Hey everyone,
I’m currently doing an internship with a team of 6, and we’re working on a data engineering project focused on big data. The goal is to build a system that processes real-time streaming bank transactions using Kafka, with an added focus on fraud detection and prediction.
Right now, we’re struggling with one main issue: where can we find large-scale, real-time (or realistically simulated) financial transaction data?
Most datasets we’ve found so far are static and not really suitable for real-time streaming or fraud detection scenarios.
If anyone has recommendations—whether it’s datasets, APIs, synthetic data generators, or even approaches to simulate streaming financial data for fraud detection—we’d really appreciate the help.
Thanks in advance!
submitted by /u/No-Big-4463
[link] [comments]