I’m trying to get better at handling real-world data drift, not just loading clean CSVs once.
Are there public datasets where:
- Fields get added/removed over time
- Data types quietly change
- Nulls suddenly spike for no obvious reason
Basically datasets that force you to add validation and monitoring instead of assuming everything stays the same.
I’m less interested in size and more in realism.
APIs, government feeds, or long-running open datasets all welcome.
Would love examples + what broke for you when you used them.
submitted by /u/crowpng
[link] [comments]