I can scrape/aggregate pretty much any fragmented public data. What datasets are missing

I built a large-scale scraping system that can extract data from thousands of sources simultaneously, bypass anti-bot protection, and convert unstructured formats (PDFs, scanned docs, complex HTML) into clean structured datasets.

What public datasets should exist but don’t because:

• Data is scattered across too many jurisdictions (every state/county has their own portal) • No one has aggregated it yet • It’s in PDFs or hard-to-parse formats • Sites actively block automated access

Not looking to sell—genuinely trying to understand what public data would be valuable if someone aggregated it. If there’s demand, I might build and release it.

submitted by /u/Sufficient-War-4020
[link] [comments]

I Can Scrape/aggregate Pretty Much Any Fragmented Public Data. What Datasets Are Missing

Leave a Reply Cancel reply

Recent Posts

Recent Comments

18+ Content

Leave a Reply Cancel reply

Recent Posts

Recent Comments