I Can Scrape/aggregate Pretty Much Any Fragmented Public Data. What Datasets Are Missing

I built a large-scale scraping system that can extract data from thousands of sources simultaneously, bypass anti-bot protection, and convert unstructured formats (PDFs, scanned docs, complex HTML) into clean structured datasets.

What public datasets should exist but don’t because:

• Data is scattered across too many jurisdictions (every state/county has their own portal) • No one has aggregated it yet • It’s in PDFs or hard-to-parse formats • Sites actively block automated access 

Not looking to sell—genuinely trying to understand what public data would be valuable if someone aggregated it. If there’s demand, I might build and release it.

submitted by /u/Sufficient-War-4020
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *