I Pulled Data From 1.5 Million US Websites – What Data Would You Want To Know?

Started out with a question, how do I spend $300 in free GCC credits, and how much could I do with it. I started with figuring out how to query HTTP Archives, pulling CRuX data to correlate sites, and learning a bit about BigQuery along the way. I went from ~12 million total sites and pared that down to 1.5 million that I could verify were live, had enough data to be able to classify/categorize, and then built a front end to access the highlights.

So far, I’ve been focused on identifying key business segments with missing opportunities, classic one click misses, some schema mapping for business type, and wondering why in the world any sane business owner would use Weebly.

What would YOU want to know?

submitted by /u/gillygangopolus
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *