{"id":41549,"date":"2026-06-28T20:27:31","date_gmt":"2026-06-28T18:27:31","guid":{"rendered":"https:\/\/www.graviton.at\/letterswaplibrary\/i-pulled-data-from-1-5-million-us-websites-what-data-would-you-want-to-know\/"},"modified":"2026-06-28T20:27:31","modified_gmt":"2026-06-28T18:27:31","slug":"i-pulled-data-from-1-5-million-us-websites-what-data-would-you-want-to-know","status":"publish","type":"post","link":"https:\/\/www.graviton.at\/letterswaplibrary\/i-pulled-data-from-1-5-million-us-websites-what-data-would-you-want-to-know\/","title":{"rendered":"I Pulled Data From 1.5 Million US Websites &#8211; What Data Would You Want To Know?"},"content":{"rendered":"<p><!-- SC_OFF --><\/p>\n<div class=\"md\">\n<p>Started out with a question, how do I spend $300 in free GCC credits, and how much could I do with it. I started with figuring out how to query HTTP Archives, pulling CRuX data to correlate sites, and learning a bit about BigQuery along the way. I went from ~12 million total sites and pared that down to 1.5 million that I could verify were live, had enough data to be able to classify\/categorize, and then built a front end to access the highlights.<\/p>\n<p>So far, I&#8217;ve been focused on identifying key business segments with missing opportunities, classic one click misses, some schema mapping for business type, and wondering why in the world any sane business owner would use Weebly.<\/p>\n<p>What would YOU want to know?<\/p>\n<\/div>\n<p><!-- SC_ON -->   submitted by   <a href=\"https:\/\/www.reddit.com\/user\/gillygangopolus\"> \/u\/gillygangopolus <\/a> <br \/> <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1ui4dqp\/i_pulled_data_from_15_million_us_websites_what\/\">[link]<\/a><\/span>   <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1ui4dqp\/i_pulled_data_from_15_million_us_websites_what\/\">[comments]<\/a><\/span><\/p><div class='watch-action'><div class='watch-position align-right'><div class='action-like'><a class='lbg-style1 like-41549 jlk' href='javascript:void(0)' data-task='like' data-post_id='41549' data-nonce='72e055e984' rel='nofollow'><img class='wti-pixel' src='https:\/\/www.graviton.at\/letterswaplibrary\/wp-content\/plugins\/wti-like-post\/images\/pixel.gif' title='Like' \/><span class='lc-41549 lc'>0<\/span><\/a><\/div><\/div> <div class='status-41549 status align-right'><\/div><\/div><div class='wti-clear'><\/div>","protected":false},"excerpt":{"rendered":"<p>Started out with a question, how do I spend $300 in free GCC credits, and how much&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[85],"tags":[],"class_list":["post-41549","post","type-post","status-publish","format-standard","hentry","category-datatards","wpcat-85-id"],"_links":{"self":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/41549","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/comments?post=41549"}],"version-history":[{"count":0,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/41549\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/media?parent=41549"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/categories?post=41549"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/tags?post=41549"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}