{"id":33147,"date":"2025-03-23T21:27:17","date_gmt":"2025-03-23T20:27:17","guid":{"rendered":"https:\/\/www.graviton.at\/letterswaplibrary\/where-do-you-source-your-data-frustrated-with-kaggle-synthetic-data-and-costly-apis\/"},"modified":"2025-03-23T21:27:17","modified_gmt":"2025-03-23T20:27:17","slug":"where-do-you-source-your-data-frustrated-with-kaggle-synthetic-data-and-costly-apis","status":"publish","type":"post","link":"https:\/\/www.graviton.at\/letterswaplibrary\/where-do-you-source-your-data-frustrated-with-kaggle-synthetic-data-and-costly-apis\/","title":{"rendered":"Where Do You Source Your Data? Frustrated With Kaggle, Synthetic Data, And Costly APIs"},"content":{"rendered":"<p><!-- SC_OFF --><\/p>\n<div class=\"md\">\n<p>I\u2019m trying to build a really impressive machine learning project\u2014something that could compete with projects from people who have actual industry experience and access to high-quality data. But I\u2019m struggling <strong>big time<\/strong> with finding good data.<\/p>\n<p>Most of the usual sources (Kaggle, UCI, OpenML) feel overused, and I want something unique that hasn\u2019t already been analyzed to death. I also really dislike synthetic datasets because they don\u2019t reflect real-world messiness\u2014missing data, biases, or the weird patterns you only see in actual data.<\/p>\n<p>The problem is, <strong>I don\u2019t like web scraping<\/strong>. I know it\u2019s technically legal in many cases, but it still feels kind of sketchy, and I\u2019d rather not deal with potential gray areas. That leaves APIs, but it seems like <strong>every good API wants money<\/strong>, and I really don\u2019t want to pay just to get access to data for a personal project.<\/p>\n<p>For those of you who\u2019ve built standout projects, where do you source your data? Are there any free APIs you\u2019ve found useful? Any creative ways to get good datasets without scraping or paying? I\u2019d really appreciate any advice!<\/p>\n<\/div>\n<p><!-- SC_ON -->   submitted by   <a href=\"https:\/\/www.reddit.com\/user\/kobastat121987\"> \/u\/kobastat121987 <\/a> <br \/> <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1ji7rlx\/where_do_you_source_your_data_frustrated_with\/\">[link]<\/a><\/span>   <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1ji7rlx\/where_do_you_source_your_data_frustrated_with\/\">[comments]<\/a><\/span><\/p><div class='watch-action'><div class='watch-position align-right'><div class='action-like'><a class='lbg-style1 like-33147 jlk' href='javascript:void(0)' data-task='like' data-post_id='33147' data-nonce='65e0e39b87' rel='nofollow'><img class='wti-pixel' src='https:\/\/www.graviton.at\/letterswaplibrary\/wp-content\/plugins\/wti-like-post\/images\/pixel.gif' title='Like' \/><span class='lc-33147 lc'>0<\/span><\/a><\/div><\/div> <div class='status-33147 status align-right'><\/div><\/div><div class='wti-clear'><\/div>","protected":false},"excerpt":{"rendered":"<p>I\u2019m trying to build a really impressive machine learning project\u2014something that could compete with projects from people&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[85],"tags":[],"class_list":["post-33147","post","type-post","status-publish","format-standard","hentry","category-datatards","wpcat-85-id"],"_links":{"self":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/33147","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/comments?post=33147"}],"version-history":[{"count":0,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/33147\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/media?parent=33147"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/categories?post=33147"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/tags?post=33147"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}