{"id":40956,"date":"2026-05-13T20:27:07","date_gmt":"2026-05-13T18:27:07","guid":{"rendered":"https:\/\/www.graviton.at\/letterswaplibrary\/self-promotion-free-20-record-samples-csv-json-of-20-dev-ai-datasets-npm-mcp-servers-huggingface-models-homebrew-etc\/"},"modified":"2026-05-13T20:27:07","modified_gmt":"2026-05-13T18:27:07","slug":"self-promotion-free-20-record-samples-csv-json-of-20-dev-ai-datasets-npm-mcp-servers-huggingface-models-homebrew-etc","status":"publish","type":"post","link":"https:\/\/www.graviton.at\/letterswaplibrary\/self-promotion-free-20-record-samples-csv-json-of-20-dev-ai-datasets-npm-mcp-servers-huggingface-models-homebrew-etc\/","title":{"rendered":"[self-promotion] Free 20-record Samples (CSV + JSON) Of 20 Dev\/AI Datasets \u2014 Npm, MCP Servers, HuggingFace Models, Homebrew, Etc."},"content":{"rendered":"<p><!-- SC_OFF --><\/p>\n<div class=\"md\">\n<p>Hi <a href=\"https:\/\/www.reddit.com\/r\/datasets\">r\/datasets<\/a> \u2014 disclosure first: I sell a paid version of these on Gumroad ($34, 83% off launch). I&#8217;m posting the free 20-record samples here because they&#8217;re genuinely useful on their own and the mod rules ask self-promotion to be labeled.<\/p>\n<p>What&#8217;s in the free samples:<\/p>\n<p>20 niche datasets, each with 20 fully-enriched records as CSV + JSON. ~55,000 records total in the paid version (54,958 as of today). Topics:<\/p>\n<ul>\n<li>ai-tools, ai-agents, ai-prompts, ai-models-pricing (13 paid Llama 3.3 70B providers compared)<\/li>\n<li>public-apis, mcp-servers (2,971), developer-tools, vscode-extensions<\/li>\n<li>self-hosted-software, open-source-alternatives, no-code-lowcode<\/li>\n<li>design-resources, cybersecurity-tools<\/li>\n<li>npm-packages (top by weekly downloads), homebrew-formulae<\/li>\n<li>huggingface-models (top 4,000 by downloads), huggingface-datasets (2,600+)<\/li>\n<li>vector-db \/ RAG ecosystem, ai-agent-frameworks (1,324 records \u2014 grew 6.6x in 8 days)<\/li>\n<\/ul>\n<p>Why I built them:<\/p>\n<p>I kept needing structured, queryable lists of &#8220;all the X tools&#8221; for filterable directory builds. Awesome-lists and READMEs are great for browsing but useless for jq \/ SQL \/ search infrastructure. So I curate, normalize, validate (zero invalid records), enrich with stars\/downloads\/installs, and refresh.<\/p>\n<p>Per-record fields are typed \u2014 categorizationTier rates each record 87-100% specific (vs vague &#8220;tool&#8221; labels). Open question for the sub: how do you handle tier-of-specificity in your own dataset categorization work? My current rubric is per-dataset config-driven but I&#8217;m curious what others do.<\/p>\n<p>Free samples (CSV + JSON, MIT-style permissive): <a href=\"https:\/\/github.com\/futdevpro\/niche-datasets-free\">https:\/\/github.com\/futdevpro\/niche-datasets-free<\/a><\/p>\n<p>Includes mega-sample.json (5 random records from each of the 20 datasets, 100 records total).<\/p>\n<p>Paid version on Gumroad \u2014 $34 launch price (83% off $198 list), monthly refresh on AI Models Pricing because OpenRouter changes weekly, quarterly on the rest. Linked from the GitHub README if anyone wants the full thing.<\/p>\n<p>Happy to answer questions about the catalog, methodology, or specific datasets.<\/p>\n<\/div>\n<p><!-- SC_ON -->   submitted by   <a href=\"https:\/\/www.reddit.com\/user\/Jhonny_Ronnie\"> \/u\/Jhonny_Ronnie <\/a> <br \/> <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1tc7907\/selfpromotion_free_20record_samples_csv_json_of\/\">[link]<\/a><\/span>   <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1tc7907\/selfpromotion_free_20record_samples_csv_json_of\/\">[comments]<\/a><\/span><\/p><div class='watch-action'><div class='watch-position align-right'><div class='action-like'><a class='lbg-style1 like-40956 jlk' href='javascript:void(0)' data-task='like' data-post_id='40956' data-nonce='72e055e984' rel='nofollow'><img class='wti-pixel' src='https:\/\/www.graviton.at\/letterswaplibrary\/wp-content\/plugins\/wti-like-post\/images\/pixel.gif' title='Like' \/><span class='lc-40956 lc'>0<\/span><\/a><\/div><\/div> <div class='status-40956 status align-right'><\/div><\/div><div class='wti-clear'><\/div>","protected":false},"excerpt":{"rendered":"<p>Hi r\/datasets \u2014 disclosure first: I sell a paid version of these on Gumroad ($34, 83% off&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[85],"tags":[],"class_list":["post-40956","post","type-post","status-publish","format-standard","hentry","category-datatards","wpcat-85-id"],"_links":{"self":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/40956","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/comments?post=40956"}],"version-history":[{"count":0,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/40956\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/media?parent=40956"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/categories?post=40956"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/tags?post=40956"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}