{"id":33477,"date":"2025-04-12T08:27:28","date_gmt":"2025-04-12T06:27:28","guid":{"rendered":"https:\/\/www.graviton.at\/letterswaplibrary\/were-creating-an-open-dataset-to-keep-small-merchants-visible-in-llms-heres-what-weve-released\/"},"modified":"2025-04-12T08:27:28","modified_gmt":"2025-04-12T06:27:28","slug":"were-creating-an-open-dataset-to-keep-small-merchants-visible-in-llms-heres-what-weve-released","status":"publish","type":"post","link":"https:\/\/www.graviton.at\/letterswaplibrary\/were-creating-an-open-dataset-to-keep-small-merchants-visible-in-llms-heres-what-weve-released\/","title":{"rendered":"We\u2019re Creating An Open Dataset To Keep Small Merchants Visible In LLMs. Here\u2019s What We\u2019ve Released."},"content":{"rendered":"<p><!-- SC_OFF --><\/p>\n<div class=\"md\">\n<p>Here\u2019s the issue that we see (are we right?):<br \/> There\u2019s no such thing as SEO for AI yet. LLMs like ChatGPT, Claude, and Gemini don\u2019t crawl Shopify the way Google does\u2014and small stores risk becoming invisible while Amazon and Walmart take over the answers.<\/p>\n<p>So we created the <strong>Tokuhn Small Merchant Product Dataset (TSMPD-US)<\/strong>\u2014a structured, clean dataset of U.S. small business products for use in:<\/p>\n<ul>\n<li>LLM grounding<\/li>\n<li>RAG applications<\/li>\n<li>semantic product search<\/li>\n<li>agent training<\/li>\n<li>metadata classification<\/li>\n<\/ul>\n<p><strong>Two free versions are available:<\/strong><\/p>\n<ul>\n<li><strong>Public (TSMPD-US-Public v1.0):<\/strong> ~3.2M products, 10 per merchant, from 355k+ stores. Text only (no images\/variants). \ud83d\udc49 Available on Hugging Face<\/li>\n<li><strong>Partner (by request):<\/strong> 11.9M+ full products, 67M variants, 54M images, source-tracked with merchant URLs and store domains. Email [<a href=\"mailto:jim@tokuhn.com\">jim@tokuhn.com<\/a>](mailto:<a href=\"mailto:jim@tokuhn.com\">jim@tokuhn.com<\/a>) for research or commercial access.<\/li>\n<\/ul>\n<p>We\u2019re not monetizing this. We just don\u2019t want the long tail of commerce to disappear from the future of search.<\/p>\n<p><strong>Call to action:<\/strong><\/p>\n<ul>\n<li>If you work with grounding, agents, or RAG systems: take a look and let us know what\u2019s missing.<\/li>\n<li>If you&#8217;re a small merchant, drop your store URL\u2014we\u2019ll include you in the next release.<\/li>\n<li>If you\u2019re training models that should reflect real-world commerce beyond Amazon: we\u2019d love to collaborate.<\/li>\n<\/ul>\n<p>Let\u2019s make sure AI doesn\u2019t erase the 99%.<\/p>\n<\/div>\n<p><!-- SC_ON -->   submitted by   <a href=\"https:\/\/www.reddit.com\/user\/tokuhn_founders\"> \/u\/tokuhn_founders <\/a> <br \/> <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1jwv547\/were_creating_an_open_dataset_to_keep_small\/\">[link]<\/a><\/span>   <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1jwv547\/were_creating_an_open_dataset_to_keep_small\/\">[comments]<\/a><\/span><\/p><div class='watch-action'><div class='watch-position align-right'><div class='action-like'><a class='lbg-style1 like-33477 jlk' href='javascript:void(0)' data-task='like' data-post_id='33477' data-nonce='65e0e39b87' rel='nofollow'><img class='wti-pixel' src='https:\/\/www.graviton.at\/letterswaplibrary\/wp-content\/plugins\/wti-like-post\/images\/pixel.gif' title='Like' \/><span class='lc-33477 lc'>0<\/span><\/a><\/div><\/div> <div class='status-33477 status align-right'><\/div><\/div><div class='wti-clear'><\/div>","protected":false},"excerpt":{"rendered":"<p>Here\u2019s the issue that we see (are we right?): There\u2019s no such thing as SEO for AI&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[85],"tags":[],"class_list":["post-33477","post","type-post","status-publish","format-standard","hentry","category-datatards","wpcat-85-id"],"_links":{"self":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/33477","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/comments?post=33477"}],"version-history":[{"count":0,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/33477\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/media?parent=33477"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/categories?post=33477"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/tags?post=33477"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}