{"id":33098,"date":"2025-03-19T15:27:42","date_gmt":"2025-03-19T14:27:42","guid":{"rendered":"https:\/\/www.graviton.at\/letterswaplibrary\/dataset-with-10k-50k-products-with-many-attributes\/"},"modified":"2025-03-19T15:27:42","modified_gmt":"2025-03-19T14:27:42","slug":"dataset-with-10k-50k-products-with-many-attributes","status":"publish","type":"post","link":"https:\/\/www.graviton.at\/letterswaplibrary\/dataset-with-10k-50k-products-with-many-attributes\/","title":{"rendered":"Dataset With 10k-50k Products With Many Attributes"},"content":{"rendered":"<p><!-- SC_OFF --><\/p>\n<div class=\"md\">\n<p>I am doing a master thesis on how large language models compare to other tools when extracting structured data from natural language. Essentially my goal is to translate something like this:<\/p>\n<p>&#8220;I want Asus laptops with relatively good reviews, at least 16 GB RAM, ideally 16 inch screen. Sort all the results by price and reviews&#8221;<\/p>\n<p>into something like this:<\/p>\n<p>{<\/p>\n<p>&#8220;brand&#8221;: &#8220;Asus&#8221;,<\/p>\n<p>&#8220;category&#8221;: &#8220;Electronics&#8221;,<\/p>\n<p>&#8220;subcategory&#8221;: &#8220;Laptops&#8221;,<\/p>\n<p>&#8220;sort&#8221;: [&#8220;price&#8221;, &#8220;review&#8221;],<\/p>\n<p>&#8220;filters&#8221;: [<\/p>\n<p>{<\/p>\n<p>&#8220;attribute&#8221;: &#8220;ram&#8221;,<\/p>\n<p>&#8220;condition&#8221;: &#8220;greater_than_or_equal&#8221;,<\/p>\n<p>&#8220;value&#8221;: &#8220;16 GB&#8221;,<\/p>\n<p>&#8220;is_hard_condition&#8221;: true<\/p>\n<p>},<\/p>\n<p>{<\/p>\n<p>&#8220;attribute&#8221;: &#8220;screen_size&#8221;,<\/p>\n<p>&#8220;condition&#8221;: &#8220;equal&#8221;,<\/p>\n<p>&#8220;value&#8221;: &#8220;16 inch&#8221;,<\/p>\n<p>&#8220;is_hard_condition&#8221;: false<\/p>\n<p>},<\/p>\n<p>{<\/p>\n<p>&#8220;attribute&#8221;: &#8220;review_rating&#8221;,<\/p>\n<p>&#8220;condition&#8221;: &#8220;greater_than_or_equal&#8221;,<\/p>\n<p>&#8220;value&#8221;: &#8220;4&#8221;,<\/p>\n<p>&#8220;is_hard_condition&#8221;: true<\/p>\n<p>}<\/p>\n<p>]<\/p>\n<p>}<\/p>\n<p>using large language models, and analyze how they compare to more traditional tools.<\/p>\n<p>What I need is a dataset that has many products, and each product has at least a category (though subcategories would be ideal), branch, and many attributes which are dynamic, depending on product. For example laptop would have CPU, RAM, screen size and so on, while sofas would have very different attributes. It can be even smaller in size (1k-10k). Is there a dataset for this?<\/p>\n<\/div>\n<p><!-- SC_ON -->   submitted by   <a href=\"https:\/\/www.reddit.com\/user\/PurpleYellowLeaf\"> \/u\/PurpleYellowLeaf <\/a> <br \/> <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1jeja24\/dataset_with_10k50k_products_with_many_attributes\/\">[link]<\/a><\/span>   <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1jeja24\/dataset_with_10k50k_products_with_many_attributes\/\">[comments]<\/a><\/span><\/p><div class='watch-action'><div class='watch-position align-right'><div class='action-like'><a class='lbg-style1 like-33098 jlk' href='javascript:void(0)' data-task='like' data-post_id='33098' data-nonce='614a020375' rel='nofollow'><img class='wti-pixel' src='https:\/\/www.graviton.at\/letterswaplibrary\/wp-content\/plugins\/wti-like-post\/images\/pixel.gif' title='Like' \/><span class='lc-33098 lc'>0<\/span><\/a><\/div><\/div> <div class='status-33098 status align-right'><\/div><\/div><div class='wti-clear'><\/div>","protected":false},"excerpt":{"rendered":"<p>I am doing a master thesis on how large language models compare to other tools when extracting&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[85],"tags":[],"class_list":["post-33098","post","type-post","status-publish","format-standard","hentry","category-datatards","wpcat-85-id"],"_links":{"self":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/33098","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/comments?post=33098"}],"version-history":[{"count":0,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/33098\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/media?parent=33098"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/categories?post=33098"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/tags?post=33098"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}