{"id":39037,"date":"2026-02-17T18:27:07","date_gmt":"2026-02-17T17:27:07","guid":{"rendered":"https:\/\/www.graviton.at\/letterswaplibrary\/self-promotion-dataset-search-for-kaggle-huggingface\/"},"modified":"2026-02-17T18:27:07","modified_gmt":"2026-02-17T17:27:07","slug":"self-promotion-dataset-search-for-kaggle-huggingface","status":"publish","type":"post","link":"https:\/\/www.graviton.at\/letterswaplibrary\/self-promotion-dataset-search-for-kaggle-huggingface\/","title":{"rendered":"[self-promotion] Dataset Search For Kaggle &amp; Huggingface"},"content":{"rendered":"<p><!-- SC_OFF --><\/p>\n<div class=\"md\">\n<p>We made a tool for searching datasets and calculate their influence on capabilities. It uses second-order loss functions making the solution tractable across model architectures. It can be applied irrespective of domain and has already helped improve several models trained near convergence as well as more basic use cases.<\/p>\n<p>The influence scores act as a prioritization in training. You are able to benchmark the search results in the app.<br \/> The research is based on peer-reviewed work.<br \/> We started with Huggingface and this weekend added Kaggle support.<\/p>\n<p>Am looking for feedback and potential improvements.<\/p>\n<p><a href=\"https:\/\/durinn-concept-explorer.azurewebsites.net\/\">https:\/\/durinn-concept-explorer.azurewebsites.net\/<\/a><\/p>\n<p>Currently supported models are casualLM but we have research demonstrating good results for multimodal support.<\/p>\n<\/div>\n<p><!-- SC_ON -->   submitted by   <a href=\"https:\/\/www.reddit.com\/user\/New-Mathematician645\"> \/u\/New-Mathematician645 <\/a> <br \/> <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1r7bp2h\/selfpromotion_dataset_search_for_kaggle\/\">[link]<\/a><\/span>   <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1r7bp2h\/selfpromotion_dataset_search_for_kaggle\/\">[comments]<\/a><\/span><\/p><div class='watch-action'><div class='watch-position align-right'><div class='action-like'><a class='lbg-style1 like-39037 jlk' href='javascript:void(0)' data-task='like' data-post_id='39037' data-nonce='65e0e39b87' rel='nofollow'><img class='wti-pixel' src='https:\/\/www.graviton.at\/letterswaplibrary\/wp-content\/plugins\/wti-like-post\/images\/pixel.gif' title='Like' \/><span class='lc-39037 lc'>0<\/span><\/a><\/div><\/div> <div class='status-39037 status align-right'><\/div><\/div><div class='wti-clear'><\/div>","protected":false},"excerpt":{"rendered":"<p>We made a tool for searching datasets and calculate their influence on capabilities. It uses second-order loss&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[85],"tags":[],"class_list":["post-39037","post","type-post","status-publish","format-standard","hentry","category-datatards","wpcat-85-id"],"_links":{"self":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/39037","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/comments?post=39037"}],"version-history":[{"count":0,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/39037\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/media?parent=39037"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/categories?post=39037"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/tags?post=39037"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}