{"id":31308,"date":"2024-11-04T19:27:10","date_gmt":"2024-11-04T18:27:10","guid":{"rendered":"https:\/\/www.graviton.at\/letterswaplibrary\/dataset-introducing-k2q-a-diverse-prompt-response-dataset-for-information-extraction-from-documents\/"},"modified":"2024-11-04T19:27:10","modified_gmt":"2024-11-04T18:27:10","slug":"dataset-introducing-k2q-a-diverse-prompt-response-dataset-for-information-extraction-from-documents","status":"publish","type":"post","link":"https:\/\/www.graviton.at\/letterswaplibrary\/dataset-introducing-k2q-a-diverse-prompt-response-dataset-for-information-extraction-from-documents\/","title":{"rendered":"[Dataset] Introducing K2Q: A Diverse Prompt-Response Dataset For Information Extraction From Documents"},"content":{"rendered":"<p><!-- SC_OFF --><\/p>\n<div class=\"md\">\n<p>Hey <a href=\"https:\/\/www.reddit.com\/r\/Datasets\">r\/Datasets<\/a>! We\u2019re excited to announce K2Q, a newly curated dataset collection for anyone working with visually rich documents and large language models (LLMs) in document understanding. If you want to push the boundaries on how models handle complex, natural prompt-response queries, K2Q could be the dataset you&#8217;ve been looking for! The paper can be found <a href=\"https:\/\/arxiv.org\/abs\/2410.15484\">here<\/a> and is accepted to the Empirical Methods in Natural Language Processing (EMNLP) Conference.<\/p>\n<p><strong>What\u2019s K2Q All About?<\/strong><\/p>\n<p>As LLMs continue to expand into document understanding, the need for prompt-based datasets is growing fast. Most existing datasets rely on basic templates like &#8220;What is the value for {key}?&#8221;, which don\u2019t fully reflect the varied, nuanced questions encountered in real-world use. K2Q steps in to fill this gap by:<\/p>\n<p>  Converting five Key Information Extraction (KIE) datasets into a diverse, prompt-response format with multi-entity, extractive, and boolean questions. Using bespoke templates that better capture the types of prompts LLMs face in real applications.  <\/p>\n<p><strong>Why Use K2Q?<\/strong><\/p>\n<p>Our empirical studies on generative models show that K2Q\u2019s diversity significantly boosts model robustness and performance compared to simpler, template-based datasets.<\/p>\n<p><strong>Who Can Benefit from K2Q?<\/strong><\/p>\n<p>Researchers and practitioners can use K2Q to:<\/p>\n<p>  Test zero-shot or fine-tuned models with realistic, challenging questions. Improve model performance on KIE tasks through diverse prompt-response training. Contribute to future studies on data quality for generative model training.  <\/p>\n<p>\ud83d\udcc4 Dataset &amp; Paper: K2Q will be presented at the Findings of EMNLP, so feel free to dive into our paper for in-depth analyses and results! We\u2019d love to see K2Q inspire your own projects and findings in Document AI.<\/p>\n<\/div>\n<p><!-- SC_ON -->   submitted by   <a href=\"https:\/\/www.reddit.com\/user\/blisferatu\"> \/u\/blisferatu <\/a> <br \/> <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1gjjy3v\/dataset_introducing_k2q_a_diverse_promptresponse\/\">[link]<\/a><\/span>   <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1gjjy3v\/dataset_introducing_k2q_a_diverse_promptresponse\/\">[comments]<\/a><\/span><\/p><div class='watch-action'><div class='watch-position align-right'><div class='action-like'><a class='lbg-style1 like-31308 jlk' href='javascript:void(0)' data-task='like' data-post_id='31308' data-nonce='65e0e39b87' rel='nofollow'><img class='wti-pixel' src='https:\/\/www.graviton.at\/letterswaplibrary\/wp-content\/plugins\/wti-like-post\/images\/pixel.gif' title='Like' \/><span class='lc-31308 lc'>0<\/span><\/a><\/div><\/div> <div class='status-31308 status align-right'><\/div><\/div><div class='wti-clear'><\/div>","protected":false},"excerpt":{"rendered":"<p>Hey r\/Datasets! We\u2019re excited to announce K2Q, a newly curated dataset collection for anyone working with visually&#8230;<\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[85],"tags":[],"class_list":["post-31308","post","type-post","status-publish","format-standard","hentry","category-datatards","wpcat-85-id"],"_links":{"self":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/31308","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/comments?post=31308"}],"version-history":[{"count":0,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/31308\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/media?parent=31308"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/categories?post=31308"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/tags?post=31308"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}