{"id":31884,"date":"2024-12-13T12:27:27","date_gmt":"2024-12-13T11:27:27","guid":{"rendered":"https:\/\/www.graviton.at\/letterswaplibrary\/can-we-automate-data-quality-assessment-process-for-small-datasets\/"},"modified":"2024-12-13T12:27:27","modified_gmt":"2024-12-13T11:27:27","slug":"can-we-automate-data-quality-assessment-process-for-small-datasets","status":"publish","type":"post","link":"https:\/\/www.graviton.at\/letterswaplibrary\/can-we-automate-data-quality-assessment-process-for-small-datasets\/","title":{"rendered":"Can We Automate Data Quality Assessment Process For Small Datasets?"},"content":{"rendered":"<p><!-- SC_OFF --><\/p>\n<div class=\"md\">\n<p>Recently, my friend and I have been thinking of working on a side project (for our portfolios) to automate data quality assessment for small tabular datasets that you often find in kaggle.<\/p>\n<p>We acknowledge that such a tool can&#8217;t be 100% accurate but it can definitely help nontech people and tech people to get started with working on their datasets. We aim to have a platform where the user will upload a dataset, the system will identify anomalies and give suggestions to the user with different ways to fix that anomaly (e.g. imputation of missing value, fixing an email that doesn&#8217;t follow the email pattern, etc).<\/p>\n<p>I would love to discuss the project further and get your thoughts on it. We have been researching similar projects and we found Cocoon, they use proceed column by column, and for each column they have a series of anomalies to fix using an LLM. But we want to have statistical methods for numerical columns, and use LLM only when it&#8217;s needed. Can anyone help?<\/p>\n<\/div>\n<p><!-- SC_ON -->   submitted by   <a href=\"https:\/\/www.reddit.com\/user\/Better_Resource_4765\"> \/u\/Better_Resource_4765 <\/a> <br \/> <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1hda0gi\/can_we_automate_data_quality_assessment_process\/\">[link]<\/a><\/span>   <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1hda0gi\/can_we_automate_data_quality_assessment_process\/\">[comments]<\/a><\/span><\/p><div class='watch-action'><div class='watch-position align-right'><div class='action-like'><a class='lbg-style1 like-31884 jlk' href='javascript:void(0)' data-task='like' data-post_id='31884' data-nonce='65e0e39b87' rel='nofollow'><img class='wti-pixel' src='https:\/\/www.graviton.at\/letterswaplibrary\/wp-content\/plugins\/wti-like-post\/images\/pixel.gif' title='Like' \/><span class='lc-31884 lc'>0<\/span><\/a><\/div><\/div> <div class='status-31884 status align-right'><\/div><\/div><div class='wti-clear'><\/div>","protected":false},"excerpt":{"rendered":"<p>Recently, my friend and I have been thinking of working on a side project (for our portfolios)&#8230;<\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[85],"tags":[],"class_list":["post-31884","post","type-post","status-publish","format-standard","hentry","category-datatards","wpcat-85-id"],"_links":{"self":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/31884","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/comments?post=31884"}],"version-history":[{"count":0,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/31884\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/media?parent=31884"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/categories?post=31884"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/tags?post=31884"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}