{"id":21458,"date":"2023-08-06T21:27:21","date_gmt":"2023-08-06T19:27:21","guid":{"rendered":"https:\/\/www.graviton.at\/letterswaplibrary\/how-to-improve-dataset-quality-for-a-machine-learning-forecast-project\/"},"modified":"2023-08-06T21:27:21","modified_gmt":"2023-08-06T19:27:21","slug":"how-to-improve-dataset-quality-for-a-machine-learning-forecast-project","status":"publish","type":"post","link":"https:\/\/www.graviton.at\/letterswaplibrary\/how-to-improve-dataset-quality-for-a-machine-learning-forecast-project\/","title":{"rendered":"How To Improve Dataset Quality For A Machine Learning Forecast Project"},"content":{"rendered":"<p><!-- SC_OFF --><\/p>\n<div class=\"md\">\n<p>I have a dataset composed by IT ticket logs from 2020 to 2023. I have structured the columns as it follows: day, month, year, holiday(0 if its not a holiday and 1 if it is) name of the day(1 to 7), hour of the day(0 to 23), bank campaign (just for July and December, bonus and finally the number of tickets per day and hour. When I organize the logs only by date, the dataset is composed by 1014 logs. If I add the hour attribute, the dataset ends with 6000 logs. I want to train ML algorithms (random forest and lstm) to forecast the number of IT tickets for a certain time (hour) and date but my metrics are underperforming. I\u2019d like to know if there\u2019s a way to improve my metrics? Could it be related to the algorithms? How could I improve the quality of my dataset?(if that\u2019s even possible)<\/p>\n<p>Thanks in advance for your help!<\/p>\n<\/div>\n<p><!-- SC_ON -->   submitted by   <a href=\"https:\/\/www.reddit.com\/user\/CheisonVS\"> \/u\/CheisonVS <\/a> <br \/> <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/15jx5fe\/how_to_improve_dataset_quality_for_a_machine\/\">[link]<\/a><\/span>   <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/15jx5fe\/how_to_improve_dataset_quality_for_a_machine\/\">[comments]<\/a><\/span><\/p><div class='watch-action'><div class='watch-position align-right'><div class='action-like'><a class='lbg-style1 like-21458 jlk' href='javascript:void(0)' data-task='like' data-post_id='21458' data-nonce='65e0e39b87' rel='nofollow'><img class='wti-pixel' src='https:\/\/www.graviton.at\/letterswaplibrary\/wp-content\/plugins\/wti-like-post\/images\/pixel.gif' title='Like' \/><span class='lc-21458 lc'>0<\/span><\/a><\/div><\/div> <div class='status-21458 status align-right'><\/div><\/div><div class='wti-clear'><\/div>","protected":false},"excerpt":{"rendered":"<p>I have a dataset composed by IT ticket logs from 2020 to 2023. I have structured the&#8230;<\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[85],"tags":[],"class_list":["post-21458","post","type-post","status-publish","format-standard","hentry","category-datatards","wpcat-85-id"],"_links":{"self":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/21458","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/comments?post=21458"}],"version-history":[{"count":0,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/21458\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/media?parent=21458"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/categories?post=21458"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/tags?post=21458"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}