{"id":27796,"date":"2024-05-27T13:27:34","date_gmt":"2024-05-27T11:27:34","guid":{"rendered":"https:\/\/www.graviton.at\/letterswaplibrary\/automated-dataset-generation-and-augmentation\/"},"modified":"2024-05-27T13:27:34","modified_gmt":"2024-05-27T11:27:34","slug":"automated-dataset-generation-and-augmentation","status":"publish","type":"post","link":"https:\/\/www.graviton.at\/letterswaplibrary\/automated-dataset-generation-and-augmentation\/","title":{"rendered":"Automated Dataset Generation And Augmentation"},"content":{"rendered":"<p><!-- SC_OFF --><\/p>\n<div class=\"md\">\n<p>Hi guys, I\u2019ve been working on a fine tuned llama3 for quite some time now and want to expand the dataset. Are there any good automated solutions to generate these datasets from pdf or html and can these be augmented automatically? <\/p>\n<p>Thanks so much in advance <\/p>\n<\/div>\n<p><!-- SC_ON -->   submitted by   <a href=\"https:\/\/www.reddit.com\/user\/OkVegetable2512\"> \/u\/OkVegetable2512 <\/a> <br \/> <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1d1oqpx\/automated_dataset_generation_and_augmentation\/\">[link]<\/a><\/span>   <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1d1oqpx\/automated_dataset_generation_and_augmentation\/\">[comments]<\/a><\/span><\/p><div class='watch-action'><div class='watch-position align-right'><div class='action-like'><a class='lbg-style1 like-27796 jlk' href='javascript:void(0)' data-task='like' data-post_id='27796' data-nonce='614a020375' rel='nofollow'><img class='wti-pixel' src='https:\/\/www.graviton.at\/letterswaplibrary\/wp-content\/plugins\/wti-like-post\/images\/pixel.gif' title='Like' \/><span class='lc-27796 lc'>0<\/span><\/a><\/div><\/div> <div class='status-27796 status align-right'><\/div><\/div><div class='wti-clear'><\/div>","protected":false},"excerpt":{"rendered":"<p>Hi guys, I\u2019ve been working on a fine tuned llama3 for quite some time now and want&#8230;<\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[85],"tags":[],"class_list":["post-27796","post","type-post","status-publish","format-standard","hentry","category-datatards","wpcat-85-id"],"_links":{"self":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/27796","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/comments?post=27796"}],"version-history":[{"count":0,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/27796\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/media?parent=27796"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/categories?post=27796"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/tags?post=27796"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}