{"id":35489,"date":"2025-09-16T15:27:08","date_gmt":"2025-09-16T13:27:08","guid":{"rendered":"https:\/\/www.graviton.at\/letterswaplibrary\/offer-free-custom-synthetic-dataset-generation-seeking-feedback-partners-for-open-source-tool\/"},"modified":"2025-09-16T15:27:08","modified_gmt":"2025-09-16T13:27:08","slug":"offer-free-custom-synthetic-dataset-generation-seeking-feedback-partners-for-open-source-tool","status":"publish","type":"post","link":"https:\/\/www.graviton.at\/letterswaplibrary\/offer-free-custom-synthetic-dataset-generation-seeking-feedback-partners-for-open-source-tool\/","title":{"rendered":"[Offer] Free Custom Synthetic Dataset Generation &#8211; Seeking Feedback Partners For Open Source Tool"},"content":{"rendered":"<p><!-- SC_OFF --><\/p>\n<div class=\"md\">\n<p>Hi <a href=\"https:\/\/www.reddit.com\/r\/datasets\">r\/datasets<\/a> community!<\/p>\n<p>I&#8217;m the creator of <strong>DeepFabric<\/strong> (<a href=\"https:\/\/github.com\/lukehinds\/deepfabric\">https:\/\/github.com\/lukehinds\/deepfabric<\/a>), an open-source tool that generates synthetic datasets using LLMs and novel approaches leveraging graphs (DAG) and Trees. I&#8217;m looking for collaborators who need custom datasets and are willing to provide feedback on quality and usefulness.<\/p>\n<p><strong>What DeepFabric does:<\/strong> DeepFabric creates diverse, domain-specific synthetic datasets using a unique graph\/tree-based architecture. It generates data in OpenAI chat format with more formats coming, minimizes redundancy through structured topic generation.<\/p>\n<p><strong>What I&#8217;m offering:<\/strong> I&#8217;ll create custom synthetic datasets tailored to your specific domain or use case, cover all LLM API costs myself, provide technical support and customization, and generate datasets ranging from small proof-of-concepts to larger training sets.<\/p>\n<p><strong>What I&#8217;m looking for:<\/strong> I need detailed feedback on dataset quality, diversity, and usefulness, insights into how well the synthetic data performs for your specific use case, suggestions for improvements or missing features, and optionally a brief case study write-up of your experience.<\/p>\n<p><strong>Ideal collaborators:<\/strong> I&#8217;m particularly interested in working with researchers or developers working in a professional capacity, doing model distillation or evaluation benchmarks, or anyone needing training data for specialized or niche domains for machine learning \/ statistical analysis &#8211; a good example might be people working with limited real-world data availability. I have so far received really good feedback from a medical professor who needed data around mock scenarios of someone complaining about symptoms that could signal risk of heart attack.<\/p>\n<p><strong>Examples of what I can generate:<\/strong> Think Q&amp;A pairs for specific technical domains, conversational data for chatbot training, domain-specific instruction-following datasets, or evaluation benchmarks for specialized tasks. I am also able to convert to whatever format you need.<\/p>\n<p>If you&#8217;re interested, please comment or PM with your domain\/use case, approximate dataset size needed, brief description of your intended use, and timeline if you have one.<\/p>\n<p>I&#8217;ll prioritize collaborations that offer the most learning opportunities for both of us. Looking forward to working with some of you!<\/p>\n<p>Some examples: medical Q&amp;A: <a href=\"https:\/\/huggingface.co\/datasets\/lukehinds\/medical_q_and_a\">https:\/\/huggingface.co\/datasets\/lukehinds\/medical_q_and_a<\/a><\/p>\n<p>Programming Challenges: <a href=\"https:\/\/huggingface.co\/datasets\/lukehinds\/programming-challenges-one\">https:\/\/huggingface.co\/datasets\/lukehinds\/programming-challenges-one<\/a><\/p>\n<p><strong>Repository:<\/strong> <a href=\"https:\/\/github.com\/lukehinds\/deepfabric\">https:\/\/github.com\/lukehinds\/deepfabric<\/a><br \/> <strong>Documentation:<\/strong> <a href=\"https:\/\/lukehinds.github.io\/DeepFabric\/synethic\">https:\/\/lukehinds.github.io\/DeepFabric\/synethic<\/a> data<\/p>\n<\/div>\n<p><!-- SC_ON -->   submitted by   <a href=\"https:\/\/www.reddit.com\/user\/DecodeBytes\"> \/u\/DecodeBytes <\/a> <br \/> <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1nig99z\/offer_free_custom_synthetic_dataset_generation\/\">[link]<\/a><\/span>   <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1nig99z\/offer_free_custom_synthetic_dataset_generation\/\">[comments]<\/a><\/span><\/p><div class='watch-action'><div class='watch-position align-right'><div class='action-like'><a class='lbg-style1 like-35489 jlk' href='javascript:void(0)' data-task='like' data-post_id='35489' data-nonce='65e0e39b87' rel='nofollow'><img class='wti-pixel' src='https:\/\/www.graviton.at\/letterswaplibrary\/wp-content\/plugins\/wti-like-post\/images\/pixel.gif' title='Like' \/><span class='lc-35489 lc'>0<\/span><\/a><\/div><\/div> <div class='status-35489 status align-right'><\/div><\/div><div class='wti-clear'><\/div>","protected":false},"excerpt":{"rendered":"<p>Hi r\/datasets community! I&#8217;m the creator of DeepFabric (https:\/\/github.com\/lukehinds\/deepfabric), an open-source tool that generates synthetic datasets using&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[85],"tags":[],"class_list":["post-35489","post","type-post","status-publish","format-standard","hentry","category-datatards","wpcat-85-id"],"_links":{"self":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/35489","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/comments?post=35489"}],"version-history":[{"count":0,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/35489\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/media?parent=35489"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/categories?post=35489"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/tags?post=35489"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}