Hey everyone 👋 I’m currently working on my final year engineering project based on disease prediction using Machine Learning.
Since real medical datasets are hard to find, I decided to generate synthetic data for training and testing my model. Some people told me it’s not a good idea — that it might affect my model accuracy or even look bad on my resume.
But my main goal is to learn the entire ML workflow — from preprocessing to model building and evaluation.
So I wanted to ask: 👉 Will using synthetic data affect my model’s performance or generalization? 👉 Does it look bad on a resume or during interviews if I mention that I used synthetic data? 👉 Any suggestions to make my project more authentic or practical despite using synthetic data?
Would really appreciate honest opinions or experiences from others who’ve been in the same situation 🙌
submitted by /u/shrinivas-2003
[link] [comments]