[Slef-promotion][Synthetic] I Built A 100K-row Sleep Health Dataset From Scratch – It Just Earned A Kaggle Silver Medal (7,800 Views, 1,700+ Downloads In 2 Weeks)

A few weeks ago I released a synthetic sleep health dataset on Kaggle and it took off faster than I expected. Sharing it here in case anyone finds it useful.

What’s in it:

– 100,000 records, 32 features, 3 prediction targets

– Sleep architecture: REM %, deep sleep %, latency, wake episodes

– Lifestyle: caffeine, alcohol, screen time, exercise, steps

– Psychological: stress score, chronotype, mental health condition

– Demographics: 12 occupations, 15 countries, ages 18-69

Three ML targets:

– cognitive_performance_score- regression (0–100)

– sleep_disorder_risk – multiclass (Healthy / Mild / Moderate / Severe)

– felt_rested – binary classification

One finding that surprised people:

Lawyers average 5.74 hrs of sleep and 7.3/10 stress. Retired individuals average 8.03 hrs and 2.6/10 stress. That 2.13-hour gap shows up clearly in every model – occupation is the strongest predictor of sleep health in the entire dataset.

All distributions are calibrated against CDC, Sleep Foundation, and Frontiers in Sleep research. Correlations match peer-reviewed values (e.g. stress vs quality r=-0.64).

Link in profile if you want to check it out. Happy to answer questions about how it was built.

submitted by /u/Mohan137
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *