A few weeks ago I released a synthetic sleep health dataset on Kaggle and it took off faster than I expected. Sharing it here in case anyone finds it useful.
What’s in it:
– 100,000 records, 32 features, 3 prediction targets
– Sleep architecture: REM %, deep sleep %, latency, wake episodes
– Lifestyle: caffeine, alcohol, screen time, exercise, steps
– Psychological: stress score, chronotype, mental health condition
– Demographics: 12 occupations, 15 countries, ages 18-69
Three ML targets:
– cognitive_performance_score- regression (0–100)
– sleep_disorder_risk – multiclass (Healthy / Mild / Moderate / Severe)
– felt_rested – binary classification
One finding that surprised people:
Lawyers average 5.74 hrs of sleep and 7.3/10 stress. Retired individuals average 8.03 hrs and 2.6/10 stress. That 2.13-hour gap shows up clearly in every model – occupation is the strongest predictor of sleep health in the entire dataset.
All distributions are calibrated against CDC, Sleep Foundation, and Frontiers in Sleep research. Correlations match peer-reviewed values (e.g. stress vs quality r=-0.64).
Link in profile if you want to check it out. Happy to answer questions about how it was built.
submitted by /u/Mohan137
[link] [comments]