- I’m packaging a dataset for ML training and want to do this “properly.”
- What fields make you trust a dataset fast? (license, data lineage, schema, label definitions, splits, leakage checks, etc.)
- Any examples of dataset cards/docs you consider “gold standard”? (Keep it discussion + best practices; avoid sales. r/datasets discourages low-effort requests and prefers original sources.)
submitted by /u/JayPatel24_
[link] [comments]