Egocentric-10K: 10,000 Hours Of Real Factory Worker Videos Just Open-Sourced. Fuel For Next-Gen Robots In Data Training

Hey r/datasets, If you’re into training AI that actually works in the messy real world buckle up. An 18-year-old founder just dropped Egocentric-10K, a massive open-source dataset that’s basically a goldmine for embodied AI. What’s in it?

  • 10K+ hours of first-person video from 2,138 factory workers worldwide .
  • 1.08 billion frames at 30fps/1080p, captured via sneaky head cams (no staging, pure chaos).
  • Super dense on hand actions: grabbing tools, assembling parts, troubleshooting—way better visibility than lab fakes.
  • Total size: 16.4 TB of MP4s + JSON metadata, streamed via Hugging Face for easy access.

Why does this matter? Current robots suck at dynamic tasks because datasets are tiny or too “perfect.” This one’s raw, scalable, and licensed Apache 2.0—free for researchers to train imitation learning models. Could mean safer factories, smarter home bots, or even AI surgeons that mimic pros. Eddy Xu (Build AI) announced it on X yesterday: Link to X post:

Grab it here: https://huggingface.co/datasets/builddotai/Egocentric-10K

submitted by /u/NotSuper-man
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *