I’ve been looking into first-person (egocentric) video datasets for activity recognition and multimodal learning research.
A few challenges that seem to come up repeatedly are:
Motion blur
Rapid viewpoint changes
Occlusions from hands and objects
Long video sequences
Annotation consistency
For people who have worked with these datasets:
Which datasets have been the most useful?
What limitations did you encounter?
How well do current datasets generalize to real-world applications?
Are there any newer datasets you’d recommend exploring?
I’d appreciate hearing about experiences from both research and production environments.
submitted by /u/Vane1st
[link] [comments]