I need to make a dataset like this with 100 videos. Is there any open source tool or any model that would be of help?
I tried CVAT but it was time consuming yet reliable. I tried this solution, this one uses qwen.
References: The dataset I’m trying to replicate: VideoChat_OpenGV
submitted by /u/Powerful_Solution474
[link] [comments]