Helloo
I have been working in the branch of lip reading for a while now. Currently there are about 100000 videos with youtube ids, start time, and end time of the clip. I am constantly working to reduce the friction in the dataset — as we cannot share the actual video clips from youtube — by adding download scripts and the actual transcripts in the near future.
I have transcripts ready of about 80000 videos. The rest are yet to be made but since the dataset is constantly expanding (150,000 ish by end of day), transcripts would lack behind until I am done with the actual videos.
Also trying to figure out how to not get rate-limited when downloading the videos from youtube using yt-dlp. If anyone knows, please enlighten me a bit 🙂.
My core aim is to make this a standard like LRS2,LRW,LRS3 etc.
I will soon add a commercial subset in the dataset. Made from youtube videos which specifically allow commercial use so if someone wants to make a hardware out of it and bring it into the market, they can wholeheartedly do so :D.
That’s mostly it.
Have a look at the dataset if you would like to 😀
huggingface.co/datasets/Rizul2159/WildVid-LIP
There isnt much right now on it. Just a csv file with 115k videos with their ids and timestamps but soon there would be a lot more than that.
submitted by /u/Historical_Pin1429
[link] [comments]