There is very little Irish language text, audio and english translation. One of the best sources is this soap opera
It is fairly easy to find the url of the subtitles when on that webpage manually
But the vtt URL uses UUIDs that seem pretty random
There are subtitle archive sites but this soap opera is not there. So how would you extract a few hundred sets of VTT files (I want to build NLP datasets , ngrams etc, not make money or anything).
I can imagine answers of
With this site you can hire someone and if you show them the steps they can extract them for you cheap
With this mouse emulator you can do it by XYZ
There is away around the UUIDs being random by XYZ
But I do not know how any of these would actually work.
submitted by /u/cavedave
[link] [comments]