All I can find are one-word audio files. So far, I found Meta’s mmcsg dataset, but it’s only between two people. I’m artificially adding noise to it, but I need more.
(I know I can generate a transcription using whisper, but it tends to be hit or miss, especially with the large models. I’m not looking to retrain whisper, I’m doing an entirely different concept)
submitted by /u/vardonir
[link] [comments]