Seeking Audio Data For Multilingual Project – 1000 Hours Needed In Various Languages

I hope you guys are doing well. I’m in need of audio data in several languages. Specifically, I’m looking for 1000 hours of data in each of the following languages:

Australian English Czech Danish Finnish Hungarian Portuguese Romanian Norwegian Bulgarian Croatian Serbian Iranian Persian Swedish Indonesian Chinese (Taiwan) Chinese (Hong Kong) Tamil Japanese

The audio data needs to meet the following specifications: – Audio file format: 16bit, 16khz or 16khz + (or any), WAV, 2 channels or 1 channel – Duration: Minimum 5 minutes and maximum 7 minutes (if other ranges are available, please provide samples and pricing) – Transcription file format: JSON or any other suitable format

Additionally, if you have transcribed files of the same audio data, please provide samples of those as well.

We will be using the data to train an LLM model to recognize events in text, and we will also require validation along with it.

If you have any leads, suggestions, or if you can provide the data yourself, please comment below or send me a direct message. Your assistance would be greatly appreciated.

Thank you in advance for your help!

submitted by /u/Disastrous_Piano7831
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *