Hello everyone,
I’m an undergrad linguistic student currently studying Computational Linguistics and NLP. I live in Brazil and I plan to work with endangered languages in my area.
I’m researching a method of creating language models of non-catalogued languages, or of languages with a small amount of data. I also plan to go to one of those groups to collect data, but that is far in the future.
Finally, I’m looking for any dataset in a language that is not modeled yet (my base is that is not in Google Translate), or in an endangered language. Any type of suggestion or comment is welcome.
Thanks for taking the time to read this and help me.
P.S.: I’m not an expert, just a student trying to do some research that can help my community.
submitted by /u/Pinguindiniz
[link] [comments]