Hi guys!
We have collected a multilingual corpus with text data and coordinates. The dataset is divided into the 123 most populated regions of the world: ~500,000 messages from social media + their coordinates, each in a separate json file according to the region. The dataset is suitable for tasks such as geotagging text data. Use it, share your opinion 🤗
PS we also have a similar dataset with timestamps, let me know if you need it 👾
submitted by /u/robvbar
[link] [comments]