I’m looking for English text-only datasets from December 2014 or earlier. Specifically, I’m interested in datasets that cover a broad range of topics, and it would be useful if they are free of spam or low-quality content. I’d like them to be from twitter, reddit, Discord, or emails.
If anyone knows where I can find those kind of datasets or has access to them, please let me know. Your help is greatly appreciated!
Thanks in advance!
(I’m making an LLM for my games dialogue system and the game is set in 2014)
submitted by /u/Affectionate-Bird883
[link] [comments]