[Dataset] [30 Trillion Tokens] “HPLT 3.0: Very Large-Scale Multilingual Resources For LLM And MT. Mono- And Bi-lingual Data, Multilingual Evaluation, And Pre-Trained Models”, Oepen Et Al. 2025 Dataset(s): https://hplt-project.org/datasets/v3.0 Paper: https://arxiv.org/abs/2511.01066 submitted by /u/RecmacfonD [link] [comments]0