“Building A Large Japanese Web Corpus For Large Language Models”, Okazaki Et Al 2024 (312b Characters) submitted by /u/gwern [link] [comments]0