Code Dataset From Github’s Top Ranked Developers (1.3M+ Source Code Files)

I curated 1.3M+ source code files from GitHub’s top ranked developers of all time, and compiled a dataset to train LLMs to write well-structured, production-grade code.

The dataset covers 80+ languages including Python, TypeScript, Rust, Go, C/C++, and more.

submitted by /u/Ok_Employee_6418
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *