This dataset contains:
- GitHub repository embeddings learned from star co-occurrence.
- Raw data for training such embeddings (2016 – 2025 years)
It is generated by the same pipeline as this repo and is intended for offline analysis, research, and downstream search/indexing.
submitted by /u/___mlm___
[link] [comments]