Periodically Updated Dataset Of All Public Repositories On GitHub With Their Description

Does it exist? I am aware of GitHub Archive on Big Query and presumably it could be used to get this dataset but it would be really inefficient because GitHub Archive contains all “events” on GitHub like git push, commits, issues etc. I will need to read the entire dataset to get all the public repositories.

There is another dataset on big query publicly hosted by Google containing all packages on Pypi, Maven, npm etc but I also need repositories which are not necessarily packages.

Any help is appreciated.

submitted by /u/GullibleEngineer4
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *