Google Drive: https://drive.google.com/drive/folders/1MkAiT8Zgm2bF-BWKYOdhVOJS-eduIofb?usp=sharing
JazzSet Dataset:
A remarkably large dataset of digitized high quality full length jazz session recordings from 1905 to 1966 with instrumentation and performer details annotated.
Statistics: • 40,329 recordings with 399,761 total performance credits.
• 275 credited instrument types or roles for 12,585 individual perfomers.
• 11,421 marked examples of 843 jazz “standards” (Songs with 5 or more examples).
• 2,202.21952 hours (91.75914 days) of audio. 245 GB, mp3.
• Sourced from a well curated session-date specific public domain collection.
• for 35,201 tracks definite (as identified by match to one or more Discogs.com releases by record and catalog number) or probable (by matching names for those individuals who’s names are unambiguous for Discogs artists) Discogs IDs are recorded to aid future metadata cleaning and improvement, and to help ensure specific identification of performers especially if these mappings can be expanded in the future.
All but the audio archive will also be placed on a Neocities page I’ve set up for the project (https://saleach.neocities.org/jazzset/) – all audio in the archive has also been uploaded to the Internet Archive’s “Great 78” project and each card has a direct archive.org file download url so you can explore the set – and download suitable subsets of training material when downloading the entire enormous archive is not practical.
submitted by /u/returnstack
[link] [comments]