I’ve released a new dataset built from the EU’s Tenders Electronic Daily (TED) portal, which publishes official public procurement notices from across Europe.
- Source: Official TED monthly XML package for August 2025
- Processing: Parsed into a clean tabular CSV, normalized fields, and enriched with CPV 2008 labels (Common Procurement Vocabulary).
- Contents (sample):
notice_id— unique identifierpublication_date— ISO 8601 formatbuyer_id— anonymized buyer referencecpv_code+cpv_label— procurement category (CPV 2008)lot_id,lot_name,lot_descriptionaward_value,currencysource_file— original TED XML reference
This free sample contains 100 rows representative of the full dataset (~200k rows).
Sample dataset on Hugging Face
If you’re interested in the full month (200k+ notices), it’s available here:
Full dataset on Gumroad
Suggested uses: training NLP/ML models (NER, classification, forecasting), procurement market analysis, transparency research.
Feedback welcome — I’d love to hear how others might use this or what extra enrichments would be most useful.
submitted by /u/OpenMLDatasets
[link] [comments]