Disclosure: I’m the creator of FragDB. The sample is free and MIT licensed. The full database is a paid product.
I’m releasing a structured fragrance database with a free sample for the community.
What’s in the database
| File | Records | Fields |
|---|---|---|
| fragrances.csv | 119,000+ | 28 |
| brands.csv | 7,200+ | 10 |
| perfumers.csv | 2,700+ | 11 |
Data highlights
Fragrances include: – Notes pyramid (top/mid/base layers with ingredient names) – Accords with strength percentages (woody:100, amber:85, etc.) – Community ratings (19.8M total votes) – Longevity & sillage votes (9.3M and 10.1M respectively) – Season suitability (winter/spring/summer/fall percentages) – “People also like” recommendations
Brands include: – Country of origin – Parent company (LVMH, Kering, etc.) – Logo URLs – Official websites
Perfumers include: – Professional status (Master Perfumer, etc.) – Current and previous employers – Education background – Biography
Technical specs
- Format: Pipe-delimited CSV
- Encoding: UTF-8
- Relational structure via IDs (fragrances → brands, fragrances → perfumers)
- Year range: 1533–2026
Free sample
The sample includes 10 fragrances (Chanel, Dior, Tom Ford, YSL, etc.) with matching brands and perfumers — enough to test your pipelines and see the data quality.
Links
- GitHub: https://github.com/FragDB/fragrance-database
- Kaggle: https://www.kaggle.com/datasets/eriklindqvist/fragdb-fragrance-database
- Full database: https://fragdb.net
Quick start
“`python import pandas as pd
fragrances = pd.read_csv(‘fragrances.csv’, sep=’|’) brands = pd.read_csv(‘brands.csv’, sep=’|’) perfumers = pd.read_csv(‘perfumers.csv’, sep=’|’)
Join tables
fragrances[‘brand_id’] = fragrances[‘brand’].str.split(‘;’).str[1] df = fragrances.merge(brands, left_on=’brand_id’, right_on=’id’)
print(df[[‘name’, ‘name_brand’, ‘country’, ‘rating’]]) “`
Happy to answer any questions about the data structure.
submitted by /u/FragDBnet
[link] [comments]