[PAID] FragDB: 119K Fragrances, 7.2K Brands, 2.7K Perfumers — Free Sample On GitHub & Kaggle

Disclosure: I’m the creator of FragDB. The sample is free and MIT licensed. The full database is a paid product.

I’m releasing a structured fragrance database with a free sample for the community.

What’s in the database

File Records Fields
fragrances.csv 119,000+ 28
brands.csv 7,200+ 10
perfumers.csv 2,700+ 11

Data highlights

Fragrances include: – Notes pyramid (top/mid/base layers with ingredient names) – Accords with strength percentages (woody:100, amber:85, etc.) – Community ratings (19.8M total votes) – Longevity & sillage votes (9.3M and 10.1M respectively) – Season suitability (winter/spring/summer/fall percentages) – “People also like” recommendations

Brands include: – Country of origin – Parent company (LVMH, Kering, etc.) – Logo URLs – Official websites

Perfumers include: – Professional status (Master Perfumer, etc.) – Current and previous employers – Education background – Biography

Technical specs

  • Format: Pipe-delimited CSV
  • Encoding: UTF-8
  • Relational structure via IDs (fragrances → brands, fragrances → perfumers)
  • Year range: 1533–2026

Free sample

The sample includes 10 fragrances (Chanel, Dior, Tom Ford, YSL, etc.) with matching brands and perfumers — enough to test your pipelines and see the data quality.

Links

Quick start

“`python import pandas as pd

fragrances = pd.read_csv(‘fragrances.csv’, sep=’|’) brands = pd.read_csv(‘brands.csv’, sep=’|’) perfumers = pd.read_csv(‘perfumers.csv’, sep=’|’)

Join tables

fragrances[‘brand_id’] = fragrances[‘brand’].str.split(‘;’).str[1] df = fragrances.merge(brands, left_on=’brand_id’, right_on=’id’)

print(df[[‘name’, ‘name_brand’, ‘country’, ‘rating’]]) “`

Happy to answer any questions about the data structure.

submitted by /u/FragDBnet
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *