Hello, everyone, we repurposed data from an old project into a Kaggle dataset⬇️ Happy to hear your thoughts and feedback
What this is about:
Major US retailers run hundreds of promotions daily – but there’s no clean, structured source to track them over time. I built a pipeline that scrapes 5 major e-commerce sites daily and extracts every promo, coupon code, and deal into a structured format using GPT-4o-mini and Llama.
Covers Office Depot, Ulta, Home Depot, 1800Flowers, and Shutterfly (for now) – with discount type, value, expiration date, and source URL for every record.
A few things the data shows right now:
- Office Depot dominates volume: 73 promos today vs 10 for Home Depot
- Ulta and 1800Flowers both hit 50% as their max discount: beauty and flowers are aggressive
- Only 4% of promos have coupon codes: most discounts are applied automatically at checkout
- Home Depot ran 228 promos on April 8th: likely a flash sale event worth investigating
You can find it here: https://www.kaggle.com/datasets/indext-data-lab-ai/promos-dataset
4,955+ records collected over 37 days and counting. Next update tomorrow morning
submitted by /u/KaiseyTayl
[link] [comments]