[Self-promotion] A Daily LLM-powered Scraper That Structures E-commerce Promos Into Clean CSV/JSON/Parquet – Free On Kaggle

Hello, everyone, we repurposed data from an old project into a Kaggle dataset⬇️ Happy to hear your thoughts and feedback

What this is about:
Major US retailers run hundreds of promotions daily – but there’s no clean, structured source to track them over time. I built a pipeline that scrapes 5 major e-commerce sites daily and extracts every promo, coupon code, and deal into a structured format using GPT-4o-mini and Llama.

Covers Office Depot, Ulta, Home Depot, 1800Flowers, and Shutterfly (for now) – with discount type, value, expiration date, and source URL for every record.

A few things the data shows right now:

  • Office Depot dominates volume: 73 promos today vs 10 for Home Depot
  • Ulta and 1800Flowers both hit 50% as their max discount: beauty and flowers are aggressive
  • Only 4% of promos have coupon codes: most discounts are applied automatically at checkout
  • Home Depot ran 228 promos on April 8th: likely a flash sale event worth investigating

You can find it here: https://www.kaggle.com/datasets/indext-data-lab-ai/promos-dataset

4,955+ records collected over 37 days and counting. Next update tomorrow morning

submitted by /u/KaiseyTayl
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *