Sharing a dataset I recorded because nothing like it seems to exist publicly: the order book
of Polymarket’s 5-minute crypto up/down markets, sampled once per second.
- ~89,000 markets across 7 coins (BTC, ETH, SOL, XRP, DOGE, HYPE, BNB)
- ~26.8M per-second rows (~300 per market), Mar–May 2026, UTC
- Two Parquet tables per coin, joined on `condition_id`: `markets` (one row per 5-min market) and `ticks` (one row per second)
- Per tick: best bid/ask, resting sizes, and bid-side 5¢ depth for both the Up and Down outcome – ~725MB total, 99.8%+ coverage, no duplicates
- Licence: CC0 (public domain)
Caveats up front: fixed window (collection ended 18 May 2026), outcome is inferred from
the final tick rather than read on-chain, ask-side depth isn’t recorded, and there are ~1.5h
of collector outages over the span (shared across all coins, so collector hiccups rather
than market-data loss). Full data dictionary and coverage audit are in the write-up.
Hugging Face: https://huggingface.co/datasets/kachoio/polymarket-5-minute-crypto-up-down-markets
Kaggle: https://www.kaggle.com/datasets/kachoio/polymarket-5-minute-crypto-updown-markets
Write-up (schema, provenance, limitations): https://kacho.io/polymarket-5min-crypto-dataset
submitted by /u/File-Environmental
[link] [comments]