Published a large-scale tick-level dataset from Polymarket, the largest prediction market. Useful for microstructure research, market efficiency studies, and ML on event-driven markets.
Scale:
| Metric | Count |
|---|---|
| Orderbook ticks | 5,555,777,555 |
| L2 depth snapshots | 51,674,425 |
| Trade executions | 4,126,076 |
| Markets tracked | 123,895 |
| Resolved markets | 23,146 |
| ML feature bars | 5,587,547 |
| Coverage | 21 continuous days |
| Null values | 0 |
Format: Daily Parquet files (ZSTD compressed), around 40 GB total. Includes pre-built 1-minute bar features with L2 depth imbalance ready for ML training on Kaggle’s free tier.
License: CC-BY-NC-4.0 (non-commercial/academic)
Link: https://www.kaggle.com/datasets/marvingozo/polymarket-tick-level-orderbook-dataset
Use cases: HFT signal detection, market maker strategy research, prediction efficiency studies, order flow toxicity (VPIN), cross-market correlation, event study analysis.
submitted by /u/Upset-Fly-454
[link] [comments]