Hey everyone,
(Disclosure: I built this dataset and pipeline myself).
I created a strict Python pipeline to solve the time-drift issue with public financial news APIs. I scraped 400+ high-impact crypto news events (Nov 2025 – May 2026) and mapped their exact UTC publication timestamps directly to 1-minute Binance BTC/USDT candles.
The dataset provides clean T0 anchors and forward-mapped price snapshots (T+5m, T+15m) so you can backtest event-driven volatility decay without look-ahead bias.
The open-source sample and the EDA notebook just received a Bronze medal on Kaggle! You can download the free sample, check out the methodology, and see the visual volatility decay analysis here:
https://www.kaggle.com/datasets/yevheniipylypchuk/bitcoin-news-vs-1m-btc-price-action-2025-26
(Note regarding Rule 5: The Kaggle link above provides a free sample for EDA and initial modeling. If you find the methodology sound and need the full unredacted 6-month historical dataset for heavy backtesting, I do sell the complete version on my Gumroad. You can find that link inside the Kaggle notebook).
Let me know if you have any questions about the timezone synchronization or the scraping logic!
submitted by /u/talissman_7
[link] [comments]