20k Reddit Crypto Sentiment Dataset With Bitcoin Market Labels

I recently created my first public dataset focused on cryptocurrency sentiment analysis and Bitcoin market forecasting. The dataset contains around 20,000 Reddit posts collected from major crypto communities between 2017 and 2025 using the PRAW API.

It includes:

  • Reddit post metadata
  • Cleaned text features
  • Crypto-enhanced VADER sentiment
  • Custom FinBERT sentiment scores
  • Bitcoin prices and returns
  • Binary BTC movement labels for 1h, 6h, 12h, and 24h horizons

The dataset was built for financial NLP, sentiment analysis, and forecasting research. I am still learning dataset engineering and would appreciate feedback, suggestions, or ideas for improvement.

submitted by /u/Cyclo_Studios
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *