Hi everyone,
I’m a high school student working on my AP Research project, and I’m running into some issues with data collection that I could really use help with. My study focuses on analyzing how Reddit-driven stock recommendations impact long-term investment decisions. I’m specifically looking at subreddits like r/wallstreetbets, r/stock, r/investing, and r/SecurityAnalysis to track sentiment around different stocks and see if that sentiment can predict stock performance over time.
I had originally planned to use the Pushshift API to collect historical Reddit data, but with Reddit’s recent API changes, Pushshift no longer works. Since I’m pretty new to programming and APIs, I’m not sure what the best alternative is. I’ve tried looking into PRAW, but I’m concerned about its limitations when it comes to accessing older posts.
Here’s what I need:
A reliable way to collect historical Reddit posts (from 2022 to 2025 if possible). Advice on whether PRAW can handle this, or if there’s another tool or method I should use. Suggestions for workarounds or public datasets that might help with historical Reddit data.
Since this is part of a project I hope to eventually publish, I’m really eager to find a solution. I’d love any advice, resources, or guidance you can offer, especially considering I’m new to this and learning as I go.
Here’s a link to my original methodology plan if it helps clear up some questions. Feel free to add coments to the document!
submitted by /u/Immediate-Today-8157
[link] [comments]