I Scraped And Cleaned 50,000+ Career Discussion Threads From R/AskEngineers And R/EngineeringStudents. Here Is The Tool I Used.

I couldn’t find a good dataset that mapped the “Skills Gap” between university and industry, so I built a local scraper to create one.

The Data:

  • Volume: ~52,000 threads.
  • Fields: Title, Body, Top Comments, Sentiment.
  • Focus: Keywords relating to “Exams” vs “Workplace Tools”.

I built the extractor (ORION) to run locally so I wouldn’t get IP banned. It uses requests and smart rate-limiting.

You can grab the tool and the extraction logic here: https://mrweeb0.github.io/ORION-tool-showcase/

Feel free to fork it if you want to scrape other career subreddits (like Nursing or CS).

submitted by /u/No-Associate-6068
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *