reposting this here. But I’ve built out a crawler that obtains live job listings across 5.6 million US company websites, and continuously updates a monthly pool of job listing data.
I’ve seen other people doing this on reddit but refusing to be transparent and actually share their datasets for download.
My airflow dags complete a full crawling cycle of all companies and their associated job boards in under 24 hours. This is on a windows machine and modest home network so my operating costs are near zero.
This data will remain forever free @ jobdatapool.com
submitted by /u/never_sleeping99
[link] [comments]