Hello everyone, to pass time during my extra long summer break before starting college I decided to learn SQL through scraping and storing data from LinkedIn. Yesterday, I dumped all the data I collected to Kaggle in a csv format. It contains 27 columns in addition to several detached files containing info such as the benefits, industries, skills associated with each job (that’s right, I discovered what data table normalization is). There’s also a separate folder containing company information (name, desciption, size, employee_count, follower_count, industries).
I plan to run the collection script again next month, allowing for further analysis of trends such as company growth, salary changes, and job demand. Also if anyone wants, I can potentially share the scraper code on GitHub, although keep in mind you may get banned (especially with new accounts).
These are the columns of the main file:
[‘job_id’, ‘company_id’, ‘title’, ‘description’, ‘max_salary’, ‘med_salary’, ‘min_salary’, ‘pay_period’, ‘formatted_work_type’, ‘location’, ‘applies’, ‘original_listed_time’, ‘remote_allowed’, ‘views’,’job_posting_url’, ‘application_url’, ‘application_type’, ‘expiry’, ‘closed_time’, ‘formatted_experience_level’, ‘skills_desc’, ‘listed_time’, ‘posting_domain’, ‘sponsored’, ‘work_type’, ‘currency’, ‘compensation_type’]
Here’s the link to the dataset:
https://www.kaggle.com/datasets/arshkon/linkedin-job-postings
submitted by /u/Armi2
[link] [comments]