submitted by /u/gwern
[link] [comments]
Category: Datatards
Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?
I’m working on an ML project to find the cheapest ticket prices based on airlines but haven’t been able to secure any datasets. Any help is appreciated. Thank you
submitted by /u/Sarcasticsalad12
[link] [comments]
I’m currently searching for a dataset on the reading speed of persons with dyslexia. I try to find out what letters or letter combinations cause the most problems during reading.
Ideal would be a dataset of a text that has been copied by dyslexic people. (so source text and the same text written by multiple dyslexic people) or a dataset with sample sentences and time required to read them.
I know this is very specific, so suggestions on alternative data sources that I might infer this information from are also very welcome!
submitted by /u/X99p
[link] [comments]
Hello everyone, to pass time during my extra long summer break before starting college I decided to learn SQL through scraping and storing data from LinkedIn. Yesterday, I dumped all the data I collected to Kaggle in a csv format. It contains 27 columns in addition to several detached files containing info such as the benefits, industries, skills associated with each job (that’s right, I discovered what data table normalization is). There’s also a separate folder containing company information (name, desciption, size, employee_count, follower_count, industries).
I plan to run the collection script again next month, allowing for further analysis of trends such as company growth, salary changes, and job demand. Also if anyone wants, I can potentially share the scraper code on GitHub, although keep in mind you may get banned (especially with new accounts).
These are the columns of the main file:
[‘job_id’, ‘company_id’, ‘title’, ‘description’, ‘max_salary’, ‘med_salary’, ‘min_salary’, ‘pay_period’, ‘formatted_work_type’, ‘location’, ‘applies’, ‘original_listed_time’, ‘remote_allowed’, ‘views’,’job_posting_url’, ‘application_url’, ‘application_type’, ‘expiry’, ‘closed_time’, ‘formatted_experience_level’, ‘skills_desc’, ‘listed_time’, ‘posting_domain’, ‘sponsored’, ‘work_type’, ‘currency’, ‘compensation_type’]
Here’s the link to the dataset:
https://www.kaggle.com/datasets/arshkon/linkedin-job-postings
submitted by /u/Armi2
[link] [comments]
Hello all,
I come to you after 100s of google searches, 10s of hours spent squinting at my computer screen, and 1 near breakdown.
Basically, I’m trying to get demographics based on job types. For example, I’d like to know the average age, gender, income, and education level for real estate brokers in the U.S. I *think* the BLS has this data, but I have no idea how to find it. I would be eternally grateful if someone could point me in the right direction.
submitted by /u/starlit_ren
[link] [comments]
I am currently doing my Master thesis and this could be a huge help if someone could help me out. What is the deal premium called on Eikon Screener because I can’t find it in “add columns” section. Is it Price Premium. I am also trying to map ESG scores of target companies and financials. Should I use the PermID for Target? Pls help, this is kind of urgent especially if you’ve experience with this, pls pls help!
submitted by /u/Thick_Sun2297
[link] [comments]
I’ve got a scraped job postings dataset from Indeed (US). The data is updated daily, with roughly 5-10k new records new every day. The dataset has all the fields in the job offer. Title, description, salary, urgently hiring, etc. Data goes back to early this year.
I can offer it all bulk (as of today) or subscription to you if you’re interested in updates.
submitted by /u/conjecturer_
[link] [comments]
Anyone away of any datasets of scraped .onion pages? The only one that I’ve managed to locate is a dead link.
submitted by /u/williamp0044
[link] [comments]
Looking for a face dataset that is ethnically diverse, preferably with age/gender labels.
submitted by /u/jessifer_dr
[link] [comments]
Hey Guys,I am currently working on a project in which I will need a dataset of video clips in parking lots which are anotated with activities being done in that parking lot by humans both normal and anomalous like fighting ,car accidents and others,I would be very grateful if someone could suggest me such a dataset or at least tell me which ones contain such video clips so I can filter through those datasets, i have heard of the UFC crime dataset but it contains many diverse situations and I don’t know if there are any parking lot video clips in that one, thanks in advance for any help!
submitted by /u/Demonking6444
[link] [comments]
Hey! I’m trying to find a dataset that shows how sales went in clothing stores in different years and months and a similar dataset but for thrift shops, is there a website I can find it? Thank you!
submitted by /u/marrcs
[link] [comments]
Hi! I am a researcher and my current thesis is about PET/CT images. I would like to know where and how I can get images of normal brain, with brain tumor, or/ and with Alzheimer’s disease. It would be better if open access as I am still a student and have no financial support.
Thank you!
submitted by /u/zadessss
[link] [comments]
We built a Streamlit demo gallery to help you get started with Cybersyn datasets on Snowflake Marketplace. Some of our favorite apps cover:
Aggregated government data on demographics and economics FHFA standardized US single-family home appraisals Macroeconomic indicators and banking sector data
submitted by /u/aiatco2
[link] [comments]
Hello everyone,
I’ve been looking basically everywhere for a geojson file of the constituencies of the Italian Chamber of Deputies but to no prevail. I’ve checked github.com, dati.gov.it and public.opendatasoft.com and was only succesful in finding them as shp and svg. Does anyone know how to aquire such a geojson file?
submitted by /u/redditor95dk
[link] [comments]
Hi everybody,
I had an idea for the creation of a survival analysis of weddings.
I would like to find a dataset in which each row has a couple.
As a feature I would like the information of the husband and wife (dates of birth, city of residence before and after marriage, date of wedding, nationality, skin color…) and in case the date of separation/divorce.
I know these are somewhat complicated requests, but I hope there exists what I am looking for.
submitted by /u/Sim_Check
[link] [comments]
Hello everyone,
I’m an undergrad linguistic student currently studying Computational Linguistics and NLP. I live in Brazil and I plan to work with endangered languages in my area.
I’m researching a method of creating language models of non-catalogued languages, or of languages with a small amount of data. I also plan to go to one of those groups to collect data, but that is far in the future.
Finally, I’m looking for any dataset in a language that is not modeled yet (my base is that is not in Google Translate), or in an endangered language. Any type of suggestion or comment is welcome.
Thanks for taking the time to read this and help me.
P.S.: I’m not an expert, just a student trying to do some research that can help my community.
submitted by /u/Pinguindiniz
[link] [comments]
For the last few years I’ve contributed to a side project concerned with curating and maintaining supplemental data related to the Hospital Price Transparency and Transparency in Coverage regulations.
The goal is to make data provided in accord with those regulations more accessible, transparent, and actionable in a maintainable and consistent way.
For example, there are many recent efforts that have attempted to collect all of the underlying price data into databases, and to do so, they need to scrape all of the files served by hospitals, which are unfortunately not required to be centralized. To do that scraping, you need hospital domains, and knowledge of how and where they serve their files. That sort of data is meant to be maintained in this repository.
Just finally got around to adding some new data after a long hiatus, so thought it’d be a nice time to reshare here: https://github.com/TPAFS/transparency-data
Would appreciate your thoughts and feedback!
submitted by /u/tpafs
[link] [comments]
The POI data covers hundreds of categories ranging from restaurants and parks to commercial brands and hospitals.
Each point of interest includes a name, location, and category and is joinable to Cybersyn address data. Overture Maps is an open data project steered by Amazon, Meta, Microsoft, and TomTom that aggregates map data from multiple sources. The first Overture Maps open dataset was released this July.
Example use cases: Finding the nearest competitors to a specific merchant, identifying target markets with a high concentration of stores to sell into, finding all healthcare facilities or schools near a given location, building or enhancing map applications.
Access the data products, including sample queries and data dictionaries, here:
US Points of Interest & Addresses
US Housing & Real Estate Essentials
submitted by /u/aiatco2
[link] [comments]
Hi, I am creating an application on Plant Analysis and disease detection. Is there any specific dataset that is available where I can get ALL Plant Identification, ALL Disease Detection and ALL Plant Description (after identification)?
I have found multiple datasets online but they are all in portions, resulting in me having to do data cleaning which is quite time consuming.
It would be of great help if anyone knows or has a source for an all in one type dataset.
submitted by /u/aka1432
[link] [comments]
I would use it to my thesis. I can’t really find them. Thank you!
submitted by /u/EcstaticButterfly862
[link] [comments]
Can someone please share access with me to Pitchbook as I would love to use it for writing my paper on venture capital and investments. Please let me know if someone is willing to share either through here or DM, as I need to write my paper as fast as possible and would appreciate any help with gaining access. I have requested a free trial but they are slow in responding.
Thank you in advance!
submitted by /u/analsage
[link] [comments]
Hi everyone,
Data i found in statista are very relatable to the research I am conducting at the moment, and I dont even mind paying them, but issue is they are asking only for annual subscription, they dont have any monthly plans, and converted to my currency, its just to much to invest for the research.
Thus, I thought if there is any possible alternative to this, it would be really good. Thanks 🙂
submitted by /u/VictoryWide1495
[link] [comments]
Hi everyone! Thanks in advance for taking time to read this.
I am new to data analysis. I have some ability to code on SQL and visualise on Power BI but I wanted to put it to practice on SQL Server Management but have no data sets and have no idea where to find these.
If anyone could be kind enough to please give me a list of sites I can get datasets from then I would really be grateful as I am desperately trying to build my portfolio!
Thanks again to all!
submitted by /u/DesertTraderr
[link] [comments]
Does anyone have dataset of all the characters of all spoken languages (modern and ancient)?
submitted by /u/maifee
[link] [comments]
Does anyone have dataset of all the characters of all spoken languages (modern and ancient)?
submitted by /u/maifee
[link] [comments]
So that I can make a base-150_000 encoding algorithm.
submitted by /u/TheYummyDogo
[link] [comments]
Are there any free APIs or sources for public weather data available? Please provide links. I need the data to cover the period from 2015 up until yesterday’s date while ensuring it is up-to-date.
submitted by /u/No_Rate6878
[link] [comments]