submitted by /u/growth_man
[link] [comments]
Category: Datatards
Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?
submitted by /u/x9182
[link] [comments]
I am trying to find data on the number of casualties over time during World War 2: how are deaths distributed over the course of the war? The closest data I could find is for Italy only, but I am interested in the combined, world-wide deaths over time.
Ideally, I am looking for the number of deaths per month over the course of the war. It would be less ideal, but still ok, to have data at lower frequency.
Does anyone know if there is such data somewhere? If not, I could estimate these numbers by calculating the excess deaths over that time period. Any thoughts on that? Thanks!
submitted by /u/matmerda
[link] [comments]
Hi r/datasets! I’m looking for an AZ real estate dataset from recent years that contains any or all of the following attributes:
Price: The selling or listing price of the property. Size: Total square footage or square meters of the property. Bedrooms: Number of bedrooms. Bathrooms: Number of bathrooms. Property Type: e.g., Single-family, condo, townhouse. Year Built: The year the property was constructed. City: City where the property is located. ZIP Code: ZIP or postal code of the property. Days on Market: Number of days the property has been listed on the market.
Is scraping Zillow the best option? Would appreciate any advice, thanks!
submitted by /u/abc1203218
[link] [comments]
Hi,
I’m looking for use online behavior dataset, different websites people visiting, annonymized dataset.
Either public dataset or I can pay for it. Any pointers from where I can get such data.
submitted by /u/Winter-Breadfruit943
[link] [comments]
Ya que moi a croire que ENTSOG est un grand Mensonge.. les données sont la plupart du temps erronées et en les analysant ya toujours des problèmes .. comparant à d’autres sources officielles…
submitted by /u/No_Rate6878
[link] [comments]
I’m working on an ML project to find the cheapest ticket prices based on airlines but haven’t been able to secure any datasets. Any help is appreciated. Thank you
submitted by /u/Sarcasticsalad12
[link] [comments]
I’m currently searching for a dataset on the reading speed of persons with dyslexia. I try to find out what letters or letter combinations cause the most problems during reading.
Ideal would be a dataset of a text that has been copied by dyslexic people. (so source text and the same text written by multiple dyslexic people) or a dataset with sample sentences and time required to read them.
I know this is very specific, so suggestions on alternative data sources that I might infer this information from are also very welcome!
submitted by /u/X99p
[link] [comments]
Hello everyone, to pass time during my extra long summer break before starting college I decided to learn SQL through scraping and storing data from LinkedIn. Yesterday, I dumped all the data I collected to Kaggle in a csv format. It contains 27 columns in addition to several detached files containing info such as the benefits, industries, skills associated with each job (that’s right, I discovered what data table normalization is). There’s also a separate folder containing company information (name, desciption, size, employee_count, follower_count, industries).
I plan to run the collection script again next month, allowing for further analysis of trends such as company growth, salary changes, and job demand. Also if anyone wants, I can potentially share the scraper code on GitHub, although keep in mind you may get banned (especially with new accounts).
These are the columns of the main file:
[‘job_id’, ‘company_id’, ‘title’, ‘description’, ‘max_salary’, ‘med_salary’, ‘min_salary’, ‘pay_period’, ‘formatted_work_type’, ‘location’, ‘applies’, ‘original_listed_time’, ‘remote_allowed’, ‘views’,’job_posting_url’, ‘application_url’, ‘application_type’, ‘expiry’, ‘closed_time’, ‘formatted_experience_level’, ‘skills_desc’, ‘listed_time’, ‘posting_domain’, ‘sponsored’, ‘work_type’, ‘currency’, ‘compensation_type’]
Here’s the link to the dataset:
https://www.kaggle.com/datasets/arshkon/linkedin-job-postings
submitted by /u/Armi2
[link] [comments]
Hello all,
I come to you after 100s of google searches, 10s of hours spent squinting at my computer screen, and 1 near breakdown.
Basically, I’m trying to get demographics based on job types. For example, I’d like to know the average age, gender, income, and education level for real estate brokers in the U.S. I *think* the BLS has this data, but I have no idea how to find it. I would be eternally grateful if someone could point me in the right direction.
submitted by /u/starlit_ren
[link] [comments]
I am currently doing my Master thesis and this could be a huge help if someone could help me out. What is the deal premium called on Eikon Screener because I can’t find it in “add columns” section. Is it Price Premium. I am also trying to map ESG scores of target companies and financials. Should I use the PermID for Target? Pls help, this is kind of urgent especially if you’ve experience with this, pls pls help!
submitted by /u/Thick_Sun2297
[link] [comments]
I’ve got a scraped job postings dataset from Indeed (US). The data is updated daily, with roughly 5-10k new records new every day. The dataset has all the fields in the job offer. Title, description, salary, urgently hiring, etc. Data goes back to early this year.
I can offer it all bulk (as of today) or subscription to you if you’re interested in updates.
submitted by /u/conjecturer_
[link] [comments]
Anyone away of any datasets of scraped .onion pages? The only one that I’ve managed to locate is a dead link.
submitted by /u/williamp0044
[link] [comments]
Looking for a face dataset that is ethnically diverse, preferably with age/gender labels.
submitted by /u/jessifer_dr
[link] [comments]
Hey Guys,I am currently working on a project in which I will need a dataset of video clips in parking lots which are anotated with activities being done in that parking lot by humans both normal and anomalous like fighting ,car accidents and others,I would be very grateful if someone could suggest me such a dataset or at least tell me which ones contain such video clips so I can filter through those datasets, i have heard of the UFC crime dataset but it contains many diverse situations and I don’t know if there are any parking lot video clips in that one, thanks in advance for any help!
submitted by /u/Demonking6444
[link] [comments]
Hey! I’m trying to find a dataset that shows how sales went in clothing stores in different years and months and a similar dataset but for thrift shops, is there a website I can find it? Thank you!
submitted by /u/marrcs
[link] [comments]
Hi! I am a researcher and my current thesis is about PET/CT images. I would like to know where and how I can get images of normal brain, with brain tumor, or/ and with Alzheimer’s disease. It would be better if open access as I am still a student and have no financial support.
Thank you!
submitted by /u/zadessss
[link] [comments]
We built a Streamlit demo gallery to help you get started with Cybersyn datasets on Snowflake Marketplace. Some of our favorite apps cover:
Aggregated government data on demographics and economics FHFA standardized US single-family home appraisals Macroeconomic indicators and banking sector data
submitted by /u/aiatco2
[link] [comments]
Hello everyone,
I’ve been looking basically everywhere for a geojson file of the constituencies of the Italian Chamber of Deputies but to no prevail. I’ve checked github.com, dati.gov.it and public.opendatasoft.com and was only succesful in finding them as shp and svg. Does anyone know how to aquire such a geojson file?
submitted by /u/redditor95dk
[link] [comments]
Hi everybody,
I had an idea for the creation of a survival analysis of weddings.
I would like to find a dataset in which each row has a couple.
As a feature I would like the information of the husband and wife (dates of birth, city of residence before and after marriage, date of wedding, nationality, skin color…) and in case the date of separation/divorce.
I know these are somewhat complicated requests, but I hope there exists what I am looking for.
submitted by /u/Sim_Check
[link] [comments]
Hello everyone,
I’m an undergrad linguistic student currently studying Computational Linguistics and NLP. I live in Brazil and I plan to work with endangered languages in my area.
I’m researching a method of creating language models of non-catalogued languages, or of languages with a small amount of data. I also plan to go to one of those groups to collect data, but that is far in the future.
Finally, I’m looking for any dataset in a language that is not modeled yet (my base is that is not in Google Translate), or in an endangered language. Any type of suggestion or comment is welcome.
Thanks for taking the time to read this and help me.
P.S.: I’m not an expert, just a student trying to do some research that can help my community.
submitted by /u/Pinguindiniz
[link] [comments]
For the last few years I’ve contributed to a side project concerned with curating and maintaining supplemental data related to the Hospital Price Transparency and Transparency in Coverage regulations.
The goal is to make data provided in accord with those regulations more accessible, transparent, and actionable in a maintainable and consistent way.
For example, there are many recent efforts that have attempted to collect all of the underlying price data into databases, and to do so, they need to scrape all of the files served by hospitals, which are unfortunately not required to be centralized. To do that scraping, you need hospital domains, and knowledge of how and where they serve their files. That sort of data is meant to be maintained in this repository.
Just finally got around to adding some new data after a long hiatus, so thought it’d be a nice time to reshare here: https://github.com/TPAFS/transparency-data
Would appreciate your thoughts and feedback!
submitted by /u/tpafs
[link] [comments]
The POI data covers hundreds of categories ranging from restaurants and parks to commercial brands and hospitals.
Each point of interest includes a name, location, and category and is joinable to Cybersyn address data. Overture Maps is an open data project steered by Amazon, Meta, Microsoft, and TomTom that aggregates map data from multiple sources. The first Overture Maps open dataset was released this July.
Example use cases: Finding the nearest competitors to a specific merchant, identifying target markets with a high concentration of stores to sell into, finding all healthcare facilities or schools near a given location, building or enhancing map applications.
Access the data products, including sample queries and data dictionaries, here:
US Points of Interest & Addresses
US Housing & Real Estate Essentials
submitted by /u/aiatco2
[link] [comments]
Hi, I am creating an application on Plant Analysis and disease detection. Is there any specific dataset that is available where I can get ALL Plant Identification, ALL Disease Detection and ALL Plant Description (after identification)?
I have found multiple datasets online but they are all in portions, resulting in me having to do data cleaning which is quite time consuming.
It would be of great help if anyone knows or has a source for an all in one type dataset.
submitted by /u/aka1432
[link] [comments]
I would use it to my thesis. I can’t really find them. Thank you!
submitted by /u/EcstaticButterfly862
[link] [comments]
Can someone please share access with me to Pitchbook as I would love to use it for writing my paper on venture capital and investments. Please let me know if someone is willing to share either through here or DM, as I need to write my paper as fast as possible and would appreciate any help with gaining access. I have requested a free trial but they are slow in responding.
Thank you in advance!
submitted by /u/analsage
[link] [comments]