Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Help Finding A Dataset With WWII Deaths Over Time

I am trying to find data on the number of casualties over time during World War 2: how are deaths distributed over the course of the war? The closest data I could find is for Italy only, but I am interested in the combined, world-wide deaths over time.

Ideally, I am looking for the number of deaths per month over the course of the war. It would be less ideal, but still ok, to have data at lower frequency.

Does anyone know if there is such data somewhere? If not, I could estimate these numbers by calculating the excess deaths over that time period. Any thoughts on that? Thanks!

submitted by /u/matmerda
[link] [comments]

Best Place To Find Data On Real Estate Transactions In Arizona?

Hi r/datasets! I’m looking for an AZ real estate dataset from recent years that contains any or all of the following attributes:

Price: The selling or listing price of the property. Size: Total square footage or square meters of the property. Bedrooms: Number of bedrooms. Bathrooms: Number of bathrooms. Property Type: e.g., Single-family, condo, townhouse. Year Built: The year the property was constructed. City: City where the property is located. ZIP Code: ZIP or postal code of the property. Days on Market: Number of days the property has been listed on the market.

Is scraping Zillow the best option? Would appreciate any advice, thanks!

submitted by /u/abc1203218
[link] [comments]

Looking For Dyslexia Reading Speed And Letter Mix Up Dataset

I’m currently searching for a dataset on the reading speed of persons with dyslexia. I try to find out what letters or letter combinations cause the most problems during reading.

Ideal would be a dataset of a text that has been copied by dyslexic people. (so source text and the same text written by multiple dyslexic people) or a dataset with sample sentences and time required to read them.

I know this is very specific, so suggestions on alternative data sources that I might infer this information from are also very welcome!

submitted by /u/X99p
[link] [comments]

[self-promotion] Kaggle: 16,000+ LinkedIn Job Postings From Last Week

Hello everyone, to pass time during my extra long summer break before starting college I decided to learn SQL through scraping and storing data from LinkedIn. Yesterday, I dumped all the data I collected to Kaggle in a csv format. It contains 27 columns in addition to several detached files containing info such as the benefits, industries, skills associated with each job (that’s right, I discovered what data table normalization is). There’s also a separate folder containing company information (name, desciption, size, employee_count, follower_count, industries).

I plan to run the collection script again next month, allowing for further analysis of trends such as company growth, salary changes, and job demand. Also if anyone wants, I can potentially share the scraper code on GitHub, although keep in mind you may get banned (especially with new accounts).

These are the columns of the main file:

[‘job_id’, ‘company_id’, ‘title’, ‘description’, ‘max_salary’, ‘med_salary’, ‘min_salary’, ‘pay_period’, ‘formatted_work_type’, ‘location’, ‘applies’, ‘original_listed_time’, ‘remote_allowed’, ‘views’,’job_posting_url’, ‘application_url’, ‘application_type’, ‘expiry’, ‘closed_time’, ‘formatted_experience_level’, ‘skills_desc’, ‘listed_time’, ‘posting_domain’, ‘sponsored’, ‘work_type’, ‘currency’, ‘compensation_type’]

Here’s the link to the dataset:

https://www.kaggle.com/datasets/arshkon/linkedin-job-postings

submitted by /u/Armi2
[link] [comments]

Complete Noob Requests Help With BLS Data

Hello all,

I come to you after 100s of google searches, 10s of hours spent squinting at my computer screen, and 1 near breakdown.

Basically, I’m trying to get demographics based on job types. For example, I’d like to know the average age, gender, income, and education level for real estate brokers in the U.S. I *think* the BLS has this data, but I have no idea how to find it. I would be eternally grateful if someone could point me in the right direction.

submitted by /u/starlit_ren
[link] [comments]

M&A Deal Premium On Refinitiv Eikon?

I am currently doing my Master thesis and this could be a huge help if someone could help me out. What is the deal premium called on Eikon Screener because I can’t find it in “add columns” section. Is it Price Premium. I am also trying to map ESG scores of target companies and financials. Should I use the PermID for Target? Pls help, this is kind of urgent especially if you’ve experience with this, pls pls help!

submitted by /u/Thick_Sun2297
[link] [comments]

[self-promotion] Indeed Dataset 730k Records

I’ve got a scraped job postings dataset from Indeed (US). The data is updated daily, with roughly 5-10k new records new every day. The dataset has all the fields in the job offer. Title, description, salary, urgently hiring, etc. Data goes back to early this year.

I can offer it all bulk (as of today) or subscription to you if you’re interested in updates.

submitted by /u/conjecturer_
[link] [comments]

Parking Lots Anomalous Activities Video Dataset

Hey Guys,I am currently working on a project in which I will need a dataset of video clips in parking lots which are anotated with activities being done in that parking lot by humans both normal and anomalous like fighting ,car accidents and others,I would be very grateful if someone could suggest me such a dataset or at least tell me which ones contain such video clips so I can filter through those datasets, i have heard of the UFC crime dataset but it contains many diverse situations and I don’t know if there are any parking lot video clips in that one, thanks in advance for any help!

submitted by /u/Demonking6444
[link] [comments]

Looking For A Dataset For Divorce Forecasting Analysis

Hi everybody,

I had an idea for the creation of a survival analysis of weddings.

I would like to find a dataset in which each row has a couple.
As a feature I would like the information of the husband and wife (dates of birth, city of residence before and after marriage, date of wedding, nationality, skin color…) and in case the date of separation/divorce.
I know these are somewhat complicated requests, but I hope there exists what I am looking for.

submitted by /u/Sim_Check
[link] [comments]

Looking For A Text Or Audio Dataset In A Language That Is Not In Google Translate

Hello everyone,

I’m an undergrad linguistic student currently studying Computational Linguistics and NLP. I live in Brazil and I plan to work with endangered languages in my area.

I’m researching a method of creating language models of non-catalogued languages, or of languages with a small amount of data. I also plan to go to one of those groups to collect data, but that is far in the future.

Finally, I’m looking for any dataset in a language that is not modeled yet (my base is that is not in Google Translate), or in an endangered language. Any type of suggestion or comment is welcome.

Thanks for taking the time to read this and help me.

P.S.: I’m not an expert, just a student trying to do some research that can help my community.

submitted by /u/Pinguindiniz
[link] [comments]

[self-promotion] Hospital Price Transparency Supplemental Data

For the last few years I’ve contributed to a side project concerned with curating and maintaining supplemental data related to the Hospital Price Transparency and Transparency in Coverage regulations.

The goal is to make data provided in accord with those regulations more accessible, transparent, and actionable in a maintainable and consistent way.

For example, there are many recent efforts that have attempted to collect all of the underlying price data into databases, and to do so, they need to scrape all of the files served by hospitals, which are unfortunately not required to be centralized. To do that scraping, you need hospital domains, and knowledge of how and where they serve their files. That sort of data is meant to be maintained in this repository.

Just finally got around to adding some new data after a long hiatus, so thought it’d be a nice time to reshare here: https://github.com/TPAFS/transparency-data

Would appreciate your thoughts and feedback!

submitted by /u/tpafs
[link] [comments]

[self-promotion] Access Points Of Interest Data From Overture Maps Foundation Directly In Your Snowflake Instance

The POI data covers hundreds of categories ranging from restaurants and parks to commercial brands and hospitals.

Each point of interest includes a name, location, and category and is joinable to Cybersyn address data. Overture Maps is an open data project steered by Amazon, Meta, Microsoft, and TomTom that aggregates map data from multiple sources. The first Overture Maps open dataset was released this July.

Example use cases: Finding the nearest competitors to a specific merchant, identifying target markets with a high concentration of stores to sell into, finding all healthcare facilities or schools near a given location, building or enhancing map applications.

Access the data products, including sample queries and data dictionaries, here:
US Points of Interest & Addresses
US Housing & Real Estate Essentials

submitted by /u/aiatco2
[link] [comments]

Dataset On Plant Identification, Disease Detection & Plant Description

Hi, I am creating an application on Plant Analysis and disease detection. Is there any specific dataset that is available where I can get ALL Plant Identification, ALL Disease Detection and ALL Plant Description (after identification)?

I have found multiple datasets online but they are all in portions, resulting in me having to do data cleaning which is quite time consuming.

It would be of great help if anyone knows or has a source for an all in one type dataset.

submitted by /u/aka1432
[link] [comments]

Does Anyone Have Access To PitchBook?

Can someone please share access with me to Pitchbook as I would love to use it for writing my paper on venture capital and investments. Please let me know if someone is willing to share either through here or DM, as I need to write my paper as fast as possible and would appreciate any help with gaining access. I have requested a free trial but they are slow in responding.

Thank you in advance!

submitted by /u/analsage
[link] [comments]