Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Looking For The Dataset Of The Percentage Of Women In The Computing Workforce In The United States 1970-2020

In the book Gender Codes by Thomas J. Misa, there is a figure I want to recreate for my stats class. The figure is cited as “Bureau of Labor Statistics Database, accessed May 2008, courtesy of Peter Meyer.”

U.S. Bureau of Labor Statistics apparently started tracking the computer workforce in 1970, so there should be data on this for the last 50 years. I’ve been trying for days to get this to work, but none of the data websites will let me do this.

submitted by /u/icanteventhat
[link] [comments]

Need: A Data Set Of Chemical Reactions That Use Common Everyday Chemicals

I need a dataset of chemical reactions that use common chemicals, I need to be able to isolate the molecular formulas of each reactant and product in every reaction. I would like the data set to be as large as possible but if you know of anything like this, I would take any data sets over 250 reactions.

If you know of a dataset for chemical reactions of really any type even if they don’t use primarily common chemicals, I would still like to look into it so please tell me.

Thanks in advance for any help or guidance.

submitted by /u/Then-Individual4582
[link] [comments]

Grad Student In Need Of Assistance: [mock] Data Related TO Early Childhood Goal-Oriented Play Activity.

Help!

I’m a grad student who is in the middle of a capstone project. The professor has gone AWOL, he’s the liaison between me and the sponsor, and due dates are near.

Can anyone who is reading this direct me towards a [mock] dataset that would help me create a predictive model to improve upon developmental cascades in fine and gross motor skills in children aged 0-3 years?

Thanks!

submitted by /u/Comstock1984
[link] [comments]

Scrap Dynamic Pricing Flight Data For Flight Price Forecasting (university Students’ Science Research Project)

I’m in a group of four 2nd years students. We are carrying out a Flight Price Forecasting project. What we’re trying to focus on is “optimizing purchase timing” – which requires when would a flight’s price drop to minimum – calculating from the present moment to the departure day.

However, we’re struggling with the actual dataset collecting. Have any of you guys had the slightest idea of how to crawl historical prices of a flight since the day it’s open to buyers (or at least 1-3 month, ideally)? For example, given this flight which was available for purchasing since September 2023, how do we, at 11th November 2023, get the price which was given to that specific flight on 1st October 2023?

We are thinking about Vietnamese airlines, such as Vietjet, Vietnam Airlines, etc. The historical data may be crawled through Google Flight, or Bing Travel, but we’re not so sure of that yet.

I deeply appreciate all of your help!

submitted by /u/EhThere-sIceCream
[link] [comments]

Trafficking In Persons Report. Looking For The Raw Datasets

Hi, I was planning to carry out a study using secondary quantitative data about human trafficking/forced labor. I wanted to use the US department of state “trafficking in persons report” as I noticed I was used in previous research, however there is nowhere I can find the raw datasets, something that I can then put into excel or SPSS for data analysis, I can only find the text reports. Does anyone know where I can look for it? Research says that it is open source.

Thank you in advance.

submitted by /u/jiridij
[link] [comments]

Looking For A Dataset For A Classification Project

Hello, I am looking for a dataset containing at least 10 features (columns) and 3 target labels (multi-class) to perform classification task. Now the only problem is that I am forbidden to use Kaggle, UCI Machine Learning Repository and Github since my professor thinks these websites are too famous and it’s too easy to find works made by others on the datasets you can find there. Please help

submitted by /u/kthxbubye
[link] [comments]

Looking For A Spare Parts Sales Dataset

Looking for a dataset containing sales data on SKU level for spare parts (date/SKU/amount), preferably in the manufacturing industry. Region/country does not matter. If there are some item characteristics included as well (for example manufactured/purchased, lead time, wear/tear item, etc.) , that would be perfect.

I am looking for this dataset to be used in a Masters research project.

submitted by /u/MirjamBleumink
[link] [comments]

Crunchbase Companies And 👤👤 Data 2.9M Free

2,971,033 Company

uuid,name,type,primary_role,cb_url,domain,homepage_url,logo_url,facebook_url,twitter_url,linkedin_url,combined_stock_symbols,city,region,country_code,short_description

1,174,980 person

uuid,type,first_name,last_name,cb_url,logo_url,facebook_url,twitter_url,linkedin_url,city,region,country_code,featured_job_title,featured_job_organization_name,featured_job_organization_uuid

Enjoy

submitted by /u/DataExpx
[link] [comments]

[self-promotion] Analyze Market Share, Compare AOVs Between Retailers, & Measure Consumer Spend By Demographic

Now on Snowflake Marketplace, Cybersyn’s Consumer Spending Foundation is a representative panel of activity in the US consumer economy that includes estimates for company:

Revenue ($), transactions (#), and average order values ($) Year-over-year (%) revenue, transactions, and average order values

We will continue to expand this product – subscribe to Cybersyn’s release notes for the latest updates.

submitted by /u/aiatco2
[link] [comments]

Genetic Diversity In Human Populations: HGDP-CEPH

Stanford University researchers conducted a study on human genetic diversity using the ‘HGDP-CEPH Human Genome Diversity Cell Line Panel.’ This dataset includes genotypes from 1,043 individuals representing 51 global populations, analyzed at over 650,000 SNP loci. The data explores genetic diversity, shared ancestry, admixture, and population variances. Access the dataset adhering to HGDP-CEPH guidelines, with a focus on analyzing genetic markers and coordinates provided in tab-delimited files.

You can check it out here: https://sellagen.com/item/650357af4d7ce7e8220d00fe

Pretty cool dataset if you’re into comparative genomics or genetic diversity studies 🙂

submitted by /u/nobilis_rex_
[link] [comments]

Lending Club Feature Information Pre Loan

Hey everyone,

currently working through the lendingclub dataset. My project is simply to predict whether a borrower will default using only the info available at the of the application.

Problem: I cannot figure out which features were available then and which would leak. I have poured over the data dict and found similar projects. There does not seem to be any consensus on which features do not leak the loan outcome.

I have rewritten my code multiple times and am out of ideas. Is there any reports or further info regarding this?

Thanks

submitted by /u/loblawslawcah
[link] [comments]

Dataset For US Company/Employment Information

Hello All,

I work with a non-profit who is looking to collect information regarding our alumni students. One area of interest is their current employers. I am hoping to find a dataset that has overview data of United States companies/employers with simple data points (i.e. company size, area of industry, address, etc.) so if an Alumni shares that are employed there we will have some basic information as to “who” their employer is. Ideally it would be a dataset that could be purchased as a zip or csv and imported into a CRM. Anyone have any idea of if this exists/where I could purchase?

submitted by /u/Blue_S0l
[link] [comments]