Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Looking For A Spare Parts Sales Dataset

Looking for a dataset containing sales data on SKU level for spare parts (date/SKU/amount), preferably in the manufacturing industry. Region/country does not matter. If there are some item characteristics included as well (for example manufactured/purchased, lead time, wear/tear item, etc.) , that would be perfect.

I am looking for this dataset to be used in a Masters research project.

submitted by /u/MirjamBleumink
[link] [comments]

Looking For A Dataset For A Classification Project

Hello, I am looking for a dataset containing at least 10 features (columns) and 3 target labels (multi-class) to perform classification task. Now the only problem is that I am forbidden to use Kaggle, UCI Machine Learning Repository and Github since my professor thinks these websites are too famous and it’s too easy to find works made by others on the datasets you can find there. Please help

submitted by /u/kthxbubye
[link] [comments]

Crunchbase Companies And 👤👤 Data 2.9M Free

2,971,033 Company

uuid,name,type,primary_role,cb_url,domain,homepage_url,logo_url,facebook_url,twitter_url,linkedin_url,combined_stock_symbols,city,region,country_code,short_description

1,174,980 person

uuid,type,first_name,last_name,cb_url,logo_url,facebook_url,twitter_url,linkedin_url,city,region,country_code,featured_job_title,featured_job_organization_name,featured_job_organization_uuid

Enjoy

submitted by /u/DataExpx
[link] [comments]

[self-promotion] Analyze Market Share, Compare AOVs Between Retailers, & Measure Consumer Spend By Demographic

Now on Snowflake Marketplace, Cybersyn’s Consumer Spending Foundation is a representative panel of activity in the US consumer economy that includes estimates for company:

Revenue ($), transactions (#), and average order values ($) Year-over-year (%) revenue, transactions, and average order values

We will continue to expand this product – subscribe to Cybersyn’s release notes for the latest updates.

submitted by /u/aiatco2
[link] [comments]

Genetic Diversity In Human Populations: HGDP-CEPH

Stanford University researchers conducted a study on human genetic diversity using the ‘HGDP-CEPH Human Genome Diversity Cell Line Panel.’ This dataset includes genotypes from 1,043 individuals representing 51 global populations, analyzed at over 650,000 SNP loci. The data explores genetic diversity, shared ancestry, admixture, and population variances. Access the dataset adhering to HGDP-CEPH guidelines, with a focus on analyzing genetic markers and coordinates provided in tab-delimited files.

You can check it out here: https://sellagen.com/item/650357af4d7ce7e8220d00fe

Pretty cool dataset if you’re into comparative genomics or genetic diversity studies 🙂

submitted by /u/nobilis_rex_
[link] [comments]

Lending Club Feature Information Pre Loan

Hey everyone,

currently working through the lendingclub dataset. My project is simply to predict whether a borrower will default using only the info available at the of the application.

Problem: I cannot figure out which features were available then and which would leak. I have poured over the data dict and found similar projects. There does not seem to be any consensus on which features do not leak the loan outcome.

I have rewritten my code multiple times and am out of ideas. Is there any reports or further info regarding this?

Thanks

submitted by /u/loblawslawcah
[link] [comments]

Dataset For US Company/Employment Information

Hello All,

I work with a non-profit who is looking to collect information regarding our alumni students. One area of interest is their current employers. I am hoping to find a dataset that has overview data of United States companies/employers with simple data points (i.e. company size, area of industry, address, etc.) so if an Alumni shares that are employed there we will have some basic information as to “who” their employer is. Ideally it would be a dataset that could be purchased as a zip or csv and imported into a CRM. Anyone have any idea of if this exists/where I could purchase?

submitted by /u/Blue_S0l
[link] [comments]

Need Free-Text Data. Willing To Pay.

I’m looking for a large free-text data sets to train a model that will identify and redact sensitive data. Would be awesome if it was already annotated/labeled. Some entity types I’m interested in:

Location, email, name, CC, CVV, Exp, date, product, username, password, passport #, time.

Anything helps.

submitted by /u/tombenom
[link] [comments]

Does Anyone Know Where I Can Find Data For J1 League (Japanese Soccer)?

Hi guys,

I’m a college student and I’m interested in writing a paper about the J1 League. I would love to look into the impact of nationality on playing time but I’m having a hard time gathering data. I know jleague.co and transfermarkt exist, but I can’t seem to find a way to download any statistics from either website.

Would anyone know where I can download data from the J1 League?

submitted by /u/useless_brownie
[link] [comments]

Having Trouble Loading A Dataset Into Google Colab

I am trying to load an OpenNeuro dataset into Google Colab to train a model. Based on the website, the dataset size is said to be 13.46 GB, which can definitely be accommodated by the free version of Colab which usually has around 50 GB of free disk space. I first attempted to download it using AWS CLI by running

!pip install awscli !aws s3 sync –no-sign-request s3://openneuro.org/ds003949 ds003949-download/

But the process terminated as Colab ran out of disk space.

I then attempted to download with openneuro-py, and shrink my download range to just the derivatives folder.

!pip install openneuro-py !openneuro-py download –dataset=ds003949 –include=derivatives/*

Again, I ran out of disk space before the download finished.

I am new to OpenNeuro so I don’t know how their datasets work exactly, or how to get the “true” dataset size. I tried loading a smaller 6 GB dataset into Colab with the above methods, and the dataset size did match what was stated on the website). I have minimal storage on my local hardware so I would like to try getting it loaded into Colab first before I attempt that route.

Would appreciate some help or advice on what I did wrong from anyone with experience working with OpenNeuro or neuroimaging data. Thanks!

submitted by /u/botsunny
[link] [comments]

Dataset Of Outgroup Vs Ingroup/neutral Questions

I’m looking for datasets containing questions that people ask to “opponents” along with questions that they ask to other people in similar situations. Examples of what I’m looking for include lawyers asking questions to their own witnesses and cross-examining other witnesses, politicians in hearings asking questions to supporters of different political parties, and detectives asking for information from suspects and from each other. I’d like to analyze any changes people make in asking questions to their “opponents” vs other people as a baseline.

submitted by /u/geartrains
[link] [comments]

Dataset For Benchmarking Recruiting Software For Bias

Hey all! I was doing some research on companies offering AI solutions for recruiting. I remember seeing a company mentioning that they were benchmarking their algorithm’s results to make sure there was no bias (as it relates to diversity) using some public dataset.

Unfortunately, I forgot to save the link and have been having trouble remembering what that dataset was. I would greatly appreciate it if you could tell me what the dataset could have been.

Thanks!

submitted by /u/opposity
[link] [comments]

Need Boarding School Or Stay Over Camp (ideally) Data For Funnel Analysis

I apologize in advance for the vague request, but I need to build a Tableau dashboard and present it for an interview. Unfortunately I wasn’t given any firm requirements or data when I asked, except that it needs to support funnel analysis. My Google searches for data haven’t been successful either. The data would ideally deal with maximizing capacity at a boarding school or stay over camp, but it doesn’t have to as long as the data support funnel analysis. I’m still pretty new in BI, so I’m not sure which data would best facilitate this. Thanks in advance for any help!

submitted by /u/skittles_grabber
[link] [comments]

Blood Transfusion Service Center Dataset

This dataset from the Blood Transfusion Service Center in Hsin-Chu City, Taiwan, explores blood donation behavior as a classification problem. Collected every three months from 748 randomly selected donors, it includes attributes like recency, frequency, monetary value, and time. The dataset is ideal for studying and predicting blood donation behavior, pretty cool for classification tasks focused on understanding influencing factors.

You can find it here: https://sellagen.com/item/650207244d7ce7e8220cbec5

submitted by /u/nobilis_rex_
[link] [comments]

Looking For Dataset For Autism Rates

Has anyone come across any datasets dealing with autism rates? I want to work on a personal project since I am close to the subject of autism but I have not come across any large data sets

Specifically it would be nice if the information is broken down by year, country, etc and shows how it is progressing

submitted by /u/aerost0rm
[link] [comments]

Why Do So Many Publicly Available Datasets Open In Such Inconvenient/unusable Formats?

Trying to just view the CDC datasets, and the only format it seems to open in is text document. Why!?!? I can’t tell a single thing that’s going on, not even the variables being measured, because it just looks like blocks of text arranged haphazardly in the notepad app

Some other datasets from GitHub contains EDF files and text files again, which are also super inconvenient

Like where is the option for csv or spreadsheet, or basically anything that’s readily viewable and understandable? Why isn’t that the default? I was expecting that viewing the data files would be the easier part of trying to write a research paper, but no

Also if anyone knows how to get this CDC dataset into a viewable format, please let me know! Thanks

submitted by /u/Classic-Asparagus
[link] [comments]

[self-promotion] Free Company Dataset (±17M Records)

BigPicture.io, the company I work for, has just released the latest version of their open-source company dataset, and it’s now available for download. I’ve been in Reddit for a while now, and think that this community might find it useful.

Check it out here: https://docs.bigpicture.io/docs/free-datasets/companies/

You need to sign up first, as we’ve had problems with bots and an AWS bill one month that nearly killed us.

Please feel free to provide your feedback/suggestions as we’re always aiming to improve our services.

submitted by /u/master_in_something
[link] [comments]

Any List Of All Agencies Submitting/not Submitting Reports To The FBI’s UCR Or NIBRS?

Looking for just a list that contains two kinds of information about the FBI’s uniform crime reports (UCR) or the newer NIBRS (the incident-based reporting system, can’t remember what it stands for):

Which agencies (e.g., police departments, etc.) contributed data to the UCR and/or NIBRS Which agencies did NOT do that (e.g., last year)

I’m hunting around the FBI’s UCR website looking for this and haven’t found it, yet. Anyone have this info?

submitted by /u/bobbyfiend
[link] [comments]