submitted by /u/OmOshIroIdEs
[link] [comments]
Category: Datatards
Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?
I was reading about the first woman to summit mount Everest without supplemental oxygen and I started down the wikipedia rabbit hole.
I found this on wikipedia:
https://en.wikipedia.org/wiki/List_of_people_who_died_climbing_Mount_Everest
I was wondering if there’s a master dataset in csv of all the people who have died climbing any mountain along with their demographic info and cause of death?
I found this too
https://en.wikipedia.org/wiki/List_of_deaths_on_eight-thousanders
But I don’t want to have to wrangle the data mysef. It usually takes me ten times as long to data wrangle as it does to do any data analysis.
I’m planning to regress the cause of death onto the demographic variables in a logistic classification.
submitted by /u/Many-Wasabi9141
[link] [comments]
Hi all,
I have been tried to search for a dataset but no luck.
I am looking at a way to see game statistics and associate them with the jersey color worn by the players and the goalkeepers. Unfortunately, seems that the almost totality of the databases only includes game results and statistics but no information about the jerseys.
Are you aware of any dataset? Or can you point me out to a website that has the jersey information and that I can subsequently merge with another set of data that includes the statistics?
Thank you all in advance
submitted by /u/stephdaedalus
[link] [comments]
Is there any dataset that contains all the Facebook groups and subreddits?
submitted by /u/bytesagelabs
[link] [comments]
Hey everyone,
Im looking for a dataset that has information about peoples habits that includes information of hour per hour. That is, have a column that is hour_of_day or similar, with values from 0-23 or 1-24, the other variables can be things such as tv watching, headphone usage, when someone goes for a walk, etc (basically 1 or 0).
I am basically looking for a dataset where I can predict when people will do a certain action given the time of day.
Can be synthetic or mock.
submitted by /u/162739
[link] [comments]
Hello, I’m looking for a dataset which holds answers to the quesitons asked in the politcal compass test or another similar test.
I’m building a fuzzy associative rules generator which would basically find strong correlations between subsets of the columns of the dataset, e.g.
[Strong Agree] I’d always support my country, whether it was right or wrong. => [Strong Disagree] No one chooses their country of birth, so it’s foolish to be proud of it.
which could be interpreted as that if someone were to answer [Strong Agree] to the question on the left, they will most likely answer [Strong Disagree] on the question on the right. This might seem fairly obvious for this simple example but things get interesting quickly once you realize that any subset of columns may belong to the LHS or the RHS.
submitted by /u/Play4u
[link] [comments]
Most measures for ecstasy, and probably a few other drugs too, collect information for the number of total users, ie; users per year, and the number of people who use ecstasy over the course of their lives.
I’m trying to find data on the number of pills (and other delivery system forms of ecstasy) consumed in the USA every year.
For instance, you might have 90% of the total annual users only representing having consumed 1 or 2 pills that year, but there are also frequent consumers of the drug. If only say 5% of total annual users are considered frequent consumers and take an average of 10 pills a year, then that 5% total might reflect a considerable amount of the total pills consumed; possibly a quarter or more of the total pills.
To look at it in tabled form I have given some general guesstimates of pills consumed by users to gauge the potential total number of ecstasy pills consumed per year in the USA.
Statistics reflect about an average of 0.8% of Americans having consumed ecstasy in the past year.
Number of pills consumed in a year (n) Group average n Percentage make up of total annual users Number of pills consumed (representing total users) Note 1 1 60 0.6 2 2 20 0.4 Very common to consume 2 pills in one session and may not represent users consuming ecstasy in different sessions 3 – 5 4 10 0.4 Might do this 3 or so times per year 6 – 20 12 10 0.4 Once per month 21 – 40 30 3 0.9 Uses most weekends Total number of pills consumed (representing total users) Average of 3.1 pills consumed per user
Table summary:
If 0.8% of Americans consume 3.1 pills per year then the number of pills equates to whatever 2.5%~ of the USA population.
The total number of ecstasy pills consumed in the USA per year would be 8.3 million based on a country population of 332 million.
Note:
As I said, my calculations are guesses, but I would imagine statistics would be broken up in a similar way.
submitted by /u/Bishopfruiting
[link] [comments]
I found one called the Caltech 101 but they are fairly low resolution and I am wondering if anybody knows of a similar sort of dataset in higher resolution.
Caltech 101: https://people.cs.umass.edu/~marlin/data.shtml
submitted by /u/The-White-Furry
[link] [comments]
Hey everyone, I am trying to work on a project. I have three datasets
Dataset 1: Machine voltage varying over a period of time. (continuous -40.000 rows) Dataset 2; Machine runtime, downtime and faults (continuous too – 8000 rows) Dataset 3: Machine degree of fault. Variable that varies between 1-3;integer. (Not exactly continuous, it states the time the alarm was triggered and identifies the degree of the machine fault). About 2000 rows.
How would I work with this dataset to do data analysis? I would like to find a relationship between voltage and degree of fault.
The end goal is optimizing the machine to minimize machine downtime. One approach is predictive maintenance/forecasting but other approaches are being considered too.
Edit: Changed flair
submitted by /u/maskedhypocriter
[link] [comments]
I’d appreciate any info on how and were to download the MBTI9k Dataset. I need it to train classification models that predict personality based on text data within a admission process
submitted by /u/airmode_fpv
[link] [comments]
We propose a brand new dataset for human pose estimation. The dataset comprises 40 subjects, each performing 16 fitness-related actions. If you are interested in it, take a look at the repo!
https://github.com/lyhsieh/SPHP
submitted by /u/fo_hsin_gong_sih
[link] [comments]
Hey!
I’m looking for a dataset that has student feedback about faculty. I’ve looked at Kaggle and HuggingFace and found some datasets from there.
Wanted to know if there are more places I can check to get more data. Ideally, if possible dataset should have more than 5k rows
submitted by /u/shobhitnagpal
[link] [comments]
Can someone please tell me how to get datasets for Mental Health, i’ve tried kaggle and google datasets but i can’t find one with 1k records please help…???
submitted by /u/CedricBorne
[link] [comments]
Where can I get Donald Trump’s voice dataset? Speeches ?
submitted by /u/Mobile_Bee_9359
[link] [comments]
Hi,
Does anyone have a transactional email dataset? When I say transactional, I mean emails such as confirmation links, reset password, 2FA codes, shipping notifications, etc.
Cheers!
submitted by /u/RedWyvv
[link] [comments]
We have a college project on marketing analytics. Any resources for the real life datasets?
submitted by /u/Snoo_21119
[link] [comments]
Need everything from title, price, bar code, image links, etc.
Any open source database I can access for this?
submitted by /u/omegal0l420
[link] [comments]
I am looking for information from the 1950s to now ideally, but 70s and 80s to 2020 would be acceptable for me.
submitted by /u/Redditnesh
[link] [comments]
I am trying to find simple disability rates by state from 1998 to now. This website basically has the information I want https://www.disabilitystatistics.org/, but there is not a way to download all the data.
I have looked into grabbing it from the American Community Survey (which is where the website gets the information). I know how to pull this ACS table (https://data.census.gov/table/ACSST1Y2021.S1810), but is so messy and not straightforward that I wanted to see if anyone knew of a centralized location before I jumped into dealing with the ACS data.
submitted by /u/jyddyj20
[link] [comments]
So I’m new to this group. I’ve worked in corporate “big data” and “data warehousing” for almost 20 years. I also own a small business that sells the “CMS/NPPES Npi database in most popular database flavors.
My observation is as follows…it seems to me that a good portion of the requests in this group are for a very specific set of data.
I’m not an academic, but how do you end up deciding what you want your thesis to be before figuring out if the data is even available?
Many of the assistance requests are for data whose source is behind a corporate policy which the industry / corporations are not going to let become public.
Another example is medical data which is protected by HIPPA in the US.
Pardon my wording, but is this a “cart before the horse” situation?
submitted by /u/j_w_g_1
[link] [comments]
I saw on London datastore that they released a dataset called MPS Antisocial. I want to contact them to see if they can make a similar one for MPS Violence Offences or link me an earlier version of MPS Antisocial as it says it started July 2022 but doesn’t have any data for 2022 (unless they meant 2023) as it has only 2023.
submitted by /u/infinity123248
[link] [comments]
Hi does anyone know how I could find census (ideally) or other microdata on lgbtq individuals in Canada? The census survey only has a variable for gender and gender diversity in a couple, but not directly a sexual orientation question, which I guess there should be.
submitted by /u/mm-2412
[link] [comments]
Hi,
Can someone help with scraping data? Unfortunately I don’t have the skills to do that. I want to create a dataset of US corporations’ expenditures on lobbying, for each available year.
Example: https://www.opensecrets.org/federal-lobbying/clients/summary?id=D000023883
Here is Amazon’s total expenditures on lobbying in 2023. You can type any other company who participates in lobbying. I guess there are more sources for such data. If someone can help me collecting this data, it will be highly appreciated. Thanks!
submitted by /u/Porcoddio45
[link] [comments]
I want to build an application that has a job experience feature, like Linkedin. Where can I get an API (or other resource) to at least have the name, logo, and location? I would like primarily tech-oriented companies, but all companies would definitely be better.
submitted by /u/imman2005
[link] [comments]