Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Need A 5 GB+ Structured Labelled Dataset For Machine Learning (regression Or Classification). No Time-series Data. Where Can I Find One?

Hi, I need a labelled structured dataset for building regression or classification models that’s more than 5 GB for my big data class. I looked over Kaggle and other places but can’t seem to find one. They are mostly time-series, image, or text datasets that I don’t want to use. May I know where can I find one?

submitted by /u/_aKiRa_26
[link] [comments]

Are There Datasets About Healthcare For Doing Regression?

Hello there, I’m doing a project about how to solve healthcare prediction problems (like regression or binary classification) with machine learning, specifically tree-based models.

I just can find binary classification problems (like, does this person have cancer or not), but any about predicting a numerical value.

Is there any dataset, preferibily educational, related with medicine/healthcare, whose target is numerical? Also whose relation between features and targets are not too simple like a linear one that with the right tools like XGBoosRegressor I can make good predictions (that is, that not all features are non-informative)?

Thanks so much.

submitted by /u/SameItem
[link] [comments]

UK GDP Time-series Data From 1970 Onwards

I am trying to find annual time series data for current and constant price GDP for the United Kingdom from 1970 to 2019. I am also looking for current price gross investment data for the same time period. Can anyone point me to resources or databases where one can find this data? The ONS website only hosts data from 1997 onwards, which is not particularly helpful.

submitted by /u/sankalpsharmaa
[link] [comments]

Parasocial Relationships, Maybe Social Media Interaction Anxiety

Hiya, I been trying to find a decent dataset that contains data about social media & affects on folks – whether they develop anxieties while online or suffer with anxieties & us online as an outlet, anything social media related inc gaming. Found loads of literature docs on line about that topic but finding it difficult to locate a dataset, this is for a end of year project – part time course – so very rusty with all things ML related after the summer. Tks

submitted by /u/How_thehell7799
[link] [comments]

NFL Ticket Price Database 2023 Regular Season

Hi! I’m looking for a database that includes ticket prices (highest, lowest, average, etc) for every NFL game in the regular season. I found data (the seat geek API) for future games but I’m missing the first 100 games of the season, as the API only includes games still posted on their website. Help???

submitted by /u/theasummerall
[link] [comments]

Muscle Distribution DataSet Available?

I am on the search for a data set that has workout information, the muscle groups that are hit in the workout, and the percentage hit by that muscle.

Most datasets I have found have the workout and the muscles that are hit, but not how much a muscle is hit.

I am looking for a list that will say “Squats (40% Quad, 40% Hamstrings, 20% Glute)”

Is there a dataset out there that would have the distribution data I am looking for?

I’ve done some research on this subreddit and through the web and haven’t been able to find anything, any help will be appreciated.

submitted by /u/DontTouchMyNut9000
[link] [comments]

Trending Recipes / Food In Real Time

I am trying to find a way to get name of food which is trending on social media right now .

On Google I found articles but I am not sure they are updated or not .

One of my ideas is to scrape r/foodporn but it’s not only about trending food .

What are the subs or websites which provide this type of data and update it frequently.

Or how can I generate my own dataset

submitted by /u/Universe-89
[link] [comments]

Economic Activity Data At The Census Tract Level?

I am looking for a data set that can show the economic activity of census tracts. I’ve found a good one by zip codes but because the rest of the datasets are gonna be by census tracts I need a census tract data for this as well and haven’t been able to find it. Literally any suggestions are welcome cause I cannot find anything. Thank you so much!

submitted by /u/E6E6FA_FFB6C1
[link] [comments]

Anyone Looking/requesting For Some Datasets? Trying To See If I Can Help! [SELF-PROMOTION]

There are tons of dataset requests in this subreddit that just go unfulfilled – I built a tool, as part of my data marketplace project, that connects your data requests with people, organization or companies that will be able to fulfill your request. No need for you to do the searching. I realized there really isn’t a single place where you can just drop your request and people come to you so hopefully this helps some people out there. It’s called sellagen.com, so please let me know if you have any questions or feedback so I can improve on it!

Disclaimer: I built and own this platform

submitted by /u/nobilis_rex_
[link] [comments]

How To Extract The Inc 5000 List (2023) Into Excel?

Hi there, I have seen a few questions on past year’s lists and Excel sheets but I couldn’t get the R code to work for the 2023 set. I’m not sure if its because I do not have the correct link format or what..
Here is the website I am taking the data from: https://www.inc.com/inc5000/2023

This is the Reddit post I tried to follow on R: https://www.reddit.com/r/datasets/comments/wr3vyz/trying_to_extract_inc_5000_2022_list_to_excel/
More specifically I followed this code: https://gist.github.com/MattSandy/14242b5af9dce69102647e2000848bcc

When I tried to follow the above code I just substituted 2022 for 2023 and crossed my fingers which did not work. I can post my R error codes or the exact code I wrote if that is helpful.

submitted by /u/Character-Forever382
[link] [comments]

Radiation Spread During An Oil Tanker Explosion

Hey y’all! Got a uni project to determine zones of high risk depending on the scenario of an oil rig or tanker exploding in a specific area , I would like to know if there’s any dataset available that gives some radiation value (for eg in Sievert) corresponding to some distance/intensity from an explosion

(doesn’t have to specific to the problem, just need one that has set of radiation values)

submitted by /u/qvuuh
[link] [comments]

[self-promotion] Git Version Controlled Datasets In S3

Ever wanted to use Git to version control datasets or large files but Github LFS turned out to be too expensive and now you have a bunch of hacky scripts put together to use S3 for storage but there’s no version control?

We’re here to help you with that. You can use your own S3 buckets or our Free LFS Storage with Github.

Try out: https://underhive.in (please use on Desktop, the mobile version is broken right now)

Dashboard Screenshot: https://i.imgur.com/eYwGGjw.png

submitted by /u/kaisoma
[link] [comments]