Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Datasets For Recommending Music To People And How To Use Them

Hello guys, I’m looking to make a recommender system using a music dataset and I can’t find many of them on the web that could help me. Do you have any suggestions or tips on how to use them?

I want to use a dataset that will enable me to use collaborative filtering. I don’t understand how to put a dataset together from the Million Song Dataset. If anyone would like to help, I’d greatly appreciate it!

submitted by /u/CheapJaguar458
[link] [comments]

Dataset Of MMLU Results Broken Down By Task

I am primarily looking for results of running the MMLU evaluation on modern large language models. I have been able to find some data here https://github.com/EleutherAI/lm-evaluation-harness/tree/master/results and will be asking them if/when, they can provide any additional data.

MMLU may be the most common evaluation run on LLMs recently, but it is very rare for papers to report more than a single final number and I have not been able to find datasets for the evaluations that were run for any major recent LLM papers.

submitted by /u/corey1505
[link] [comments]

Game Analytics Datasets For Gamer Modeling

Does someone have a dataset for game analytics? I will do gamer modeling. So I want datasets that include gamer behavior, like how many coins get they had in one level, the tools utilized, online session duration, and more. Any type of mobile game dataset would suffice, but if it pertains to a hypercasual game, it would be great. I have attempted to search for relevant datasets on Kaggle and google it but have been unable to find any suitable options.

submitted by /u/rai_shi
[link] [comments]

Prices For Used Medium And Heavy Duty Vehicles Data Set?

I’m looking for a data set that aggregates the sell price for used medium and heavy duty vehicles in the U.S.
For example, I’d like like a data set of used box trucks with attributes (selling price, year, mileage).
Sites like commercial truck trader display trucks that are actively being sold but there is no seamless way of aggregating this information into a data set.
I’ve been unable to find a data set that matches these preferences. What are my options?

submitted by /u/freshcarrot7
[link] [comments]

Plastic Waste In Oceans, Water To Save Marine Life

Hi everyone, I’m wondering if you can help in my project. I’m searching for plastic waste in oceans, water, lake. I found large dataset but it’s already used in previous papers, so using this data will give me the same results. Besides I need to avoid fake photos generated by AI.
Here’re websites I tried already:
Kaggle
Google Dataset Search
If you have any idea about where to obtain like 3000 images of “Plastic Waste in the water” with additional simple info like date of the image and location so I can have kinda of understanding.
Thanks 🙂

submitted by /u/The_Simpsons_22
[link] [comments]

Dataset Suggestions For Building Text Summarization Model

Hey everyone,

I want to work on a text summarization project and I’m looking for suggestions on open-source datasets for training and evaluation. Specifically, I need datasets suitable for text summarization tasks, such as scientific papers, or domain-specific documents like legal or healthcare texts.

If you know any good datasets or resources, please share the links or relevant information.

Thank you in advance!

submitted by /u/a_vira1
[link] [comments]

Challenges Surrounding Data Availability For Area Of Interest

Hi guys,

Ive been teaching myself data science, specifically geospatial data analysis, and am conducting research for personal purposes on a country in the Global South, but I’ve realised that the country I’m looking into lacks comprehensive datasets as opposed to the UK (where I’m based).

Any advice on how I can work around this issue? Are there open source websites with decent datasets available? Along the same vein, are there any other countries other than the UK with a sizeable collection of complete and up-to-date datasets?

Open to any and all advice.

submitted by /u/InternationalSmile7
[link] [comments]

Introducing NBA Stats API: Access NBA Season And Playoff Totals, Advanced Statistics, And More!

Hello, fellow data enthusiasts and NBA fans!

I am excited to announce the release of my latest project, the NBA Stats API (version 0.1 Beta). This API provides access to NBA season and playoff player totals, advanced statistics, shot chart data, and more. As an NBA fan and data enthusiast myself, I’ve always had a passion for finding patterns and trends in sports statistics. This API is my contribution to the community, in hopes that it will fuel your own analysis, be it for fantasy leagues, sports journalism, predictive modeling, or simply out of curiosity.

I’ve put in many hours of work into this project, ensuring that the data is not only accurate but also easy to access and understand. The API is currently in its Beta version (0.1), and I’m excited to see how it will evolve with your valuable feedback and suggestions. Currently, the advanced statistics is in testing and will be made available very soon.

The complete API documentation is available as a POSTMAN collection at the following link: API Documentation.

I’ve also hosted all the code behind this project on GitHub under MIT license: NBA Stats GitHub Repository

I am continuously working on improving and expanding the API, and your feedback and suggestions are more than welcome. Feel free to ask any questions, provide suggestions, or even share what you’ve managed to achieve using the API. I’m looking forward to your creations!

I’ve created a small website to start visualizing this data. Check out my favorite chart displaying Total Points vs. Win Shares. All data on this site fetches from the API.

Thank you for your time and happy data diving!

submitted by /u/NBAStatsAPI
[link] [comments]

Teacher Turnover At The School-level

I am looking for a public data set that has teacher turnover percentages at the school-level (preferably in New York City). Really any similar metric will do (attrition, leavers, movers, etc.). I found the data set from New York that claims to have the data, but it is missing most of the data. I know this data has to be available somewhere given the intense rhetoric on high teacher turnover.

Any help is greatly appreciated.

submitted by /u/Modular_Dissaray
[link] [comments]

Lost A Dataset With Science Fiction Stories, Please Help Me Find It Again!

it was a bunch of .txt files (containing the stories) and two xml?-files (or something) with additional metadata for the stories (title, first published, author, appeared in, rating on goodreads, rating on googlebooks etc etc) and the authors (biography, gender, name, country etc).

i remember i had to dig for it when i downloaded it like two weeks ago (just fried the laptop i saved them on, that’s why i need them again). there were some issues of the magazine Galaxy in it and a bunch of old stories: h.g. wells, asimov, de guin, and so on… i think it had a few hundred elements

if that description sounds familiar to anyone here i’d appreciate it if you could tell me where to get it again 🙂

EDIT: Christ alive, i found it: https://github.com/nschaetti/SFGram-dataset

submitted by /u/DrJotaroBigCockKujo
[link] [comments]

USA Pedestrian Crossing Light Dataset

Hi all,

I am wondering if anyone knows of a data set for USA pedestrian crosswalk lights (those lights which have a red hand and counter when you should not walk and a white stick figure when you can walk). I only need USA lights however, all I can find are datasets for China or UK. Any help appreciated.

submitted by /u/aadiman23
[link] [comments]

Collecting Data HELP For A Scientific Research Paper

Hi everyone,

not sure if this is the correct thread, but hoping it is. So long story short, I am trying to compile a database of every indian politician (I have a list of them by name/party which ive imported into excel). I need to include their date of birth, date of death. Many politicians have wikipedia pages so currently I am manually going through each politician, searching them up and then entering their details into the database.

Would there be any faster way to do this? I am doing this for a scientific paper so i need it done asap but this method seems like it would take forever

submitted by /u/Aggravating_Hope2390
[link] [comments]