Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Data MarketPlace, Is It A Good Idea?

I think the current iteration of the data marketplace sucks. You have to know a specific place, where you want to get your data from. The variety of data sets available in a specific platform also varies so much. Also, it is incredibly difficult for a non-technical person to get their hands on the data. If a business user wants to access data they have to jump through a lot of hoops to download the data. Is it a good idea to start a marketplace that solves all these problems? Did anyone try to do this before?

submitted by /u/Responsible_Bell_772
[link] [comments]

Looking For Group Competition Dataset With Variying Team Compisition Of Limited Individuals Pool

I’m looking for a dataset of sports, games or video games events with two teams of multiple players (ideally 5 to 10) facing each other with the individual composition of each team being a different combination of a limited pool of players. And of course the final score/outcome of the event.

Like if 23 players had played 100 games of counter strike together : who is playing, what is each team’s composition (not always the same 5 dudes facing the other same 5 dudes) and what is the result + maybe how long did it last ?

All I can find are datasets with teams with fixed or little variying composition like the european football dataset or broad results without individual differenciation of the team members like league of legends ranked games datasets.

Doesn’t have to be highly skilled players. It could be the dataset of one’s kid’s football games at recess.

Any idea if such à dataset exists ? I’m currently trying to make my own by recording my own practice games but at the rate of once a week this will take forever.

submitted by /u/Heliantine
[link] [comments]

Looking For An Incomplete Dataset That Should Be Messy Or Contain Various Data Quality Issues.

Hello, Reddit community,

I’m working on a project that focuses on query-oriented data cleaning with human expert involvement, and I’m in search of a suitable dataset to support this research. The dataset should ideally contain messy or incomplete data.

If you know of any relevant datasets or sources where I can find such data, I would greatly appreciate your assistance. Additionally, if you have any suggestions or insights on where to look for datasets with data quality issues, please feel free to share them.

Thank you in advance for your help and suggestions!

submitted by /u/thelifeofZ080
[link] [comments]

Looking For Gaza Bomb Locations & Times. Any Data Out There?

I’m looking for a dataset that has geolocation coordinates (e.g., latitude & longitude) for bombs dropped on Gaza, especially in the past few weeks, but older, as well. Ideally, I’d like a column with location and a matched column with date/time, and any other information is gravy.

Any ideas? I’ve been searching online, trying to follow sources back for reports in WaPo, Reuters, Axios, AP, etc., but they all seem to lead to dead ends (e.g., proprietary data not shared online).

submitted by /u/bobbyfiend
[link] [comments]

Trying To Get Database Of All Homes With A Heat Pump

Hello, so I am trying to do some real estate-related research, and am particularly trying to understand types of buildings and locations that are most likely to have houses that have certain “green” and sustainability-related features, such as certain energy efficient appliances. I do not intend for this to be a discussion about the overall sustainability and performance of heat pumps, but I am trying to find a way to obtain a database of as many houses as I can across that US that have a heat pump, or just within California. The whole US would be great, but I am most interested in California for the moment. This is real estate-related, because heat pumps are just a hot topic in general in the eco-friendly home space. I know there are certain data sources like RECS data sets that have stats on heat pump adoption, but these values are only at the census division level. I want to see how heat pump homes are distributed much more locally and granularly so that I can understand which cities, regions, districts, neighborhoods, climate zones, etc. have higher clusters of heat pumps installed than others. Additionally, I want to understand the types of homes that have heat pumps, so that I can understand if there are any trends to take note of. I at first thought this idea was absurd and this data was just unobtainable, but then it was just suggested that I take a look at Zillow’s API ,which can be used to pull real estate home data that includes (sometimes) the HVAC system of a home. So I am wondering if maybe I could actually leverage this to get a read on where heat pump households are located within California. But also, I am wondering if there are other data sources I could use for this, I am thinking like construction permit databases or tax assessor databases, where I could filter results for houses where a permit was taken out for a heat pump installation. The idea would be to match all these data points to an address, so that I can map out heat pump homes across the state with GIS. Does this sound reasonable? Would anyone here perhaps have any suggestions on how I could approach this research challenge? Thank you!

submitted by /u/teledude_22
[link] [comments]

Ml Sentiment Analysis Project For Mental Health Monitoring

Hi, so straight to the point me and my team chose a project idea for the machine learning course “social media mental health monitoring” basically Mental Health Monitoring: Collect data on social media posts, online forums, or surveys. Develop a sentiment analysis model to monitor and identify signs of mental health issues, and i think it’s gonna be fun and all but the first issue to face us is the lack of usable dataset, we looked into it alot but most of what we found was papers and sources and even the datasets we found (barely a handful) were not exactly aligned with how our project should go or unavailable, our professor told us she’d prefer a dataset that’s not from Kaggle for some reason, but I’d really appreciate it if someone could help me link a similar dataset that can be used for this project be it on kaggle or not and if there was a project implementation that’s close to what we’re trying to achieve here.
Thank you.

submitted by /u/_-_VIK_-_
[link] [comments]

[self-promotion] Issue With Dataset Promotion On Kaggle

I have what I think is an interesting and unique dataset with github accounts, but during its entire existence (20 days) there were only 90 views and not a single download. This is very strange considering that I have a dataset on a similar topic and it collected hundreds of downloads in a first week.

I wanted to know if this could be due to the fact that the dataset may contain user information (this information is available to all github users) or because I accidentally installed 6 tags when 5 were allowed (today I removed one).

Do you know any pitfalls in promoting datasets on kaggle that I should take into account?

submitted by /u/donBarbos
[link] [comments]

Comprehensive Criminal Sentencing Dataset

I am searching the internet for a comprehensive, case-by-case dataset describing criminal convictions and sentencing. I know that information for guilty convictions is made publically available, but I’m not sure if a case-by-case dataset has been aggregated in any form and made available to the public. Does anyone know of any existing sources for this information or have any suggestions for aggregating my own dataset from historical criminal/sentencing data?

submitted by /u/eastbay_jae
[link] [comments]

Can You Help Me Find Datasets For My Final Year Research Project Topic – “Android Malware Detection From User-generated Content – A Comparison Using CNN And NLP” Dataset”

Can you help me find datasets for my Final Year Research Project topic – “Android Malware Detection from User-generated content – A Comparison using CNN and NLP”. I am planning to use 2 machine learning techniques: CNN and NLP, for this comparative study. Please help me find datasets that have relevant variables, analysis and will be apt for a comparison.

submitted by /u/Silver_Hour_9963
[link] [comments]

Datasets On EU City Happiness Or Quality Of Life?

Hey there, people of Reddit! 🌍🏙️

I’m specifically interested in datasets focused on European cities that break down happiness or quality of life indicators. Can’t find anything. If you’ve stumbled upon such data or have any leads, please do share. Your insights could help shed light on some interesting trends! Thanks a million! 📊😄

#EuropeanCities #QualityOfLifeData #DataQuest

submitted by /u/BlueEurope
[link] [comments]

Do You Have Any Tips On Where I Can Find Data On The Airline Industry? I Need The Revenue Figures Of The Most Important And Largest Airlines In Each Respective Region (North America, Europe, Etc.). The Data From The Last Five Years Would Be Most Preferable.

Do you have any tips on where I can find data on the airline industry? I need the revenue figures of the most important and largest airlines in each respective region (North America, Europe, etc.). The data from the last five years would be most preferable.

submitted by /u/fndkkxnx
[link] [comments]