Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Chemical Reaction Datasets Or Websites To Scrape

Hey there, I know it has been asked a couple of times before, but I could not get a good source from them, and besides my request is perhaps simpler.

I am looking for a dataset of chemical reactions, the simplest possible, to construct an interaction graph, e.g. from the reaction H + 2O -> H2O, I would construct two edges between (H, H2O) and (O, H2O). Is there a database with a bunch of reactions of any kind which I could use?

Alternatively, if you know a website whose HTML could be scraped, I could also work with that.

Thanks

submitted by /u/qotsalo
[link] [comments]

Need Help ForBangladeshi’s Car Market Data

My boss asked me to make the Target Group for our new products. One is an entry-level sedan, the second one is a 5-seat SUV and the last one is a full-sized 7-seater SUV. I’ve to make three TG for these 3 models. I’ve collected data on the population of Bangladesh by age group, how many people live in one urban area, and trying to relate it to the income level of population. But it is very hard to quantify how many people can buy our products. Can someone help me with this problem with suggestions or solutions?

submitted by /u/drdoctor98
[link] [comments]

Need Help With Indian F&O Data Collection From NSE

I have a small doubt hoping you can clarify. I’ve been trying to collect F&O daily bhav copies from NSE from 2011 to 2022. I was successful with doing so from years 2016 onwards using some libraries on python.

However, a lot of people on the internet including myself have been facing the issue of downloading bhav copies prior to 2016 because the new NSE website is pretty shitty that way (it’s storing the csv file in a zip so the API can’t access the csv directly).

If you have some time you spare, will you be able to help me out? It’s for a research project I’m working on!

Thank you in advance 🙂

submitted by /u/jingbolosodabama
[link] [comments]

Dataset Related To Sustainable Development Goals (SDG)

I’m working on a data mining school assignment, with a primary focus on quality education/decent work and economic growth. However, I’m open to exploring datasets related to any other SDGs as well.

I’m looking for two datasets with the following criteria:

At least 10,000 observations and six variables per dataset They must be mergeable Must be related to one of the SDGs

I’ve already searched on Kaggle but I haven’t found suitable datasets. If you have any suggestions or if you know of an easier way to filter search results effectively it would be much appreciated.

submitted by /u/Lyn03
[link] [comments]

Looking For Covid 19 Datasets For Each Country

Hello, I would need to find the number of new cases, deaths and recoveries per day in a given country for a project im making. Preferably the data should be listed as a spreadsheet from around the day of the breakout till the 100th day. Any idea where can I find this type of information? I’ve been looking for the past hour, but all I’ve found is just the total number or the newest data (I need it from the 2019-2020)

submitted by /u/Several-Ad-3048
[link] [comments]

Seeking Mental Health Data For Research Project

My team mates and I wish to focus on mental health resources and stigma for our Big Data course project, and we could use some help locating data sources. Here’s the rundown:

Project Objective: Our research aims to collect and analyze data related to mental health resources and the evolving stigma around mental health, particularly on social media. We plan to compare trends over the past two decades, both globally and within North America if the global data is limited.

Data Needed: To make this project a reality, we require data on the following:

Availability of mental health facilities across countries.

Information on mental health programs in various nations.

Data regarding non-governmental organizations (NGOs) dedicated to mental health awareness globally.

The percentage of utilization of mental health facilities and programs in each region.

We initially tried to access data from the World Health Organization (WHO) through their Project Atlas Report (https://www.who.int/publications/i/item/9789240036703 ), but our efforts hit a roadblock. We’ve reached out to WHO, although we’re uncertain if they’ll share this information.

If anyone knows of alternative data sources or has any tips on where to find similar datasets, your input would be incredibly valuable. We’re committed to advancing our understanding of mental health resources and stigma, and your assistance can make a real difference in our research. Thanks in advance! 📈🧠

submitted by /u/hinberry
[link] [comments]

Noise Pollution In Cities (or Countries?)

I’m curious if there’s any data on noise pollution over time in cities, such as NYC, London, Paris, or perhaps even in entire countries (if that even makes sense).

I’m thinking electric cars are becoming more common, and they’re more quiet, so perhaps noise pollution has gone down in recent years?

(To be honest I’d be more interested in a graph than the raw data, but this was the stats/data request sub I found (via askstatistics…) so here I am.)

Thanks!

submitted by /u/cpwnage
[link] [comments]

LF Approximate Vehicle Maintenance Cost Of Operation Per State (US)

Hello, I’m currently working on a project and we’re trying to offer information to clients about average maintenance costs of ownership of a vehicle. So, basically, if you are planning to buy a vehicle, you would know how much and when you would have certain expenses related to maintaining your vehicle (scheduled services, oil changes, windshield wipers, tire changes, etc).

Ideally, the dataset would show information based on average mileage driven per year, different makes, years, and models, and separate information for each state.

Thank you!

submitted by /u/dumdumbadum
[link] [comments]

Multiclass Classification Datset Request

Hello people,

I am looking for a multiclass classification dataset(more than 3 classes) for my data mining project. If you have any leads please let me know. I have been searching for sometime on kaggle, UCI repository but I am not able to find it. Thanks in advance.

Note: It shouldn’t contain any Text or Image analysis.

submitted by /u/swinging_mood7260
[link] [comments]

UK Bank Branches Location Dataset / Api

Hey, just trying to do some analysis on banks in the UK. However, I’m finding that a lot of available datasets are outdated. Any chance someone could help me find an up to date data set with location and number of banks? I’m aware that each each bank has their own locator API, but I’m unfamiliar with APIs.

Thanks in advance.

submitted by /u/FrostyJozoid
[link] [comments]

Looking For Two Related Mostly Categorical Datasets For School Work

Dear everyone,

I humbly seek your assistance in my current endeavor. I am tasked with conducting a data analysis as part of my school project. The initial (and, for me, the most challenging) step is to identify two datasets that are interrelated and can be merged. Subsequently, I will proceed with the analytical work, which does not intimidate me. The datasets need not provide instant, magical solutions to one another, but there should be a logical basis for their integration.

The primary dataset should encompass approximately 20 categories, with a predominant emphasis on categorical data. It should be in a format that can be reasonably connected or merged with the second dataset, which should originate from a different data structure or source.

Honestly, after hours of diligent searching, I find myself somewhat disoriented. I would greatly appreciate any insights or suggestions. Initially, we contemplated working with a dataset pertaining to train delays in Poland, aiming to correlate it with weather data based on the date. Unfortunately, the dataset concerning Polish trains contains only 8 columns.

I will be immensely thankful for any guidance or counsel. Thank you!

submitted by /u/M4tel0te
[link] [comments]

MENA Region Restaurant/food Consumption Data

Hi all,

I’m involved in a food project in the Middle East – specifically Iraq/Kuwait. I was wondering if you know any good market research tools/websites/resources/reports on these countries/Middle East region – specifically related to restaurants/food/franchises/what works vs. what doesn’t etc. Or things like average spend per person etc. The closer it relates to food/restaurants the better.

I”m in the beginning stages of working with franchises and need to research to assess the market in terms of needs and what could work vs what doesn’t. Having an insight into regional countries would give me something to think about, as there seems to be a real dearth of research/data.

Appreciate any info/tips/help. Thank you.

submitted by /u/RevolutionaryWalk592
[link] [comments]

Looking For Hours/minutes Of Precipitation Data (not Amount Or % Chance)

For a travel-related project, I need a dataset that contains the total duration of rainfall, as this is the most relevant measure of whether you can spend time outside in a given location. For example: Miami might average 5cm of rainfall, but it may all happen in 30 minutes. Seattle might average only 1cm, but spread out over 8 hours of a day.

Has anyone come across a weather dataset that has precipitation duration metrics? Haven’t found anything like this on WeatherSpark, wunderground, etc.

Thanks!

submitted by /u/uberdev
[link] [comments]

Event-Based, Fine-Grained Tennis Matches Dataset?

Does anyone know of a tennis matches dataset that’s event-based? I mean a dataset listing every event in a tennis match with timestamps, categories, etc.

For instance, I wanted to answer some questions which I think are only answerable with such a dataset, such as:

In the last set of the game, how often does the person who starts win? Does it matter to start a set? What about a tiebreak? How often do reversals happen? On best 3s? On best of 5s? How beneficial is it to have contested games vs cleanly won games? Does that percentage change when you get to the top when compared to lower ranked players? Can game duration predict winners? Out of those players who got injured during a game, how many end up winning? Out of those, how many manage to keep on winning?

I think having something like this would be so valuable for bettors…

I suspect the ATP does have a dataset like this, but I think they do not intend on sharing it.

submitted by /u/Fanaro009
[link] [comments]

Multilingual Corpus With Text Data And Coordinates

Hi guys!
We have collected a multilingual corpus with text data and coordinates. The dataset is divided into the 123 most populated regions of the world: ~500,000 messages from social media + their coordinates, each in a separate json file according to the region. The dataset is suitable for tasks such as geotagging text data. Use it, share your opinion 🤗
PS we also have a similar dataset with timestamps, let me know if you need it 👾

submitted by /u/robvbar
[link] [comments]

Does Anyone Know Where I Can Find Price Discrimination Data?

I want a dataset of prices where there is an increasing price for higher Consumption bands. An example could be Consumption caps in the covid pandemic panic (on toilet papers, masks or Alcohol gel). Other example could be residential energy Consumption bands, where heavy spenders are penalized, or other environmental tax is applied.

I want specifically transacional data, longitudinal per consumer, but time series panel data with average price per location that shows this higher price by higher Consumption Pattern would also be Nice.

I can find only limited/paid datasets. I would be of great help!

submitted by /u/TomSargent
[link] [comments]