Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Remote Sensing: High Resolution/UHR Dataset Of Sub-Saharan African Cities W. Ground Truth Labels For Semantic Segmentation

Hi. I’m working on a supervised learning computer vision project to segment green spaces in sub-Saharan African Cities. I saw the OpenCities challenge dataset but this lacks labels for anything but building footprints.

I can’t seem to find any datasets that meet my need. Ideally this would have around 5 labels (e.g., road, building, vegetation, water, background etc.) but anything you may know of helps. I know there are various ones for cities around the world but these aren’t useful for my project, unfortunately.

Would really appreciate any help! Can’t find anything on huggingface, google, or kaggle.

submitted by /u/Atticus_ass
[link] [comments]

Looking For A Dataset Regarding Australian Businesses Affected By Rainfall

I was looking in the energy, solar, and public transport sectors for publicly available datasets and trying to correlate them with the weather and rainfall datasets from the Bureau Of Meteorology (http://www.bom.gov.au/climate/data/index.shtml?bookmark=136&zoom=3&lat=-32.5355&lon=147.74&layers=B00000TFFFFFFFTFFFFFFFFFFFFFFFFFFFFTTT&dp=IDC10002-d).

But the correlation between number of passengers in public transport and rainfall seems to be weak across the few states that I looked at (NSW, ACT, QLD), as are the correlations between rainfall and demand for electricity as well as the correlation between demand for electricity and weather, demand for electricity in peak, off-peak times. I’m guessing because if its cold weather people will just turn on their heaters and in hot weather their ACs. Similarly, with population increases it would make sense why rainfall doesn’t affect public transport much.

So, I was wondering if anyone knew of any public datasets where I can make a somewhat strong relation with rainfall and revenue/or something similar. Maybe retail and restaurants, but they don’t really have their datasets out on display.

submitted by /u/D3V1RG1NATOR
[link] [comments]

Check Out The New Global Crypto Currency Price Database!

Dataset Link: https://www.kaggle.com/datasets/lasaljaywardena/global-cryptocurrency-price-database. This Dataset has 7500+ Crypto Currencies against USD, and it gets updated daily. This dataset is an invaluable resource for anyone interested in exploring the world of digital currencies and analyzing their market behavior. These not only include popular coins such as BTC, ETH, and SOL but it also captures newly released coins as well.

submitted by /u/Common_Protection667
[link] [comments]

I Am Trying Trying To Get ANY Open-source Datasets Created By You Guys!

I’ve just launched a website repository where people can share and access free datasets, with the goal of making datasets more accessible. I’m also planning to integrate a donation feature to encourage people to support contributors if they wish. If you have a dataset you’d like to share, please don’t hesitate to reach out—I’m interested, it’s super easy to post/list!

submitted by /u/nobilis_rex_
[link] [comments]

Datasets For Stats Project – Thinking Something In Sports And Comparing Datasets (ask)

Looking for help on where to start and maybe some ideas on how to chop it up.

My first initial thought is to compare unique sports to legacy/mainstream sports. (What’s the probability of a fighter over 6 ft to be in the UFC? What’s a standard deviation look like for the weight in on a jockey verses a shot putter?)

Would love to dive in and find some weirdness? We will start with a null hypothesis but want to highlight funny or unexpected findings.

Thanks again for any help.

submitted by /u/toddwetm
[link] [comments]

Request For Airplane Ticket Price Datasets

I’m a student in college and am attempting creating a flight price predictor with machine learning algorithms.

But I have currently faced issues at finding datasets for airfare prices which include travel destinations, dates, prices, airlines and countries preferably international.

Most datasets are behind a pay wall and there are no updated tutorials which work such that I can use web scraping to get those data.

So could you guys help me with finding those resources. Thank you.

submitted by /u/Sarcasticsalad12
[link] [comments]

[self-promotion] Looking To Help With Your Data Request!

I’ve been working on a data marketplace platform where users can buy, sell, request and subscribe to data/datasets for a few months now. We have a request feature where users can submit data requests for free with descriptions, fields required, geography scope, budget etc.. Once a request is posted, it gets sent to tons of companies/organizations/data vendors that can potentially fulfill your request.

I personally know how frustrating the data acquisition process can be so we’re building this to be your one-stop shop for all data-related transactions where you don’t need to waste weeks or months dealing with different vendors/companies through slow emails and can request, negotiate and purchase all in one platform.

It’s completely free to post a request btw 🙂

We’ve been seeing some successes so hopefully we can help more and more people get the dataset they need since this subreddit has a dedicated request tag and a lot of them never get answered.

submitted by /u/nobilis_rex_
[link] [comments]

Getting Dataset With Balance Sheet Of 1000s Of Companies

the features i am looking for are:

Balance sheet has Two Section Asset and liabilites

Asset Section has two option : short term assets and long term assets

Liabilities section also has two option : short term liabilities and long term Liabilities

Short Term Asset : Cash and cash reserve,cash equivalent,Inventories,account receviables,securities etc..

Long Term Asset : Property ,plant and equipment,long term investment,all intengible assets

Short term Liabilities : Short term debts, dividend payable,trade account payable,customer deposits,current position of long term debts

Long Term Liabilities : Long term loans,deffered revenues, deffered compenstions etc..

if there is no dataset for such, there must be a website i can take these details through an API

thanks for help

submitted by /u/qhelspil
[link] [comments]

Looking For Time Histories Of The 6D Pose Of An Object In Space (ie, The 3D Location And Roll/pitch/yaw Orientation Of A Camera Or A Drone)

Real is preferred, synthetic is also fine. This has got to be a pretty common piece of data, but I’m having trouble finding it. The csv from the pose directory of https://github.com/CenekAlbl/drone-tracking-datasets/tree/master/dataset5 would be perfect, but the range of roll/pitch/yaw angles are surprisingly tight – the drone in this example stayed in pretty level flight and some more variation would be useful for visualization. Thank you!

submitted by /u/TheMeiguoren
[link] [comments]

Food-101N: Quantifying Thousands Of (Known) Errors [self-promotion]

Hello redditors,

The Food-101N dataset is a computer vision dataset that is a varient of Food-101 that has extra images and label noise added. I spent some time using an automated data correction platform to really quantify the amount of noise in this dataset. With over 100k examples, manual inspection isn’t an option.

To my surprise, I didn’t just find noise, I also found outliers, ambiguous examples, and duplicates. It was quite an eye-opener seeing thousands of issues that were not included in the “disclaimer” of added label noise by the authors.

Here’s a quick breakdown of what I found:

27,488 Mislabeled Examples 8,519 Outliers 13,538 Ambiguous Examples 17,510 (Near) Duplicate Examples.

If you’d like to read and see a bit more, you can check out the article. There are many visuals that show all of the errors that I wish I could upload here.

* Disclaimer: I am a data scientist for Cleanlab who builds Cleanlab Studio, the automated data correction platform that I used to find these issues.

submitted by /u/cmauck10
[link] [comments]

Startups/companies That Have Pivoted

Can someone help me find a dataset focused on start ups and companies, particularly those associated with Y Combinator, that have pivoted their ideas and products? I am specifically interested in knowing the original idea they pivoted from, the idea they pivoted to, and the year (batch in the case of YC). Also at the moment I am only interested in tech startups. Thanks for your time.

submitted by /u/swaptr
[link] [comments]

Tips On Comprehensive Sports Team Datasets?

Does anyone have tips on where I can find openly available and comprehensive sports dataset for an Business Intelligence assignment. Preferably I need similar datasets for two teams, and I need them to have some information that I can analyze and make predictions based on.

It can be basically any sport a this point. I have already looked at Kaggle and some other sites, but usually the datasets are lacking sufficient depth. And it is hard to find similar datasets to compare two teams from the same year, with the same information etc.

submitted by /u/Bitzer-
[link] [comments]