Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

[self-promotion] Looking To Help With Your Data Request!

I’ve been working on a data marketplace platform where users can buy, sell, request and subscribe to data/datasets for a few months now. We have a request feature where users can submit data requests for free with descriptions, fields required, geography scope, budget etc.. Once a request is posted, it gets sent to tons of companies/organizations/data vendors that can potentially fulfill your request.

I personally know how frustrating the data acquisition process can be so we’re building this to be your one-stop shop for all data-related transactions where you don’t need to waste weeks or months dealing with different vendors/companies through slow emails and can request, negotiate and purchase all in one platform.

It’s completely free to post a request btw 🙂

We’ve been seeing some successes so hopefully we can help more and more people get the dataset they need since this subreddit has a dedicated request tag and a lot of them never get answered.

submitted by /u/nobilis_rex_
[link] [comments]

Getting Dataset With Balance Sheet Of 1000s Of Companies

the features i am looking for are:

Balance sheet has Two Section Asset and liabilites

Asset Section has two option : short term assets and long term assets

Liabilities section also has two option : short term liabilities and long term Liabilities

Short Term Asset : Cash and cash reserve,cash equivalent,Inventories,account receviables,securities etc..

Long Term Asset : Property ,plant and equipment,long term investment,all intengible assets

Short term Liabilities : Short term debts, dividend payable,trade account payable,customer deposits,current position of long term debts

Long Term Liabilities : Long term loans,deffered revenues, deffered compenstions etc..

if there is no dataset for such, there must be a website i can take these details through an API

thanks for help

submitted by /u/qhelspil
[link] [comments]

Looking For Time Histories Of The 6D Pose Of An Object In Space (ie, The 3D Location And Roll/pitch/yaw Orientation Of A Camera Or A Drone)

Real is preferred, synthetic is also fine. This has got to be a pretty common piece of data, but I’m having trouble finding it. The csv from the pose directory of https://github.com/CenekAlbl/drone-tracking-datasets/tree/master/dataset5 would be perfect, but the range of roll/pitch/yaw angles are surprisingly tight – the drone in this example stayed in pretty level flight and some more variation would be useful for visualization. Thank you!

submitted by /u/TheMeiguoren
[link] [comments]

Food-101N: Quantifying Thousands Of (Known) Errors [self-promotion]

Hello redditors,

The Food-101N dataset is a computer vision dataset that is a varient of Food-101 that has extra images and label noise added. I spent some time using an automated data correction platform to really quantify the amount of noise in this dataset. With over 100k examples, manual inspection isn’t an option.

To my surprise, I didn’t just find noise, I also found outliers, ambiguous examples, and duplicates. It was quite an eye-opener seeing thousands of issues that were not included in the “disclaimer” of added label noise by the authors.

Here’s a quick breakdown of what I found:

27,488 Mislabeled Examples 8,519 Outliers 13,538 Ambiguous Examples 17,510 (Near) Duplicate Examples.

If you’d like to read and see a bit more, you can check out the article. There are many visuals that show all of the errors that I wish I could upload here.

* Disclaimer: I am a data scientist for Cleanlab who builds Cleanlab Studio, the automated data correction platform that I used to find these issues.

submitted by /u/cmauck10
[link] [comments]

Startups/companies That Have Pivoted

Can someone help me find a dataset focused on start ups and companies, particularly those associated with Y Combinator, that have pivoted their ideas and products? I am specifically interested in knowing the original idea they pivoted from, the idea they pivoted to, and the year (batch in the case of YC). Also at the moment I am only interested in tech startups. Thanks for your time.

submitted by /u/swaptr
[link] [comments]

Tips On Comprehensive Sports Team Datasets?

Does anyone have tips on where I can find openly available and comprehensive sports dataset for an Business Intelligence assignment. Preferably I need similar datasets for two teams, and I need them to have some information that I can analyze and make predictions based on.

It can be basically any sport a this point. I have already looked at Kaggle and some other sites, but usually the datasets are lacking sufficient depth. And it is hard to find similar datasets to compare two teams from the same year, with the same information etc.

submitted by /u/Bitzer-
[link] [comments]

Percentage Of Land Covered In Flood Water And Flood Water Volume Estimates By States (USA)

I’m trying to find some data on flooding frequency and averages for land covered by flood water annually by state. I can’t seem to find any information other than specific address searches. Is there a data set available that would give me any of the information below? – % of land affected by floods annually by state OR – amount of land covered in flood water annually by state – Water volume estimates for floods I searched everywhere and I can’t seem to find anything that fits the criteria.

submitted by /u/Possibl3DumbQuestion
[link] [comments]

Seeking Dataset To Train A Mental Health Treatment Chatbot

Hey fellow Redditors,
I hope you’re all doing well. I’m reaching out to this amazing community today with a request for assistance. I am currently working on developing a mental health treatment chatbot like Woebot, and I am in need of a suitable dataset to train it effectively.

To create an effective mental health treatment chatbot, it is essential to have a diverse and comprehensive dataset. This dataset should ideally include a wide range of mental health conditions, symptoms, treatment approaches, and relevant conversations between mental health professionals and patients. By training the chatbot on such a dataset, we can ensure that it is equipped with the knowledge and empathy necessary to provide meaningful support to users.
Therefore, I kindly request the assistance of this community in locating or providing a suitable dataset for training my mental health treatment chatbot. If you have access to any relevant resources or know of any existing datasets that could be utilized for this purpose, I would greatly appreciate your input.
Additionally, if you have any suggestions, advice, or experiences related to developing a mental health treatment chatbot, I would love to hear from you. Your insights could prove invaluable in shaping the direction of this project.

submitted by /u/Amans-r
[link] [comments]

Map Instances From Wikidata And DBPedia

Is there any way to map entities from Wikidata and DBPedia.There is a method to map property type using sparql queries (eg date of birth).But is there way to map instances of classes.Lets say Michael Jackson. So given url/id of Michael Jackson from WikiData I need to find the corresponding instance in DBPedia.Can someone help me with this?Please let me know if there anything ambiguous in the question.

submitted by /u/Designer_Ad_6525
[link] [comments]

Manufacturing Dataset For Time Series Classification

Hi,

i am looking for a dataset with specific traits:

Industrial manufacturing domain Sensor data (multivariate time series) The machine is performing different operations Ideally the data is labelled according to those operation so it can be used for time series classification Open source (for purpose of a thesis)

I know there are several repositories with Industrial Datasets, but I havent found one that fits these requirements. Maybe somebody has an idea.

Thank you.

submitted by /u/GetThere2023
[link] [comments]

Lokking For Datasets For Grocery Item Detection

Hi all, I’m trying to build a classifcation model for grocery items and was wondering if anyone would know where I could get labelled grocery item data? I’ve seen a couple on kaggle but they are usually labelled under classes (i.e fruit, vegetable, animal meat) rather than a specific item like broccoli, chicken breast etc.

submitted by /u/Black_God_Ho
[link] [comments]

Median Income By Zip Code And Year In US

Hey all! Anyone know where I can get median household income data by zipcode for like 20 years ago? Trying to calculate median household income based on where people lived when they were born (sample is 18-25 years old). Seems like the US census website only has current information, but I may not be looking in the right spot. Thanks!

submitted by /u/Neurotic-raccoon
[link] [comments]

Looking For Tracking Data For Rugby Union.

Hello everyone,

I’m hoping you all can help. I am looking for a Rugby tracking data set that shows the XY position of players on the field. I know some more things exist for football, both American and European but I am really struggling to find that information for Rugby.

Anything helps if you have an idea or no somewhere I should start my search. Please let me know.

submitted by /u/abrax55
[link] [comments]

[self-promotion] 13F, 10-Q, 10-K, And 8-K Reports + OpenFigi Ids Direct To Your Snowflake Instance

Last night Cybersyn added 13Fs and OpenFigi IDs to Snowflake Marketplace.
You can leverage 13Fs to track institutional investors’ securities holdings and OpenFigi IDs (financial instrument global identifiers) to facilitate easier mapping of securities across data sources.
This release builds on the 8-K, 10-K, & 10-Q reports and attached exhibits originally available in Cybersyn SEC Filings.

submitted by /u/aiatco2
[link] [comments]

Congressional Data, Preferably With Bills Introduced

I’d like a dataset with columns for the name of the bill introduced, date introduced, title, subject, number of co-sponsors, etc.

I want to analyze (in R) congressional action related to Taiwan, so I hope to get a dataset of bills from, say, the last 5-10 congresses and evaluate how many were passed, what share had bipartisan support, and temporal trends.

I’ve researched a couple options but have tun into problems with both:

ProPublicaR Congress API — I have the API working in R, but its functions return lists, the function it suggests to turn the output into a data frame returns an error: “no method for [function] applied to an object of class list”. I’m also unsure how comprehensive the data is from this source.

GovInfo bulk data — this site has data on congressional bills, but the bills come in individual XML files and I don’t know how to get those into R (and then into a format in which I can analyze the bills as I described above)

Thanks!

submitted by /u/Rude_Inside_4089
[link] [comments]

Is There An API Or Daily Dataset For Large, In-person Event Information?

I’m looking for a way to get up-to-date information about large, in-person events happening today or in the near future (hundreds to thousands of attendees), e.g. concerts, festivals/fairs, conferences, sports, etc.

Ideally, the dataset provides simple information, like the time the event starts & ends, and the location of the event. Events could be global, but would be best if it focused on US and/or English-speaking countries.

submitted by /u/coinclink
[link] [comments]