Title
submitted by /u/Snoo_72181
[link] [comments]
Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?
No affiliation but love their dashboards (example: https://www.gridstatus.io/live/caiso) and mentions in my local news station.
Looks like they have an API to access their data.
submitted by /u/semicausal
[link] [comments]
I’m working on a marketplace for selling datasets and decided to discuss the idea with the community here. The goal is to connect ML teams/researchers with the exact datasets that they need. These would be high quality and like any other marketplace would be quality controlled via reviews/comments.
Would any of you find a need for this if the selection was robust enough and quality was good? Would you pay for it? Or are you finding what you need mostly free in the public domain? Curious to get your thoughts
submitted by /u/brequinn89
[link] [comments]
Anyone have a dataset of English bigrams and/or trigrams extracted from the OpenSubtitles dataset?
Preferably Creative Commons.
So far I’m only aware of this frequency list: https://github.com/hermitdave/FrequencyWords
submitted by /u/CheBiblioteca
[link] [comments]
I am building a social platform and I want to use AI to predict what are some of the user’s interests. I imagined that when you post something on the platform an AI model would tag this post with example “funny”, “politics”, “technology”, “entertainment”, “other”, etc. Now I need a dataset with an example of a post and with a tag e.g. “politics”. Do you know any datasets that would meet my expectations and requirements.
submitted by /u/RokKuz3
[link] [comments]
Is there any way to get a good amount of english songs (audio – if possible, with lyrics) dataset.
submitted by /u/Sri_Krishna_Kireeti
[link] [comments]
I want to do a project that enables me to use big data technologies(Spark,Scala, etc.) and was told big data is in the realm of 500,000+ rows of data or a certain size(forget how many Mb). Where can i find data sets this big for analysis?
submitted by /u/Nickaroo321
[link] [comments]
I nees customer support chat for an insurance company, will be greatly if someone here can provide me with that.
submitted by /u/S_m_nt
[link] [comments]
it must be about sports or etc. i will use it for my project.
submitted by /u/North-Ad2847
[link] [comments]
Hey r/datasets! I’m working on a startup and will pay $200+ for datasets from small & medium businesses!
All kinds of datasets related to SMBs are welcome — timesheets, balance sheets, payroll, expenses, etc.
Along with the dataset, please submit 15 questions which can be answered using your dataset. For example: “What was the best selling item in January 2022? Who is the top performing salesperson in this dataset? How many products were purchased in this dataset?”
Please comment if you’re interested — thank you so much in advance.
submitted by /u/mewolove
[link] [comments]
Is translating data something you have to deal with often? How do you typically solve this? I tried to build something that automates dataset translation, and I’m curious to understand if other folks struggle with this often. Would love to get your thoughts and input on the topic.
What is it: A script that automatically translates any dataset to your language of choice, using the Google Cloud Translation API. The example uses a dataset with dummy customer data, which gets translated from English to German.
Why use it: To create reports and dashboards in multiple languages. The output feeds directly into an embedded BI tool (in the project, I used Luzmo), and the script can be run on any dataset out of the box. With heavier modifications to the script, you could also store the translated data in a database, data warehouse or other destination.
Who it’s for: Software developers, product managers or data engineers who are working on multi-lingual apps, especially for analytical features, dashboards or reports.
How it works: There’s a GitHub repo you can clone, and a tutorial to walk you through the full set-up. Once you have the script up and running, you can run it repeatedly on any dataset, with any language.
Would love to get your feedback on whether this is useful, as well as any improvements that could make it better!
submitted by /u/InsightScripter
[link] [comments]
Folks,
I need help with dataset of walking, walk score or any other index on walkability of cities across Europe.
Any help, pointers will be much appreciated.
Thank you.
submitted by /u/BBjayjay
[link] [comments]
Hello!
I’m doing a project on auto-grading handwritten exam papers and so am looking for a dataset to help me with that. I want to specifically do this project for auto-marking GCSE/A level exam papers but it seems that no dataset with answered papers exist, so I am looking for alternatives. I am new to ML projects so any advice would be very much appreciated. Thanks!
submitted by /u/cakeandflowers2202
[link] [comments]
All my searching so far leads me to suspect that this is a dataset that does not exist. There are a bunch of datasets that primarily focus on examples of English-to-ZOL, but the creators always insist on throwing some first-order logic in there as well. I can explain why that’s a problem if anyone is genuinely curious (as opposed to simply wanting to have an argument.)
TL;DR: I need a dataset that makes a point out of including examples of English (when a sentence actually allows for it) being translated only into ZOL, no higher-order logic whatsoever.
submitted by /u/evangelos520
[link] [comments]
I want to play with global elevation data, but I’m not good at parsing special files. Is there a simple text format dataset of global elevation? Something like a CSV of
LONGITUDE, LATITUDE, ELEVATION 0,0,0
It doesn’t have to be super-high resolution. I’ve found a few sources, but I don’t know how to parse an hgt or kml file.
submitted by /u/stable_maple
[link] [comments]
Hello,
I am looking for a way to get all contributors of the Linux kernel GitHub repo, and then also get all followers from each contributor, preferably in python.
Unfortunately i have never done anything in this direction, i need this for a course at uni.
Is there any way to do this? if so, which programs, library or tutorials can recommend?
Cheers!
submitted by /u/ro-oope
[link] [comments]
Hello can someone help me get the code for this please “Feature Programming for Multivariate Time Series Prediction ” and i searched on the internet and i couldn’t find.
submitted by /u/Fantastic_Prize7240
[link] [comments]
I am looking a dataset of tweets or subreddits which contains the information of location and time period ideally between 2010-2020.
submitted by /u/FakeNinshu
[link] [comments]
I’m working on a project and I need some free food recipes, can’t really buy any data currently so I was wondering if the data is already out there before trying to scrape it.
submitted by /u/DumShrimp
[link] [comments]
Hello everyone, I am currently working on a project that involves using R-based statistical analysis to improve precision plant growth and farming in greenhouses. I have generated a data set for a few plants, but it is not very efficient as it is randomly generated. Therefore, I am wondering if there is a real-life data set available for a few plants that includes sensor readings for temperature, humidity, and light intensity. If anybody has accomplished anything similar to this, I would very appreciate hearing about it.
submitted by /u/Biocandy93
[link] [comments]
This is a dataset including text from South Africa’s 84-page case submitted to the International Court of Justice accusing Israel of committing genocide against the people of Gaza.
Link to Dataset: https://www.kaggle.com/datasets/samerhijjazi/south-africa-genocide-case-against-israel-2324
Original source: https://www.pbs.org/newshour/world/read-the-full-application-bringing-genocide-charges-against-israel-at-un-top-court
submitted by /u/Embarrassed-Big-5823
[link] [comments]
I need dataset which have information like place attachment or like their favourite places for different kinds of people… The dataset should be small to moderately sized
submitted by /u/StrengthNo3171
[link] [comments]
Most people can agree that data is the new gold. There is a lot of valuable data that companies own that their customers, partners, or other companies could use and make money for both sides, so I am surprised there isn’t more data products out there especially for small-medium businesses.
Curious for the community’s thoughts on the biggest barriers of selling data (I guess both for data companies but also for other companies who just want to make extra revenue?)
submitted by /u/kitkat_126
[link] [comments]
I am looking for a data set that includes state-by-state data on the number of commercial pools and commercial elevators in the United States.
I have tried looking at government data state by state but there are a lot of inconsistencies and some states have no information available. I am looking to complete a project that requires me to look at all of the locations for pools and elevators.
Does anyone know where this data would exist? Any pointers or tips that anyone may have to lead me in the right direction would be greatly appreciated. TYIA!!
submitted by /u/ilovemarketresearch
[link] [comments]
I wanted to build some AI projects in this domain by employing models like time-series forecasting, computer vision, probably some sort of NLP as well as classic techniques like regression, classification, clustering.
submitted by /u/Snoo_72181
[link] [comments]
As the title says, I need a data set containing noisy medical images so that I can apply Denoising algorithms on em and maybe try new things. I have to convey the data set I would be using to my project guide by this Saturday and I am unable to find one. All the medical image data sets I find online are pure images. I want medical image data sets containing noisy images as well as the ground truth. Please help me someone.
submitted by /u/No1_unpredictablenin
[link] [comments]
Greetings,
I am looking for datasets on salinity, specifically on Bangladesh as my supervisor instructed me to do so. I have found few repositories that are paid. It would be helpful if I could find some free resources.
TIA
submitted by /u/RealFeature4520
[link] [comments]
Looking for a smaller sample size, around n=100-1000 or so, w/ a small number of variables, one of which is an ordinal variable. Preference for csv or excel files, as well as preference for government, or University data, but not a stringent requirement. I’ve been looking for a few days on Kaggle & UC Irvine Machine learning repository & haven’t had much luck so far.
submitted by /u/GhostGlacier
[link] [comments]