Looking for a dataset that only contains recipes for pastries. Came across food/recipes dataset that had pastries in them but they are intermingled with other foods/cusines.
submitted by /u/ElectionJealous7922
[link] [comments]
Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?
Looking for a dataset that only contains recipes for pastries. Came across food/recipes dataset that had pastries in them but they are intermingled with other foods/cusines.
submitted by /u/ElectionJealous7922
[link] [comments]
Hey all,
I’m looking for this data set and have no idea where to get it from. Those leads don’t have a strong Github to scraping it won’t work.
Thank you!
submitted by /u/blkmamba101
[link] [comments]
every time i drive i find myself wondering what kind of data goes into decisions like stoplight vs stop sign, roundabout, etc. Or like how much collective time is wasted due to an accident. as a kid i used to think about how if an accident caused a 30 minute delay for 500 cars, that was collectively 250 hours of waste. never knew what to do with that data, lol. but anyway yeah i’ve always wanted to get access to data like this.
anyone got any other dream data sets? or even just something that’s super inaccessible if it does technically exist
submitted by /u/bhousecjs
[link] [comments]
Hi there, my first post not sure if this is the sub for it,
So I am working on a weather datasets (taken from stats can:https://climate.weather.gc.ca/index_e.html), The dataset I am working with has some missing values that I wish to fill using another dataset from a similar location. For this I found two other datasets from similar location, but both report slightly different numbers (as expected).
I wanna figure out if these differences are significant enough for me to not choose these datasets. How do I go about this? Do I use t test individually on each column? or ANOVA?
submitted by /u/Nepoleon_bone_apart
[link] [comments]
We are looking for researchers and members of AI development teams who are at least 18 years old with 2+ years in the software development field to take an anonymous survey in support of my research at the University of Maine. This may take 20-30 minutes and will survey your viewpoints on the challenges posed by the future development of AI systems in your industry. If you would like to participate, please read the following recruitment page before continuing to the survey. Upon completion of the survey, you can be entered in a raffle for a $25 amazon gift card.
https://docs.google.com/document/d/1Jsry_aQXIkz5ImF-Xq_QZtYRKX3YsY1_AJwVTSA9fsA/edit
submitted by /u/wildercb
[link] [comments]
I want some dataset recommendations as well as project ideas for making EDA projects and econpemtrics projects. I want datasets where I can perform various things like data cleaning, data visualisation and EDA. Along with give some econometric inference. Please help. Sample project examples also required.
submitted by /u/indianmanan
[link] [comments]
Hi! I have a dataset of BIC and am doing a master data template. The template also wants me to put in the banks name. Is there any resource where I can get a table of BIC codes with bank names I can then use to fill in the name slots via lookups?
I’ve found sites that convert the BIC codes, unfortunately one by one and I have cca 2k entries…
Any help would be appreciated! Thx
submitted by /u/Gregib
[link] [comments]
Hi everyone,
I’m a data science researcher focusing on process engineering and optimization, and I’m looking to further strengthen my knowledge through different use cases. I’m reaching out for recommendations on extensively large datasets that can be processed using cloud platforms.
My goal is to create an end-to-end Data Science/Data Engineering project that involves ingesting these large datasets and applying domain knowledge to derive insights. I’m particularly interested in **time series** modeling, which is crucial for capturing temporal trends.
Some areas I’m considering include:
Oil and gas unit operations datasets Carbon Capture, Utilization, and Storage (CCUS) datasets FMCG manufacturing datasets, such as edible oil or biomass production Water treatment units, especially where time-sensitive data is key
To give you an idea of my background, I’ve worked on modeling and optimization in amine treating, sulfur recovery, and carbon capture datasets. I’ve also successfully developed an anomaly detection model for the Tennessee Eastman process. However, I’m eager to dive deeper into time series modeling for my next project.
Major requirements:
Focus on time series data Can involve classification or regression tasks Comparatively large datasets with many columns (variables) and datapoints
I would greatly appreciate any suggestions or pointers to datasets that align with what I mentioned.
Thanks in Advance!
submitted by /u/ryanroy0698
[link] [comments]
Looking for a dataset of airport footprints or bounding area
submitted by /u/Upper_Distance_6882
[link] [comments]
Just curious, want ones I can use or send others without having them need to pay, etc.
submitted by /u/trace186
[link] [comments]
Hi all,
Several new partnerships/doors have opened up and allowed my business to aggregate historical (road) freight transactions. They are mostly lane/rate confirmations, and include information such as route, $ rate, shippers, carriers, brokers, etc.. They are all PDFs, but we’re working on building out a pipeline to start structurizing them.
This data is not free for us to collect, so we were debating whether or not it’s worthwhile to continue to collect this data. Are there any businesses/places this data might be useful?
submitted by /u/Interesting_Law_9138
[link] [comments]
I am pulling data from NCEI for some annual average temperature etc and the csv it is giving me for the local sites has a weird format I cannot figure out for temperature. What in the heck are these numbers and why is it not in Celsius?
TMP
|| || || |-0017,5| |-0028,5| |-0033,5| |-0044,5| |-0056,5| |-0067,5| |-0078,5| |-0078,5| |-0094,5| |-0089,5|
submitted by /u/agonzal7
[link] [comments]
Hey everyone! I created a dataset of ~125k job postings from LinkedIn with attributes like job title, description, company, compensation, benefits, zip code etc. All the postings are from the United States and over a period of ~1 week, but you can fork the repo and modify it for a specific location/keyword for real-time data.
It was originally intended both to extract some insights about the job market and help me filter live postings. Published the code to save time for anyone pursuing a similar goal.
submitted by /u/Armi2
[link] [comments]
Hi All, I’m currently in a bootcamp and need to find a applicable data set for the problem we are trying to solve. I’m having a hard time finding something suitable so I’m here to ask for some advice. I’m looking for a data set that has sensor data recorded at varying intervals (this part is easy) but the issue is finding a data set that also contains operational cost data as well. Any pointers on where or how to find a dataset would be very appreciated!
submitted by /u/Jeromes-in-the-House
[link] [comments]
Hi guys, I am starting to build mt DS portfolio, i already work wih DS and ML but i cannot use my job project on my portfolio due to NDA. I am having a bad time to finding some dataset or even have some ideas on ML projects such as regression, classification, etc. Do you have any sugestion of dataset or projects? (I didnt want to use kaggle datasets because some say companies dont lime projects fone with kaggle datasets too much) Aprecciate your help!
submitted by /u/pdrmrtn
[link] [comments]
Hi!
As part of my thesis, I am conducting an econometric analysis of the housing market in the US.
For this I really need historical LTV data, I am however having a hard time finding it for a longer time period.
The closest I have come is FRED, where they have data back to 2012.
Preferably I would need it back to year 2000 or earlier.
Any help would be greatly appreciated!
submitted by /u/NielsSm0ker
[link] [comments]
Is it even possible to find that?
I mostly just want unemployment, FDI (inflows), GDP, imports and exports
submitted by /u/Default-Name-100
[link] [comments]
Hey, I’m currently working on a project on Alzheimer’s disease. I need an audio dataset for the same. I tried looking for the dataset online, but none of them are readily available. If anyone can help me figure this out, it would be of great help!!
submitted by /u/Strange_Economist710
[link] [comments]
As the title states, I would like to find a website that has data on say how many US employees Ford had from 2000 to 2020. Or Toyota. Or GM. Or Tesla. Etc…
submitted by /u/insidiousfruit
[link] [comments]
Hello guys. I’m looking for a datasets (free only) for multiple stuff (on HF, or just Reddit subs to scrape):
Labeled music: a dataset with songs and corresponding descriptions, like tempo, key signatures, or just the way the general mood feels Discussions of super controversial, NSFW, and unethical ideas about everything from conspiracy theories to the meaning of life Role-play dialogs. Or just general dialogs but not just texting World knowledge Q&As Grammarly-like datasets, with bad and good sentences
Thanks.
submitted by /u/yukiarimo
[link] [comments]
Hi!
As part of my thesis I would like to combine AI and football. To achieve this I would need whole match recordings of some team’s previous season. Maybe someone has recordings of their local team that I could legally use, or knows where I could get such materials(also legally pls). Thanks in advance for any help and suggestions 🙂
submitted by /u/G1b0
[link] [comments]
Looking for datasets to fuel your next AI project? DatasetHunt (https://datasethunt.webflow.io/) is your go-to directory for discovering a wide range of open datasets across various domains. Whether you’re a data scientist, researcher, or enthusiast, find and access the data you need quickly and easily.
Would love to hear your thoughts—do you find it useful?
submitted by /u/hasibhaque07
[link] [comments]
Hi everyone,
I’m currently working on a project that requires a specific dataset type, and I’d like someone here to point me in the right direction or offer some advice.
What I need:
Task descriptions: a list of tasks or activities with explanations. Seniority levels: the seniority level (Junior, Mid, Senior) of the person who performed each task. Time taken: the factual amount of time it took to complete each task.
Where I’ve looked:
I’ve checked platforms like Kaggle, Google Datasets and some project management tools, but I haven’t found exactly what I’m looking for. I’ve also considered synthetic data generation, but I hope to find a real dataset.
Does anyone know of a dataset that fits this description? If not, any suggestions on where I might find this kind of data? Lastly, if finding a dataset is challenging, do you think web scraping could be a viable option? If so, from where?
Thanks in advance for any help or suggestions!
submitted by /u/Pretend_Cartoonist27
[link] [comments]
Hi everyone,
I’m excited to share something I’ve been working on—a new AI-powered API called FragranceFinder API! 🎉
For all the data enthusiasts and developers out there, this API allows you to search through thousands of fragrances effortlessly.
Whether you’re building an app, exploring scent data, or just curious about different perfumes, this tool can help you find what you’re looking for.
Here’s what you can do with it:
Search by name, notes, or brand: Quickly locate specific fragrances or discover new ones. Get detailed information: Includes fragrance names, brands, scent notes, and even images. (The image URLs use a prefix of —just add
I’d love to hear your thoughts or feedback! If you have any questions or need help with integration, feel free to ask.
Happy scent hunting!
Best,
submitted by /u/Affectionate-Olive80
[link] [comments]
Hi guys, I developed a tool that allows you to request your data from various UK retailers. Thought you guys would appreciate being able to generate your own retailer data sets from UK grocers like Waitrose, Boots, Tescos etc.
Full disclosure, I own the site, but I don’t make money off of it, we also won’t share your data with anyone. In fact, we delete all the personal data as soon as we receive it because to us, it’s all about improving our request process. And the more users we request for, the better our relationship would be with the retailer data teams.
submitted by /u/SuperMarketerUK
[link] [comments]
Looking for datasets to fuel your next AI project? DatasetHunt (https://datasethunt.webflow.io/) is your go-to directory for discovering a wide range of open datasets across various domains. Whether you’re a data scientist, researcher, or enthusiast, find and access the data you need quickly and easily.
Would love to hear your thoughts—do you find it useful?
submitted by /u/hasibhaque07
[link] [comments]
Hi, I need to host a little site so that people from my team could all connect and label the data: more precisely, choose from two shown pictures: first picture, second picture, draw or skip. I have a vague idea of how to do this on my own PC but was wondering if there’s already an online tool for simplifying something like this. If anyone has some tips on the subject, I’d be very thankful!
submitted by /u/speedmotel
[link] [comments]
I’m looking for dataset with weight lifting exercises with focus on involved muscles. I don’t care for gifs, pics or training plans.
I’ve found https://github.com/yuhonas/free-exercise-db – it’s rather limited in terms of muscles involved. I’m aware of exrx.net which is quite… unfriendly license-wise or paid, although it’s pretty much perfect in terms of content quality. I found few other sources that were generally worse on both dimensions, often due to focus on visual content.
submitted by /u/teleoflexuous
[link] [comments]