Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

BIC (Bank Identifier Code) To Bank Name?!

Hi! I have a dataset of BIC and am doing a master data template. The template also wants me to put in the banks name. Is there any resource where I can get a table of BIC codes with bank names I can then use to fill in the name slots via lookups?

I’ve found sites that convert the BIC codes, unfortunately one by one and I have cca 2k entries…

Any help would be appreciated! Thx

submitted by /u/Gregib
[link] [comments]

Recommendations For Extensive Datasets In Process Engineering And Optimization For End-to-End DS/DE Projects

Hi everyone,

I’m a data science researcher focusing on process engineering and optimization, and I’m looking to further strengthen my knowledge through different use cases. I’m reaching out for recommendations on extensively large datasets that can be processed using cloud platforms.

My goal is to create an end-to-end Data Science/Data Engineering project that involves ingesting these large datasets and applying domain knowledge to derive insights. I’m particularly interested in **time series** modeling, which is crucial for capturing temporal trends.

Some areas I’m considering include:

Oil and gas unit operations datasets Carbon Capture, Utilization, and Storage (CCUS) datasets FMCG manufacturing datasets, such as edible oil or biomass production Water treatment units, especially where time-sensitive data is key

To give you an idea of my background, I’ve worked on modeling and optimization in amine treating, sulfur recovery, and carbon capture datasets. I’ve also successfully developed an anomaly detection model for the Tennessee Eastman process. However, I’m eager to dive deeper into time series modeling for my next project.

Major requirements:

Focus on time series data Can involve classification or regression tasks Comparatively large datasets with many columns (variables) and datapoints

I would greatly appreciate any suggestions or pointers to datasets that align with what I mentioned.

Thanks in Advance!

submitted by /u/ryanroy0698
[link] [comments]

Value Of Historical Freight Transaction Dataset?

Hi all,

Several new partnerships/doors have opened up and allowed my business to aggregate historical (road) freight transactions. They are mostly lane/rate confirmations, and include information such as route, $ rate, shippers, carriers, brokers, etc.. They are all PDFs, but we’re working on building out a pipeline to start structurizing them.

This data is not free for us to collect, so we were debating whether or not it’s worthwhile to continue to collect this data. Are there any businesses/places this data might be useful?

submitted by /u/Interesting_Law_9138
[link] [comments]

Help Deciphering Data Sets From NCEI

I am pulling data from NCEI for some annual average temperature etc and the csv it is giving me for the local sites has a weird format I cannot figure out for temperature. What in the heck are these numbers and why is it not in Celsius?

TMP

|| || || |-0017,5| |-0028,5| |-0033,5| |-0044,5| |-0056,5| |-0067,5| |-0078,5| |-0078,5| |-0094,5| |-0089,5|

submitted by /u/agonzal7
[link] [comments]

125k LinkedIn Job Postings From 2024

Hey everyone! I created a dataset of ~125k job postings from LinkedIn with attributes like job title, description, company, compensation, benefits, zip code etc. All the postings are from the United States and over a period of ~1 week, but you can fork the repo and modify it for a specific location/keyword for real-time data.

It was originally intended both to extract some insights about the job market and help me filter live postings. Published the code to save time for anyone pursuing a similar goal.

Dataset link

Scraper link

submitted by /u/Armi2
[link] [comments]

Best Way/place To Find Specific Datasets?

Hi All, I’m currently in a bootcamp and need to find a applicable data set for the problem we are trying to solve. I’m having a hard time finding something suitable so I’m here to ask for some advice. I’m looking for a data set that has sensor data recorded at varying intervals (this part is easy) but the issue is finding a data set that also contains operational cost data as well. Any pointers on where or how to find a dataset would be very appreciated!

submitted by /u/Jeromes-in-the-House
[link] [comments]

Regression Project For Portfolio, Sugestions Please

Hi guys, I am starting to build mt DS portfolio, i already work wih DS and ML but i cannot use my job project on my portfolio due to NDA. I am having a bad time to finding some dataset or even have some ideas on ML projects such as regression, classification, etc. Do you have any sugestion of dataset or projects? (I didnt want to use kaggle datasets because some say companies dont lime projects fone with kaggle datasets too much) Aprecciate your help!

submitted by /u/pdrmrtn
[link] [comments]

Historical Loan-to-value Ratios For USA

Hi!

As part of my thesis, I am conducting an econometric analysis of the housing market in the US.

For this I really need historical LTV data, I am however having a hard time finding it for a longer time period.

The closest I have come is FRED, where they have data back to 2012.

Preferably I would need it back to year 2000 or earlier.

Any help would be greatly appreciated!

submitted by /u/NielsSm0ker
[link] [comments]

I’m Looking For The Unique Datasets For Multiple Modalities

Hello guys. I’m looking for a datasets (free only) for multiple stuff (on HF, or just Reddit subs to scrape):

Labeled music: a dataset with songs and corresponding descriptions, like tempo, key signatures, or just the way the general mood feels Discussions of super controversial, NSFW, and unethical ideas about everything from conspiracy theories to the meaning of life Role-play dialogs. Or just general dialogs but not just texting World knowledge Q&As Grammarly-like datasets, with bad and good sentences

Thanks.

submitted by /u/yukiarimo
[link] [comments]

Legally Acquired Footage Of Football Games

Hi!

As part of my thesis I would like to combine AI and football. To achieve this I would need whole match recordings of some team’s previous season. Maybe someone has recordings of their local team that I could legally use, or knows where I could get such materials(also legally pls). Thanks in advance for any help and suggestions 🙂

submitted by /u/G1b0
[link] [comments]

Looking For A Dataset With Task Descriptions, Time, And Seniority Levels – Any Suggestions?

Hi everyone,

I’m currently working on a project that requires a specific dataset type, and I’d like someone here to point me in the right direction or offer some advice.

What I need:

Task descriptions: a list of tasks or activities with explanations. Seniority levels: the seniority level (Junior, Mid, Senior) of the person who performed each task. Time taken: the factual amount of time it took to complete each task.

Where I’ve looked:

I’ve checked platforms like Kaggle, Google Datasets and some project management tools, but I haven’t found exactly what I’m looking for. I’ve also considered synthetic data generation, but I hope to find a real dataset.

Does anyone know of a dataset that fits this description? If not, any suggestions on where I might find this kind of data? Lastly, if finding a dataset is challenging, do you think web scraping could be a viable option? If so, from where?

Thanks in advance for any help or suggestions!

submitted by /u/Pretend_Cartoonist27
[link] [comments]

Just Launched: AI-Powered FragranceFinder API 🌸✨

Hi everyone,

I’m excited to share something I’ve been working on—a new AI-powered API called FragranceFinder API! 🎉

For all the data enthusiasts and developers out there, this API allows you to search through thousands of fragrances effortlessly.

Whether you’re building an app, exploring scent data, or just curious about different perfumes, this tool can help you find what you’re looking for.

Here’s what you can do with it:

Search by name, notes, or brand: Quickly locate specific fragrances or discover new ones. Get detailed information: Includes fragrance names, brands, scent notes, and even images. (The image URLs use a prefix of —just add

I’d love to hear your thoughts or feedback! If you have any questions or need help with integration, feel free to ask.

Happy scent hunting!

Best,

submitted by /u/Affectionate-Olive80
[link] [comments]

Request Your Own Data Sets From UK Supermarket Loyalty Cards

Hi guys, I developed a tool that allows you to request your data from various UK retailers. Thought you guys would appreciate being able to generate your own retailer data sets from UK grocers like Waitrose, Boots, Tescos etc.

Full disclosure, I own the site, but I don’t make money off of it, we also won’t share your data with anyone. In fact, we delete all the personal data as soon as we receive it because to us, it’s all about improving our request process. And the more users we request for, the better our relationship would be with the retailer data teams.

supermarketer.co.uk/beta

submitted by /u/SuperMarketerUK
[link] [comments]

Online Tools For Image Labeling (online Hosted Gradio)

Hi, I need to host a little site so that people from my team could all connect and label the data: more precisely, choose from two shown pictures: first picture, second picture, draw or skip. I have a vague idea of how to do this on my own PC but was wondering if there’s already an online tool for simplifying something like this. If anyone has some tips on the subject, I’d be very thankful!

submitted by /u/speedmotel
[link] [comments]

Datasets With Physical Exercises, Focused On Involved Muscles.

I’m looking for dataset with weight lifting exercises with focus on involved muscles. I don’t care for gifs, pics or training plans.

I’ve found https://github.com/yuhonas/free-exercise-db – it’s rather limited in terms of muscles involved. I’m aware of exrx.net which is quite… unfriendly license-wise or paid, although it’s pretty much perfect in terms of content quality. I found few other sources that were generally worse on both dimensions, often due to focus on visual content.

submitted by /u/teleoflexuous
[link] [comments]

Seeking Real-estate Developer Contacts

Hi all,

I’m a retail real estate investor looking to compile a list of small to mid-size retail real estate developers, specifically focused on FL, NY, NJ, TX, and GA. Ideally, I’d like to find developers with contact info like a phone number or email. Does anyone know of good databases, startups, or resources that might help? Any tips on where to look or how to go about finding this information would be greatly appreciated!

Thanks in advance!

submitted by /u/No_Way_1569
[link] [comments]

Looking For Datasets On Companies That Changed Their Logos During Pride Month

Hi all! So I’m playing around with a project on rainbow washing and was needing a dataset on companies that changed their logos online during pride month. It would pretty much be [company name] [yes/no] [year]. I’ve found one linked below for example. I’m curious if the community may know of other sources. If not, is there a manual way to hunt it down myself? Because pride month is over, all companies have already reverted their logos on social media so I won’t be able to tell. I’ve tried using wayback machine to check their social media pages during June, but it’s not showing (unless I’m doing something wrong). Thanks! https://dongou.notion.site/1f26ed07c9c84bc69c56447b9d989115?v=d8cb928e5791411cb5b86f39833d0b6d

submitted by /u/silverdrgn
[link] [comments]