Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Real-world German Customer Service Dataset (open To Collaboration!)

hey everyone,

I’m looking for a real-world German customer service dataset for my Master’s thesis. My research focuses on analyzing linguistic patterns in customer interactions to develop a sentiment analysis model to increase quality and personalize the customer service experience. The exact focus of my study depends on the available data—so if you know of any datasets with authentic customer inquiries, support tickets, or service chat logs, tell me about it (I’m also open to collaborations!).

🫱🏽‍🫲🏻 Let’s connect!

submitted by /u/No-String-8114
[link] [comments]

Searching For The AI4Leprosy Dataset

Hi All

In the paper Reimagining leprosy elimination with AI analysis of a combination of skin lesion images with demographic and clinical data00009-6/fulltext), the authors released an open-source image- and databank for leprosy.

In the paper, they link to the dataset as “The DOI for repository can be accessed at: https://doi.org/10.35078/1PSIEL.”. This link does not work anymore.

Can someone help me find this dataset?

Thank you

submitted by /u/txtcl
[link] [comments]

Want: AP’s Database Of Military DEI Content Flagged For Deletion

War heroes and military firsts are among 26,000 images flagged for removal in Pentagon’s DEI purge

tens of thousands of photos and online posts marked for deletion as the Defense Department works to purge diversity, equity and inclusion content, according to a database obtained by The Associated Press.

The database, which was confirmed by U.S. officials and published by AP, includes more than 26,000 images that have been flagged for removal across every military branch. But the eventual total could be much higher.

WANT.

The story includes a pane with a text search, apparently connected to the whole database, but I haven’t found any way to actually download the dataset, short of scraping the pane in the story itself and automating paging through it (which would be really obnoxious and would probably not work).

submitted by /u/gnurdette
[link] [comments]

Looking For Datasets On Voice Signal Classification For Disease Recognition

Hi everyone!

I’m an undergraduate student in computer engineering, and I’m starting to work on my thesis. My goal is to perform classification on voice signals to recognize various diseases by fine-tuning an existing model.

I’ve found several datasets for Parkinson’s disease, but I’m looking for datasets covering other conditions like Alzheimer’s, ALS, etc. Ideally, a mixed dataset with multiple diseases would be great, but even single-disease datasets would be really helpful.

Since I’m still a beginner in this field, any additional advice or resources would also be greatly appreciated!

Thanks a lot!

submitted by /u/tsox_
[link] [comments]

Platforms Or APIs For Data Labeling?

Hey folks, does anyone have a solution for input-output data labeling? I just need a drag & drop or API solution where I upload a dataset, and get it processed/segmented with labels. I wanted to use Scale Rapid, but apparently they closed.

submitted by /u/vohkay
[link] [comments]

Looking For Multimodal Financial Datasets

I am currently doing a project on Multimodal Financial Sentiment Analysis and I’ve been looking for open source Multimodal financial datasets, but I couldn’t find any. Are there any open source bimodal or trimodal datasets related to financial news? Recommend if you know any. Thanks

submitted by /u/karthic2811
[link] [comments]

List Of European Countries With Country Specific Characteristics

Hi,

My small family company is selling a product in most of the European countries. We experienced a significant boom and decided to ride the wave. However, we struggle to understand why some countries outperform other as – naturally – we have never investigasted that.

Before we employ any external consultants (which are pricey), I decided to run an in-house analysis. Is there a database online with all euro countries and characteristics like “GDP per capita”, “English speaking % of the population” and/or even “Average temperature in the year”. I give these 3 random examples because from my point of view – I assume I know nothing and therefore don’t want to be biased with any assumptions. I want to have dozens or even hundreds of country-specific inputs so I can let my sales analyst to run all regressions to find any relationships.

Sorry I don’t use a data science language but I hope you understand my question. Would be grateful for any support 🙂

submitted by /u/4681744148
[link] [comments]

Room Furnishing AI Model CSV Dataset

I am working on a model that helps users design their different rooms (e.g. bathrooms, bedrooms, etc..). The model should take the room type, the room dimensions and the furniture in the room and should predict the positions in the 2D-layout (X-Y coordinates) and which wall these fixtures are placed on

submitted by /u/Alive-Examination819
[link] [comments]

Looking For Full Dubai Real Estate Transaction Data (2023 & 2024)

I’m looking for the full real estate transaction data for Dubai from the last two years (2023 & 2024).

I know that Dubai Land Department provides open data through two sources:

  1. Dubai Land Department Open Data – provides only the current year’s data but includes a parking field as a string.

  2. Dubai Pulse – provides data from all years but lacks the parking field.

I can easily download the 2025 data from Dubai Land Department, but I want the complete dataset for 2023 and the full 2024 transactions (at least the last 6 months of 2024 so far). I’ve found some partial datasets on GitHub but not the full one.

Has anyone downloaded the complete dataset or at least the last 6 months of 2024? If so, I’d appreciate it if you could share or point me in the right direction. Thanks!

submitted by /u/Competitive_Put_8758
[link] [comments]

Dataset For Normal Or Clear Skins To Classify Them From Abnormal Ones..??

I was trying to get a binary classification for normal skin and abnormal one? While i can get many images for abnormal skins, idk where I can get images for clear or normal skins… While i can make some myself, it won’t be nearly enough to balance with the abnormal skins. Is there any place i could get images for normal skin? With no abnormalities that is

I would need diverse images too, like from face, hand thigh, feet, between toes, behind ear, neck, armpit, basically every place. Also diverse in age, gender and skin types, and race.

submitted by /u/Damn_thats_hottt
[link] [comments]

World Development Indicator Dataset From World Bank And IDP/Refugees

Trying to figure out something – does anyone know if IDPs/refugees are included in stats on employment/unemployment, vulnerable emplyment, ag employment from the WDI dataset from the WB?

i’m trying to figure out what happened in somalia with 18m population and over 4m IDPs and Refugee populations. Their ag industry only emplys 25% of the workforce (much, much lower than the rest of africa), vulnerable employment is 45% (also much lower than other african countries, but usually is inclusive of ag employment) and unemplyment is 18%. Trying to figure out where the IDPs fit in. if you didn’t know there was a conflict there, it looks like the formal employment sector is doing good.. but of course it isn’t.

Old reports say 80% of employment is in ag.. but that is such an anomoly!

Thanks for any insight.

submitted by /u/nowheresmiddle99
[link] [comments]

Looking For Realtor Contacts With Active Short Sale Listings (150+ DOM, $500K+)

I’m looking for contact info for realtors with active short sale listings nationwide, specifically properties that have been on the market for 150+ days and are priced at $500K or more. Ideally, I need agent details, MLS IDs, and listing info.

This type of data usually comes from MLS, Zillow, Redfin, or real estate aggregators like PropStream or CoreLogic.

If anyone has access to this or knows where to find it, I’d appreciate the help! Feel free to DM me or drop a comment.

Thanks! 🙌

submitted by /u/Nandhagopalakrishnan
[link] [comments]

Looking For Datasets On Manufacturing Equipment Faults/failures For ML Project

I’m working on an AI project focused on predicting equipment failures in manufacturing settings. I’m looking to build a machine learning pipeline in PyTorch that can identify patterns leading to failures before they happen, so what I’m looking for is time series datasets from manufacturing equipment, labelled data with failures,

preferably real world data, but high quality synthetic datasets would also work

open source or academic datasets that can be used for university projects

Im interested in any industry. I know companies often keep this data private, but there must be some research datasets or anonymized industrial data available. If anyone is interested in supporting this project, please let me know, I will make sure to anonymise any industrial data given

submitted by /u/mayodoctur
[link] [comments]

Audio Dataset Of Real Conversations Of Between Two Or More People (hopefully With Transcriptions As Well)

All I can find are one-word audio files. So far, I found Meta’s mmcsg dataset, but it’s only between two people. I’m artificially adding noise to it, but I need more.

(I know I can generate a transcription using whisper, but it tends to be hit or miss, especially with the large models. I’m not looking to retrain whisper, I’m doing an entirely different concept)

submitted by /u/vardonir
[link] [comments]

Need Help Finding Snapchat DAU Dataset

I came across this Snapchat DAU dataset on Statista but I can’t afford to buy the subscription to be able to access it. Do any of you know how I can access this or if I can get it elsewhere.Couldn’t find it on Kaggle,UCI, or any other data source websites. Need it for a time series forecasting project:(

submitted by /u/Relative-Ear-1356
[link] [comments]

Need Help With Finding Datasets U.S Or EU

Hello everyone,

I’m a CS major working on a project for my Advanced Data Structures class. My idea is to develop an app that optimizes routes for emergency responders by analyzing traffic density, 911 calls, and past response routes to recommend the fastest possible paths. Now the issue I have is finding recent datasets for traffic density, emergency response times, and road networks—especially for Boston (but I’d be happy with data from anywhere in the U.S. or Europe). Most datasets I’ve found are either outdated or incomplete.

Does anyone know where I can find:

Live or historical traffic density data Emergency response datasets Road network data

Any help would be appreciated, thanks in advance!

submitted by /u/BottleDisastrous
[link] [comments]

What Real Estate Sales Data Is Already Out There That I’m Overlooking?

In the past, I’ve posted here looking for specific real estate data, but this time I want to flip the question around.

Rather than trying to create my own dataset from scratch, I’m curious to learn what existing data is already out there regarding residential real estate sales that’s either free or inexpensive to access.

I’m especially interested in datasets covering things like:

Sale prices Time on market Property details (beds, baths, square footage, etc.) FSBO (For Sale By Owner) vs. agent-listed transactions Regional trends

Before I invest the time into building something from the ground up, I’d love to know:
What sources have you found surprisingly useful? What data might already be hiding in plain sight—whether public records, government databases, or other unexpected places?

Thanks so much for any insights!What Real Estate Sales Data Is Already Out There That I’m Overlooking?

submitted by /u/Ykohn
[link] [comments]

C++ Dataset Needed Where There Is A Question Giving With The Responce Code From A Student AND A Teacher.

i need a dataset where there should be a question based on which a students writes a code then a teacher writes a code. I tried to find it on the web but came up with nothing. If both student and theacher’s code in a single file is not possible I would also like a seperate dataset meaning the questions are not the same for both parties. I need this to compare the quality of the code.

Thank you!

submitted by /u/Rotten-Apple420
[link] [comments]