Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

[URGENT ]Seeking Point Of Sale (POS) Or Sales Data For Academic Capstone Project (Authorized By IIT Madras)

Hi everyone,

I’m currently working on a business analytics project as part of my academic work at IIT Madras, and I’m seeking access to Point of Sale (POS) data or any related sales/transactional datasets from any business.

Purpose: The data will be used strictly for educational and analytical purposes to explore trends, build predictive models, and derive business insights.

What I’m looking for:

->POS data (product ID, timestamp, quantity, price, etc.)

->Inventory or stock movement records

->Sales by region, time, or category

If you or your organization is willing to help, or if you can point me in the right direction, I’d be incredibly grateful! I’m also open to signing NDAs or any data use agreements as needed.

Any suggestions are also welcomed
Thank You

submitted by /u/midhunreddy
[link] [comments]

Looking For High Quality Datasets Of Plastic Litter On Ground And Water

Hello everyone,

I’m a third-year undergrad student pursuing a degree in Artificial Intelligence and Machine Learning. For my Deep Learning course project, I’m planning to build a model that detects plastic litter both on the ground and in water.

I’m specifically looking for dataset suggestions — preferably satellite or aerial imagery datasets — that could help with training and testing such a model.

If you know of any publicly available datasets, research projects, or organizations that might share relevant data, I’d greatly appreciate your recommendations.

Thanks in advance!

submitted by /u/CartographerOk858
[link] [comments]

API For Historical US Stock Prices & Financial Statements : Feedback Welcome

Hey everyone,

I put together an API to make it easier to get historical OHLCV stock prices and full financial statements (income, balance sheet, cash flow) without scraping or manual downloads.

The API:

  • Returns quarterly reports in JSON format
  • Provides complete price history for any US stock
  • Is accessible via RapidAPI for easy integration

Could you give me some feedback on:

  • Any missing data fields
  • How easy it is to integrate into Python/JS workflows
  • Other endpoints you’d want added

Here is the link : https://rapidapi.com/vincentbourgeois33/api/macrotrends-finance1

Thanks for checking it out!

submitted by /u/gozunoob
[link] [comments]

Learning Data Science — I’ll Clean & Analyze Your Messy CSV/Excel Data For Free

Hey everyone,
I’m learning data science and want to build my skills by working on real-world data. If you have any messy datasets (CSV, Excel, Google Sheets) that need:

  • Cleaning (removing duplicates, handling missing values, etc.)
  • Structuring
  • Basic analysis or summary
  • Visualizations (charts, graphs)

…I’d be happy to do it completely free — no catch.

You get clean data and maybe some cool insights. I get practice and accountability.
Drop me a message or comment below if you’re interested — I’ll handle only a couple of small projects each week to give proper focus.

Thanks!

submitted by /u/Comfortable_Gene_269
[link] [comments]

Looking For A Student Sentiment Analysis Dataset (Mental Health / Education Feedback)

Hi everyone!
I’m working on a final year project related to sentiment analysis on students, aiming to explore aspects like mental health, teacher behavior, course feedback, class schedules, and academic stress.

I’m looking for a dataset that contains:

  • Student responses or posts (can be survey-based, forum discussions, or open-ended feedback)
  • Labeled sentiments (positive/neutral/negative) or at least raw text suitable for labeling
  • Data like year of study, age range, CGPA, course/subject, etc.

Does anyone know of such a dataset or where I might find something similar (publicly available or open for research use)? Any help or direction is greatly appreciated!

Thanks in advance!

submitted by /u/Particular_Meat_2304
[link] [comments]

Where To Find Super Rare Diseases Dataset

for eg , let say Fusariosis (Fusarium infections) or Candida auris Infection , i wanted to train my model on these diseases for a research paper but no good dataset till now , if anyone can help me thanks
if not , then i will just increase the saturation , rotate them , add noise and do stuff like that to train

submitted by /u/Dapper_Owl_361
[link] [comments]

Where Do You Find Real Messy Datasets For Portfolio Projects That Aren’t Titanic Or Iris?

I swear if I see one more portfolio project analyzing Titanic survival rates, I’m going to start rooting for the iceberg.

In actual work, 80% of the job is cleaning messy, inconsistent, incomplete data. But every public dataset I find seems to be already scrubbed within an inch of its life. Missing values? Weird formats? Duplicate entries?

I want datasets that force me to:
– Untangle inconsistent date formats
– Deal with text fields full of typos
– Handle missing data in a way that actually matters for the outcome
– Merge disparate sources that almost match but not quite

My problem is, most companies won’t share their raw internal data for obvious reasons, scraping can get into legal gray areas, and public APIs are often rate-limited or return squeaky clean data.

The difficulty of finding data sources is comparable to that of interpreting the data. I’ve been using beyz to practice explaining my data cleaning and decision, but it’s not as compelling without a genuinely messy dataset to showcase.

So where are you all finding realistic, sector-specific, gloriously imperfect datasets? Bonus points if they reflect actual business problems and can be tackled in under a few weeks.

submitted by /u/Various_Candidate325
[link] [comments]

Help Finding/making Dataset For Car Sales

I’m doing a history project on British cars, and I need datasets regarding car sales in Britain going back to at least the 50s, on cars like the Mini, Rolls Royces and Aston Martins. I’ve poked around a bit already, but I can’t find anything that goes back far enough. I want to be able to reference the data sets to see how various forms of advertising (like TV commercials or celebrity endorsement) affected car sales. Would love some help putting all this together!

submitted by /u/Mundane_Purchase_337
[link] [comments]

[R] VQG Dataset Query: Generating Questions For Geometric Shapes

So i have to make a VQG model that takes image containing geometrical shapes can be multiple and to generate questions like how many type of shapes are there, which is the biggest shape, what color is the square of etc So i have the images now the questions are left i was thinking of annotating the images like types of shapes, color,size etc and use them in some scripts for question like What is (shape_name) color etc So what are your suggestion what to annotate or how to make questions? Thanks

submitted by /u/SyedUmer1
[link] [comments]

Dexa Scan Dataset (Image / Bodyfat Pairs) Needed

I’m working on a project that requires a dataset containing body images paired with accurate body fat percentage measurements.

I’ve found several DEXA scan datasets, but they only include anthropometric data and no images. I’ve also scraped a number of publicly available images and estimated body fat visually, but I’m looking for a more accurate dataset.

If anyone can recommend an existing dataset or suggest ways to acquire such data, I’d really appreciate it.

submitted by /u/Unable-Bonus-9992
[link] [comments]

Dataset Explorer – Tool To Search Any Public Datasets (Free Forever)

Dataset Explorer is now LIVE, and will stay free forever.

Finding the right dataset shouldn’t be this painful.

There are millions of quality datasets on Kaggle, data.gov, and elsewhere – but actually locating the one you need is still like hunting for a needle in a haystack.

From seasonality trends, weather data, holiday calendars, and currency rates to political datasets, tech layoffs, and geo info – the right dataset is out there.

That’s why we created dataset-explorer. Just describe what you want to analyze, and it uses Perplexity, scraping (Firecrawl), and other sources to bring relevant datasets.

Quick example: I analyzed tech layoffs from 2020–2025 and found:

📊 2023 was the worst year — 264K layoffs 🏢 Post-IPO companies made 58% of the cuts 💻 Hardware firms were hit hardest — Intel topping the list 📅 Jan 2023 = worst month ever — 89K people lost jobs in 30 days

Once you find your dataset, you can run a full analysis for free on Hunch, an AI data analytics platform.

Dataset Explorer – https://hunch.dev/data-explorer Demo – https://screen.studio/share/bLnYXAvZ

Give it a try and let us know what you think.

submitted by /u/matkley12
[link] [comments]

[self-promotion] WildChat-4.8M: 4.8M Real User–Chatbot Conversations (Public + Gated Versions)

We are releasing WildChat-4.8M, a dataset of 4.8 million real user-chatbot conversations collected from our public chatbots

  • Total collected: 4,804,190 conversations from Apr 9, 2023 to Jul 31, 2025.
  • After removing conversations flagged with “sexual/minors” by OpenAI Moderations, 4,743,336 conversations remain.
  • From this, the non-toxic public release contains 3,199,860 conversations (all toxic conversations removed from this version).
  • The remaining 1,543,476 toxic conversations are available in a gated full version for approved research use cases.

Why we built this dataset:

  • Real user prompts are rare in open datasets. Large LLM companies have them, but they are rarely shared with the open-source communities.
  • Includes 122K conversations from reasoning models (o1-preview, o1-mini), which are real-world reasoning use cases (instead of synthetic ones) that often involve complex problem solving and are very costly to collect.

Access:

Original Source:

submitted by /u/yuntiandeng
[link] [comments]

Fundamentals Of Deep Learning Building Practical Deep Learning Projects

Deep learning is revolutionizing industries by enabling computers to learn from complex data with remarkable accuracy. From training your first CNN to leveraging pre-trained LLMs, the fundamentals covered in this article provide a solid foundation for building AI solutions. By mastering tools like PyTorch, techniques like transfer learning, and applications in computer vision and NLP, you’re well-equipped to tackle real-world challenges. Whether creating a personalized doggy door or classifying fruit, deep learning opens a world of possibilities. Start experimenting, set up your AI environment, and join the global community driving innovation through deep learning.

https://open.substack.com/pub/ahmedgamalmohamed/p/fundamentals-of-deep-learning?r=58fr2v&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true

submitted by /u/ahmed4929
[link] [comments]

911 Calls Analysis For A Research Project

hello, I have a research project about 911 calls, I need a dataset for 911 call audio to listen to them to analysis them and answer our research questions

if you know AI model to listen to calls and analyze them, please share it with me

also if there are publications about analysis of 911 audio calls, please share them with me

submitted by /u/AhmedUSMLE
[link] [comments]

Looking For Some Kind Of Data Correlated With BT Corn Adoption

I have a resource showing BT, HT, and hybrid GMO corn adoption in the years since 2000 and I want data that correlates with it somehow.

Examples:

-European Corn Borer Populations (By State)

-European Corn Borer Diversity/Species Richness (By State)

-European Corn Borer Larvae In Non-BT Corn (By State)

-European Corn Borer Larvae In (Crop other than BT Corn) By State

-Non-BT Corn Deaths Due to Insects

-(Crop other than BT corn) Deaths due to Insects

If anyone knows how to get data related to anything above, it would be a lot of help. It can be a species other than European Corn Borers and a crop other than corn. It can also be about weeds instead of insects.

submitted by /u/Empty-Wing7678
[link] [comments]

Built An IDE For Web Scraping — Introducing Crawbots

We’ve been working on a desktop app called Crawbots — an all-in-one IDE for web data extraction. It’s designed to simplify the scraping process, especially for developers working with Puppeteer, Playwright, or Selenium.

We’re aiming to make Crawbots powerful yet beginner-friendly, so junior devs can jump in without fighting boilerplate or complex setups.

Would appreciate any thoughts, questions, or brutal feedback

submitted by /u/varvolta
[link] [comments]