Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

[self-promotion] Daily Updated Sephora Australia Skincare Sales (by Category, Brand, And Promotion %)

I’ve been tracking Sephora Australia’s skincare promotions and put together a dataset that might be useful for anyone studying beauty retail, pricing, or promotions.

  • Covers all skincare products currently on sale
  • Organized by category and subcategory
  • Further grouped by brand and promotion %
  • Updated daily
  • Free to view and explore

Here’s the link: [https://www.kungfutemplate.com/What-s-on-Sale-Today-Australia-Sephora-2763de239fe3801f82fefe478cd72c53?source=copy_link ]

Hope it helps anyone interested in retail analytics, consumer behavior, or just curious about beauty sales trends

submitted by /u/IntelligentHome2342
[link] [comments]

[Tool] I Built A Free Web Tool To Automatically Join And Enrich Different Datasets Using AI.

Hey r/datasets,

I’ve often found amazing related datasets on this sub and elsewhere, but combining them for a project was always a manual chore. If the column names or key formats didn’t line up, it meant breaking out Python scripts.

To make this easier, I built a free tool called Datum Fuse AI.

The main goal is to help you take two separate datasets and quickly harmonize and join them. For example, if you have a CSV with country names and another with country codes, it can help you merge them.

Key features:

  • AI suggests how to map columns between two files.
  • It can join the files based on your mapped keys.
  • It can also augment a dataset with things like Geolocation (City/State/County from a Zip Code column) or add a column for US Holidays if your data is time-based.

It’s in free public beta right now. I’m hoping it can be a useful utility for this community when you’re working on your data projects. I’d appreciate any feedback on what other features or augmentations would be helpful.

Check it out at: https://www.datumfuse.ai

Thanks!

submitted by /u/Bootes-sphere
[link] [comments]

[Request] IEEE DataPort Datasets: PV Arrays: Suffled Frog Leaping Algorithm And Other MPPTs Under Partial Shading – PSIM Model

We have a college project coming ahead. Please help sharing this dataset for us. Thanks ahead

Fábio José Rodrigues, Fernando Marcos de Oliveira, Oswaldo Hideo Ando Junior, “PV arrays: Suffled Frog Leaping Algorithm and other MPPTs under partial shading – PSIM model”, IEEE Dataport, July 23, 2024, doi:10.21227/a1m0-gs94

https://ieee-dataport.org//documents/pv-arrays-suffled-frog-leaping-algorithm-and-other-mppts-under-partial-shading-psim-model

submitted by /u/Vivid-Turnover-620
[link] [comments]

Need Real Dataset Like Mimic-iv For ML Model

Can You give me real dataset contaning department like icu,telemetry,medical,surgery in bedtype and departments like oncology,cardio,etc with real los Around 1000 rows atleast I am working on an AI model to reduce LOS but the current one I was using is synthetic which has data like in ICU a patient admitted for 2 mins only Which ks not logical so can you help me out ?

submitted by /u/Time_Photograph6748
[link] [comments]

Global Urban Polygons & Points Dataset, Version 1

Hi there!

I am doing a research about urbanisation of our planet and rapid rural-to-urban migrations trends that are taking place in the last 50 years. I have encountered following dataset which would help me a lot, however I am unable to convert it to excel-ready format.

I am talking about Global Urban Polygons & Points Dataset, Version 1 from NASA SEDAC data-verse. TLDR about it: The GUPPD is a global collection of named urban “polygons” (and associated point records) that build upon the JRC’s GHSL Urban Centre Database (UCDB). Unlike many other datasets, GUPPD explicitly distinguishes multiple levels of urban settlement (e.g. “urban centre,” “dense cluster,” “semi‑dense cluster”). In its first version (v1), it includes 123 034 individual named urban settlements worldwide, each with a place name and population estimate for every five‑year interval from 1975 through 2030.

So what I would like to get is an excel ready dataset which would include all 123k urban settlements with their populations and other provided info at all available points of time (1975, 1980, 1985,…). On their dataset landing page they have only .gdbtable, .spx, similar shape-files (urban polygons and points) and metadata (which are meant to be use with their geographical tool) but not a ready-made CSV file.

I have already reached out to them, however without any success so far. Would anybody have any idea how to do this conversion?

Many thanks in advance!

submitted by /u/Important_Load2334
[link] [comments]

Building My First Data Analyst Personal Project | Need A Mentor!!!

So, I am currently looking out for job opportunities as a Data Analyst. Now what I have realized is that talking about the work you have done and showcasing them are far more worth than gaining certificates.
so this is my Day 1 in journey of building projects, also my first project to work on my own.
I work better in a team, so if there are people out there who’d want to join me in my journey and work on projects, join me

submitted by /u/Puzzleheaded_Mud1923
[link] [comments]

Data Analysis In Excel| Question|Advice

So my question is, after you have done all technical work in excel ( cleaned data, made dashboard and etc). how you do your report? i mean with words ( recommendations, insights and etc) I just want to hear from professionals how to do it in a right format and what to include . Also i have heard in interview recruiters want your ability to look at data and read it, so i want to learn it. Help!

submitted by /u/dollywinnie
[link] [comments]

Looking For Free / Very Low-cost Sources Of Financial & Registry Data For Unlisted Private & Proprietorship Companies In India — Any Leads?

Hi, I’m researching several unlisted private companies and proprietorships (need: basic financials, ROC filings where available, import/export traces, and contact info). I’ve tried MCA (can view/download docs for a small fee), and aggregators like Tofler / Zauba — those help but can get expensive at scale. I’ve also checked Udyam/MSME lists for proprietorships.

submitted by /u/Interesting-Chef6209
[link] [comments]

Why Is Modern Data Architecture So Confusing? (and What Finally Made Sense For Me – Sharing For Beginners)

I’m a data engineering student who recently decided to shift from a non-tech role into tech, and honestly, it’s been a bit overwhelming at times. This guide I found really helped me bridge the gap between all the “bookish” theory I’m studying and how things actually work in the real world.

For example, earlier this semester I was learning about the classic three-tier architecture (moving data from source systems → staging area → warehouse). Sounds neat in theory, but when you actually start looking into modern setups with data lakes, real-time streaming, and hybrid cloud environments, it gets messy real quick.

I’ve tried YouTube and random online courses before, but the problem is they’re often either too shallow or too scattered. Having a sort of one-stop resource that explains concepts while aligning with what I’m studying and what I see at work makes it so much easier to connect the dots.

Sharing here in case it helps someone else who’s just starting their data journey and wants to understand data architecture in a simpler, practical way.

https://www.exasol.com/hub/data-warehouse/architecture/

submitted by /u/UnusualRuin7916
[link] [comments]

Looking For Real‑Time Social Media Data Providers With Geographic Filtering

I’m working on a social listening tool and need access to real‑time (or near real‑time) social media datasets. The key requirement is the ability to filter or segment data by geography (country, region, or city level).

I’m particularly interested in:

  • Providers with low latency between post creation and data availability
  • Coverage across multiple platforms (Twitter/X, Instagram, Reddit, YouTube, etc.)
  • Options for multilingual content, especially for non‑English regions
  • APIs or data streams that are developer‑friendly

If you’ve worked with any vendors, APIs, or open datasets that fit this, I’d love to hear your recommendations, along with any notes on pricing, reliability, and compliance with platform policies.

submitted by /u/To_Iflal
[link] [comments]

[Resource] A Hub To Discover Open Datasets Across Government, Research, And Nonprofit Portals (I Built This)

Hi all, I’ve been working on a project called Opendatabay.com, which aggregates open datasets from multiple sources into a searchable hub.

The goal is to make it easier to find datasets without having to search across dozens of government portals or research archives. You can browse by category, region, or source.

I know r/datasets usually prefers direct dataset links, but I thought this could be useful as a discovery resource for anyone doing research, journalism, or data science.

Happy to hear feedback or suggestions on how it could be more useful to this community.

Disclaimer: I’m the founder of this project.

submitted by /u/Winter-Lake-589
[link] [comments]

Looking For A Dataset For Project!! (stock Prediction Using Sentiment Analysis)

Any recommendations for datasets even remotely close to below structure plzz recommend

|| || |Comapny ticker|DJIA value of company on Day3(t-2)|DJIA value Day2(t-1)|DJIA value Day1(t)|Twitter Sentiment about company on day3|Twitter Sentiment on day2|Twitter Sentiment on day1|label : prediction (up or down)(t+1)|

where, day 3 is day before yersterday, day 2 is yesterday, day 1 is today and prediction(label) is of tomorrow.

Also, any recommendations for datasets on stock related tweets too!!

submitted by /u/Dull-Assignment-3273
[link] [comments]

What’s The Smoothest Way To Share Multi-gigabyte Datasets Across Institutions?

I’ve been collaborating with a colleague on a project that involves some pretty hefty datasets, and moving them back and forth has been a headache. Some of the files are 50–100GB each, and in total we’re looking at hundreds of gigabytes. Standard cloud storage options don’t seem built for this either they throttle speeds, enforce strict limits, or require subscriptions that don’t make sense for one off transfers.

We’ve tried compressing and splitting files, but that just adds more time and confusion when the recipient has to reassemble everything. Mailing drives might be reliable, but it feels outdated and isn’t practical when you need results quickly. Ideally, I’d like something that’s both fast and secure, since we’re dealing with research data.

For those of you who routinely share large datasets across universities, labs, or organizations what’s worked best in your experience? Do you stick with institutional servers and FTP setups, or is there a practical modern tool for big dataset transfers?

submitted by /u/d4rk_diamond
[link] [comments]