Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Data-Driven “Men’s Global Wellbeing Index” Project (With Domain + Dashboard + Dataset)

Hey everyone,

I’ve been working on a project called the Men’s Global Wellbeing Index (MGWI) — a data-driven scoring system that compares men’s wellbeing conditions across different countries. I’ve put a lot into building the core foundation, but I’m shifting my focus to other projects and don’t want this one to sit unused.

I’m looking for someone who wants to take it over, expand it, or build something bigger on top of it. or someone who wants to repurpose it for a similiar project.

🔧 What MGWI Includes

  • 10 fully defined metrics (Suicide, Social Bias, Child Custody, Legal Bias, Homelessness, Workplace Fairness, Freedom of Expression, Mental Health Access, Violence Against Men, Loneliness)

Each metric includes:

  • Emoji marker
  • Full rationale/explanation
  • Consistent scoring system

Additional assets:

  • 10 countries scored (100-point total index)
  • Airtable backend with all data structured
  • Softr dashboard (mock-up style)
  • Name: Mensglobalwellbeingindex dot com
  • Brand notes, methodology, and all assets included

🔎 SEO Notes

Some MGWI-related pages are already ranking on the first page for keywords like:

  • global wellbeing index for men
  • men’s wellbeing index
  • men’s global index
  • global index for men
  • index for men’s global wellbeing

(Useful if someone wants to continue the project or build an SEO-focused site.)

🎯 Who This Is Good For

  • Researchers
  • Activists or NGOs
  • University projects
  • Startups in wellbeing, mental health, or analytics
  • Indie makers looking for a meaningful data project
  • Anyone wanting a niche SEO website with long-term potential

📦 What I Can Share If You’re Interested

  • Demo video of the dashboard
  • Sample of the dataset
  • Full scoring methodology
  • Asset list + structure
  • Notes on future expansion (global rankings, crowdsourced sentiment, etc.)

I’m open to offers — mainly want this to go to someone who will actually build it out.

If you’re interested or want to see more, just comment or DM me.

submitted by /u/Zealousideal-Gap414
[link] [comments]

Need Ideas For Utilizing Gcp’s $300 Free Credits In The Next Three Days And Get The Most Long Term Value Out Of It (something That Stays Even After The Credits Expire)

So the thing is my gcp account’s free trial is expiring in 3 days. I was hoping to get some long-term value out of it, something that stays even after the free credits expire like maybe running a vm 24/7 for data extraction process but im not sure what kind of data to extract. Anything that can be of value to me later on after the credits expire doesnt have to be necessarily datasets

submitted by /u/Mean_Interest8611
[link] [comments]

Benchmarked TabPFN On 1M-10M Row Datasets

We just put out a blog post with TabPFN benchmarks on datasets from 1M to 10M rows.

For context: TabPFN is a transformer pretrained on millions of synthetic datasets that does in-context learning for tabular classification/regression. No hyperparameter tuning needed – you just give it training data at inference and it predicts.

Compared our Scaling Mode against CatBoost, XGBoost, LightGBM on internal classification datasets. Performance keeps improving with more data and the gap to gradient boosting isn’t shrinking.

Benchmark results show normalized scores across datasets plus individual results showing ROC AUC improvements. You can find them here: https://priorlabs.ai/technical-reports/large-data-model

Would be interesting to keep on benchmarking this on public large tabular datasets. Anyone know good large public tabular datasets?

submitted by /u/Diligent_Inside6746
[link] [comments]

Guidance On Beginning A Data Project On Matcha And Its Rise

Hello Reddit! Apologies if this isn’t the right sub, but I’m working on a fun data project exploring how matcha lattes have exploded in popularity over the last year or so.

The thing is, I’m having a hard time finding any datasets that actually include matcha sales. My backup idea is to look for a dataset from a boba or Thai tea shop (since they usually sell matcha) and compare those sales to a cafe over the same time period that may not sell matcha?

This project is just for fun—mainly an excuse for me to play around with Kaggle, SQL, R, etc.—so the dataset doesn’t have to be perfect. If anyone has suggestions, dataset ideas, or guidance on where to look, I’d really appreciate it!

submitted by /u/Pristine-Rhubarb-787
[link] [comments]

Looking For Science Education Data Sets

I have a introductory data science class and my project requires me to do some basic analysis on some data set related to a topic I like. However my topic I am genuinely interested in is education in computer science. However I have had some trouble finding a data set I can work with, I found the annual stack overflow questionnaire but I don’t think it will work because of how they asked the questions. I also found another one that has all the schools that offer computer science in the US but my professor didn’t like that one. I have like two days to do the project so i need to find the data like today, please please if anyone knows Id love the help. Ive decided that it can be something related to just science in general or even education in general, its just a topic I want to study but I have struggled to find a good data set that I am pretty far from my original question anyways. Pleas and thanks to anyone who can help!

submitted by /u/papiyou
[link] [comments]

96 Million INaturalist Research-grade Plant Records Dataset (free And Open Source)

I’ve built a large-scale plant dataset from iNaturalist research-grade observations:
96.1 million rows containing:

  • species / genus / family names
  • GBIF taxonomy IDs
  • lat / lon
  • event dates
  • image URLs (iNat open data)
  • license information
  • dataset keys / source info

It’s meant for anyone doing:

  • image classification (plants, ecology, biodiversity)
  • large-scale ViT/ConvNext pretraining
  • location-aware species modelling
  • weak-supervised learning from image URLs
  • training LoRA adapters for regional plant ID

Dataset (parquet, streamable via HF Datasets):
https://huggingface.co/datasets/juppy44/gbif-plants-raw

let me know what you build with it!

submitted by /u/Lonely-Marzipan-9473
[link] [comments]