We are working on a computer vision project with one of its functions being detecting fainting or abnormal conditions. Any help would be appreciated.
submitted by /u/LockedSouI
[link] [comments]
Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?
We are working on a computer vision project with one of its functions being detecting fainting or abnormal conditions. Any help would be appreciated.
submitted by /u/LockedSouI
[link] [comments]
I wonder if there are any datasets where I can type “holding hands” and instances of this from different movies show up as the search result.
submitted by /u/drumchant
[link] [comments]
Hey everyone 👋
I wanted to share a resource we’ve been working on that may help those who spend time hunting for open or synthetic datasets for AI/ML training, benchmarking, or research.
It’s called Opendatabay a searchable directory that aggregates and organizes datasets from various open data sources, including government portals, research repositories, and public synthetic dataset projects.
What makes it different:
Everything listed is open-source or publicly available no paywall or gated access.
We’re also working on indexing synthetic datasets specifically designed for AI model training and evaluation.
Would love feedback from this community especially around what metadata or filters you’d find most useful when exploring large-scale datasets.
(Disclosure: I’m part of the team building Opendatabay.)
submitted by /u/Winter-Lake-589
[link] [comments]
I have got a beast of a dataset with about 2M business names and its got like 26000 categories some of the categories are off like zomato is categorized as a tech startup which is correct but on consumer basis it should be food and beverages and some are straight wrong and alot of them are confusing too But some of them are subcategories like 26000 is a whole number but on the ground it has a couple 100 categories which still is a shit load Any way that i can fix this mess as key word based cleaning aint working it will be a real help
submitted by /u/Existing_Pay8831
[link] [comments]
I’m trying to reproduce HMR2.0 / 4D-Humans evaluation on Human3.6M, using the official config and h36m_val_p2.npz.
Training runs fine, and 3DPW evaluation works correctly —
but H36M eval completely fails (black crops, sky-high errors).
After digging through the data, it turns out the problem isn’t the code —
it’s that the h36m_val_p2.npz expects full-resolution images (~1000×1000)
with names like:
“`
S9_Directions_1.60457274_000001.jpg
“`
But there’s no public dataset that matches both naming and resolution:
| Source | Resolution | Filename pattern | Matches npz? |
|---|---|---|---|
| HuggingFace “Human3.6M_hf_extracted” | 256×256 | S11_Directions.55011271_000001.jpg |
✅ name, ❌ resolution |
| MKS0601 3DMPPE | 1000×1000 | s_01_act_02_subact_01_ca_01_000001.jpg |
✅ resolution, ❌ name |
4D-Humans auto-downloaded h36m-train/*.tar |
1000×1000 | S1_Directions_1_54138969_001076.jpg |
close, but _ vs . mismatch |
So the official evaluation .npz points to a Human3.6M image set that doesn’t seem to exist publicly. The repo doesn’t provide a download script for it, and even the HuggingFace or MKS0601 versions don’t match.
Has anyone successfully run HMR2.0 or 4D-Humans H36M evaluation recently?
h36m_val_p2.npz?I’ve already registered on the official Human3.6M website and requested dataset access,
but it’s been weeks with no approval or response, and I’m stuck.
Would appreciate any help or confirmation from anyone who managed to get the proper eval set.
submitted by /u/Last_Raise4834
[link] [comments]
The original download link for the MIT Blackbird Dataset (http://blackbird-dataset.mit.edu/) seems to be dead, and no one’s seeding it on the academic torrents (https://academictorrents.com/details/eb542a231dbeb2125e4ec88ddd18841a867c2656) either.
submitted by /u/Afraid_Radish2408
[link] [comments]
Hi, I’m looking for a dataset that has one continuous response variable, at least six continuous covariates, and one categorical variable with three or more categories. I’ve been searching for a while but haven’t found anything yet. If you know a dataset that fits that, I’d really appreciate it.
submitted by /u/SeaworthinessOk3084
[link] [comments]
Hey guys, does anyone know any data source/link which has free/available dataset for maternal health risk which should be minimum 1GB of Data? It’ll be very much appreciated as this is for my course project. Thank You!!
submitted by /u/Glad_Bat_7513
[link] [comments]
👋 Hey i have Just uploaded 2 new datasets for code and scientific reasoning models:
ArXiv Papers (4.6TB) A massive scientific corpus with papers and metadata across all domains.Perfect for training models on academic reasoning, literature review, and scientific knowledge mining. 🔗Link: https://huggingface.co/datasets/nick007x/arxiv-papers
GitHub Code 2025 a comprehensive code dataset for code generation and analysis tasks. mostly contains GitHub’s top 1 million repos above 2 stars 🔗Link: https://huggingface.co/datasets/nick007x/github-code-2025
submitted by /u/its_just_me_007x
[link] [comments]
I want to train a personal assistant for me to use at work. I want to fine tune it on work related conversations and was wondering if anyone has ideas on where I can find such.
In kaggle I have seen one which was quite small and not enough
Thanks!
submitted by /u/Potential-Will-9273
[link] [comments]
I’ve been looking for a labeled snoring dataset which i needed for sleep apnea detection. I found out that many research papers have used the MPSSC dataset for their research and basically that is the largest and the best labeled dataset that is available. I have looked almost everywhere for it but I can’t find it. If anyone knows how to access that dataset or has it downloaded somewhere or a torrent, I’d really appreciate it if you could link it here or in my DMs.
submitted by /u/hydrastrix
[link] [comments]
Anyone know of any good ones? Or an enrichment API that’s pretty cheap?
submitted by /u/Vegetable-Emu-4370
[link] [comments]
I’m working on a final year project to optimise baggage handling by using ai to map better route baggage through airport and minimise carousel conflict and overloads to increase throughput but unfortunately there’s not much data I can find to work with. If anyone knows any data set that includes conveyor travel times, error rates, capacity at carousel ect… that would be great thank you.
submitted by /u/thelordgodj1
[link] [comments]
Is a natural language translation dataset from ENG to another language in a very specific domain worthwhile to curate for conference submission?
I am a part-time translator working in this specific domain who is originally a student wondering if this could be a potential submission. I have quite several peers who are willing to put in the effort to curate a decent sized dataset (~2k) translated scripts for research use for conference submission.
However, I am not quite confident as to how useful or meaningful of a contribution this will be to the community.
submitted by /u/AdGlittering3010
[link] [comments]
Hello everyone,
I’m an engineering student currently taking a course called Applied Machine Learning. As part of the course, I need to develop a web application that demonstrates key machine learning concepts such as segregation and classification. I’m looking for datasets related to housing markets or middle-class neighborhoods. Additionally, I’d appreciate any review-based datasets, as I plan to incorporate NLP into my project.
Thank you in advance!
submitted by /u/mendaX20
[link] [comments]
I’m currently working on a car classification project for a university-level neural network course. The Car-1000 dataset is the ideal candidate for our fine-grained visual categorization task.
The official paper cites a GitHub repository for the dataset’s release (toggle1995/Car-1000), but unfortunately, the repository appears to contain only the README.md and no actual data files.
Has anyone successfully downloaded or archived the full Car-1000 image dataset (140,312 images across 1,000 models)? If so, I would be very grateful if you could share a link or guide me to an alternative download source.
Any help with this academic project is highly appreciated! Thank you.
submitted by /u/Porsche_Lover2002
[link] [comments]
I created a dataset for a research project to get data about the diplomatic visits by Chinese leaders form 1950 to 2025.
submitted by /u/janethelame_
[link] [comments]
Hi guys,
Doing a bit of research here for school but i really need a dataset of images/videos of swifts in their nests/birdboxes getting fed or not fed, or just videos from birdbox cams of swifts in general. Not really that urgent but any help is appreciated.
Thanks
submitted by /u/Horror-Tower2571
[link] [comments]
Looking out for a free/open source/publicly available data for US businesses data for my project.
The project is a weather engine, connecting affected customers to nearby prospects.
submitted by /u/BrilliantSea8202
[link] [comments]
Any ideas =(
Everything i’ve liked has been under a 100mb so far.
submitted by /u/TokkiJK
[link] [comments]
A little bird from mangoblogger.com told me that all the images from world’s leading website homepages can be found here – http://cdn.mangoblogger.com
Maybe good for training models or running experiments. Not sure how long this will be public but users of mangoblogger.com can always access this. The dataset drills down from the top level domains to individual websites.
submitted by /u/Pristine-Arachnid-41
[link] [comments]
Hi everyone, I’m conducting a research project on business behavior patterns and looking for recommendations on legally licensed, large-scale firmographic or B2B datasets.
Purpose: strictly for data analysis and AI behavioral modeling and not for marketing, lead generation, or outreach.
What I’m looking for:
Requirements:
If anyone has experience with trusted data providers or knows of reputable sources that can deliver at this scale, I’d really appreciate your suggestions.
Mods: this post does not request PII, only guidance on compliant data sources. Happy to adjust wording if needed.
submitted by /u/Axiata244
[link] [comments]
https://huggingface.co/datasets/ronantakizawa/japanese-text-difficulty
This dataset gathered texts from Aozora Bunko (A corpus of Japanese texts) and marked them with jReadability scores, plus detailed metrics on kanji density, vocabulary, grammar, and sentence structure.
This is an excellent dataset if you want to train your LLM to understand the complexities of the Japanese language 👍
submitted by /u/Ok_Employee_6418
[link] [comments]
Hi, datasets!
Want to know France’s GDP growth? You’re checking Eurostat, World Bank, OECD… then wrestling with CSVs, different formats, inconsistent naming. It’s 2025, and we’re still doing this manually.
qoery.com makes every time-series statistic queryable in plain English or SQL. Just ask “What’s the GDP growth rate for France?” and get structured data back instantly:
... "id": "14256", "entity": { "id": "france", "name": "France" }, "metric": { "id": "gdp_growth_rate", "name": "GDP change percent" }, ... "observations": [ { "timestamp": "1993-12-31T00:00:00+00:00", "value": "1670080000000.0000000000" }, { "timestamp": "1994-12-31T00:00:00+00:00", "value": "1709890000000.0000000000" }, { "timestamp": "1995-12-31T00:00:00+00:00", "value": "1749300000000.0000000000" }, ...
We’ve indexed 50M observations across 1.2M series from ~10,000 sources, including the World Bank, Our World in Data, and more.
Right now we’re focused on economic/demographic data, but I’m curious:
– What statistics do YOU constantly need but struggle to access?
We have a free tier (250 queries/month) so you can try it today. Would love your feedback on what data sources to prioritize next!
submitted by /u/SammieStyles
[link] [comments]
Hey folks! 👋
I’m looking for good websites where I can find free, copyright-free (or Creative Commons) images that are already organized or easy to browse by category — for example: • Dog breeds 🐶 • Musical instruments 🎸 • Football teams ⚽️ • Landmarks, foods, etc.
Basically, something I could use for an educational or guessing-style game project. I’ve checked Unsplash and Pexels, but they’re quite general — not very structured by category.
Any recommendations for sites or archives that have structured collections or datasets of free images? They should be easy to scrap or download.
Bonus points if they allow attribution-free use or have clear licensing info.
I have found something but usually they ask to pay a subscription.
Thanks in advance! 🙌
submitted by /u/Vanals
[link] [comments]
Recently, I have been reading papers on social networks, in which some social network datasets were used for experiments(Email、NetScience、Facebook、Wiki-Vote、PGP、NetHEPT、CondMat、NetPHY). I couldn’t find several of these network data on the Stanford nasp or the networkrepository website, such as NetHEPT, NetPHY, and CondMat. May I ask where I can find these social network data?
submitted by /u/Remarkable-Scale2170
[link] [comments]