I was planning to take geeks for geeks data science course . But , there is no single genuine review about that course . Shall I take it or are there any better options than this .
submitted by /u/Weak_Salamander_9540
[link] [comments]
Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?
I was planning to take geeks for geeks data science course . But , there is no single genuine review about that course . Shall I take it or are there any better options than this .
submitted by /u/Weak_Salamander_9540
[link] [comments]
I am trying to find a dataset with all the scores from NCAA tournaments dating back to sometime around 2000. Is there any dataset like this? Thanks in advance for your help!
submitted by /u/rootbeerjayhawk
[link] [comments]
I came across this Snapchat DAU dataset on Statista but I can’t afford to buy the subscription to be able to access it. Do any of you know how I can access this or if I can get it elsewhere.Couldn’t find it on Kaggle,UCI, or any other data source websites. Need it for a time series forecasting project:(
submitted by /u/Relative-Ear-1356
[link] [comments]
Hello everyone,
I’m a CS major working on a project for my Advanced Data Structures class. My idea is to develop an app that optimizes routes for emergency responders by analyzing traffic density, 911 calls, and past response routes to recommend the fastest possible paths. Now the issue I have is finding recent datasets for traffic density, emergency response times, and road networks—especially for Boston (but I’d be happy with data from anywhere in the U.S. or Europe). Most datasets I’ve found are either outdated or incomplete.
Does anyone know where I can find:
Live or historical traffic density data Emergency response datasets Road network data
Any help would be appreciated, thanks in advance!
submitted by /u/BottleDisastrous
[link] [comments]
In the past, I’ve posted here looking for specific real estate data, but this time I want to flip the question around.
Rather than trying to create my own dataset from scratch, I’m curious to learn what existing data is already out there regarding residential real estate sales that’s either free or inexpensive to access.
I’m especially interested in datasets covering things like:
Sale prices Time on market Property details (beds, baths, square footage, etc.) FSBO (For Sale By Owner) vs. agent-listed transactions Regional trends
Before I invest the time into building something from the ground up, I’d love to know:
What sources have you found surprisingly useful? What data might already be hiding in plain sight—whether public records, government databases, or other unexpected places?
Thanks so much for any insights!What Real Estate Sales Data Is Already Out There That I’m Overlooking?
submitted by /u/Ykohn
[link] [comments]
i need a dataset where there should be a question based on which a students writes a code then a teacher writes a code. I tried to find it on the web but came up with nothing. If both student and theacher’s code in a single file is not possible I would also like a seperate dataset meaning the questions are not the same for both parties. I need this to compare the quality of the code.
Thank you!
submitted by /u/Rotten-Apple420
[link] [comments]
Hi everyone!
I’m working on a research paper where I’m analyzing the impact of IPL auction strategies on team performance (specifically Net Run Rate). I’ve already collected detailed auction data for the 2022 and 2023 seasons from Cricbuzz, but I’m struggling to find complete data for 2021 and earlier seasons.
The data i want is for each team I want how much they have spent for each player in the squad, and categorized by the type of player (bowler, batsman, all-rounder and wicketkeeper). Something like:
CSK:
Retentions – __ Cr.
Auction Spent –
Batsman:
Ruturaj Gaikwad (retained) – 6.00 Cr.
You can check the ipl 2022 Auction from crickbuzz then go to teams and then select any team to see what exactly I want. LINK: https://m.cricbuzz.com/cricket-series/ipl-2022/auction/teams/58 (I want something like this for all team from 2022 to 2015 season)
The issue I’m facing is that the data for 2021 and earlier seasons on Cricbuzz is mostly incomplete and doesn’t include retentions or detailed breakdowns. If anyone has access to a complete dataset or knows where I can find one, I’d really appreciate your help!
Alternatively, if you have any suggestions for other sources (e.g., archives, news articles, or datasets), please let me know.
Thanks in advance!
submitted by /u/WaltzWeird
[link] [comments]
I’m working on a project that requires a dataset of small, self-contained Python files that are known to be bug-free. Ideally, these files would represent complete, functional units of code, not just snippets.
Specifically, I’m looking for:
Self-contained Python files: Each file should be runnable on its own, without external dependencies (beyond standard libraries, if necessary). Bug-free: The files should be reasonably well-tested and known to function correctly. Small to medium size: I’m not looking for massive projects, but rather individual files that demonstrate good coding practices. Optional but desired: Unit tests attached to the files would be a huge plus!
I want to use this dataset to build a static analysis tool. I have been looking for GitHub repositories that match this description. I have tried the leetcode dataset but I need more than that.
Thank you 🙂
submitted by /u/Serious-Aardvark9850
[link] [comments]
Looking for some data of publishing companies for my university assignment. Book manufacturing orders, material supply for book production. To be more clear: I need data from the perspective of the publishing house company. Not bookshops (sales) but publishing houses (orders, material supplies). Any help would be appreciated.
submitted by /u/VanDarkholme111
[link] [comments]
Hello!
The dataset I have created got an update! It now includes over 230 000 football matches’ data such as scores, stats, odds and more! All updated up to 01/2025 🙂 The dataset can be used for training machine learning models or creating visualizations, or just for personal data exploration 🙂
Please let me know if you want me to add anything to it or if you found a mistake, and if you intend to use it, share your results: )
Here are the links:
Kaggle: https://www.kaggle.com/datasets/adamgbor/club-football-match-data-2000-2025/data
Github: https://github.com/xgabora/Club-Football-Match-Data-2000-2025
submitted by /u/AdkoSokdA
[link] [comments]
is sentiment data still valuable today, and if yes who actually uses it? AI companies, marketing, hedge funds? if you use data to make decisions, im curious to hear what you look out for
submitted by /u/oym69
[link] [comments]
What challenges do you face when it comes to data annotation?
Annotated datasets are poised to become even more critical over the next five years as artificial intelligence (AI) and machine learning (ML) continue to evolve and integrate into various industries.
submitted by /u/LifeBricksGlobal
[link] [comments]
What’s the easiest way to get an accurate up to date NBA data set? I’d like to put this structured data in PostgreSQL
submitted by /u/Safe-Worldliness-394
[link] [comments]
Does anyone have the USAID GHSC-PSM Health Commodity Delivery Dataset that they could send to me? Need it for a thesis I’m doing and not sure how I can get it after it was taken down
submitted by /u/Public-Consequence62
[link] [comments]
My background is in insights and market research. I’m currently job hunting and I’m seeing a lot of roles in audience insights and marketing research, which I don’t have direct experience in. I was thinking about trying to do some small projects to include in my applications to show I have transferrable skills, but I’m struggling to find open source data to work with. Does anyone have any suggestions? Thanks so much.
submitted by /u/belledamesans-merci
[link] [comments]
Howdy folks,
I’m based in the states. Im just wondering if anyone might know if there is any data out there that would be able to inform when cars/models tend to have whatever services/breakdowns at particular mileage…and what those services or items tend to be?
I’m looking at this regressively, as Im not trying to predict or project what services are needed for future mileage but something that would actually SHOW at what mileage a particular model has received particular services/repairs or breakdowns PREVIOUSLY or shown itself to happen at, etc?
Does anyone know if anything like this exists or is available?
submitted by /u/WhatsTheAnswerDude
[link] [comments]
I found it difficult to find such data. I’ve only found one website, but I would have to pay (warn tracker).
I’m especially interested for layoffs in big tech corporations (META, INTEL etc.)
submitted by /u/Flying_Trying
[link] [comments]
Has anyone ever used data sets from trainingdata.pro or applied to their student program https://trainingdata.pro/university ? I’m interested in one of their dataset (or potentially a combination of 2) for my thesis project and I’m curious how long it takes them to answer and if you’ve had a good experience with them.
submitted by /u/anonymousD1812
[link] [comments]
Hi everyone,
I’m currently working on training a 2D virtual try-on model, specifically something along the lines of TryOnDiffusion, and I’m looking for datasets that can be used for this purpose.
Does anyone know of any datasets suitable for training virtual try-on models that allow commercial use? Alternatively, are there datasets that can be temporarily leased for training purposes? If not, I’d also be interested in datasets available for purchase.
Any recommendations or insights would be greatly appreciated!
Thanks in advance!
submitted by /u/Straight-Piccolo5722
[link] [comments]
I would like to create a database with historical soccer results and odds. Since I have no idea about programming, I had thought about Excel or Google Sheets. The question is, how do I get the data? I have heard of web scraping or using an API. There are some at rapidapi, e.g. from Sofascore. But they have limits in the free version. I imagined it like this: e.g. country, league, date, season, round, home team, away team, goals home, goals, away, half time: goals home, away, odds 1 x 2, elo home, away.
Chatgpt has me Google sheets, there Google Apps script use for the API. I just can’t get along with the endpoints. Furthermore, I want the daily results from the last day/days to be fetched automatically or by command, as well as upcoming games with odds for the next 7 days.
How can I implement this? What ideas do you have Thanks a lot
submitted by /u/PokerMurray
[link] [comments]
It seems 2024 US General election data should be published but I’m not seeing it posted in the usual spots. I see a request from three months ago that stated the data should be available after a few months. Am I just missing something? Does anyone have a lead or am I just impatient?
submitted by /u/SquiggleQuotient
[link] [comments]
I’m working on an econometrics paper for my college course. I am aiming to reproduce the results of the following paper:
Incentives, time use and BMI: The roles of eating, grazing and goods by Daniel S. Hamermesh
I want to reproduce these results with more modern and accurate methods in mind rather than BMI but I am having trouble finding the data. I’d appreciate any help you guys can offer
submitted by /u/seventydaily
[link] [comments]
Hello Everyone,
These data are needed for a student but they are unable to find/download the data.. CDC’s website currently only lists up to phase 8. Does anyone know where or if this dataset can be located?
submitted by /u/Suspicious-One-1260
[link] [comments]
I’ve been doing a lot of work on building computer vision models to track infants in cribs, since becoming a parent. Recently I’ve tried to start making models and datasets that are more generalized and not just for my kid. Turns out this is pretty difficult, since there aren’t a lot of datasets made for tracking infants in cribs.
I made a first attempt at producing a synthetic dataset that can be used to bootstrap a model. The idea is you’d either supplement the synthetic data with a small subset of real data, or something else like transfer learning. The dataset was made using path tracing, so it looks a little bit better than some of the other synthetic datasets on infants that I’ve seen (links on my GitHub repo).
Relevant Links:
https://github.com/tay10r/infant-detection-dataset https://www.kaggle.com/datasets/tay10r/synthetic-infant-dataset
It’ll be a week or so before the full dataset is done rendering (10k images). I’m traveling over the weekend so I was only able to upload a subset of the dataset (a little over 100 images).
Currently I use a trained model I made with about 2000 labeled images on my kid to analyze sleep patterns. I’m hoping this dataset, perhaps after a few improvements, will help produce more general models for this type of work. I’m curious to know if anyone else finds this interesting or practical. Let me know what you think!
submitted by /u/taylorcholberton
[link] [comments]
Does anyone know where I could get a dataset (preferably over 200 rows long) of different songs with the corresponding artist and genre (preferably in csv format) I need it for a project in my computer science and can’t find any datasets. The reason for the csv format being I need to use it with JavaScript code in code.org
submitted by /u/Zanman2000
[link] [comments]
Hey amazing people! First post here! Today, I’m excited to announce that you can now train your own reasoning model with just 5GB VRAM for Qwen2.5 (1.5B) using our open-source project Unsloth: https://github.com/unslothai/unsloth
GRPO is the algorithm behind DeepSeek-R1 and how it was trained. You need a dataset with about 500 rows in question, answer pairs and a reward function and you can then start the whole process!
This allows any open LLM like Llama, Mistral, Phi etc. to be converted into a reasoning model with chain-of-thought process. The best part about GRPO is it doesn’t matter if you train a small model compared to a larger model as you can fit in more faster training time compared to a larger model so the end result will be very similar! You can also leave GRPO training running in the background of your PC while you do other things!
Due to our newly added Efficient GRPO algorithm, this enables 10x longer context lengths while using 90% less VRAM vs. every other GRPO LoRA/QLoRA (fine-tuning) implementations with 0 loss in accuracy. With a standard GRPO setup, Llama 3.1 (8B) training at 20K context length demands 510.8GB of VRAM. However, Unsloth’s 90% VRAM reduction brings the requirement down to just 54.3GB in the same setup. We leverage our gradient checkpointing algorithm which we released a while ago. It smartly offloads intermediate activations to system RAM asynchronously whilst being only 1% slower. This shaves a whopping 372GB VRAM since we need num_generations = 8. We can reduce this memory usage even further through intermediate gradient accumulation. Use our GRPO notebook with 10x longer context using Google’s free GPUs: Llama 3.1 (8B) on Colab-GRPO.ipynb)
Blog for more details on the algorithm, the Maths behind GRPO, issues we found and more: https://unsloth.ai/blog/grpo)
GRPO VRAM Breakdown:
Metric Unsloth TRL + FA2 Training Memory Cost (GB) 42GB 414GB GRPO Memory Cost (GB) 9.8GB 78.3GB Inference Cost (GB) 0GB 16GB Inference KV Cache for 20K context (GB) 2.5GB 2.5GB Total Memory Usage 54.3GB (90% less) 510.8GB
Also we spent a lot of time on our Guide (with pics) for everything on GRPO + reward functions/verifiers so would highly recommend you guys to read it: docs.unsloth.ai/basics/reasoning
Thank you so so much for reading! 😀
submitted by /u/yoracale
[link] [comments]
I am doing a business project and I want to do my project in relation to Korea or Japan but I can’t find much data on many aspect, mainly only kdramas or pollution but i want more business related topics
submitted by /u/PhysicalWorldliness5
[link] [comments]
so guys im cooked and im urgently in need for kicking video datset just simple kicking. ive looked all over the internet and couldnt find it. so this is my last resort. so pls help me
submitted by /u/AccomplishedSnow5004
[link] [comments]
I am a journalism student looking for Hinge datasets to analyze dating patterns. Hinge lets users export their personal data including likes sent and received, matches, conversations, etc. If someone has a dataset of multiple users or is willing to share their own data please let me know. If sharing personal data, I could anonymize your name in my findings if you prefer. Thanks in advance!
submitted by /u/cappingaf
[link] [comments]
I’m exploring how people discover D2C brands and want to improve search/filtering experiences in large directories. To do this, I’m looking for well-structured datasets related to:
D2C brand directories (with categories, tags, or attributes) E-commerce product databases with metadata Consumer search behavior for brands/products
If you know of any publicly available datasets that could help, I’d love to hear about them! Also, if you have tips on structuring datasets for better discoverability, feel free to share.
Thanks in advance!
submitted by /u/Mobile_Candidate_926
[link] [comments]