I need to find a list of head heights (chin to crown) and eyeball diameters for different animals.
submitted by /u/Squeaky-Fox49
[link] [comments]
I need to find a list of head heights (chin to crown) and eyeball diameters for different animals.
submitted by /u/Squeaky-Fox49
[link] [comments]
I want to build an end-to-end machine learning project and incorporate MLOPs practices including data versioning. I couldn’t find datasets other than stock market that are updated. Need help finding those! Need datasets where new data is added frequently
submitted by /u/IshanDandekar
[link] [comments]
I’m working on a project focused on missing data. Does anyone know of interesting datasets with the following criteria?
Publicly available for download, in a tractable format Data arrives over time (e.g. a new batch every day/week/month; or at least new rows added from time time time.) Some columns have missing values Ideally, missing values show interesting patterns of some kind (e.g. “column X is sometimes missing when column Y == A, but never when column Y == B” or “percentage of missing values in column Z is much higher on weekends.”
I’m willing to wade through a fair amount of EDA to find interesting patterns. Really, anything you can point me to would be helpful.
submitted by /u/grumpy_greybox
[link] [comments]
Hello, all. I am working on a statistical analysis of NASCAR Cup Series drivers in the modern era (1972 to present) and am in need of data. Currently, I can access the information I need through a few different channels, but wanted to see if it was possible to access a database that is already compiled that would decrease the amount of time it is taking.
The most cumbersome fields to gather are dates of birth and number of race starts pre-1972. Additionally, I am using data pieces like driver name, finishing position, condition of car at finish, team, manufacturer, etc., but those are simple enough to get right now.
If there is a dataset with all this information, or multiple datasets that would encompass all this, I would really appreciate being able to access them to use for this project.
Thank you all in advance for any help you can afford!
submitted by /u/tarvusdreytan
[link] [comments]
Choosing the right off-the-shelf AI training data provider can be a daunting task, especially with the large number of options available. Here are some factors to consider when selecting an AI training data provider:
Quality: One of the most critical factors to consider is the quality of the training data. The provider should have high-quality data that accurately reflects the real-world scenarios that the AI system will encounter. Diversity: It is also essential to ensure that the provider offers a diverse range of data sets that cover a wide variety of scenarios. This will ensure that the AI model is trained on a comprehensive dataset that reflects the real world. Customizability: The provider should offer customizable data sets that allow you to select the specific data that best suits your needs. Data Security: The provider should have robust data security measures in place to ensure that your data remains secure and confidential. Scalability: The provider should be able to provide a scalable solution that can grow with your business’s needs. Cost: Finally, consider the cost of the data sets and ensure that it is within your budget. Be wary of providers that offer data sets at an unusually low price, as this may indicate low-quality data.
By considering these factors, you can choose the right off-the-shelf AI training data provider that will provide you with the best possible training data for your AI system.
submitted by /u/Shaip111
[link] [comments]
What issues/challenges you face in current tools for data science/analytics?
submitted by /u/lightversetech
[link] [comments]
I’ve been learning web scraping recently and wanted to do a project to post on Kaggle. I’ve searched and can’t find anywhere with express permission to web scrape their site. I wanted to scrape their rental data (as the for_sale and sold data are already available in csv files, but rentals aren’t). Anyone can link me to permission or something legal, so that I can include it in my project? This world of scraping legality is new to me, so apologies for any ignorances on my part.
Edit: I emailed them and asked and they said they don’t allow scraping. I was under the impression that if it’s publicly available data then it’s not illegal to scrape?
submitted by /u/bingopajamma
[link] [comments]
Interested in any Cyber Incident data that links MITRE ATT&CK labels to the time of detection or attacker kill chain, such as annotated cyber incident timelines. Particularly interested in mapping progress through the killchain to draw out most common attack paths.
I know much of this data will be commercially sensitive, or IP for incident response companies, any suggestions or direction would be greatly welcomed.
submitted by /u/swivel_chair_jockey
[link] [comments]
TL;DR: My friends and I have a stupid hobby that’s getting out of control and I need your help spiraling it further. Please help me create a fair power rankings system (using the attached spreadsheet for reference) for the Beerio Kart tournaments we host.
Dear members of the Statistics community,
I call humbly upon the statisticians, mathematicians, programming aficionados, excel experts, sports analysts, and power rankings enthusiasts of this great community to assist me with a vital task — creating a fair and representative power ranking formula for the International Beerio Kart Championships of the World.
A little background: my buddies and I were trapped at home Thanksgiving of ’21 for a fourteen day COVID quarantine. We were saddened by a missed opportunity to see our families, but with competitive spirit running through our veins and a surplus of leftover PBR from a party we threw (which was undoubtedly what gave us COVID), we found solace in roughly two weeks straight of fierce competition in the best drinking/video game pair to ever exist: Beerio Kart. For the uninitiated: Beerio Kart is Mario Kart, however, you need to finish your beer before the end of each race, and you can’t drink and drive (i.e. chug and control your character simultaneously). Our version of the game has many extra rules and sub-rules, however, that’s the basic premise of the game.
After two weeks of this, we needed an outlet to determine who was truly the best of us, and thusly the International Beerio Kart Championships of the World were born. It started with a modest eight competitors, but interest has increased steadily over the past three years and in recent events we’ve had as many as 58 competitors fighting to compete in a 32 person bracket (surplus competitors play in Play-in Prix’s for entry into the main bracket). We’ve now had 75 people play in official brackets and obtain power rankings, and close to 100 participate in the events overall. For a little context into how the tournaments are run, four competitors participate in each Grand Prix, and the top two competitors advance from each round until the championship. In the preliminary rounds, players must drink a beer on races two and four of each Grand Prix, and in the finals all four races are drinking rounds, thusly the final four competitors must drink a minimum of 10 beers to win the tournament.
As tournaments got larger and more intricate (and people started complaining that they were seeded unfairly), we realized we needed an objective ranking system to seed players so that the Prix’s leading up to the championship were fair and quantitative. This background brings me to the hallowed undertaking I beseech your help with: please help me figure out how to do this.
We’ve tried a few formulas, but we are but amateur statisticians and none have felt like they effectively capture a player’s skill level.
First we tried the following formula: ibkc power ranking = 0.33t/60n + 0.33z/60 + 0.33y/60, where:
60 = the maximum number of possible points scored in any given grand prix t = total points accrued over all past tournaments attended n = total number of grand prix’ held in all official tournaments z = average points scored per prix, per tournament, in all tournaments attended y = average points scored per prix, per tournament, in all tournaments attended this calendar year
It was a good start, but it unfairly biased players who had played in more tournaments, and wasn’t an accurate reflection of current skill level. It would be like baseball power rankings putting the Yankees are at the top because they’re an ancient ball club and have won 27 World Series’, even though the last time they won was 2009, or the Astros low down on the power rankings because they didn’t win their first Series until 2017, even though they’ve won twice in the past 5 years.
We then created a formula based on Pythagorean expectation, where a players skill level is calculated by averaging their (points accrued in a prix)/(points accrued in a prix + total number of possible points in a prix). Each round of a tournament was weighted heavier than the last, and tournaments with four rounds carry more weight than tournaments with three rounds. The player’s Pythagorean expectation was then averaged over all tournaments they’ve participated in, averaged over the last four tournaments held, and averaged over the last two tournaments held. Their power score was then calculated by averaging these three numbers together with the intention that more recent tournaments would be weighted heavier than older ones. This is the formula that the attached spreadsheet uses.
This new formula was better than the first but has an inverse problem — it weighs recent tournaments too heavily and doesn’t account for any rank decay from missing tournaments. For example, you can see that BAT has won 6 of 8 tournaments, but after a huge upset in the semi’s, BAT did not make the finals of the last tournament, and was booted from first place overall to third. All the while, Squirt4Boyz advanced from second place overall to first, even though Squirt4Boyz didn’t even participate in the last tournament.
There’s all sorts of hidden columns and rows and whatnot in this spreadsheet so please dm me with any questions you might have, but please, I beg of you fine and glorious proprietors of the world’s most stressful game, help me create a ranking system that makes sense. Ultimately we need a system that reflects how many points a player is expected to score, considers that player’s tournament wins, podium finishes and finals appearances, accounts for rank decay, and like in global tennis or golf rankings, has some bias for recent events.
Thank you, friends.
Your servant,
The International Beerio Kart Championships of the World League Commissioner
submitted by /u/zakarm22
[link] [comments]
Guys, I need phishing web site data-set. Where can I get it other than Kaggel?
submitted by /u/its_falling_D
[link] [comments]
Soft, vibrant, has a certain warm temperature, good grip. Titty is soft but when grabbing, has great resistance. Sense of awe highly present, somewhat like being starstruck and not being able to hold back smile or state of excitement. Time was experienced very quickly. Hard to believe. The situation itself becomes isolated, environment seems to be in a lower dimension. Titty is confirmed 3D. My recollection of touching both of them with two hands is too blurred but the possibility lies currently at 51,3%. Looking forward to do it again if opportunity is given. Sending new query to titty dispatch.
I want to do a toy project to train a language model that can suggest data fields, data type, default values and constraints for learning purpose.
submitted by /u/saintshing
[link] [comments]
Looking for any sort of data on click-rate/landing rate for TikTok ads. Let me know if there are any good ways of obtaining this data if you don’t have it.
submitted by /u/DetachedOptimist
[link] [comments]
I need to find a research done by someone where they use a dataset and use distributions such as normal distribution, t distribution, anova distribution e.t.c to do their research and then i need to show my understanding of it. It doesn’t have to be very complicated as I’m just a fresher(undergrad) and all i need to do is show the use of any of these distributions in research in real life. Any links or ideas about any such research papers or actual life use of these done by people?
Thanks in advance
submitted by /u/youredumbaflol
[link] [comments]
I’m working on a study at work and one thing I am hoping to get some data together for is counts of people in the US who hold multiple professional licenses, specifically for skilled trades. If anyone knows something available this would be a huge help
submitted by /u/Caconym32
[link] [comments]
They say that your data analytics are only as good as the data you have to analyze.
Where do I go to find sources of data that I can do analytics on?
Is there a directory for this or something?
submitted by /u/sanman
[link] [comments]
Hello all, just wondering if I have a massive set of data that I want to compare or analyze the set for trends, would there be a good way to do this through a website or should I manually look for these trends myself. Another question would be how could I easily spot trends or important data figures within my set of data. Thanks!
submitted by /u/floppy11
[link] [comments]
Hello, I am trying to find a data set that looks like the following:
consumer good consumption or price (as dependent variable) predicted by demographics like gender, race, income, education, etc.
I’m also interested in any other factors that could impact consumption or price.
submitted by /u/AssignmentOk1408
[link] [comments]
They have escaped the goat matrix. I think this is very important to know for all who have nothing left to lose.
There are also mountain GOAT’s (greatest of all times). These are usually mountain Buddha niggas located on the peak of a mountain who practice transcendence.
I am looking for historical data sets on top 10 luxury brands sales worldwide over the years, something like comparison over the years of brands like Hermes, Gucci, Chanel based on their sales number. Please help.
submitted by /u/gtrivedi47
[link] [comments]
Hi everyone,
I am fairly new, learning Python since December 2022, and coming from a non-tech background. I took part in the DataTalksClub Zoomcamp. I started using these tools used in the project in January 2023.
Project link: GitHub repo for Magic: The Gathering
Project background:
I used to play Magic: The Gathering a lot back in the 90s I wanted to understand the game from a meta perspective and tried to answer questions that I was interested in
Technologies used:
Infrastructure via terraform, and GCP as cloud I read the scryfall API for card data Push them to my storage bucket Push needed data points to BigQuery Transform the data there with DBT Visualize the final dataset with Looker
I am somewhat proud to having finished this, as I never would have thought to learn all this. I did put a lot of long evenings, early mornings and weekends into this. In the future I plan to do more projects and apply for a Data Engineering or Analytics Engineering position – preferably at my current company.
Please feel free to leave constructive feedback on code, visualization or any other part of the project.
Thanks 🧙🏼♂️ 🔮
submitted by /u/binchentso
[link] [comments]
Hi everyone! For the past couple of weeks, I’ve been helping some fellow community members with some data requests and I’m wondering which other channels can you find people requesting for specific datasets? Seems like r/datasets is the most active forum online for data request!
submitted by /u/nobilis_rex_
[link] [comments]
Hello everyone,
I am currently pursuing a career as a Senior Business Analyst, and I know that having a strong understanding of SQL is essential for this role. However, there are so many aspects of SQL to learn, and I’m not sure where to focus my attention.
I would like to know from those who work as Senior Business Analysts, or those who have experience working with them, what are the best aspects of SQL to learn for this position? Which SQL skills do you use the most in your day-to-day work, and which ones have been the most valuable for you?
I appreciate any insights or advice you can offer, and I look forward to learning from your experiences. Thank you!
submitted by /u/LampRunner
[link] [comments]
Hello everyone,
I am currently working on creating a chatbot that can recommend solutions to log errors that occur in Java applications. To do this, I need a dataset that contains examples of log errors along with their corresponding solutions. I am hoping to find a dataset that is large enough to train a machine learning model to accurately suggest solutions based on the log error message.
If anyone knows of a dataset that would be helpful for this project or has any suggestions on where to find one, I would greatly appreciate it. Any information or assistance would be extremely valuable to me.
Thank you for your time and consideration.
submitted by /u/Farjou69
[link] [comments]
Looking for data that can help me compare how covid may have encouraged more people to take hobby flying lessons. I could use either: – # of people that signed up for classes – # take offs/landings of smaller aircrafts like Cessnas – # of PPLs/CPLs issued as a proxy for seeing the impact
submitted by /u/Eeshoo
[link] [comments]
Financial thematic data package, pertaining to banking
https://app.snowflake.com/marketplace/listing/GZTSZAS2KF7/cybersyn-inc-financial-data-package
Includes data from:
Federal Deposit Insurance Corporation (FDIC) Federal Reserve Economic Data (FRED) Federal Financial Institutions Examination Council (FFIEC) Consumer Financial Protection Bureau (CFPB)
submitted by /u/aiatco2
[link] [comments]
Here is a simple spreadsheet of several thousand battles. I am working (slowly) to get a ton of information on each battle. Please critique and notify me of errors. Cheers.
submitted by /u/UnlimitedRed
[link] [comments]