Category: Other Nonsense & Spam

Monthly Snowfall For Last 3 Years In Canada

I am working on a project that is comparing response times for service requests between departments. I want to be sure that the differences I notice in process has nothing to do with the added pressures of heavy snowfall.

I cannot find any snowfall data for the last 3 years. I checked on the government of Canada’s website, but the monthly data anyone can access in CSV format have headings that don’t make sense.

Precipitation is not the same as snowfall.

At minimum, I am looking for snowfall per month. (Not per year, because the snow season is September-April and our services are February-April)

Please help me! Is there a website anyone knows of that I can freely search for snowfall per month?

submitted by /u/somewhenimpossible
[link] [comments]

Requesting A Structured Folder Structure For Dark Face Dataset.

I was looking at the Dark Face dataset available on https://flyywh.github.io/CVPRW2019LowLight/ .
About the dataset according to the description of the dataset:
DARK FACE dataset provides 6,000 real-world low light images captured during the nighttime, at teaching buildings, streets, bridges, overpasses, parks etc., all labeled with bounding boxes for of human face, as the main training and/or validation sets. We also provide 9,000 unlabeled low-light images collected from the same setting. Additionally, we provided a unique set of 789 paired low-light/normal-light images captured in controllable real lighting conditions (but unnecessarily containing faces), which can be used as parts of the training data at the participants’ discretization. There will be a hold-out testing set of 4,000 low-light images, with human face bounding boxes annotated.
When I downloaded the dataset, I was hoping for the folders to be structured something like:
Dark Face Dataset
β”œβ”€β”€ Low-light
β”‚ β”œβ”€β”€ Training
β”‚ β”‚ β”œβ”€β”€ Positive
β”‚ β”‚ └── Negative
β”‚ └── Validation
β”‚ β”œβ”€β”€ Positive
β”‚ └── Negative
β”œβ”€β”€ Paired Low-light and Normal-light
β”‚ β”œβ”€β”€ Training
β”‚ └── Validation
└── Testing
└── Low-light

Instead, all I got was:
images: containing 6000 images
labels: containing 6000 text files containg information about images. But nowhere is the mention of training,validation, no low-light and normal-light images.

Now I may be missing something very obvious, but I have searched online quite a bit but I can’t figure out how to get the complete dataset. They said it was part of competion ‘UG 2+ challenge’. website: http://cvpr2023.ug2challenge.org/

Can someone please help me with link(s) to where I can access the dataset which is properly structure.

submitted by /u/Altruistic-Hat-9604
[link] [comments]

ACS Data In Easily Digestable Format

I want acs5 data for 2021 for every category. I’m burnt out, I tried the api it’s not going well. I found a map that is exactly what I could hope for but has license requirements I cannot agree to. I think when it comes time I am going to have to just give in and spend the time finding the right zip file and process the summary file. I downloaded the dataset and the keys once. Tried converting it into an esri table and converting 2000 headers to contain the description maybe I need to export the tables and use pandas instead?

Thoughts? Suggestions? Anyone who’s done this before with suggestions?

submitted by /u/Different_Camp4002
[link] [comments]

[Request] Global Data Of Federal Grants

For my bachelor’s thesis, I’m researching variable importance assesment. I’d like to apply it to find the most important contributors to a country’s green house gas emissions in terms of governement spending (eg. should we focus on limiting spending on airports or rather investing in public transport etc). A rather big hurdle I found is finding a comprehensive dataset with grants/spending. Is there a dataset that springs to mind or should i compile one myself by looking at each countries annual budget.

Thanks in advance!

submitted by /u/joren109
[link] [comments]

Public Datasets With Interesting Patterns In NULL/missing Data

I’m working on a project focused on missing data. Does anyone know of interesting datasets with the following criteria?

Publicly available for download, in a tractable format Data arrives over time (e.g. a new batch every day/week/month; or at least new rows added from time time time.) Some columns have missing values Ideally, missing values show interesting patterns of some kind (e.g. “column X is sometimes missing when column Y == A, but never when column Y == B” or “percentage of missing values in column Z is much higher on weekends.”

I’m willing to wade through a fair amount of EDA to find interesting patterns. Really, anything you can point me to would be helpful.

submitted by /u/grumpy_greybox
[link] [comments]

In Need Of: NASCAR Cup Series Dataset(s)

Hello, all. I am working on a statistical analysis of NASCAR Cup Series drivers in the modern era (1972 to present) and am in need of data. Currently, I can access the information I need through a few different channels, but wanted to see if it was possible to access a database that is already compiled that would decrease the amount of time it is taking.

The most cumbersome fields to gather are dates of birth and number of race starts pre-1972. Additionally, I am using data pieces like driver name, finishing position, condition of car at finish, team, manufacturer, etc., but those are simple enough to get right now.

If there is a dataset with all this information, or multiple datasets that would encompass all this, I would really appreciate being able to access them to use for this project.

Thank you all in advance for any help you can afford!

submitted by /u/tarvusdreytan
[link] [comments]

How To Choose The Right Off-the-Shelf AI Training Data Provider?

Choosing the right off-the-shelf AI training data provider can be a daunting task, especially with the large number of options available. Here are some factors to consider when selecting an AI training data provider:

Quality: One of the most critical factors to consider is the quality of the training data. The provider should have high-quality data that accurately reflects the real-world scenarios that the AI system will encounter. Diversity: It is also essential to ensure that the provider offers a diverse range of data sets that cover a wide variety of scenarios. This will ensure that the AI model is trained on a comprehensive dataset that reflects the real world. Customizability: The provider should offer customizable data sets that allow you to select the specific data that best suits your needs. Data Security: The provider should have robust data security measures in place to ensure that your data remains secure and confidential. Scalability: The provider should be able to provide a scalable solution that can grow with your business’s needs. Cost: Finally, consider the cost of the data sets and ensure that it is within your budget. Be wary of providers that offer data sets at an unusually low price, as this may indicate low-quality data.

By considering these factors, you can choose the right off-the-shelf AI training data provider that will provide you with the best possible training data for your AI system.

submitted by /u/Shaip111
[link] [comments]

Is It Legal To Scrape Data From RedFin Using Selenium?

I’ve been learning web scraping recently and wanted to do a project to post on Kaggle. I’ve searched and can’t find anywhere with express permission to web scrape their site. I wanted to scrape their rental data (as the for_sale and sold data are already available in csv files, but rentals aren’t). Anyone can link me to permission or something legal, so that I can include it in my project? This world of scraping legality is new to me, so apologies for any ignorances on my part.

Edit: I emailed them and asked and they said they don’t allow scraping. I was under the impression that if it’s publicly available data then it’s not illegal to scrape?

submitted by /u/bingopajamma
[link] [comments]

[REQUEST] MITRE ATT&CK Annotated Cyber Attack Trees

Interested in any Cyber Incident data that links MITRE ATT&CK labels to the time of detection or attacker kill chain, such as annotated cyber incident timelines. Particularly interested in mapping progress through the killchain to draw out most common attack paths.

I know much of this data will be commercially sensitive, or IP for incident response companies, any suggestions or direction would be greatly welcomed.

submitted by /u/swivel_chair_jockey
[link] [comments]

International Beerio Kart Championships Of The World: Power Rankings Development Help!

TL;DR: My friends and I have a stupid hobby that’s getting out of control and I need your help spiraling it further. Please help me create a fair power rankings system (using the attached spreadsheet for reference) for the Beerio Kart tournaments we host.

https://docs.google.com/spreadsheets/d/1CS5pWnmgS8wIZAvFQL4cc_jHWbTZ_khS/edit?usp=sharing&ouid=114408781303577995971&rtpof=true&sd=true

Dear members of the Statistics community,

I call humbly upon the statisticians, mathematicians, programming aficionados, excel experts, sports analysts, and power rankings enthusiasts of this great community to assist me with a vital task — creating a fair and representative power ranking formula for the International Beerio Kart Championships of the World.

A little background: my buddies and I were trapped at home Thanksgiving of ’21 for a fourteen day COVID quarantine. We were saddened by a missed opportunity to see our families, but with competitive spirit running through our veins and a surplus of leftover PBR from a party we threw (which was undoubtedly what gave us COVID), we found solace in roughly two weeks straight of fierce competition in the best drinking/video game pair to ever exist: Beerio Kart. For the uninitiated: Beerio Kart is Mario Kart, however, you need to finish your beer before the end of each race, and you can’t drink and drive (i.e. chug and control your character simultaneously). Our version of the game has many extra rules and sub-rules, however, that’s the basic premise of the game.

After two weeks of this, we needed an outlet to determine who was truly the best of us, and thusly the International Beerio Kart Championships of the World were born. It started with a modest eight competitors, but interest has increased steadily over the past three years and in recent events we’ve had as many as 58 competitors fighting to compete in a 32 person bracket (surplus competitors play in Play-in Prix’s for entry into the main bracket). We’ve now had 75 people play in official brackets and obtain power rankings, and close to 100 participate in the events overall. For a little context into how the tournaments are run, four competitors participate in each Grand Prix, and the top two competitors advance from each round until the championship. In the preliminary rounds, players must drink a beer on races two and four of each Grand Prix, and in the finals all four races are drinking rounds, thusly the final four competitors must drink a minimum of 10 beers to win the tournament.

As tournaments got larger and more intricate (and people started complaining that they were seeded unfairly), we realized we needed an objective ranking system to seed players so that the Prix’s leading up to the championship were fair and quantitative. This background brings me to the hallowed undertaking I beseech your help with: please help me figure out how to do this.

We’ve tried a few formulas, but we are but amateur statisticians and none have felt like they effectively capture a player’s skill level.

First we tried the following formula: ibkc power ranking = 0.33t/60n + 0.33z/60 + 0.33y/60, where:

60 = the maximum number of possible points scored in any given grand prix t = total points accrued over all past tournaments attended n = total number of grand prix’ held in all official tournaments z = average points scored per prix, per tournament, in all tournaments attended y = average points scored per prix, per tournament, in all tournaments attended this calendar year

It was a good start, but it unfairly biased players who had played in more tournaments, and wasn’t an accurate reflection of current skill level. It would be like baseball power rankings putting the Yankees are at the top because they’re an ancient ball club and have won 27 World Series’, even though the last time they won was 2009, or the Astros low down on the power rankings because they didn’t win their first Series until 2017, even though they’ve won twice in the past 5 years.

We then created a formula based on Pythagorean expectation, where a players skill level is calculated by averaging their (points accrued in a prix)/(points accrued in a prix + total number of possible points in a prix). Each round of a tournament was weighted heavier than the last, and tournaments with four rounds carry more weight than tournaments with three rounds. The player’s Pythagorean expectation was then averaged over all tournaments they’ve participated in, averaged over the last four tournaments held, and averaged over the last two tournaments held. Their power score was then calculated by averaging these three numbers together with the intention that more recent tournaments would be weighted heavier than older ones. This is the formula that the attached spreadsheet uses.

This new formula was better than the first but has an inverse problem — it weighs recent tournaments too heavily and doesn’t account for any rank decay from missing tournaments. For example, you can see that BAT has won 6 of 8 tournaments, but after a huge upset in the semi’s, BAT did not make the finals of the last tournament, and was booted from first place overall to third. All the while, Squirt4Boyz advanced from second place overall to first, even though Squirt4Boyz didn’t even participate in the last tournament.

There’s all sorts of hidden columns and rows and whatnot in this spreadsheet so please dm me with any questions you might have, but please, I beg of you fine and glorious proprietors of the world’s most stressful game, help me create a ranking system that makes sense. Ultimately we need a system that reflects how many points a player is expected to score, considers that player’s tournament wins, podium finishes and finals appearances, accounts for rank decay, and like in global tennis or golf rankings, has some bias for recent events.

Thank you, friends.

Your servant,

The International Beerio Kart Championships of the World League Commissioner

submitted by /u/zakarm22
[link] [comments]

Briefly Describing How A Titty Feels Like After Touching One Only Once In Life

Soft, vibrant, has a certain warm temperature, good grip. Titty is soft but when grabbing, has great resistance. Sense of awe highly present, somewhat like being starstruck and not being able to hold back smile or state of excitement. Time was experienced very quickly. Hard to believe. The situation itself becomes isolated, environment seems to be in a lower dimension. Titty is confirmed 3D. My recollection of touching both of them with two hands is too blurred but the possibility lies currently at 51,3%. Looking forward to do it again if opportunity is given. Sending new query to titty dispatch.

Help Finding An Actual Research And Dataset That Uses Distributions.

I need to find a research done by someone where they use a dataset and use distributions such as normal distribution, t distribution, anova distribution e.t.c to do their research and then i need to show my understanding of it. It doesn’t have to be very complicated as I’m just a fresher(undergrad) and all i need to do is show the use of any of these distributions in research in real life. Any links or ideas about any such research papers or actual life use of these done by people?

Thanks in advance

submitted by /u/youredumbaflol
[link] [comments]

Best Ways To Analyze Data, Useful For NBA Stats

Hello all, just wondering if I have a massive set of data that I want to compare or analyze the set for trends, would there be a good way to do this through a website or should I manually look for these trends myself. Another question would be how could I easily spot trends or important data figures within my set of data. Thanks!

submitted by /u/floppy11
[link] [comments]

Mountain Goats Are Goats Who Ascended To 5D

They have escaped the goat matrix. I think this is very important to know for all who have nothing left to lose.

There are also mountain GOAT’s (greatest of all times). These are usually mountain Buddha niggas located on the peak of a mountain who practice transcendence.