Category: Other Nonsense & Spam

Statista Reports – Full Availability

Good Morning to all, if some of you need reports from Statista official Web site, don’t hesitate to contact me.

This is not a scam and the reports are true genuine. If you need we can issue a regular invoice. How it is possible? My company paid the full commercial licence to provide such reports.

Thank you very much for your attention

submitted by /u/satchurated
[link] [comments]

Looking For Active List Of Domain Names

There seems to be about (if not more) 350 million registered domain names, but can’t seem to find any source that offers to download this data.

I am only interested in root domains eg dailynews.com I came across this repo https://github.com/tb0hdan/domains But after filtering the root domains I end up about 150 million. There is also paid service such as zonefiles. Io that offers about 260 millions domain. Anyone knows or aware of any other sources that provide the complete set?

Thanks in advance.

P.S. Is it worth it to setup your own crawlers for this type of thing?

submitted by /u/activelearning23
[link] [comments]

[Synthetic] DatasetGPT – A Command-line Tool To Generate Datasets By Inferencing LLMs At Scale. It Can Even Make Two ChatGPT Agents Talk With One Another.

GitHub: https://github.com/radi-cho/datasetGPT

It can generate texts by varying input parameters and using multiple backends. But, personally, the conversations dataset generation is my favorite: It can produce dialogues between two ChatGPT agents.

Possible use cases may include:

Constructing textual corpora to train/fine-tune detectors for content written by AI. Collecting datasets of LLM-produced conversations for research purposes, analysis of AI performance/impact/ethics, etc. Automating a task that a LLM can handle over big amounts of input texts. For example, using GPT-3 to summarize 1000 paragraphs with a single CLI command. Leveraging APIs of especially big LLMs to produce diverse texts for a specific task and then fine-tune a smaller model with them.

What would you use it for?

submitted by /u/radi-cho
[link] [comments]

Suggestions For Ecology Dataset For Classification

I’m looking for a dataset similar to the Amphibians dataset from UCI for an undergraduate data science project. It should be a classification problem, i.e. presence/absence of a species dependent on habitat characteristics such as temperature, type of vegetation, size of water reservoir, amount of rainfall, distance to roads/civilisation, etc.

It should include

>15 numerical and categorical features >300 observations temporal and/or spatial data if possible, so I can play around with some heat maps or time series analysis.

Any hints are highly appreciated as I’m a beginner and I’ve been scrolling my eyes out on kaggle etc. all weekend.

submitted by /u/apex—-predator
[link] [comments]

Finding Datasets For Computer Vision

Hello! I’m a senior electronics engineering student. My friend trying to make a blind-assistant that helps blind people to differentiate same form-objects as like Coca-Cola vs Sprite. He design a hardware with esp8266 and uses a cloud for storing datasets. We create a dataset with taking photos of cokes however its hard to creating for all stuff. Is there any solution or resource for finding daily life datasets? We had dive a lot of open datasets CIFAR, Berkley, Kaggle, COCO, MNIST but we required 224×224 pixels for our ML model.

submitted by /u/yagmurxyildiz
[link] [comments]

Where We Actually Buy Big Data For Company?

Hi

I’m wondering where I can buy machine learning data directly for my project/product. Let’s say it’s a music or allergy app. I would like to connect a chat/predictor which, based on a few data, is able to indicate a certain percentage of something. However, large amounts of data are needed to train such algorithms. Where can you actually buy them?

submitted by /u/jackoborm
[link] [comments]

The Largest Dataset Of Graded Diamonds On Kaggle

Hi there!

I just put up a new dataset on Kaggle. It’s cryptically titled The largest diamond dataset currently on Kaggle

It has just under 220,000 diamonds and 25 columns of data making it about 3x larger than next largest. I think it’s perfect for regression models and there is an attached notebook.

This is my first submission to Kaggle so I’d be very much interested in any feedback you might have.

Thanks!

submitted by /u/hrokrin
[link] [comments]

Is It Legal To Scrape Data From RedFin Using Selenium?

I’ve been learning web scraping recently and wanted to do a project to post on Kaggle. I’ve searched and can’t find anywhere with express permission to web scrape their site. I wanted to scrape their rental data (as the for_sale and sold data are already available in csv files, but rentals aren’t). Anyone can link me to permission or something legal, so that I can include it in my project? This world of scraping legality is new to me, so apologies for any ignorances on my part.

Edit: I emailed them and asked and they said they don’t allow scraping. I was under the impression that if it’s publicly available data then it’s not illegal to scrape?

submitted by /u/bingopajamma
[link] [comments]

[REQUEST] MITRE ATT&CK Annotated Cyber Attack Trees

Interested in any Cyber Incident data that links MITRE ATT&CK labels to the time of detection or attacker kill chain, such as annotated cyber incident timelines. Particularly interested in mapping progress through the killchain to draw out most common attack paths.

I know much of this data will be commercially sensitive, or IP for incident response companies, any suggestions or direction would be greatly welcomed.

submitted by /u/swivel_chair_jockey
[link] [comments]

International Beerio Kart Championships Of The World: Power Rankings Development Help!

TL;DR: My friends and I have a stupid hobby that’s getting out of control and I need your help spiraling it further. Please help me create a fair power rankings system (using the attached spreadsheet for reference) for the Beerio Kart tournaments we host.

https://docs.google.com/spreadsheets/d/1CS5pWnmgS8wIZAvFQL4cc_jHWbTZ_khS/edit?usp=sharing&ouid=114408781303577995971&rtpof=true&sd=true

Dear members of the Statistics community,

I call humbly upon the statisticians, mathematicians, programming aficionados, excel experts, sports analysts, and power rankings enthusiasts of this great community to assist me with a vital task — creating a fair and representative power ranking formula for the International Beerio Kart Championships of the World.

A little background: my buddies and I were trapped at home Thanksgiving of ’21 for a fourteen day COVID quarantine. We were saddened by a missed opportunity to see our families, but with competitive spirit running through our veins and a surplus of leftover PBR from a party we threw (which was undoubtedly what gave us COVID), we found solace in roughly two weeks straight of fierce competition in the best drinking/video game pair to ever exist: Beerio Kart. For the uninitiated: Beerio Kart is Mario Kart, however, you need to finish your beer before the end of each race, and you can’t drink and drive (i.e. chug and control your character simultaneously). Our version of the game has many extra rules and sub-rules, however, that’s the basic premise of the game.

After two weeks of this, we needed an outlet to determine who was truly the best of us, and thusly the International Beerio Kart Championships of the World were born. It started with a modest eight competitors, but interest has increased steadily over the past three years and in recent events we’ve had as many as 58 competitors fighting to compete in a 32 person bracket (surplus competitors play in Play-in Prix’s for entry into the main bracket). We’ve now had 75 people play in official brackets and obtain power rankings, and close to 100 participate in the events overall. For a little context into how the tournaments are run, four competitors participate in each Grand Prix, and the top two competitors advance from each round until the championship. In the preliminary rounds, players must drink a beer on races two and four of each Grand Prix, and in the finals all four races are drinking rounds, thusly the final four competitors must drink a minimum of 10 beers to win the tournament.

As tournaments got larger and more intricate (and people started complaining that they were seeded unfairly), we realized we needed an objective ranking system to seed players so that the Prix’s leading up to the championship were fair and quantitative. This background brings me to the hallowed undertaking I beseech your help with: please help me figure out how to do this.

We’ve tried a few formulas, but we are but amateur statisticians and none have felt like they effectively capture a player’s skill level.

First we tried the following formula: ibkc power ranking = 0.33t/60n + 0.33z/60 + 0.33y/60, where:

60 = the maximum number of possible points scored in any given grand prix t = total points accrued over all past tournaments attended n = total number of grand prix’ held in all official tournaments z = average points scored per prix, per tournament, in all tournaments attended y = average points scored per prix, per tournament, in all tournaments attended this calendar year

It was a good start, but it unfairly biased players who had played in more tournaments, and wasn’t an accurate reflection of current skill level. It would be like baseball power rankings putting the Yankees are at the top because they’re an ancient ball club and have won 27 World Series’, even though the last time they won was 2009, or the Astros low down on the power rankings because they didn’t win their first Series until 2017, even though they’ve won twice in the past 5 years.

We then created a formula based on Pythagorean expectation, where a players skill level is calculated by averaging their (points accrued in a prix)/(points accrued in a prix + total number of possible points in a prix). Each round of a tournament was weighted heavier than the last, and tournaments with four rounds carry more weight than tournaments with three rounds. The player’s Pythagorean expectation was then averaged over all tournaments they’ve participated in, averaged over the last four tournaments held, and averaged over the last two tournaments held. Their power score was then calculated by averaging these three numbers together with the intention that more recent tournaments would be weighted heavier than older ones. This is the formula that the attached spreadsheet uses.

This new formula was better than the first but has an inverse problem — it weighs recent tournaments too heavily and doesn’t account for any rank decay from missing tournaments. For example, you can see that BAT has won 6 of 8 tournaments, but after a huge upset in the semi’s, BAT did not make the finals of the last tournament, and was booted from first place overall to third. All the while, Squirt4Boyz advanced from second place overall to first, even though Squirt4Boyz didn’t even participate in the last tournament.

There’s all sorts of hidden columns and rows and whatnot in this spreadsheet so please dm me with any questions you might have, but please, I beg of you fine and glorious proprietors of the world’s most stressful game, help me create a ranking system that makes sense. Ultimately we need a system that reflects how many points a player is expected to score, considers that player’s tournament wins, podium finishes and finals appearances, accounts for rank decay, and like in global tennis or golf rankings, has some bias for recent events.

Thank you, friends.

Your servant,

The International Beerio Kart Championships of the World League Commissioner

submitted by /u/zakarm22
[link] [comments]

Briefly Describing How A Titty Feels Like After Touching One Only Once In Life

Soft, vibrant, has a certain warm temperature, good grip. Titty is soft but when grabbing, has great resistance. Sense of awe highly present, somewhat like being starstruck and not being able to hold back smile or state of excitement. Time was experienced very quickly. Hard to believe. The situation itself becomes isolated, environment seems to be in a lower dimension. Titty is confirmed 3D. My recollection of touching both of them with two hands is too blurred but the possibility lies currently at 51,3%. Looking forward to do it again if opportunity is given. Sending new query to titty dispatch.

Help Finding An Actual Research And Dataset That Uses Distributions.

I need to find a research done by someone where they use a dataset and use distributions such as normal distribution, t distribution, anova distribution e.t.c to do their research and then i need to show my understanding of it. It doesn’t have to be very complicated as I’m just a fresher(undergrad) and all i need to do is show the use of any of these distributions in research in real life. Any links or ideas about any such research papers or actual life use of these done by people?

Thanks in advance

submitted by /u/youredumbaflol
[link] [comments]

Best Ways To Analyze Data, Useful For NBA Stats

Hello all, just wondering if I have a massive set of data that I want to compare or analyze the set for trends, would there be a good way to do this through a website or should I manually look for these trends myself. Another question would be how could I easily spot trends or important data figures within my set of data. Thanks!

submitted by /u/floppy11
[link] [comments]