Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Local Automotive Repair Shops Data On Repairs Performed

Hi everyone, I have a request for a dataset pertaining to automotive repairs.

I am voluntarily building a free application/platform that anyone can freely use anytime to help the public make informed decisions on where to take their motor vehicles for repairs. My interest in this comes from the fact that I love cars and I hate seeing people get ripped off. I’ve worked on countless cars and helped many people with free repairs. Specifically, this platform would allow users to search for nearby automotive repair shops and they would see a graphical summary view of the quantity of repairs any individual shop has done in a given period of time (X number of brake repairs, Y number of engine oil changes, Z number of front-end alignments, etc.). More features would be added with time but this is the starting point.

I have already done legwork before coming here to make this platform a reality.

I contacted my state’s Department of Motor Vehicles (DMV) and submitted a Freedom of Information Act (FOIA) request to obtain access to the necessary dataset. My state’s DMV has a legal clause that specifically requires all automotive repair shops to retain records of estimates, work orders, invoices, parts purchase orders, and appraisals to be available for inspection by the DMV. The DMV kindly responded to my request and unfortunately, I learned that although all automotive repair shops are required to retain these records, the shops are not obligated to submit these records to the DMV for archival at any point in time. Furthermore, the circumstances under which the DMV would even audit a shop with the intent to inspect these records would be extremely circumstantial and exceptionally rare.

For clarification, my intent is to only depict the values contained in these records through visual means such as graphs and charts. Customer names, cost of repairs, parts vendor names, mechanic names, and any other personally identifiable information (except for the name of the shop doing the repair) would all be obscured.

After hitting this brick wall, I learned about some existing platforms that collect and aggregate automotive repair data (RepairPal, iATN, Mechanic Advisor, AutoMD, CarMD). Although these platforms give users the ability to post reviews like Google Reviews and Yelp, they don’t contain the fundamental data I need to build this free platform. Some also sell products or services to automotive repair shops (namely OEM how-to tutorials for specific make/model cars) and I don’t want to get involved with any financial sponsorships or political bureaucracy.

I have thought about reaching out to local automotive repair shops I have close relations with but there’s less than a handful that trust me enough to grant me access to their data and for this data to be accurate. Networking with each automotive repair shop in my entire state is just not realistic.

Any feedback would be greatly appreciated. Thanks in advance!

submitted by /u/justLURKin220020
[link] [comments]

[self-promotion] Every Product Listed On LEGO.com, May 2023

I made a little Python crawler that slurps up data about products listed on LEGO.com. That’s every product on the site, not just LEGO sets.

Here’s the crawler’s JSON output from May 9, 2023: https://gist.github.com/ryukoposting/070bea86a3b9fefc285388b0ffe651aa

Each product includes the following information:

The product’s name A link to the product page on LEGO.com The product’s price in USD The product’s discount price in USD, if there is a discount. The number of LEGO pieces in the product (if the product isn’t a LEGO set, this value is null) LEGO’s suggested age range of the product, if one is available. Whether or not the product is currently available for purchase. Note: this is misleadingly called in_stock, but its value will be true for products that are on backorder. The product’s customer rating average, 1-5 stars. A list of themes to which the product belongs. Many products have only one theme, but some belong to multiple themes.

submitted by /u/ryu-ryu-ryu
[link] [comments]

Dataset Or Repository Of People Looking To Acquire External Datasets

I am looking for a dataset or repository that has a list of individuals or organizations actively searching and looking to purchase external datasets. The datasets can be used for research, academia, or business purposes, and they can encompass any type of data as long as the potential buyers have the intent and budget to make the purchase. I’m not even sure such a compilation exists (besides r/datasets) but thought it would be worth a try to ask!

submitted by /u/-x-Knight
[link] [comments]

Datasets Suggestions For These Requirments

Hey guys. I am currently starting to work on my universtiy project for Fundamentals of Artificial Intelligence class. I would really appreciate if you could suggest me the datasets according to these requirments :

“Select a dataset that is suitable for a classification task. The student must avoid selecting the Iris dataset or the

Palmer Archipelago (Antarctica) penguin dataset. In addition, the meaningfulness of the classification has to be

considered, e.g. it is meaningless to classify continents by the number of Covid-19 cases because, first, there are

only six continents and new ones will not appear soon, second, the number of Covid-19 cases is not a

defining characteristic of continents;

• it is preferable to select a dataset that is already given in the format of a .csv datafile;

• the dataset should be well-documented (there should be information about who created the set, when and what

the data source is);

• the dataset should be of reasonable size (at least 200 data objects);

• the dataset should be deeply annotated (there should be information about which features are stored and what

they mean);

• the number of features should be between 5-15;

• the dataset should be labelled;

• the student must avoid datasets with many Boolean (true/false, 1/0, etc.) or categorical type feature (attribute)

values. It is preferable to use datasets in which most of the attributes are represented by continuous attribute

values;

• you should avoid datasets of unlabelled data (e.g. text corpora and raw images)”

submitted by /u/kktsrvii
[link] [comments]

Looking For Java Exception/Error Datasets And Solutions

Hey fellow developers!

I hope you’re all doing well. I’m currently working on a project that involves analyzing Java exceptions and errors. To enhance the accuracy of my analysis, I’m in need of a comprehensive dataset that includes various Java exceptions, errors, and their corresponding solutions. I believe having such a dataset would greatly benefit the development community as a whole.

Therefore, I’m reaching out to you all to see if anyone knows of any existing datasets or resources that provide information about Java exceptions and errors. Specifically, I’m looking for a dataset that encompasses a wide range of exceptions, covering different classes, such as NullPointerException, ArrayIndexOutOfBoundsException, and IllegalArgumentException, among others.

Ideally, the dataset would include:

Exception/Error name

Description and context of the exception/error

Stack trace (if available)

Common causes/triggers of the exception/error

Recommended solutions and best practices to handle or avoid the exception/error

I understand that documenting every exception and error might be an enormous task, but even a partial dataset or relevant resources would be highly appreciated. I’m willing to put in the effort to curate and organize the information into a cohesive format, making it accessible to the community.

Additionally, if you have any personal experiences or insights related to specific Java exceptions or errors, feel free to share them! Practical examples and real-life scenarios are often invaluable for understanding and addressing these issues effectively.

Thank you in advance for your time and assistance. Your contribution will not only aid my project but will also assist numerous developers who encounter similar challenges in their Java projects. Let’s collaborate and make Java development more seamless for everyone!

Looking forward to your suggestions, datasets, and insights.

Happy coding!

submitted by /u/Farjou69
[link] [comments]

Looking For A Dataset Of Letters. Any Ideas?

I’m doing a project for a website where I analyze the similarity in writing style and content of letters of different users and try to match them to another user with the highest similarity. I need a dataset of letters/emails/long text messages for that and that’s what I’m looking for. I’ve found the subreddits r/letters and r/loveletters but they haven’t been too satisfactory in terms of the quality of texts. I’ve thought about making dataset with sample letter texts from English exams but since there is no one authentic human writer behind it, it’s not the best source either. Historic archives exist but since my focus is on modern casual letter/email writing, I’ve decided to pass on that. If there was a blog, for example, where someone publicly wrote letters to someone, that would be great but I am unable to find any. Any help would be must appreciated!

submitted by /u/cakeandflowers2202
[link] [comments]

You Haven’t Killed Anyone Driving, Have You? Of Course Not!

You might never have been in an accident and certainly not one where three people were sent to the hospital. Or morgue. I mean, that option was put on the table, too.

And you might not be that bad of a driver — no what the others say about you.

I’m in your corner here. I want you to know that. And help you, my friend, here are 10 years of [Denver Traffic Accident data](https://www.kaggle.com/datasets/hrokrin/denver-traffic-accidents).

Now, you might be thinking: “How is this going to help me?” A valid question.

Cherry-picking is always a good option but let’s not forget both obfuscation and actual analysis. Three solid options right there and let’s be honest, already this has been worth your time.

Think of how good you’re going to look when you can *conclusively* (or not) show how accidents due to cell phone usage have been trending so that fender bender is not *technically* your fault.

The [attached notebook](https://www.kaggle.com/code/hrokrin/denver-traffic-accidents-eda) is there … just waiting for you. Your improvements; your questions. Just waiting.

What’s the best place to hit a pedestrian in a car? Just waiting. Which precinct does the worst job with its paperwork? Just waiting. What’s the best neighborhood to take a bike ride in case you don’t want to get hit? JJust waiting. Is there a correlation between road conditions and accidents? Denver has great snow clearing right? Right? Just waiting.

Oh, and there’s a heat map.

This isn’t some picked-over dataset about people on a boat. Who cares? They’re dead already! Not that many in this dataset are.

Ok, so in all seriousness, I’d love feedback. And for you to take a spin the two for a spin.

submitted by /u/hrokrin
[link] [comments]

ECG Data Using Apple Watch And HealthKit Api In CSV Format

Hi, Fellows I need ECG data from apple watch in .csv format for a project wich is due in a week. I need only 10 sample to prove what I am doing. Unfortunately, I live in a region where apple’s to collect and export ECG data in .csv format is not available. I need your help to get the 10 ECG samples taken at rest from 10 different people using apple watch and apple official app in .csv format. Can anyone here help me get the samples?

submitted by /u/u109e114
[link] [comments]