Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Dataset For Rotten Tomatoes Movies 1970 – 2024

Hey, I scraped rotten tomatoes! From each movie I grabbed the URL, title, release date, critic score, and audience score. These were the only data points I needed for my own needs so no other information is there. It’s major release US titles and it’s only from 1970 – 2024. If this is useful at all to you here is both the csv and json files.

This data is not ALL movies on rotten tomatoes in this range, unfortunately, rotten tomatoes uses very inconsistent naming conventions in their URLs which makes it very difficult not to miss a few movies here and there but I managed to get over 12,000 of them. I hope this is useful to someone.

https://drive.google.com/file/d/12IpMErb4j83h5gGTdTpv0WZOf5ceY7b3/view?usp=sharing

submitted by /u/Business-Platform301
[link] [comments]

How Would You Contact A Company To Get Data On Their Products?

I want to get food product label information that is on the packaging. If you were to write to a company and ask for data on all their current products who would you contact? A Board Member, some customer service phone number or is there a better person to ask for this info? I do have a USDA database, but I am finding that some of their values don’t match the values on the labels from the store.

submitted by /u/TexasBound1973
[link] [comments]

Guitar Chords Dataset For Classification Task

Hello everyone.

Recently I embarked on the adventure of implementing a Guitar Chord classifier. My goal is to classify 14 guitar chords (A, Am, B, Bm, C, Cm, D, Dm, E, Em, F, Fm, G, Gm). Unfortunately, I found no publicly available datasets containing all of the chords as most contained only the non-minor chords. Therefore, I decided to build my dataset.

I shoot three different videos for each chord in three different locations of two different people playing the chords. The videos were shot in 4k 60fps so I had plenty of HQ frames to choose from.

I sampled 250 images for training, 100 for valid validating, and 100 for testing. After that, using Robflow I augmented my data, which is now publically available.

I started training my classifier with this data (MobileNetv2) and I was surprised to find that the model achieved almost 100% in the validation set (97%). I’m trying to figure out why that happened because it seems very suspicious to me but till now I cannot pinpoint the problem. I generated a PCA plot for each of the splits which can be found here: PCA

Any feedback would be appreciated 🙂

submitted by /u/dduka99
[link] [comments]

A 100% Synthetic Dataset Hub / Search UI

My goal is to never hear “I don’t have data” from ML people again.

So I did this app which is still experimental, it’s a search engine UI that uses a LLM to invent datasets that match your query. That means you can type any kind of dataset and you will always get results.

https://huggingface.co/spaces/infinite-dataset-hub/infinite-dataset-hub

For example for `star wars vs star trek preference classification`:

https://huggingface.co/spaces/infinite-dataset-hub/infinite-dataset-hub?q=star+wars+vs+star+trek+preference+classification

It was pretty fun to make, it runs for free on HF, and it’s open source in case you want to modify it.

submitted by /u/qlhoest
[link] [comments]

Need Direction On A Project I Am Going To Start Regarding Analysis Of How The Creative Class Responds To Global (Western And Non-Western) Events By Examining Discussions And Sentiments In Art-related Subreddits.

I have to check how the creative class(particularly musicians) responded to wars, how music got effected by these events. I am unsure how to approach this, it is not final I can make amends in this project and add things to it to get more useful insights but I am open to discussion, but all needs to be logical.

One thing I have come across is that I categorise the songs into protest songs, loss and grief, hope songs and etc. Then, compare these categorises.

I am open to ideas

submitted by /u/OneAnalysisbc
[link] [comments]

Satellite Images Dataset (like Google Maps Satellite Images)

I need some satellite images from the Earth, just plain photos without clouds, no water/forest or whatever other satellite images datasets are about.

My intention is to use computer vision (with a Jetson Orin Nano) in a drone which will be taking photos of the ground below it. These photos will be compared to the dataset images so that the drone can estimate its location without GPS (still not sure if this will workout, I wanna know your opinions on this too).

Correct me if I’m wrong, but I need datasets of a certain region because the whole Earth images dataset is extremely heavy on memory and a Jetson Nano SD card cannot store it all. So the dataset being able to be segmented into a region is also a must.

submitted by /u/Pelochus
[link] [comments]

I Am Looking For Wage, Steel And Shipyard Availability Time Series

After wasting literally two days on finding publicily available data, I reach out to the community. For a project I need steel, wage and some shipping related time series.

Steel: I am able to find data at US Bureau of Labor Statistic (Series ID “WPU101” if anyone is interested) (Wasn’t looking for steel plates, but it’ll do.)
Wage: Is super tough. A “world” index would be nice, but even some more granular (Advanced Economies and Emerging Markets and Developing Economies) would do.
Ship yard capacity: I’d like to -somehow- model how busy ship yard’s currently are. It is a long sho here, but maybe someone has an idea on how to put this together.

Any productive ideas are most welcome.

submitted by /u/erkan_lange
[link] [comments]

Ice Hockey Dataset – Offset Penalties

Hey,

I’m wondering if anyone has a data set that includes what percentage of penalties in the NHL (minor, major, etc.) come from offsetting penalties? In other words, how many of the total penalties in a season are offset, such that teams play at even strength post penalty? Additionally, is there season level data on this over the past few seasons?

Trying to avoid matching player level data (player penalties) and game level data (coding for offset penalties based on time), which can provide this data but will take a while to compile. This is to address a question that an editor for an academic publication asked during a conditional accept on a research project (final hurdle before publication), so any data that helps answer it would be extremely appreciated.

Thanks!

submitted by /u/Trying2bAProf
[link] [comments]

Looking For COVID-related Social Media Posts From 2020 Posted To Healthcare Or Nursing Groups

Title. I’m looking to do some research on what was posted to popular social media sites in 2020 about COVID. Specifically, things posted onto subreddits/forums/etc. devoted to healthcare or nursing.

It’s a shot in the dark, I know. But wanted to at least put a feeler here since the entire world was studying COVID-19 for a while there.

If anyone knows of a related dataset or has already scraped social sites for this sort of data before, please let me know!

submitted by /u/SailorNash
[link] [comments]

Looking For CVs Dataset With Linkedin Formats And Non-Linkedin Formats For A CV Parsing And Candidate Ranking Project.

Hello everyone. As the title says, I’m looking for a dataset that includes CVs with Linkedin format and other regular CV formats for parsing and training a model for candidate ranking. I tried searching about what a “Linkedin” CV format meant but didn’t find anything meaningful so i’d appreciate it if someone tells me what it meant.

submitted by /u/Raki360
[link] [comments]

Request For Shipping Cargo Dataset For Data Analysis Project

Hello everyone,

I hope this message finds you well. I’m currently working on a project related to shipping logistics and cargo data analysis. I’m in search of a comprehensive dataset that includes information on shipping routes, cargo types, volumes, and possibly costs.

If anyone has access to or knows where I could find such a dataset, I would greatly appreciate your help. Please feel free to either reply here or send me a private message with any leads or suggestions you may have.

submitted by /u/mr1Hunned
[link] [comments]

Datasets With Abdominal Vessels That Are Annotated

Hi everyone! I’m trying to find a dataset with abdominal CT scans that have labeled annotations of some of the common abdominal vessels near the pancreas and liver (ex. aorta, celiac artery, and superior mesenteric artery, inferior vena cava, portal vein, superior mesenteric vein, splenic vein and renal veins). I have found some research papers that use these types of annotated datasets, but they are all collected from hospitals and annotated by medical professionals on their team, so they are not publicly available. If anyone knows where I get my hands on such a dataset that would be great! Thank you so much!!!

submitted by /u/DiyaRamakrishnan
[link] [comments]

Synthetic Image Dataset For Indian Road Signs In Challenging Conditions.

https://imgur.com/a/2HvaRLU
https://imgur.com/a/CY9gTYf
Update on my Synthetic Image Dataset for Indian Road Signs in Challenging Conditions.

Here I showcase the angles and corresponding labels generated for a sample of the dataset.

Next, I am going to add rain to the scene to increase the challenge for computer vision perception models.

I am using Unity Perception 1.0 and will write some custom C# scripts along the way.

Thanks

syntheticimagegeneration #syntheticdata #syntheticimages

submitted by /u/Gold_Worry_3188
[link] [comments]

Looking For LG INR21700 M50 Battery Dataset

I am working on a project building a machine learning model to State of Health/Charge and Remaining Useful Life of Batteries. For that I am looking for the dataset of LG INR21700 M50 cells. Does anyone worked with it? Do I have to request for its access or is publicly available?

Thank you in advance.

submitted by /u/RoxstarBuddy
[link] [comments]

Looking For Medicine Dataset With Focus On Name, Chemical Structure (SMILES), Molecular Descriptors, Protein Targets, Pharmacological Properties, Medicine Ontology Information, Combination, Adverse Events, Gene Expression Profile, Known DDIs.

I’ve applied for an academic license at DrugBank.com but my application has been under review for 4/5 days and this is an internship project, so if anyone can provide me with sources and how to access those datasets, thankyou. I’ve seen PubChem, DrugBank, ChEMBL but I can’t figure out how to download them.

submitted by /u/Anxiousbanana001
[link] [comments]