Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Where Can I Find Sports Application Datasets?

Hi everyone, can anyone tell me where I can find datasets related to users of sports applications? I tried sending forms and surveys to several people, but I ended up not getting a reasonable amount of responses.🥲🥲 It would be to create an AI for potential users of a sports application. Nothing too complex as it is just for a school project. Anyone who can help me I will be very grateful.

submitted by /u/Rough-Chef-6215
[link] [comments]

I Need Datasets For Machine Translation Project And If I Can’t Find A Dataset Of The Equivalent Translation I Need, How Can I Make One ?

this is my first real project and I need to work on , the equivalent i seek isn’t popular, because it’s between two dialects of the same language

so my bits that i won’t be able to find a dataset for my project so my question is on how to make a translation dataset to train my translation model

if any can proved help through material, tutorials, or if they been through the same problem i will be thankful

submitted by /u/Emotional-Rhubarb725
[link] [comments]

Data Query For Locations?? Want To Find X Within Y Distance

This is so random I’m not sure if this is where I’m supposed to be but I am trying to look up locations relative to other locations. So for example I want to find all the apartments in Mississippi that are within 10 miles of an AMC movie theater. Or let’s say I drive an hour to work every day and I want to know every gas station on the route to work. How do I do this?

submitted by /u/Mountain_Mud_4720
[link] [comments]

Free Pet Insurance Dataset: 50,000+ Quotes For Data Analysis And ML Projects

I’ve just come across a free sample dataset of over 500,000+ pet insurance quotes from the UK market. This real-world dataset includes information on:

Pet details (species, breed, age)
Policy features (coverage types, limits, premiums)
Geographical data (postcodes)
Policyholder demographics
It’s perfect for:
Predictive modeling of insurance premiums
Risk analysis in the pet insurance market
Exploring geographical trends in pet ownership and insurance
Practice projects for data cleaning and analysis

You can access the dataset here: https://app.snowflake.com/nkkubsv/hjb89858/#/data/provider-studio/provider/listing/GZTSZ2DR6BH

I’m excited to see what insights and models the community can derive from this data from https://marketdatainsightica.com

submitted by /u/Comfortable-Ad-6686
[link] [comments]

Where And How Do You Normally Find Data For Your AI Projects?

I know this question may vary depending on industry and use case, but I’ve spent hours navigating pages for different types of data for my projects and still feel like I’m not finding the right datasets.

I’m starting to suspect that I’m either using the wrong process for determining what type of data I need or not looking in the right places.

For context: I’m working on both LLM and conventional ML projects, and I’m looking for both various structured public EU datasets and unstructured private data. However, I’m curious to learn about your experiences in general so that I can assess my own process.

How do you go about finding datasets for your projects, and where do you normally search for them?

submitted by /u/Impressive_Bit_979
[link] [comments]

[Request] Ecommerce Data Pertaining To A Specific Product.

Hi comrades. I’ve got myself in a pickle by promising something that I’m not sure how to deliver.

My boss would like to know when a specific shaped product first went on sale in the UK (not just by us, which would be easy, but by any of our dozens of competitors). We identify the product by a vague description, e.g. “Fairy decoration with illuminated wings”, but we’re also interested in “decorative fairy with light up wings”.

Google reverse image search can get me a list of product names from various suppliers, for what’s on sale now, but I’ve struggled with finding out how far back these sales go. I thought WayBack Machine would help, but it’s really light on e-commerce sites. This may be because “view product” pages on most sites aren’t stored, but are generated dynamically.

I think EAN data might help us, but I’m not really familiar with that. Similarly, Ebay or Amazon might hold the key, but I don’t know how easy it is to access old data from them.

Do any of you guys know a decent source of data that could reliably show when a product first appeared on the market?

submitted by /u/JoeDidcot
[link] [comments]

Web Site Traffic Identification PLUS 80+ Demographic Pieces Of Data

I run a company that provides extremely detailed data on your annoymous web traffic. We can identify between 40-50% of your annoymous web traffic for you to convert. These pixels can be placed on funnels or websites. This traffic is already familiar with your product or service, so would you pay $.12 -.$19 to identify and target them? Its a no brainer! If you want a demo- and test it out- email me [Stephen@vitaledge.io](mailto:Stephen@vitaledge.io) My calendar link

100% Compliant with CANSPAM & CCPA laws- Available for United States Consumers Only. We do work with business outside the US targeting US consumers. We say we can only identify 40-50% because we scrub and remove anyone on the DNC, Outside the US, Bots, and does not have verifiable opt-in status. Our list is 100% ready to be marketed to.

Data Points Captured:
Name, Email (0.9% bounce Rate), Phone (80-85% Accuracy), Address, Opt In Status (verified, date, IP, URL) Facebook, Twitter, Linkedin Profiles, Age, Gender, Religion, Marital Status, Address, Secondary Address, Urbanicity (rural, surbabn, City) Congressional District, Laguages, Behaviors, Interests, Generation, Ethnic Group, Political Party Affiliation,Political campaign contributor, Education level, Occupation Detail, IOS or Android, Income Level, Total Household Income, Credit Score Range, # Adults in the house, # Children, Age Ranges of Children, Dwelling Type, Own vs Rent, Mortgage Details, Purchase Price, Current Value, Refinance Status, Refi amount, Type of Mortgage (conventional, FHA, VA), Amex Card, bank cards, Credit Cards, Investments, Premium Credit Cards, Stocks, Bonds, make model of vehicle.

We also do data enrichment. If you have a customer data base we can enrich with all the data points here!

If you are running FB/ Google Ads we can also provide custom audiences of people who have searched for your product or service in the last 24hrs- and we set up a live feed so your audience is always with in the last 24hrs. Your CPL may be higher, but your CPA can be cut by 20%-50% ! Send me an email! [stephen@vitaledge.io](mailto:stephen@vitaledge.io)

submitted by /u/stevo1586
[link] [comments]

Looking For Dataset(s) That Shows “appreciation” Of Subjects Taught In High School AFTER Graduation

I’m currently investigating what the general opinion is of U.S. high school graduates in areas of the different commonly taught subjects (i.e. history, math, biology, chemistry, etc) that are ubiquitous for all students. In other words, I’m only interested in subjects that are generally mandatory throughout all of high school. Personally, I know physics was NOT a mandatory subject at my school, and if that policy scales to most of the rest of the country, I would not be interested in what the opinions of physics are. I’m trying to study and better understand what subjects people have respect—and to some extent admiration—for regardless of if they did well in them. I did not do particularly well in my high school biology course, but my opinions of biology of a subject were never negative, just my opinions of the textbook, teacher, etc. I would like to know if there are any datasets that probe at this particular aspect high school classes. My gut feeling is that mathematics class leave more people with a genuine distain for the subject itself more so than people would have with an art class, even though, for example, the frequency of people unable to paint a fence after an art class might be comparable with that of the people unable to find the zeros of a quadratic equation. Normally, I would put a lot more effort into tracking down this kind of data myself, but this endeavor would be best described as a side-side project and can’t afford myself to get too sidetracked on it. If anyone has a reference to such a dataset, or even an analysis that mirrors what I’ve detailed, I would be greatly appreciative if you could point me toward it.

submitted by /u/InterestingAd4287
[link] [comments]

Reddit Posts Dataset For Kaggle Community

Hi folks,

So I am a data analyst. And spent yesterday fetching hot reddit posts for top subreddits as an EDA activity. I fetched the common parameters like post title, url, upvotes, number of comments, shares, etc. Removed the ‘author’ (userid of person who made the post) for privacy reasons.

I am thinking of uploading the dataset to Kaggle for other fellow analysts and researchers, under the Reddit API Terms license available on Kaggle.

Is this ok? Or am I going to get in any legal trouble?

Regards

submitted by /u/we-r-just-stardust
[link] [comments]

Top Reddit Posts Across 50 Subreddits

Link to Dataset – Kaggle

I am relatively new to python, pandas. Recently getting better.
So I wanted to do an EDA on top reddit posts of all time. I couldn’t find something concise. I saw a few datasets in 10s of GBs of entire data dumps by pushshift. But that was too much for me to go through.

I wanted something simpler, lightweight for myself and potentially other newbies to get their feet wet when coming into analytics.

So I wrote a script and had to take chatgpt help (pardon my poor coding skills, im not from a programming background) to use reddits api to fetch top posts from top 50 subreddits.

I did a bit of data preprocessing and cleaning to ensure the formatting was ok, removed the OP(author) field for privacy.

Uploaded to Kaggle and prepared a starter notebook – whether or not you check out the dataset, the notebook is a must look. Short and to the point intro.

The script needs work, cleanup and commenting, and updates to ensure I don’t fetch OP info in the first place. Will also try to fetch some other necessary parameters. When finalized, will share that on github. (I do not know how to use github yet, again sorry).

Thanks for your time.

I hope to find some interesting datasets on r/datasets for my eda as well.

Thenk 😀

submitted by /u/pale-blue-dotter
[link] [comments]

Bulk Weather Data Download Of Multiple Locations

Im looking for some kind of service (free or cheap) that offers daily weather data for multiple cities around the globe. I initially OpenWeatherMap but their bulk data downloads require a professional subscription which costs over $400 which is too much for my simple project. Any alternatives? Ideally it would be a csv or json file of various cities with the average weather for that day or even hourly would be better.

submitted by /u/blackpanther28
[link] [comments]

Job Postings Dataset: Enriched Exactly How You Need It

We built the best job postings database which includes:

De-duplicating and removing ghost job postings Tagging jobs by O*NET SOC code (the standard occupation taxonomy in the US) Tagging employers by NAICS code Extracting job title, salary range, benefits, and qualifications

Disclaimer: I am one of the founders. If you’d like to try a sample of the dataset, please comment below or DM.

submitted by /u/Different-General700
[link] [comments]

Pharmacoepidemiology/Cancer Datasets

Hi,

I am finishing my Masters in epidemiology and I need to analyze a dataset for my thesis. I am looking for any datasets related to pediatric brain cancer, cancer survivorship/outcomes, and also treatment modalities (chemotherapy, surgery, radiation). I am familiar with SEER but was wondering if anybody had other recommendations. I am hoping there is a dataset out there with more specific treatment information than SEER.

Ty!

submitted by /u/thenotoriouskara
[link] [comments]

Looking For A Detailed Births Dataset

Hi, I am looking for a detailed dataset with information about births, including the estimated gestation week or even day, mother age, if it is a natural delivery or c section, and any other details. I am interested in applying the possible results in Europe, but different geographic contexts would be really interesting. Thanks

submitted by /u/Jfpalomeque
[link] [comments]

Need Some Advice Regarding Finding Data Sets Or Establishing A Set Of Questions Regarding The E-commerce Problem Domain

I’m a student for CSU and I’m taking CIS 250. For a project I need to determine a problem domain, establish a set of questions to answer, and find a data set to adequately answer those questions.

The domain I decided upon is E-Commerce and the questions I set were Q1 “How has the use of online storefronts by customers changed over the decade?” and Q2 “How much does ease of access in a digital storefront’s UI affect the amount of customers who order from said storefront?”. I chose this data set called Amazon Data Set (on Kaggle.com. Link to dataset here), but it doesn’t have sales data, making it unfeasible to use to answer the questions. That’s when I realized how tricky it is to find a data set that does answer those questions.

So, is it possible any of you know any good sites where I can find data that suits those questions or should I propose a new set of questions that are feasible to answer with the data I have access to?

submitted by /u/Allustar1
[link] [comments]

Looking For Historical Cloud / Cloud Tops / Satellite / Lightning Data Sets

I want to create a detector for nearby thunderstorms. I’m a slight amateur meteorologist and a full time machine learning engineer. It’s always annoyed me that you can basically tell if there’s bad weather coming your way from just a glimpse at the weather radar sites.. but somehow there’s no personalized app that warns me.

I teach kayaking to groups on the water, so there’s a bit of personal safety involved. My wife does research on open fields so I’d also like to provide her with warnings.

I’m an European citizen so I might have access to ESA data?

submitted by /u/Captain_Flashheart
[link] [comments]

Looking For Data Sets For College Classroom

I am trying to make my university-level statistics class more engaging. I previously used the data sets provided by the book in my class notes, but I would like to start using real-world data sets that are more relatable and interesting to college students.

Would anyone happen to have a suggestion of where I can find these types of data sets? Does anyone know what kind of data sets seem to click with 18-20 year olds? I’m thinking social media use, maybe specific data about the college they are currently attending, anything about money.

Thank you!

submitted by /u/Mathislove87
[link] [comments]