Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Searching A Small Dataset For Sarcasm Detection

Hello! I have an assignment and I wanted to do a sentiment analysis, specifically sarcasm detection, for a small amount of data (about 150 tweets relating to the same topic, ex. harry potter or marvel): I’m going to use a model already trained, I just need to show that I know how to use it. Can you help me find something similar to what I’m searching? I’m very new to all of this and I don’t really know where to search 🙁

submitted by /u/Artistic-Ad-5790
[link] [comments]

Acces To Financial Data On Refinitiv LSEG

Hello. I am a phd researcher in an emerging country looking for access to Refinitiv Eikon LSEG for specific data points (country Bonds, banks data, ESG Scores etc.) I am a single researcher working alone and my university could not make a partnership with LSEG Refinitiv Eikon and even when i contacted LSEG they didn’t want to give solo access (they said they can do it only for companies or universities).

My time for phd is coming and collecting data is requiring a lot of energy and field work, so im looking for someone with an existing account who may be willing to share access or send specific information in return for a reasonable fee. Ill be more than pleased, Thanks in advance.

submitted by /u/Electrical_Life5638
[link] [comments]

I Need A Detailed Dataset For A Football Scouting App

Hi everyone. I am currently working on a football scouting app for a school project and i was wondering if someone who may have done something similar before has a detailed dataset of players statistics around Europe top 5 leagues (at least – anything more is a bonus). The season doesn’t matter much as the set will only be used for demonstration purposes. Thank you in advance.

submitted by /u/Comfortable-Play9718
[link] [comments]

Are There Any Datasets Of Labeled Aerial Imagery, Possibly Of Google Earth, For Training A Deep Learning Object Identification Model?

I’m working on a project where I need to train a deep learning model that can identify roads, houses, cars, and trains from aerial/satellite ln Google Earth. I’d been manually counting cars and houses before but I’d rather make a model from scratch that’ll identify them for me. Is there a repository of reliable labeled aerial images, ideally from Google Earth?

submitted by /u/literallybateman
[link] [comments]

Why Is Cleaning Data Always Such A Mess?

been working on something lately and keep running into the same annoying stuff with datasets. missing values that mess everything up, weird formats all over the place, inconsistent column names, broken types. you fix one thing and three more pop up.

i’ve been spending way too much time just cleaning and reshaping instead of actually working with the data. and half the time it’s tiny repetitive stuff that feels like it should be easier by now.

interested to know what data cleaning headaches you run into the most. is it just part of the job or have you found ways/AI tools to make it suck less?

submitted by /u/shopnoakash2706
[link] [comments]

Biggest Challenges In Data Cleaning?

Hi all! I’m exploring the most common data cleaning challenges across the board for a product I’m working on. So far, I’ve identified a few recurring issues: detecting missing or invalid values, standardizing formats, and ensuring consistent dataset structure.

I’d love to hear about what others frequently encounter in regards to data cleaning!

submitted by /u/Academic_Meaning2439
[link] [comments]

I Need Datasets For Learning Machine Learning

Hi! I’m currently doing a Data Science Bootcamp, I need to make a Machine Learning project, I can do whatever, it’s an easy project so they can see if I can do the process and stuff like that. I need to look for datasets as part of the project but this it’s not evaluated so it doesn’t matter how I get the dataset.

I’ve been looking for datasets but they’re either too complex (I wanted to do a research on Amazon products, I found this but the dataset is huge, I think I’m going to spend more time trying to know how to work with it than doing the actual project, time that I don’t necessarily have) or too simple.

Another problem I have is that I kinda want to do something that while simple, still needs machine learning, because some datasets I found I could do something with but I feel that is over engineering a bit and I’d like to make something closer to what a real project could look like and that includes a reason to do it that way.

If someone know some dataset that I can do the project with I’d be grateful

submitted by /u/chucklemuff
[link] [comments]

Automatic Report Generation From Questionnaire Data

Hi all,

I am trying to find a way for ai/software/code to create a safety culture report (and other kinds of reports) simply by submitting the raw data of questionnaire/survey answers. I want it to create a good and solid first draft that i can tweak if need be. I have lots of these to do, so it saves me typing them all out individually.

My report would include things such as an introduction, survey item tables, graphs and interpretative paragraphs of the results, plus a conclusion etc. I don’t mind using different services/products.

I have a budget of a few hundred dollars per months – but the less the better. The reports are based on survey data using questions based on 1-5 Likert statements such as from strongly disagree to strongly agree.

Please, if you have any tips or suggestions, let me know!! Thanksssss

submitted by /u/BodyFun5162
[link] [comments]

Computing Education Resources Data Collection?

Hi everyone,

I’ve been struggling with this for the past few weeks… I’m currently working on a project to build a dashboard for computing education resources in the community. The focus is on out-of-school programs, things like after-school coding clubs, library events, university outreach programs, summer camps, etc.

The problem is: there’s no existing dataset for this kind of information, so I need to build a database from scratch. I’m stuck on how to collect these data in an efficient and scalable way. I don’t have much experience with data collection, and right now, the only way I can think of is manually searching and entering the information, which obviously is not ideal considering the time and effort, and wouldn’t be a solution for long term.

I was thinking about using something like the Yelp API, but it doesn’t really cover academic or nonprofit events very well.

Has anyone encountered something like this before or have any idea on how to approach it? I’d really appreciate any advice, tools, or suggestions!

submitted by /u/CherryLetter
[link] [comments]

Looking For Hinglish (Hindi-English Code-Mixed) Emotion-Labeled Speech Audio Dataset

Hi everyone,

I’m working on a deep learning project focused on emotion recognition from Hinglish (code-mixed Hindi-English) speech.

I’m specifically looking for:

Audio recordings of Hinglish speakers

With emotion labels (happy, sad, angry, etc.)

Spoken in natural code-mixed sentences (not just Hindi or English alone)

So far, I’ve only found datasets like:

CREMA-D, RAVDESS – English only

IITKGP Emotion Hindi Speech , hindiemo– Hindi only But nothing for Hinglish, especially with emotion labels.

Even small datasets (100–500 samples) or research projects that have created or used such data would be extremely helpful. If no such dataset exists, I’d appreciate any advice on similar resources or potential alternatives.

Thanks a lot! 🙏

submitted by /u/Due_Confusion_8014
[link] [comments]

Homeowner And LinkedIn People Data Set?

I’ve been tasked with doing a project to correlate people in Texas’ professional success to the sizes of their homes. Are there data sets that offer homeowner information and their LinkedIn profiles?

I’ve found homeowner names and their homes’ square footage on county clerk websites, and I can manually search people’s names on LinkedIn and make educated guesses as to whether they’re the same person, but I’m wondering if there’s a faster way of doing this.

submitted by /u/ChineseFoodRocks
[link] [comments]

Need Help Finding Two Datasets Around 5k And 20k Entries To Train A Model (classification ). I Needed To Pass A Project Help Pls

Hi I need these two datasets for a project but I’ve been having a hard time finding so many entries, and not only that but finding two completely different datasets so I can merge them together.

Do any of you know of some datasets I can use (could be famous ) ? I am studying computer science so I am not really that experienced on the manipulation of data.

They have to be two different datasets I can merge to have a more wide look and take conclusions. In adittion I need to train a classification type model

I would be very grateful

submitted by /u/Jproxy122
[link] [comments]

Trying To Build A Dataset Of Political Donations By Industry, Need Some Help Starting.

I’m working on a little passion project, a dataset of political donations in Alaska that would be broken down by company, industry, donor location, and candidate.

But campaign finance filings are very scattered and inconsistent. Some candidates over the years have reported via PDFs, others dump spreadsheets, and a few towns barely publish anything. I had more luck with the statewide Akorgs company register, which is good for data on who actually owns what, but it’s a small part of this “research”.

I’ve also looked through municipality and state election sites manually, but I’m missing smaller local races or entities that don’t get flagged properly (especially Native corporations or smaller PACs). Ideally, I want a clean CSV or database where I can filter donors by SIC code or address.

So, if anyone knows a (maybe free) consolidated repository by state, even just for some years, I’d appreciate it. Any other data sources or tools for this, including third-party aggregators, is also welcome.

submitted by /u/Sharp-Self-Image
[link] [comments]

Dataset: Material Deformation Data For A New Phase-changing Polymer

This dataset comes from an early-stage lab experiment examing deformation behavior in a novel phase-changing polymer under varying loads.

Dropbox Link: https://www.dropbox.com/scl/fo/2by3a3cvyimg5zp1fyabf/ABG0s8meLsN2LQvkwqgCQnU?rlkey=9iub6b8oufwf1fbogayh662n2&st=winsvzjk&dl=0

Columns in CSV includes:

  • `sample_id`
  • `strain_rate`
  • `temperature`
  • `load`
  • `elonation`
  • `phase_transition`

This dataset is free to use for research and educational purposes.

submitted by /u/Automatic_Program114
[link] [comments]

Dataset Required For Quantitative Behavioural Analysis On Sustainability Behaviours

Hi all,

I’m working on a project that involves analyzing sustainability-related behaviors (e.g. energy use, recycling, green consumption, sustainable transport, etc.) using quantitative data.

These could include:

  • Household or individual-level data on energy, water, or transport usage
  • Panel data on product or brand choices, especially eco-labeled or green products
  • Surveys with attitudinal + behavioral questions
  • Pre/post intervention data (even better if from sustainability campaigns)
  • Consumer or municipal-level data on waste, electricity, or mobility

The project is for my portfolio and non-commercial, and I’m happy to share back any insights or modeling techniques with those interested. Any pointers to open datasets, research repositories, or organizations sharing such data would be hugely appreciated.

Thanks in advance!

submitted by /u/sarthook
[link] [comments]

[CSV] US Plastic‑Surgery Cost & Surgeon‑Availability — 600 Rows (100 Metros × 6 Procedures, July 2025)

**TL;DR – data updated 2025‑07‑04**

> *Example:* In **Phoenix** a **rhinoplasty** averages **$10 250** (range $7 k–$14 k) with **38** board‑certified plastic surgeons; next consult ≈ 14 days.

**Raw CSV (70 kB, no signup):**

https://raw.githubusercontent.com/Pastor0fMuppets/plastic-surgery-info/v2507/data/plastic_cost_v2507.csv

—-

### What’s inside?

| Column | Notes |

|——–|——-|

| `City` | Top 100 U.S. metros |

| `Procedure` | Rhinoplasty, Breast Augmentation, Liposuction, Tummy Tuck, Facelift, Breast Reduction |

| `Avg_Cost_USD` | RealSelf “Worth‑It” averages (rounded) |

| `Cost_Range_USD` | 25th–75th percentile |

| `Board_Cert_Surgeons` | Count of individual NPIs with plastic‑surgery taxonomy (`2082*`) |

| `Earliest_Consult_Days` | Days until next open slot (from AestheticMatch feed) |

| `Financing?` | Yes / No flag (CareCredit / Alpheon accepted) |

| `Consult_Link` | Branded redirect to booking form **inside the CSV rows only** |

### Data sources

* RealSelf Cost API (CC BY 4.0) – scraped 2025‑07‑03

* CMS NPPES (2025‑06 dump) – public domain

* AestheticMatch availability feed

### Disclaimer

Prices are averages for information only and may vary.

Not medical advice. Verify costs and credentials with a board‑certified surgeon.

submitted by /u/Haunting_Photo_9361
[link] [comments]

Datasets For Cognitive Biases Impact

Bit of an odd request, I want a dataset where I want to illustrate in Power Bi tool the impact of behavioral analytics and want to display the impact for it.

Any idea where I can find? I am open to any industry but D2C industries would be preferrable i guess.

submitted by /u/skap24
[link] [comments]

Alternatives To The X API For A Student Project?

Hi community,

I’m a student working on my undergraduate thesis, which involves mapping the narrative discourses on the environmental crisis on X. To do this, I need to scrape public tweets containing keywords like “climate change” and “deforestation” for subsequent content analysis.

My biggest challenge is the new API limitations, which have made access very expensive and restrictive for academic projects without funding.

So, I’m asking for your help: does anyone know of a viable way to collect this data nowadays? I’m looking for:

  1. Python code or libraries that can still effectively extract public tweets.
  2. Web scraping tools or third-party platforms (preferably free) that can work around the API limitations.
  3. Any strategy or workaround that would allow access to this data for research purposes.

Any tip, tutorial link, or tool name would be a huge help. Thank you so much!

TL;DR: Student with zero budget needs to scrape X for a thesis. Since the API is off-limits, what are the current best methods or tools to get public tweet data?

submitted by /u/letucas
[link] [comments]

Looking For A Reliable Source Of Player Tackles Odds — Any Leads?

Hey folks, We’re working on a prop-focused betting analytics tool, and we’ve run into a wall trying to consistently source player tackles odds across major leagues (especially Premier League, La Liga, MLS, etc.).

We’re NOT looking for final match stats (we already have those), and we’re not scraping bookies directly due to all the anti-bot measures.

What we’re looking for:

A data provider/API that reliably includes pre-match odds for player tackles

Ideally with some sort of subscription or monthly fee (we want stability, not hacks)

Doesn’t have to be Opta-tier, just accurate and consistent

We’re happy to pay if it saves us the headache and keeps things running clean on the backend. If anyone’s using or knows of a source (public or private), I’d love to hear from you.

Thanks in advance for any help — and if anyone’s building something similar, always open to connect!

submitted by /u/hildegrim17
[link] [comments]

Request: Reddit Posts And Comments From R/endometriosis (April–May 2025) For Academic Research

Hello! I am conducting academic research on discussions in r/endometriosis from April through May 2025 and January 2023. I’m looking for datasets containing posts and comments from that subreddit during this period. I’ve tried Reddit API and Pushshift but haven’t been able to access the full historical data. If anyone has such a dataset or can point me to where I can find it, I’d really appreciate your help! Thanks so much!

submitted by /u/LordofRinger
[link] [comments]