How Do You Stay Sane While Working With Messy Or Incomplete Data?

Dealing with inconsistent, missing, or messy data is a daily struggle for many data professionals. What’s your go-to strategy for handling chaotic datasets without losing your mind? Do you have any personal tricks, mindset shifts, or even funny coping mechanisms that help you push through frustrating moments?

submitted by /u/Pangaeax_
[link] [comments]

0

Any Databases To Pull A Simple Random Sample Of US Addresses?

I apologize if this belongs on r/askstatistics (I posed here since I am inquiring about a dataset). I’m developing a mapping algorithm and require a random sample of US addresses to validate the tool with. I was wondering if anyone had any tips on free databases that would be a statistically sound source to select a simple random sample from? Do you think openaddresses.io would be adequate? Alternatively, I was thinking of randomly generating a latitude and longitude within the United States and then using a reverse geocoding algorithm to provide an address. Though I’m not sure the latter would be a statistically sound method?

submitted by /u/Khianea
[link] [comments]

0

Want: Video Footage Of A Roulette Wheel Spinning With Ball

Hi, I’m going to start working on a project regarding object detection and roulette. Does anybody know where i can find sources of roulette being played?

submitted by /u/Dirty_Wanderer
[link] [comments]

0

Sources For Weapons Impact Data In War

Hi all,

Would anyone have insight into a dataset of recent war incidents (ideally the last 25 years, not historical) which tracks specific munitions use and impacts?

Platforms like ACLED, S&P Global, LiveUAMap have good records of specific incidents (a drone strike here, an tank shelling there) but there’s not a focus on the consequences.

My ideal dataset would have date, location, weapon type and some measurement of destruction. The idea is to abstract different ‘types’ of war – Sudan vs Ukraine vs Gaza – in order to examine what would happen if these ‘war’ types hit elsewhere.

Grateful for any insights!

submitted by /u/Trebia218
[link] [comments]

0

Looking For A Good Phishing Email Dataset, The Latest The Better

i am looking for a phishing email dataset for my model for classification. i need email body as well. if its possible to get the latest dataset pls provide.

submitted by /u/Glittering_Item5396
[link] [comments]

0

Need Customer Feedback / Support Ticket Dataset That Also Shows The Unmet Needs Of The Customer.

I need help with finishing such dataset ASAP it’s urgent

submitted by /u/AdityaxReddy
[link] [comments]

0

I Need Help Finishing My Stats Project

Guys I only need like 20 more responses for my stats project due tomorrow. If you’re a college student this form only takes like 5 seconds to fill out. Please help a pimp in distress!

submitted by /u/Swimming-Smoke4722
[link] [comments]

0

Does Anyone Have Volvo GTT Dataset ?

It was used in Volvo Challenge ECML PKDD 2024. I have searched the entire internet but I am yet to find it anywhere. If someone happens to have it please do share.

submitted by /u/Handicapped_banana
[link] [comments]

0

Datasets/where To Look For Wide Range Of Company Data

Hi All, I am a data scientist trying to run an analysis on companies to identify potential new clients for the current company I work for. Currently, we have one very large client (think millions of workers) that we do most of our reporting work on, then we have 3-5 smaller clients (think 10k workers or less). I can’t get too far into specifics, but we essentially are an add-on service to a company’s medical plan (free for the employees to use, but we bill the company). We do outreach to offer our services, but obviously the list of people we can contact is finite and will decrease quickly over time. Our main goal is to identify workplace troubles and situations where work environments affect a worker’s mental health, then provide them with resources to help with whatever they are struggling with. Our busines model is that we can prove that providing these services proactively saves companies millions of dollars in medical spend in the long run (spend a little now to keep employees mentally healthy vs wait for problems to compound into more serious problems resulting in more medical claims spend in the future). I have been looking for an impactful project to work on, and the one that I keep wanting to explore more is to build some sort of clustering algorithm to 1) identify companies similar to the ones we currently work with, and 2) identify other companies that we can provide the most impact for. I would greatly appreciate any recommendations on what resources I can use to compile the data I’m looking for, where to start, or any other ideas to help refine my approach.

Thanks so much!

submitted by /u/CollectionShoddy8445
[link] [comments]

0

Request For MRI Brain Tumor Images (Meningioma, Pituitary, Glioma)

Hi everyone,

I’m working on my undergraduate thesis in statistics and need MRI images of brain tumors (meningioma, pituitary, and glioma) to apply machine learning techniques. I’m looking for reliable datasets, preferably from institutional sources, hospitals, or public databases.

If anyone knows where I can find these images, I would really appreciate your help!

Thanks in advance to anyone who can assist! 🙌

submitted by /u/ThomKm
[link] [comments]

0

Life Expectancy Dataset 1960 To Present

Hi, i want share with you this new dataset that I has created in Kaggle, if do you like please upvote

https://www.kaggle.com/datasets/fredericksalazar/life-expectancy-1960-to-present-global

submitted by /u/Electronic-Reason582
[link] [comments]

0

Need Help Creating A Research Question

Hi all!

I’m taking a statistics class and the assignment is to create a quantitative manuscript. The prof wants us to use a publicly available dataset and then create a research question, do the stats/analysis and write the manuscript (instructions: Choose a research question that aligns with the available data in the selected dataset and is relevant to your chosen context). I’m thinking of using this database:

Hospitalization and Childbirth, 1995–1996 to 2023-2024 — Supplementary Statistics

https://www.cihi.ca/en/access-data-and-reports/data-tables?keyword=birth&published_date=All&acronyms_databases=All&type_of_care=All&place_of_care=All&population_group=All&health_care_quality=All&health_conditions_outcomes=All&health_system_overview=All&sort_by=field_published_date_value&items_per_page=10&page=0

I’m interested in maternal health, but I’m really struggling with creating a research question. I just don’t understand how you can do it from a database – I’m a qualitative researcher so i’m use to always doing data collection. Any help would be so greatly appreciated

submitted by /u/CupcakeCapital9519
[link] [comments]

0

Help Me With My Data Collection On Vehicle Data Using Simulator.

I’m doing an ML project on a study of various accident scenarios in vehicles, hence I would need to collect datas such as speed and steering wheel angle in timeseries format, at first I used euro truck simulator to collect some data but now I have reached a point where I need to collect the data of two vehicles at a time. Can someone help me with this, Carla is a heavy file and cannot be supported.

submitted by /u/The_Tropicals
[link] [comments]

0

Is There Any Recommended Datasets I Could Possibly Use For School Project

Im just looking for an easy to understand data set because I’m don’t really know what should my project should be about could someone help me decide?

submitted by /u/Some_guy-yt
[link] [comments]

0

Web Server Logs – 4,091,155 Requests, 27,061 IP Addresses, 3,441 User-agent Strings (march 2019)

submitted by /u/PaperMoonsOSINT
[link] [comments]

0

The Kaggle Dataset Has Over 10,000 Data Points On Question-and-answer Topics.

I’ve scraped over 10,000 kaggle posts and over 60,000 comments from those posts from the kaggle site and specifically the answers and questions section.

My first try : kaggle dataset

I’m sure that the information from Kaggle discussions is very useful.

I’m looking for advice on how to better organize the data so that I can scrapp it faster and store more of it on many different topics.

The goal is to use this data to group together fine-tuning, RAG, and other interesting topics.

Have a great day.

submitted by /u/nieuver
[link] [comments]

0

Web Browser Useragent And Activity Tracking Data – 600,000,000 Web Traffic Records

submitted by /u/PaperMoonsOSINT
[link] [comments]

0

LogHub – A Large Collection Of System Log Datasets For AI-driven Log Analytics

submitted by /u/PaperMoonsOSINT
[link] [comments]

0

Need Help‼️ Urgently Looking For An Accurate Indian Stock Market Dataset With Buy/Sell Ratios 🚨

My team and I are currently developing a financial software solution. Our primary goal is to deliver clean, structured, and highly accurate data to users, not just stock market predictions.

We are currently focused on the Indian stock market and urgently need a reliable dataset. While multiple datasets are available online, they lack accuracy and do not fulfill the requirements for our application. Specifically, we need data in a structured format like this:

📊 Stock Analysis for RELIANCE
➡ Last Price: ₹1247.25
🔄 Change: ₹8.85 (0.71%)
🔹 Open Price: ₹0 | Close Price: ₹0
📉 Day Low: ₹0 | �� Day High: ₹0
📆 52-Week Low: ₹0 | 52-Week High: ₹0
📊 VWAP: ₹0 | Above VWAP ✅ (Bullish)
📢 Trend: 📈 Uptrend
🔥 Near 52-week high! Possible breakout

The challenge we face is that most available datasets do not include crucial metrics like the buying and selling ratio, which makes precise analysis difficult.

If anyone has access to a dataset that includes this information or knows a reliable source where we can obtain it, please share the details. This is extremely urgent, and we would be very grateful for any help or guidance.

submitted by /u/Puzzle_Age555
[link] [comments]

0

Bitter DB A Database Of Bitter Hings

submitted by /u/cavedave
[link] [comments]

0

Where Can I Find Macroeconomic Dataset For Ml

where can i find macroeconomic dataset for ml, i looked at kaggle and couldnt find anythingh promisinf

submitted by /u/Clean_Elevator_2247
[link] [comments]

0

Most Useful Datasets For Analyzing Residential Real Estate Sales

I’m looking for the most useful datasets for analyzing residential real estate sales to help determine property values. Ideally, I’d like datasets that include:

Historical sales prices
Property characteristics (square footage, lot size, bedrooms/bathrooms, etc.)
Location data (ZIP code, neighborhood, proximity to amenities)
Market trends (price appreciation, days on market, supply/demand)
Tax assessments and mortgage data (if available)

I’m especially interested in open/public datasets but would also appreciate recommendations on high-quality paid sources. Bonus points for datasets that provide nationwide coverage in the U.S. or strong local-level granularity (county or ZIP code level).

submitted by /u/Ykohn
[link] [comments]

0

Would There Be A Way To Automate Data Creation With Huggingface+ MCP Servers? Someone Already Working On This?

I’m curious if anyone has explored using Hugging Face datasets + MCP servers to automate data generation and augmentation. The idea is to leverage AI agents that interact with MCP-connected tools to synthesize or transform datasets dynamically. Has anyone tried this? What challenges do you see in scaling such a setup? Would love to hear if someone is already building something similar!

submitted by /u/metalvendetta
[link] [comments]

0

Computer Science University In USA For Masters

Hello, I’m an international student from India, currently studying in the USA. I’m living in a small town where everything is quite affordable, including tuition fees and living costs. However, the town doesn’t have many companies offering internship opportunities, and the university’s ranking in computer science is not very high.

I’m now looking to transfer to a different university that is still affordable but located near a larger city, where I can find better opportunities for internships in the computer science field. Ideally, I’m looking for a school with a good reputation in computer science and a tuition fee range of $4,000 to $5,000 per semester.

If anyone has any recommendations or knows of any universities that fit this criteria, I would greatly appreciate it!

submitted by /u/Haunting-Low-5269
[link] [comments]

0

Data Set For Econometrics Project!!!

Hello, I have a project due tonight and I have not started yet, but our project requires a data set that has at least 50 observations on three variables. Professor says we get bonus points for a creative/unique data set that we find, so I am hereby asking for help for some creative datasets that yall might know 🙂

submitted by /u/External_Ad_5677
[link] [comments]

0

Desperately Need Help Finding A Dataset With Lots Of Columns

I need a larger dataset to practice on for my internship. I worked on a smaller dataset but I’ve been asked to find a bigger dataset. So I need a bigger dataset with lots of columns so I can make a plenty of dimensions etc.

I’ve looked at so many datasets and it’s not even close to column M. I need to make a lot of dimensions and need something that goes upto at least Y or Z. that’s like 25 columns at least. Can y’all share a bigger dataset you’ve come across. Or where can I find something like that. I’ve tried kaggle and looked at so many datasets everywhere, but there aren’t enough columns. Is there a way to filter your search to look for a dataset with a certain number of columns on kaggle?

If you happen to know/find a dataset with a lot of columns, please, please let me know!!

submitted by /u/tellswe
[link] [comments]

0

Need A Good Dataset For Machine Learning

I need to find a good dataset for a university project but we arent allowed to use Kaggle.

any leads?

submitted by /u/sleepyy_turtle
[link] [comments]

0

In Search Of Datasets For Meal/diet Plan Generator Application

I am working on an application that allows users to create customised diet plan (age, diet preferences, diseases etc.) for my university project and looking for datasets that could be useful for this purpose. I have found one that provides a nutritional breakdown of individual food ingredients, but haven’t had any luck related to meal plan generation.

submitted by /u/Shoddy_Ad7179
[link] [comments]

0

YouTube Channels With Over 1M Subscribers

Hello, is anyone here have a huge dataset of YouTube channel and their subscribers count?

submitted by /u/Playful-Total9092
[link] [comments]

0

I Need A Dataset Of Online E-commerce Sales And Returns

Are there any known e-commerce datasets about sales and product returns? Any help is immensely appreciated

submitted by /u/psyduckscar4
[link] [comments]

0

Category: Datatards

How Do You Stay Sane While Working With Messy Or Incomplete Data?

Any Databases To Pull A Simple Random Sample Of US Addresses?

Want: Video Footage Of A Roulette Wheel Spinning With Ball

Sources For Weapons Impact Data In War

Looking For A Good Phishing Email Dataset, The Latest The Better

Need Customer Feedback / Support Ticket Dataset That Also Shows The Unmet Needs Of The Customer.

I Need Help Finishing My Stats Project

Does Anyone Have Volvo GTT Dataset ?

Datasets/where To Look For Wide Range Of Company Data

Request For MRI Brain Tumor Images (Meningioma, Pituitary, Glioma)

Life Expectancy Dataset 1960 To Present

Need Help Creating A Research Question

Hospitalization and Childbirth, 1995–1996 to 2023-2024 — Supplementary Statistics

Help Me With My Data Collection On Vehicle Data Using Simulator.

Is There Any Recommended Datasets I Could Possibly Use For School Project

Web Server Logs – 4,091,155 Requests, 27,061 IP Addresses, 3,441 User-agent Strings (march 2019)

The Kaggle Dataset Has Over 10,000 Data Points On Question-and-answer Topics.

Web Browser Useragent And Activity Tracking Data – 600,000,000 Web Traffic Records

LogHub – A Large Collection Of System Log Datasets For AI-driven Log Analytics

Need Help‼️ Urgently Looking For An Accurate Indian Stock Market Dataset With Buy/Sell Ratios 🚨

Bitter DB A Database Of Bitter Hings

Where Can I Find Macroeconomic Dataset For Ml

Most Useful Datasets For Analyzing Residential Real Estate Sales

Would There Be A Way To Automate Data Creation With Huggingface+ MCP Servers? Someone Already Working On This?

Computer Science University In USA For Masters

Data Set For Econometrics Project!!!

Desperately Need Help Finding A Dataset With Lots Of Columns

Need A Good Dataset For Machine Learning

In Search Of Datasets For Meal/diet Plan Generator Application

YouTube Channels With Over 1M Subscribers

I Need A Dataset Of Online E-commerce Sales And Returns

Recent Posts

Recent Comments

18+ Content

Hospitalization and Childbirth, 1995–1996 to 2023-2024 — Supplementary Statistics

Recent Posts

Recent Comments