Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

I Made A Recursive Discord Image Scraper

The github description says it all really, it’s a recursive search algorithm that generates and validates Discord media links until it finds a real one – an image scraper.

Collaboration is kindly accepted, just open an issue or pull request. Using the program is pretty fun it looks like you’re hacking if you change the console to green text ($ COLOR 0a before /path/to/file)

Be aware that it takes a WHILE to get an image, 68 million links were searched before we got our first one. The speed of searching the links is what we aim to focus our efforts on going into future updates.

submitted by /u/Obvious-Luck-6548
[link] [comments]

Any Datasets For Employee Emails Or Exchanges?

Hello! I’m trying to train an RNN to classify employee responses as negative or positive. I initially trained it on the yelp polarity dataset, and while the test accuracy was high it doesn’t seem to be suitable to what I’m looking for. The main issue is that it classifies negative interactions as positive.

My guess is the more formal nature of these conversations makes them look more neutral compared to negative yelp user reviews. I’ve searched quite a bit online but I don’t seem to find any datasets that match what I need.

submitted by /u/UnfriendlyMOAB
[link] [comments]

How To Find Phishing/spam/safe Email Dataset

Hey, for a work project, i’m looking for an email dataset that contains phishing emails, spam emails, and “safe” emails, any Idea where to find it? The main problem is that all th dataset I found confuse phishing and spam (spam: unwated email, phishing: malicious mail)

Thanks for your help!

submitted by /u/EstebanbanC
[link] [comments]

Searchable Online Database That Contains Prevalence Of Different Health Conditions In The US?

Hi, I’m looking for a dataset that includes prevalence of health conditions in the US. Sort of A to Z of health conditions, not just most fatal ones. So it would include not only heart disease and various cancers but also hernias and hemorrhoids and the flu (random examples). Even better if prevalence can be organized by age groups.

Prevalence rates for individual conditions, of course, is fairly easy to find online. The problem is finding a database that allows me to compare prevalence rates. For instance, to make a list of the top 1000 most prevalent health conditions in the US.

I’ve looked at CDC and healthdata.org but wasn’t able to find such info. Wonder if some insurance companies have this information…..

Would much appreciate any help or suggestions.

submitted by /u/big-enchilada
[link] [comments]

Cryptocurrency Datasets TOP 100 For The Last 8 Years

Hello,

I am currently working on a website to indicate if we are in an altcoin season or not. I wanted to back to test my indicators. However, I would need the top 100 (or 50 will do) cryptocurrencies by market cap everyday for the last 8 years.

I can get this data if I use the CoinGecko API but that would require me to pay 700 dollars lmao.

Does anyone have this data? I tried Kaggle and couldn’t find anything.

Also my website: https://www.thealtsignal.com

Thanks!

submitted by /u/BugSpatula0
[link] [comments]

Input From Community On What Analytics And Metrics They Would Be Interested To See With Nationwide Property Data

Hey everyone!

My friend and I spent the last year collecting parcel information for nearly the entire United States—roughly 170 million properties—across over 3,000 counties. We’re launching a free analytics feature and would love to get your thoughts on what you’d like to see.

You can check out our attribute list here: docs.realie.ai/api-reference/property-data. We’re also working on using machine learning to build out an AVM, but we’d like the analytics feature to be more robust before we launch it.

Right now, we’re planning quarterly data updates, potentially moving to monthly updates if there’s enough interest. Our analytics can be filtered at the state, county, or even town level (for example: Baltimore Analytics).

Let us know in the comments if there are specific features, metrics, or insights you’d like us to include!

submitted by /u/Equivalent-Size3252
[link] [comments]

Searching For Dataset On Total Fertility Rate In US Counties, 2012-24

A recent report evaluates the relationship between the TFR (total fertility rate) and the political tendency across time and counties. I am trying to replicate the statistical analysis, but I have not been able to find the data for the Total Fertility Rate (TFR is not the General Fertility Rate). I guess it comes from CDC, but my multiple searches have not been successful (link1, link2, link3).

Any idea where to find the TFR data at county level since 2012? If not, at least for the General Fertility Rate?

submitted by /u/RiGonz
[link] [comments]

Need Help Regarding The Project And Its Data

I am makin personalised learning pathways project , for that i needed data like users preferred learning style, exam scores, and things like that , but i didn’t find any (kaggle, uci etc)after searching it , so i made my synthetic data, so is it okay to use the synthetic data, when changing it’s distribution from uniform to normal it’s prediction accuracy decrease, if it is not okay then please help me with some data for the same

submitted by /u/harsh1004
[link] [comments]

Real Interest Rates For Non-US Countries

The US has some pretty great data on TIPs bonds https://fred.stlouisfed.org/series/DFII10 and inflation expectations can be calculated from this by subtracting nominal interest rates from this. Where can I find similar data for other countries?

I know the UK, Germany, Japan, etc all have inflation protected bonds but I can’t seem to find the associated data with these. Can anyone point me in the right direction?

submitted by /u/itsmyfirstday69
[link] [comments]

Open Source, Cross Platform, Lightweight – CSV File Viewer & Editor

I’m launching Nanocell-csv, an open source, cross platform, lightweight, CSV file viewer & editor.
[self-promotion]

As many of this community’s dataset sources seem to be CSV files, I thought it would find its target audience here.

Looking for feedback to grow the project!

I’d also be curious to know your workflow when receiving a new CSV file. What is the first tool you use to open it? what for?

submitted by /u/cbjr77
[link] [comments]

What’s Your Biggest Challenge With Searching The Web For Data?

Hi everyone! đź‘‹

I’m conducting research to better understand the pain points devs face when it comes to searching and querying data from the web. Whether you’re building scrapers, automating tasks, or simply trying to get structured data from unstructured sources, I want to learn from you!

If you have a minute, please share your thoughts on any of these questions:

What kind of data do you often need to extract or query from the web? Are there specific challenges or frustrations you encounter (e.g., anti-bot measures, unstructured formats, incomplete data)? How do you currently handle these challenges (e.g., tools, frameworks, or DIY solutions)? What features or tools would make your life easier when it comes to querying and automating data retrieval?

This is purely for research purposes—no promotions, no sales pitch. Your insights will help shape how developers approach these problems in the future.

I’m also a dev and have some thoughts on this but want to hear other perspectives as well.

submitted by /u/spacespacespapce
[link] [comments]

I Need Help Finding Data Sets In Spanish

Hi, I’m thinking about making my dissertation in a topic that requieres data sets about comments or posts in social media that are either sexist or not. I’ve found some examples in english, but the problem is that I need data sets in spanish (I know that i can just take a ML model and translate them to spanish, but i’d like to know if anyone has any idea of where to find them) so far i’ve only found one and it has very few entries. If anyone can help me i’d really apreciate it. T-T

submitted by /u/valent_iina
[link] [comments]

Are There Any Substance Abuse Usage Dataset

Hey folks! I’m required to fetch some data (textual) on “conversations”, and “messages” on substance use.
e.g. “Smoking crack hits me with an intense wave of euphoria.”, “I enjoy doing cocaine”, etc.

I’ve been trying to find such data but have failed so far, what I’ve discovered mostly relates to datasets on an individual addict or drug being used, but none of them matches the requirement above.

I would really appreciate it if you guys could suggest a dataset from any repository, kaggle/hugging face, or anything else that could help me.

submitted by /u/Kian5658
[link] [comments]

Looking For Global Political Tension Data

Hi all, I’m doing a research project on global conflicts and in particular the cyber impact. I am looking for a dataset which I can use to create a matrix of which countries have ‘political issues’ with each other.
I can find a lot of information on the major conflicts, but getting outside the top 10 gets a bit challenging.

Has anyone seen any data I could use to summarise global political tensions by country?

submitted by /u/fred_t_d
[link] [comments]

Search For A Cool Dataset For Learning Analysis With Python

Hey, I have to write a paper about applied data analysis and for that I am searching for a interesting dataset. I interestingliy can not think of any data by myself, I tried random Google Searches but didn’t find any cool data for now. I think the one prequesite my professor set (he wants to learn something new from the analysis) made me weirdly judge all datasets as ‘unworthy’ if you know what I mean.

Are there any cool datasets from which my professor with background in datascience can learn? (optionally if would be nice if they where fun to work with and not a litteral pain to normalize but yeah just optionally xD)

submitted by /u/matth_l
[link] [comments]

Where Can I Find A Company’s Financial Data FOR FREE? (if It’s Legally Possible)

I’m trying my best to find a company’s financial data for my research’s financial statements for Profit and Loss, Cashflow Statement, and Balance Sheet. I already found one, but it requires me to pay them $100 first. I’m just curious if there’s any website you can offer me to not spend that big (or maybe get it for free) for a company’s financial data. Thanks…

submitted by /u/C0deit-Michael
[link] [comments]

Looking For A YOLO/Darknet-compatible Dataset That Can Be Used To Scan An Image/video And Identify Specific Body Parts

Hey all,

I’m working on a number of devices where I’d like to use machine learning and live video to identify specific parts of the human body.

This is a sex-positive project, and therefore rather than have a classifier that censors anything it thinks might be nudity, I’m looking for a dataset that will enable me to identify nipples, penises, vaginas, and other potentially erogenous zones on people of all genders, colours, and body shapes.

It feels to me that it should be possible, but I’m new to creating/training models and not sure where to start, so figure standing on the shoulders of others is probably a good place!

submitted by /u/No-Art1323
[link] [comments]

Song Dataset With Mood/Vibe Parameters

I have an idea for a personal project and I could use some help finding a dataset.

Project:

I would like to make a playlist generator where I can specify different moods at different points of time in the paylist. So something along the lines of 1h Chill, 1h Pop, 1h Dance. Obviously I would like mush more refinement that I showed in the example. My thought was that I could find paths between different song types so that the genre transitions are smooth.

Maybe this already exists?

Dataset:

What I am looking for is a long list dataset with obviously the main parameters (name, artist, year etc) but also things like popularity, danceability, singablity, nostalgia factor, high vs low energy, happiness, tempo, and more.

Does a dataset like this exist? I also thought it could be possible to use sentiment analysis on the lyrics to generate some of these parameters.

Let me know if you have any ideas

submitted by /u/hindenboat
[link] [comments]

Is There A Dataset Listing Death/birth Dates?

Is there a dataset that contains both the birth and death dates of real people?

This may be a bit of a morbid topic, but I’ve been talking to my wife about people dying close to their birthdays, and since I tend to do silly projects as a way to keep my knowledge alive, I figured an analysis of this data might tell us something (preferably that there’s no correlation lol).

However, all government databases I found only provide aggregated data, such as death and birth rates, unfortunately. I know this may involve some data security and privacy concerns, but I would really just need these two linked dates to do the analysis, no names or anything.

If anyone has access to a structure like this, or perhaps an API that can make this data available, I would be very grateful. I promise to bring this complete study to reddit as soon as I finish it.

submitted by /u/alchamiwa
[link] [comments]

Dataset With Categorical And Numerical Variables Both

Hi, I’m looking for a dataset which at least three numerical variables and two categorical variables. It should be easy enough to look for, but I’m having trouble finding any which match the requirements. Any suggestions for resources where I can look?

The dataset is for a project, we aren’t allowed to use in built or made up data, or from places like kaggle etc.

submitted by /u/viridiancityy
[link] [comments]