Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Looking For A Dataset Of Coffees And Flavour Notes

Hey everyone!

Im building a Swift app that essentially recommends users the next coffee they should buy based on how they evaluate previous coffees (E.g how much acidity they like, or if they prefer chocolate notes)

What kind of dataset might I need for this? Do you have any idea where to find this?

Thanks for any help because I’m early in my development journey!

submitted by /u/CodesMacabre
[link] [comments]

Searching For Datasets By Number Of Records?

Hi all! This is my first time making a reddit post after looking around and not finding an answer. I Have a final assignment for a data analytics course that requires finding datasets with a specific number of entries. I have to find two datasets with a similar topic that have between 7,000 and 10,000 entries (rows) to analyze. Any recommendations on where to look for datasets where I could filter my search for the number of records included?

submitted by /u/Unable-Date4212
[link] [comments]

Looking For Information About How Much Each Productive Area Contributes To Country’s GDP

Hi, I’m currently working on a project for myself where I’m trying to get insights from different country’s aspects throughout the years: poverty, GDP, pop.

Right know I’m looking for a dataset that can provide which are the main activities the country gets its GDP from – example: Mining, agriculture, petrol production, industries, construction, fishing, etc.

Do you know of any reliable sources where I can get these? I know each individual country may have it’s own public information, but it is unstructured data and looking for it for all the countries in the different years (lets say the past 30) it’s more than 6180 individual searches I’d have to do, which is kind of impossible

submitted by /u/PanchoZansa
[link] [comments]

Used This Dataset For A Paper, But Cannot Find The Source

Hello! I am using a dataset from Kaggle.com, one that deals with credit card fraud. I unknowingly used this dataset

https://www.kaggle.com/datasets/nelgiriyewithana/credit-card-fraud-detection-dataset-2023/data

And I cannot find a source for this one specifically anywhere.
This one seems to be based off the popular one from here:
https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud/data

Has anyone worked with the first one?

submitted by /u/WaffleBoi014
[link] [comments]

I Am Looking For Two Different Picture Datasets For A Neural Network Classification Task. One That Is Very Context-dependent And One That Isn’t.

For an assignment, I would like to compare a neural network with a CNN base and a neural network with a Visual Transformer (ViT) base on two different datasets. The idea is that for one dataset context is really important and for the other, it’s less so. The hypothesis is that ViT will perform better when context is important and CNN when it’s less important. It’s kinda hard to define context in a picture but one example of pictures with less context might be these facial expressions (https://www.kaggle.com/datasets/juniorbueno/rating-opencv-emotion-images) and an example where context is more important would be these emotion generic pictures (https://www.kaggle.com/datasets/sanidhyak/human-face-emotions). This combo seems perfect but the second dataset is too small. Do you know any datasets that capture the same idea but are larger?

submitted by /u/Limp_Award2427
[link] [comments]

HELP I Need Yearly Precipitation/temperature Data By Country (focus On The EU Member States) For Approximately Last 10 Years

I am doing an important school assignment and I’m struggling to find the data. I thought weather data would be accessible and easy to find, especially for Europe, but apparently not. I need “raw” data, that is, data not already summed up for the whole decade, but rather data for each year I can do calculations with. Any help would be greatly appreciated. Thanks!

submitted by /u/necichan
[link] [comments]

Datasets For Cleaning Practice – Specific Topics

Hi, I hope someone here can help – I am looking for a messy dataset for my assignment and I am hitting a wall.
Not as simple as just any set – the assignment is to find a clean one and a messy one and then join those on a common variable, then perform analysis. So these need to be somewhat related topics wise and include a common variable.
i would like to work on the subject on gender representation and I already have several clean sets with general demographic info but i just cannot find anything messy enough (this is still a beginner level so they don’t just want me to do standardisation etc but I need something that includes observations as variables, missing data etc). I was hoping to find something on gender representation in politics by country to then join to my clean sets by country variable. Any help much much much appreciated!!

submitted by /u/MagdaMc85
[link] [comments]

Looking For A Financial Statements Data Set

Hello! As a training project, I want to build several demo dashboards:
– financial statements: profit and loss, cashflow, balance sheet;
– sales report.
In this regard, I’m looking for a high-quality data set. If you have data that you can provide for my purposes or information about sources where it can be found or how it can be generated, I’ll be grateful.

submitted by /u/According_Scheme_553
[link] [comments]

Introducing CCI: A High-Quality Chinese Internet Language Dataset For AI

Hello r/datasets Community,

We’re excited to introduce the Chinese Corpora Internet (CCI) dataset v1.0.0, a high-quality Chinese internet language dataset, meticulously developed by BAAI with the support of leading institutions and tech partners. CCI is designed to be the cornerstone of AI research requiring high-quality Chinese language data.

CCI’s standout features:

Vast Scale: CCI offers an impressive 104GB of data, providing a broad spectrum of linguistic information. Time Span: The dataset encompasses over two decades of data, from January 2001 to November 2023, offering historical depth and contemporary relevance. Quality Sources: Data is sourced from trusted and authoritative Chinese internet platforms, ensuring high fidelity and relevance. Rigorous Processing: CCI has undergone extensive cleaning, deduplication, and quality checks to ensure the highest standards of data integrity. Safe and Reliable: With a focus on safety and reliability, CCI has been filtered through advanced techniques to remove any sensitive or inappropriate content. Benchmark Filtering: Unique to CCI, we’ve implemented stringent checks against mainstream Chinese benchmark datasets to prevent “teaching to the test” in model training.

Download CCI and join us in shaping the future of AI:

BAAI Open Data Repository: https://data.baai.ac.cn/details/BAAI-CCI HuggingFace: https://huggingface.co/datasets/BAAI/CCI-Data

We’re eager to see the innovative applications and research that will emerge from the community’s use of CCI. Your participation and feedback are crucial to the continuous improvement of this dataset.

Cheers,

The BAAI Team

Supported by: CSAC, Beijing Municipal Cyberspace Administration, Beijing Municipal Science & Technology Commission, Zhongguancun Administrative Committee, Haidian District Government, our tech partners TRS and Wenge.

submitted by /u/lukai-baai
[link] [comments]

Where To Start For Offensive Cybersecurity Dataset?

Looking to create a offensive and defensive cybersecurity techniques dataset. The dataset would be used for a class project for teaching and refining an AI model chat responses. Can anyone recommend some sources and what a row/column would look like? I know the preferred method is quantitative data so how would this work with qualitative data? Also, any recommendations for web scraping application besides me developing a script? Thanks

submitted by /u/Aisechopeful
[link] [comments]

How Do I Go About Selling My Personal Data?

Hey guys,

Quick question – how does an individual go about selling their personal data at a strictly individual level (e.g. browsing history, shopping habits, location etc.)

Also what data can be sold at this level?

Thinking of starting a super user friendly app for individuals to sell their data and make a few extra $’s per month.

submitted by /u/AsadExec
[link] [comments]

Need To Improve My Skills And Need A Data Set For Research

It would be preferred if I had a data set looking at higher education, community education, emergencies, emergency medical technicians, films, and or anything to do with social gerontology. I am supposed to be improving my SAS, Stata and Spss skills. I’m supposed to be working with data for my research project but the data I have is either to big for me to be able to open, I can’t be approved to use it, or isn’t a big enough dataset. I am trying to get better with using datasets but I need ones that are free to use. Please save me from the failure that is writing my own dataset.

submitted by /u/Rajah_1994
[link] [comments]