Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

I Am Looking For Two Different Picture Datasets For A Neural Network Classification Task. One That Is Very Context-dependent And One That Isn’t.

For an assignment, I would like to compare a neural network with a CNN base and a neural network with a Visual Transformer (ViT) base on two different datasets. The idea is that for one dataset context is really important and for the other, it’s less so. The hypothesis is that ViT will perform better when context is important and CNN when it’s less important. It’s kinda hard to define context in a picture but one example of pictures with less context might be these facial expressions (https://www.kaggle.com/datasets/juniorbueno/rating-opencv-emotion-images) and an example where context is more important would be these emotion generic pictures (https://www.kaggle.com/datasets/sanidhyak/human-face-emotions). This combo seems perfect but the second dataset is too small. Do you know any datasets that capture the same idea but are larger?

submitted by /u/Limp_Award2427
[link] [comments]

HELP I Need Yearly Precipitation/temperature Data By Country (focus On The EU Member States) For Approximately Last 10 Years

I am doing an important school assignment and I’m struggling to find the data. I thought weather data would be accessible and easy to find, especially for Europe, but apparently not. I need “raw” data, that is, data not already summed up for the whole decade, but rather data for each year I can do calculations with. Any help would be greatly appreciated. Thanks!

submitted by /u/necichan
[link] [comments]

Datasets For Cleaning Practice – Specific Topics

Hi, I hope someone here can help – I am looking for a messy dataset for my assignment and I am hitting a wall.
Not as simple as just any set – the assignment is to find a clean one and a messy one and then join those on a common variable, then perform analysis. So these need to be somewhat related topics wise and include a common variable.
i would like to work on the subject on gender representation and I already have several clean sets with general demographic info but i just cannot find anything messy enough (this is still a beginner level so they don’t just want me to do standardisation etc but I need something that includes observations as variables, missing data etc). I was hoping to find something on gender representation in politics by country to then join to my clean sets by country variable. Any help much much much appreciated!!

submitted by /u/MagdaMc85
[link] [comments]

Looking For A Financial Statements Data Set

Hello! As a training project, I want to build several demo dashboards:
– financial statements: profit and loss, cashflow, balance sheet;
– sales report.
In this regard, I’m looking for a high-quality data set. If you have data that you can provide for my purposes or information about sources where it can be found or how it can be generated, I’ll be grateful.

submitted by /u/According_Scheme_553
[link] [comments]

Introducing CCI: A High-Quality Chinese Internet Language Dataset For AI

Hello r/datasets Community,

We’re excited to introduce the Chinese Corpora Internet (CCI) dataset v1.0.0, a high-quality Chinese internet language dataset, meticulously developed by BAAI with the support of leading institutions and tech partners. CCI is designed to be the cornerstone of AI research requiring high-quality Chinese language data.

CCI’s standout features:

Vast Scale: CCI offers an impressive 104GB of data, providing a broad spectrum of linguistic information. Time Span: The dataset encompasses over two decades of data, from January 2001 to November 2023, offering historical depth and contemporary relevance. Quality Sources: Data is sourced from trusted and authoritative Chinese internet platforms, ensuring high fidelity and relevance. Rigorous Processing: CCI has undergone extensive cleaning, deduplication, and quality checks to ensure the highest standards of data integrity. Safe and Reliable: With a focus on safety and reliability, CCI has been filtered through advanced techniques to remove any sensitive or inappropriate content. Benchmark Filtering: Unique to CCI, we’ve implemented stringent checks against mainstream Chinese benchmark datasets to prevent “teaching to the test” in model training.

Download CCI and join us in shaping the future of AI:

BAAI Open Data Repository: https://data.baai.ac.cn/details/BAAI-CCI HuggingFace: https://huggingface.co/datasets/BAAI/CCI-Data

We’re eager to see the innovative applications and research that will emerge from the community’s use of CCI. Your participation and feedback are crucial to the continuous improvement of this dataset.

Cheers,

The BAAI Team

Supported by: CSAC, Beijing Municipal Cyberspace Administration, Beijing Municipal Science & Technology Commission, Zhongguancun Administrative Committee, Haidian District Government, our tech partners TRS and Wenge.

submitted by /u/lukai-baai
[link] [comments]

Where To Start For Offensive Cybersecurity Dataset?

Looking to create a offensive and defensive cybersecurity techniques dataset. The dataset would be used for a class project for teaching and refining an AI model chat responses. Can anyone recommend some sources and what a row/column would look like? I know the preferred method is quantitative data so how would this work with qualitative data? Also, any recommendations for web scraping application besides me developing a script? Thanks

submitted by /u/Aisechopeful
[link] [comments]

How Do I Go About Selling My Personal Data?

Hey guys,

Quick question – how does an individual go about selling their personal data at a strictly individual level (e.g. browsing history, shopping habits, location etc.)

Also what data can be sold at this level?

Thinking of starting a super user friendly app for individuals to sell their data and make a few extra $’s per month.

submitted by /u/AsadExec
[link] [comments]

Need To Improve My Skills And Need A Data Set For Research

It would be preferred if I had a data set looking at higher education, community education, emergencies, emergency medical technicians, films, and or anything to do with social gerontology. I am supposed to be improving my SAS, Stata and Spss skills. I’m supposed to be working with data for my research project but the data I have is either to big for me to be able to open, I can’t be approved to use it, or isn’t a big enough dataset. I am trying to get better with using datasets but I need ones that are free to use. Please save me from the failure that is writing my own dataset.

submitted by /u/Rajah_1994
[link] [comments]

Free Platform For Finding Any Data Using LLM

Hi Everyone,

I created a platform which has aggregated and stored any data on web, and has an LLM Chat Assistant to help you find data best fitted for your use case.

I would be happy if you have any feedback to share, and let me know how that would compare to more traditional methods of finding data through a search bar.

Feel free to use it below and let me know :), hope it helps:

https://www.cognidex.net/

submitted by /u/XhoniShollaj
[link] [comments]

Looking For Specific Data Set For Multiple Regression

I need to find a data set that has variables that lend themselves to analysis by some form of multiple regression; it must have at least 15 cases per predictor; it must have at least 3 predictor variables; it should have both quantitative and categorical predictors; and it should have at least one quantitative dependent variable.

Is there a site where I can filter all these specifics?

submitted by /u/kevinalways
[link] [comments]

Data On EU Countries. Something Other Than UNdata And Eurostat?

Hello.
My goal is to find certain statistical information about different countries of european union(related to things like employment, crime,cost of living,immigration, social nets etc.), however im quite new to this and i have no idea where to look.
I have found two major sources of data: eurostat and UNdata, but i was wondering if there are some other sources out there that i couldn’t find on google?

submitted by /u/420-big-chungus-kean
[link] [comments]

Is Searching For Datasets The Hardest Part? Looking For CSV Paired Real-world Dataset So I Can Run Some Python Analysis.

Hello, so I’ve been searching for over an hour on various repositories. I’m looking for a dataset that has a before and after numerical results. It can be test grades before and after intervention. Blood pressure before and after intervention etc… anything like that. I feel like I just don’t know how to do properly search for this.

submitted by /u/Enochwel
[link] [comments]

Non Aggregated Individual Level Dataset Needed Urgently

Hi all,

I need a non aggregated dataset, individual level, non synthesized, in english and from a credible source. A combination of qualitative and quantitative data.

This is for an assignment and the lecturer is not amenable to any deviations from the above.

I thought I could use census data but a lot of the data I found is aggregated. Surveys are often simulated.

Any help at all would be appreciated. Thank you!

submitted by /u/reader20not
[link] [comments]