Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Datasets For Cleaning Practice – Specific Topics

Hi, I hope someone here can help – I am looking for a messy dataset for my assignment and I am hitting a wall.
Not as simple as just any set – the assignment is to find a clean one and a messy one and then join those on a common variable, then perform analysis. So these need to be somewhat related topics wise and include a common variable.
i would like to work on the subject on gender representation and I already have several clean sets with general demographic info but i just cannot find anything messy enough (this is still a beginner level so they don’t just want me to do standardisation etc but I need something that includes observations as variables, missing data etc). I was hoping to find something on gender representation in politics by country to then join to my clean sets by country variable. Any help much much much appreciated!!

submitted by /u/MagdaMc85
[link] [comments]

Looking For A Financial Statements Data Set

Hello! As a training project, I want to build several demo dashboards:
– financial statements: profit and loss, cashflow, balance sheet;
– sales report.
In this regard, I’m looking for a high-quality data set. If you have data that you can provide for my purposes or information about sources where it can be found or how it can be generated, I’ll be grateful.

submitted by /u/According_Scheme_553
[link] [comments]

Introducing CCI: A High-Quality Chinese Internet Language Dataset For AI

Hello r/datasets Community,

We’re excited to introduce the Chinese Corpora Internet (CCI) dataset v1.0.0, a high-quality Chinese internet language dataset, meticulously developed by BAAI with the support of leading institutions and tech partners. CCI is designed to be the cornerstone of AI research requiring high-quality Chinese language data.

CCI’s standout features:

Vast Scale: CCI offers an impressive 104GB of data, providing a broad spectrum of linguistic information. Time Span: The dataset encompasses over two decades of data, from January 2001 to November 2023, offering historical depth and contemporary relevance. Quality Sources: Data is sourced from trusted and authoritative Chinese internet platforms, ensuring high fidelity and relevance. Rigorous Processing: CCI has undergone extensive cleaning, deduplication, and quality checks to ensure the highest standards of data integrity. Safe and Reliable: With a focus on safety and reliability, CCI has been filtered through advanced techniques to remove any sensitive or inappropriate content. Benchmark Filtering: Unique to CCI, we’ve implemented stringent checks against mainstream Chinese benchmark datasets to prevent “teaching to the test” in model training.

Download CCI and join us in shaping the future of AI:

BAAI Open Data Repository: https://data.baai.ac.cn/details/BAAI-CCI HuggingFace: https://huggingface.co/datasets/BAAI/CCI-Data

We’re eager to see the innovative applications and research that will emerge from the community’s use of CCI. Your participation and feedback are crucial to the continuous improvement of this dataset.

Cheers,

The BAAI Team

Supported by: CSAC, Beijing Municipal Cyberspace Administration, Beijing Municipal Science & Technology Commission, Zhongguancun Administrative Committee, Haidian District Government, our tech partners TRS and Wenge.

submitted by /u/lukai-baai
[link] [comments]

Where To Start For Offensive Cybersecurity Dataset?

Looking to create a offensive and defensive cybersecurity techniques dataset. The dataset would be used for a class project for teaching and refining an AI model chat responses. Can anyone recommend some sources and what a row/column would look like? I know the preferred method is quantitative data so how would this work with qualitative data? Also, any recommendations for web scraping application besides me developing a script? Thanks

submitted by /u/Aisechopeful
[link] [comments]

How Do I Go About Selling My Personal Data?

Hey guys,

Quick question – how does an individual go about selling their personal data at a strictly individual level (e.g. browsing history, shopping habits, location etc.)

Also what data can be sold at this level?

Thinking of starting a super user friendly app for individuals to sell their data and make a few extra $’s per month.

submitted by /u/AsadExec
[link] [comments]

Need To Improve My Skills And Need A Data Set For Research

It would be preferred if I had a data set looking at higher education, community education, emergencies, emergency medical technicians, films, and or anything to do with social gerontology. I am supposed to be improving my SAS, Stata and Spss skills. I’m supposed to be working with data for my research project but the data I have is either to big for me to be able to open, I can’t be approved to use it, or isn’t a big enough dataset. I am trying to get better with using datasets but I need ones that are free to use. Please save me from the failure that is writing my own dataset.

submitted by /u/Rajah_1994
[link] [comments]

Free Platform For Finding Any Data Using LLM

Hi Everyone,

I created a platform which has aggregated and stored any data on web, and has an LLM Chat Assistant to help you find data best fitted for your use case.

I would be happy if you have any feedback to share, and let me know how that would compare to more traditional methods of finding data through a search bar.

Feel free to use it below and let me know :), hope it helps:

https://www.cognidex.net/

submitted by /u/XhoniShollaj
[link] [comments]

Looking For Specific Data Set For Multiple Regression

I need to find a data set that has variables that lend themselves to analysis by some form of multiple regression; it must have at least 15 cases per predictor; it must have at least 3 predictor variables; it should have both quantitative and categorical predictors; and it should have at least one quantitative dependent variable.

Is there a site where I can filter all these specifics?

submitted by /u/kevinalways
[link] [comments]

Data On EU Countries. Something Other Than UNdata And Eurostat?

Hello.
My goal is to find certain statistical information about different countries of european union(related to things like employment, crime,cost of living,immigration, social nets etc.), however im quite new to this and i have no idea where to look.
I have found two major sources of data: eurostat and UNdata, but i was wondering if there are some other sources out there that i couldn’t find on google?

submitted by /u/420-big-chungus-kean
[link] [comments]

Is Searching For Datasets The Hardest Part? Looking For CSV Paired Real-world Dataset So I Can Run Some Python Analysis.

Hello, so I’ve been searching for over an hour on various repositories. I’m looking for a dataset that has a before and after numerical results. It can be test grades before and after intervention. Blood pressure before and after intervention etc… anything like that. I feel like I just don’t know how to do properly search for this.

submitted by /u/Enochwel
[link] [comments]

Non Aggregated Individual Level Dataset Needed Urgently

Hi all,

I need a non aggregated dataset, individual level, non synthesized, in english and from a credible source. A combination of qualitative and quantitative data.

This is for an assignment and the lecturer is not amenable to any deviations from the above.

I thought I could use census data but a lot of the data I found is aggregated. Surveys are often simulated.

Any help at all would be appreciated. Thank you!

submitted by /u/reader20not
[link] [comments]

Looking For Survey With +100 Questions

Hey guys,

I’m looking for a finished survey with over 100 questions. It doesn’t have to have a lot of participants, but the more, the better of course. It’s for my thesis in mathematics. There is a new theory we are trying to use in practice. So I don’t care what field it is in or how old it is. Any hint or Dataset would be appreciated.

Thanks

submitted by /u/juggerjaxen
[link] [comments]

Looking For Australian Stock Dataset

Hello,

I am looking for Australian Stock Market dataset for all companies that’s for a client project. They provided me the link of Yahoo finance website as they need stock company data from there. At first thought of scraping but it may change and I need dynamic data. Is there any API for all the company stock data of Australia?

submitted by /u/Turbulent_Setting_59
[link] [comments]

Looking For Australian Stock Dataset

Hello,

I am looking for Australian Stock Market dataset for all companies that’s for a client project. They provided me the link of Yahoo finance website as they need stock company data from there. At first thought of scraping but it may change and I need dynamic data. Is there any API for all the company stock data of Australia?

submitted by /u/Turbulent_Setting_59
[link] [comments]

I Need A List Of Offensive Words And Slurs

So, the thing is, I want a little bit of code that will check what’s the user is inputting their name as the player character. And, if it matches an offensive word, the game will throw a secret easter egg commenting funny things and then basically saying, you can’t do that bro.
I have the code set up and working. But the thing is it’s so hard to just manually inputting everything in i can think of.
i just need a list of those words. I found a list on the internet. but the sad thing is… well…….. according to the list, ‘arab’ is an offensive word. so is ‘black’ or ‘whites’
i just need a good list, with solid words, that will NOT cause any controversies.

submitted by /u/INGENAREL
[link] [comments]