Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

[self-promo] Aviation Safety Network (ASN) Dataset

If you’re looking for reliable and up-to-date information on civil aviation accidents and incidents, the Aviation Safety Network (ASN) dataset may be just what you’re looking for. This global database has information on more than 100,000 accidents and incidents that happened since 1919. You can download the dataset stored in a csv file format for further analysis. The csv file has the following columns:

Date – Date of the accident Type – Type of aircraft registration – Registration of the aircraft operator – Operator of the aircraft fatalities – Number of fatalities location – Location of the accident country – Country of the accident cat – Category of the accident described by ASN year – Year of the accident

It is available for download at the below Github link:
https://github.com/alsonpr/Aviation-Safety-Network-Dataset

submitted by /u/woolly-mamoth
[link] [comments]

Old GAN Site – Thispersondoesnotexist

There used to be a site thispersondoesnotexist.com which generates AI generated (GAN created ) artificial human face images . ( Originally a project done at NVDIA ) . That site has been replaced by another one – https://this-person-does-not-exist.com/en ) which has watermarks etc .

Does anyone have the dataset of those AI generated images ? (1024×1024 px ) . I found a few on kaggle datasets , but they are not of the same resolution of the original images that were generated by the site. If so, can you please share the links to the dataset ?

submitted by /u/pythoslabs
[link] [comments]

How To Represent Large Categorical Data?

I’ve 10 numerical and large datasets where each has 3 generic categories. Each row contains unique data. The end row of each dataset contains the labels for each category. The category is not distinct thus other row may refer to any of the 3 categories.

e.g.

Date Value Category 1/1/2010 1.11111 Alpha 2/1/2010 2.11111 Beta 3/1/2010 2.00009 Alpha 4/1/2010 0.00000 Charlie

But the 10 datasets have different volume of data. E.g. dataset A may have 10K rows, dataset B around 100K, Dataset C 1 million, etc.

I couldn’t process all the data as its too large.

What would be the best way to sample each dataset? I’d like the sample containing a fair representative of the 3 categories.

submitted by /u/runnersgo
[link] [comments]

Cannot Find The I2R Dataset On The Internet.

I have been studying a paper and I noticed that they were using video from a dataset called I2R. I tried searching for this dataset but wasn’t able to find it. Does it have a different name or is this dataset not available publicly?

Specifically, the paper mentioned the WaterSurface dataset, Campus, Waving trees, fountain, curtain and switch light datasets.

I am looking for these datasets to apply a background/foreground separation algorithm.

submitted by /u/Curious_Analyst986
[link] [comments]

Daily Cash And Debt Operations Of The U.S. Treasury 2005-2023

The Daily Treasury Statement (DTS) dataset contains a series of tables showing the daily cash and debt operations of the U.S. Treasury. The data includes operating cash balance, deposits and withdrawals of cash, public debt transactions, federal tax deposits, income tax refunds issued (by check and electronic funds transfer (EFT)), short-term cash investments, and issues and redemptions of securities. All figures are rounded to the nearest million.

Source: https://fiscaldata.treasury.gov/datasets/daily-treasury-statement/deposits-and-withdrawals-of-operating-cash

Explore the data online: https://app.gigasheet.com/spreadsheet/U-S–Treasury-Daily-Cash-Debt–Oct-2005–Apr-2023-/820a1527_c8f0_4ae6_a8a6_b841d327c093

submitted by /u/n1nja5h03s
[link] [comments]

Creating A Network Of Reddit 2013 & 2023

Hello, I am working on a project for graduate school on Reddit as a social network from 2013 to 2023. I am using a previous database of 2,500 subreddits and the top 1000 posts from each from 2013 and I am recollecting it for 2023. I have the uploader, post score, list of all commenters, and their collective score for each commenter in that post

Each node will be a subreddit and the ties will be based on the commenters they have in common. How should I measure this?

Each tie is unidirectional and weighted based on the number of commenters who have ever left comments on both of those subreddits. Each tie is unidirectional and weighted based on the total score of all comments in which the commenter has posted in either subreddit

^ This one sounds more substantial but raises a few concerns such as what if Sub A is a huge subreddit and Sub B is a relatively small subreddit? In Sub A the same commenter has say 2K upvotes but in Sub B they have 300 upvotes, which is more than anyone else on that sub.

submitted by /u/admaciaszek
[link] [comments]

[self-promo] Sales & Ads Data Benchmarks For Shopify

This Shopify Benchmarks data includes a cohort of Shopify store sales, website engagement, and advertising metrics at the store category and subcategory level. This eCommerce data is made up of aggregated sales and web analytics for thousands of Shopify stores globally. Additionally, the dataset includes stores’ total Google Ad spend on search ads, embedded display ads, and more from Google Ad Manager.

Sales and engagement metrics:

Revenue Transaction count Website sessions Website page views

Advertising metrics:

Ad spending Ad clicks Ad views (impressions)

https://app.snowflake.com/marketplace/listing/GZTSZAS2KDH/cybersyn-inc-shopify-sales-advertising-benchmarks-by-category

Free trial available if you have a Snowflake account.

submitted by /u/aiatco2
[link] [comments]

Is This Ethical? Our AI Celebrity Voice Bot App Is Incredibly Realistic, But It Raises Concerns.

We’ve developed an AI voice bot app that can mimic the voices of celebrities like Joe Biden, Donald Trump, Alex Jones, Elon Musk, and Scarlett Johansson. While we’re proud of the technology, we’re also concerned about the potential ethical issues when users have conversations that might trick real people and lead to controversial outcomes.

For example, a user recently shared their experience using the app to imitate Joe Biden in a conversation with a friend. The AI-generated “Biden” endorsed a bizarre policy, like replacing the U.S. national anthem with the “Baby Shark” song. The friend was genuinely convinced they were speaking with the President, and in shock, they shared the call on social media. The story quickly gained traction, leading to heated debates and confusion among people who didn’t realize it was an AI-generated conversation.

These incidents have raised questions about the ethical implications of using such an accurate AI technology to impersonate living people, particularly when it can deceive others and potentially create controversial situations.

As AI enthusiasts, we’re eager to hear your thoughts on the ethical boundaries of AI-generated celebrity voices. We want to ensure that we’re using this technology responsibly and respecting the boundaries of both users and the individuals being impersonated.

TL;DR: Our AI voice bot app can convincingly mimic celebrity voices and has caused controversial situations by fooling real people in conversations, raising ethical concerns.

What are your opinions on the ethical limits of AI when it comes to impersonating living people and potentially creating controversial situations?

submitted by /u/malaika109
[link] [comments]

Census Data On Crime Rates For Major US Cities

Does anyone understand census data enough to help me pull income and crime rates at the zip code level or even at the census tract level? I’m writing a paper on the relationship between crime and income and I want the data to be as granular as possible.

Alternatively, does the Census Bureau have a department to help with these kinds of requests? Thanks!

submitted by /u/Guavifo
[link] [comments]

Google Trends But For Social Media/new Publications?

Greetings all! I’m working on something that requires me to look up a specific search term and track how that term has grown in popularity over time. Google Trends makes something like that very easy, but I’m wondering if there is something that I can use to look at the popularity of a search term over time in social media or in press articles in the same manner (e.g. tweets per day/week on orange juice, or number of articles published daily on telescopes). Thank you all!

submitted by /u/rocket__man_
[link] [comments]

How To Buy/request Data From The DMV?

Many states make millions of dollars selling their database of drivers licensee’s and car registrations. Some databases can be purchased for +$100K.

My brother is doing some research for uni and got a grant and he’s interested in getting some purchasable data from his local DMV. He tried calling the DMV but got nowhere, he’ll try again next week. The process is definitely not transparent and there’s a lack of instructions on how the process goes.

Does anyone have some experience purchasing data from the DMV?

submitted by /u/nobilis_rex_
[link] [comments]