Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Is It Ethical Or I Guess Allowed For Me To Use A Prior Data Set For Practice?

I think I already know the answer but want to get other opinions.

I have two large data sets that I had access to in the past: 1 was shared with me on Github and is still available on their profile – Its real data but redacted for HIPAA reasons.

Another Data set I had been given access to for during my Capstone project – Its also redacted and does not have any direct patient identifiers (Medical recor numbers but this means nothing to me or This is the only thing I’m worried about)

Would it be appropriate for me to re-use these data sets and put them up on my portfolio with data visualizations and as ‘data cleaning’ projects?

Any advice is appreciated

submitted by /u/Potential_Lettuce
[link] [comments]

Does Any World Beaches Dataset Exist?

I’ve been searching for it but all I’ve found are a couple datasets from any specific country, but nothing global, neither free or paid.

What I need is something like: “country – city name – beach name”, it doesn’t have to be a perfect list of world beaches, but at least it should serve as a starting point.

submitted by /u/montesremotedev
[link] [comments]

Reported Chemicals In Makeup Dataset

The information provided in these data has been submitted to the California Safe Cosmetics Program (CSCP) at the California Department of Public Health (CDPH). The primary goal of the CSCP is to gather data on unsafe and potentially hazardous components in cosmetic products available for sale in California and make this information accessible to the public.

Under the California Safe Cosmetics Act, manufacturers, packers, and/or distributors are required to submit a list of all cosmetic products that contain any ingredients known or suspected to cause cancer, birth defects, or other developmental or reproductive harm to the CSCP, as indicated on the product label, for all cosmetic products sold in California.

Companies with reportable ingredients in their products must provide information to the CSCP if they meet the following criteria:

They have annual aggregate sales of cosmetic products of one million dollars or more They have sold cosmetic products in California on or after January 1, 2007.

To view the data: https://app.gigasheet.com/spreadsheet/Cosmetic-Company-Chemicals/26ed23e9_77da_4708_b5da_8bb23c6efcff

Source: https://catalog.data.gov/dataset/chemicals-in-cosmetics-7d6ab

submitted by /u/sheetheadd
[link] [comments]

Cyber Security Related Data Set For A Project

I have a project due where I need to make 5 different linear regressions in Python on a cyber security topic such as cyberattacks, fake news, cyber intrusions, identity theft, malware, etc. I need a dataset with 200 lines and is a csv file. I know how to do the code but finding a good data set with numeric values is so hard!

submitted by /u/AmericanArsenal17
[link] [comments]

Can Anyone Point Me To A Database Of Worldwide Lux Records?

This might not be the best place for this question. Pointing me to a better forum would be appreciated if that’s true.

I live in Seattle, WA, which has a reputation for being rainy. But it’s not a well deserved one. There are cities in Florida that get more rain than us, for example.

After living here for 20 years, I’m convinced that what makes Seattle noteworthy is rather how dark it is. But any time I try to research this, it’s a dead end. All sources of data break things down into the binary of cloudy / sunny. Usually by day. One infographic I found at least had the nuance to use hours of sunshine.

I’m looking for a source to break cities down by average lux over the course of a year. With a smooth range from 120,000 lux to 10,000 for full daylight, and a range of 1,000 to 5 lux for cloud cover, and assumably 10,000 to 1,000 for some sort or partial cloud cover, it seems like there’s a ton of nuance possible here beyond “sunny” or “cloudy”.

With 10% or so of Americans being impacted by seasonal affective disorder, I’m confused why this information isn’t more in demand. I want to look at the big picture of average yearly light exposure.

But I also want my weather app to predict lux for tomorrow. How bright will it be at noon? I want people to have access to the vocabulary of lux like we’ve recently developed the vocabulary of air quality. “Wow, yesterday only got up to 10 lux in Seattle!”

It seems more significant to me than what time sunrise and sunset are, or what the humidity is, but I can’t find evidence that anyone is tracking this information at all 🫤

Can anyone point me to the secret database of global lux records?

submitted by /u/tigerproofrock
[link] [comments]

Historical 1-min OHLC Crypto Prices 1900+ Coins Dataset With Code

I created a dataset for analyzing crypto price data across a large number of coins traded on Ethereum.

The dataset can be viewed and downloaded from Kaggle here: https://www.kaggle.com/datasets/martkir/historical-ohlc-crypto-price-data-for-1900-coins

I also uploaded the code on Github if you want to reproduce the dataset and/or download fresh data. Link here: https://github.com/martkir/crypto-prices-download

I created the dataset because I couldn’t find a good / free place to download historical price data that was granular (1 min resolution) for a large enough cross section of coins.

Centralized exchanges (e.g. Binance, Kraken) have APIs but only for a small subset of tokens – which misses a lot of the small-cap coins traded on DEXs with interesting statistical properties.

Anyway, hope some of you find this dataset useful 🙂

submitted by /u/112129
[link] [comments]

[self-promo] Cybersyn: Snowflake Funded Data-as-a-Service Provider

This post is self-promotional, but I genuinely feel it can offer value to this community to discuss our plans, expose our free datasets, and take feedback on what datasets would like to see on Snowflake:

https://www.snowflake.com/blog/snowflake-invests-cybersyn-bringing-unique-data-products-to-marketplace/ https://www.cybersyn.com/blog-series-a/

Find all of our products directly here: https://app.snowflake.com/marketplace/listings/Cybersyn%2C%20Inc

submitted by /u/aiatco2
[link] [comments]

{Academic Study} Play A Quick Memory Game To Test Your Short Term Recall. Shouldn’t Take More Than 5 Minutes.

Hi, we are currently testing the effect of circadian rhythms on short term recall. The instructions are pretty simple. Download this app (https://apps.apple.com/us/app/short-term-memory/id804088277), play levels 4 and 8 using only 15 seconds to memorize the items. Record how many items you were able to recall for each level. The caveat is that you need to do this once in the evening, and once in the morning. That is the whole purpose of the experiment. Thank you for the participation! You can post your results in the comments or DM me.

submitted by /u/Trevor-Dustin
[link] [comments]

Wooldridge “nbasal” Dataset Analysis Issues

Im trying to analyze the “nbasal” dataset based on position.

when I run this line:

model1 = lm(wage ~ exper, data = center_players) # regression on center players

summary(model1)

The output is this

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1184.64 355.26 3.335 0.00174 **

exper 80.21 51.22 1.566 0.12450

when I run this:

model2 = lm(wage ~ exper + points, data = center_players)

summary(model2)

the output is this:

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -33.43 244.90 -0.136 0.8921

exper 71.13 29.86 2.382 0.0217 *

points 149.05 16.02 9.306 7.3e-12 ***

I don’t understand how each point increases salary by 149.05 and the intercept become -33. can someone explain this to me.

submitted by /u/Expensive-Still7318
[link] [comments]

Require Urgent Help Regarding PPMI Dataset

Hi all, I’m new to working with the PPMI dataset for my research project and require SBR values (LC, RC, LP, RP) and CSF markers (ptau, total tau, beta, alpha-syn). I’m finding it really confusing as of from where can I get the CSV files for the same. COuld someone help meout. It’s kinda urgent

submitted by /u/unicorn262001
[link] [comments]

Looking For Someone With A Statista Premium Subscription

Hey everyone,

I hope you’re all doing well. I’m currently working on a startup in the gaming industry and I’m looking for some specific data that is available on Statista. However, I don’t have a premium subscription and unfortunately, the data I need is not available with the free version.

So, I was wondering if anyone here has a Statista Premium subscription and would be willing to help me out. I know it’s a long shot, but I thought I’d give it a try.

I don’t want to take up too much of your time, but if you’re able to help, I would be extremely grateful.

Thank you for reading this far, and I hope you have a great day!

submitted by /u/saltpeppermint
[link] [comments]

Looking For A Dataset With Both Book ISBNs And Genre(s)

I need to do some data visualization work with books, and the dataset from Goodreads is almost perfect for what I need to do.

However, it doesn’t have any genre(s) listed. Is there an existing dataset, which I can use in conjunction with this one, that also has a list of genres? I don’t need it to line up with all 10,000 books in the Goodreads set, but a decent amount.

Any help would be greatly appreciated

Edit: An english equivalent of this is what I’m trying to find.

submitted by /u/jakehenderson01
[link] [comments]