Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Cyber Security Related Data Set For A Project

I have a project due where I need to make 5 different linear regressions in Python on a cyber security topic such as cyberattacks, fake news, cyber intrusions, identity theft, malware, etc. I need a dataset with 200 lines and is a csv file. I know how to do the code but finding a good data set with numeric values is so hard!

submitted by /u/AmericanArsenal17
[link] [comments]

Can Anyone Point Me To A Database Of Worldwide Lux Records?

This might not be the best place for this question. Pointing me to a better forum would be appreciated if that’s true.

I live in Seattle, WA, which has a reputation for being rainy. But it’s not a well deserved one. There are cities in Florida that get more rain than us, for example.

After living here for 20 years, I’m convinced that what makes Seattle noteworthy is rather how dark it is. But any time I try to research this, it’s a dead end. All sources of data break things down into the binary of cloudy / sunny. Usually by day. One infographic I found at least had the nuance to use hours of sunshine.

I’m looking for a source to break cities down by average lux over the course of a year. With a smooth range from 120,000 lux to 10,000 for full daylight, and a range of 1,000 to 5 lux for cloud cover, and assumably 10,000 to 1,000 for some sort or partial cloud cover, it seems like there’s a ton of nuance possible here beyond “sunny” or “cloudy”.

With 10% or so of Americans being impacted by seasonal affective disorder, I’m confused why this information isn’t more in demand. I want to look at the big picture of average yearly light exposure.

But I also want my weather app to predict lux for tomorrow. How bright will it be at noon? I want people to have access to the vocabulary of lux like we’ve recently developed the vocabulary of air quality. “Wow, yesterday only got up to 10 lux in Seattle!”

It seems more significant to me than what time sunrise and sunset are, or what the humidity is, but I can’t find evidence that anyone is tracking this information at all 🫤

Can anyone point me to the secret database of global lux records?

submitted by /u/tigerproofrock
[link] [comments]

Historical 1-min OHLC Crypto Prices 1900+ Coins Dataset With Code

I created a dataset for analyzing crypto price data across a large number of coins traded on Ethereum.

The dataset can be viewed and downloaded from Kaggle here: https://www.kaggle.com/datasets/martkir/historical-ohlc-crypto-price-data-for-1900-coins

I also uploaded the code on Github if you want to reproduce the dataset and/or download fresh data. Link here: https://github.com/martkir/crypto-prices-download

I created the dataset because I couldn’t find a good / free place to download historical price data that was granular (1 min resolution) for a large enough cross section of coins.

Centralized exchanges (e.g. Binance, Kraken) have APIs but only for a small subset of tokens – which misses a lot of the small-cap coins traded on DEXs with interesting statistical properties.

Anyway, hope some of you find this dataset useful 🙂

submitted by /u/112129
[link] [comments]

[self-promo] Cybersyn: Snowflake Funded Data-as-a-Service Provider

This post is self-promotional, but I genuinely feel it can offer value to this community to discuss our plans, expose our free datasets, and take feedback on what datasets would like to see on Snowflake:

https://www.snowflake.com/blog/snowflake-invests-cybersyn-bringing-unique-data-products-to-marketplace/ https://www.cybersyn.com/blog-series-a/

Find all of our products directly here: https://app.snowflake.com/marketplace/listings/Cybersyn%2C%20Inc

submitted by /u/aiatco2
[link] [comments]

{Academic Study} Play A Quick Memory Game To Test Your Short Term Recall. Shouldn’t Take More Than 5 Minutes.

Hi, we are currently testing the effect of circadian rhythms on short term recall. The instructions are pretty simple. Download this app (https://apps.apple.com/us/app/short-term-memory/id804088277), play levels 4 and 8 using only 15 seconds to memorize the items. Record how many items you were able to recall for each level. The caveat is that you need to do this once in the evening, and once in the morning. That is the whole purpose of the experiment. Thank you for the participation! You can post your results in the comments or DM me.

submitted by /u/Trevor-Dustin
[link] [comments]

Wooldridge “nbasal” Dataset Analysis Issues

Im trying to analyze the “nbasal” dataset based on position.

when I run this line:

model1 = lm(wage ~ exper, data = center_players) # regression on center players

summary(model1)

The output is this

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1184.64 355.26 3.335 0.00174 **

exper 80.21 51.22 1.566 0.12450

when I run this:

model2 = lm(wage ~ exper + points, data = center_players)

summary(model2)

the output is this:

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -33.43 244.90 -0.136 0.8921

exper 71.13 29.86 2.382 0.0217 *

points 149.05 16.02 9.306 7.3e-12 ***

I don’t understand how each point increases salary by 149.05 and the intercept become -33. can someone explain this to me.

submitted by /u/Expensive-Still7318
[link] [comments]

Require Urgent Help Regarding PPMI Dataset

Hi all, I’m new to working with the PPMI dataset for my research project and require SBR values (LC, RC, LP, RP) and CSF markers (ptau, total tau, beta, alpha-syn). I’m finding it really confusing as of from where can I get the CSV files for the same. COuld someone help meout. It’s kinda urgent

submitted by /u/unicorn262001
[link] [comments]

Looking For Someone With A Statista Premium Subscription

Hey everyone,

I hope you’re all doing well. I’m currently working on a startup in the gaming industry and I’m looking for some specific data that is available on Statista. However, I don’t have a premium subscription and unfortunately, the data I need is not available with the free version.

So, I was wondering if anyone here has a Statista Premium subscription and would be willing to help me out. I know it’s a long shot, but I thought I’d give it a try.

I don’t want to take up too much of your time, but if you’re able to help, I would be extremely grateful.

Thank you for reading this far, and I hope you have a great day!

submitted by /u/saltpeppermint
[link] [comments]

Looking For A Dataset With Both Book ISBNs And Genre(s)

I need to do some data visualization work with books, and the dataset from Goodreads is almost perfect for what I need to do.

However, it doesn’t have any genre(s) listed. Is there an existing dataset, which I can use in conjunction with this one, that also has a list of genres? I don’t need it to line up with all 10,000 books in the Goodreads set, but a decent amount.

Any help would be greatly appreciated

Edit: An english equivalent of this is what I’m trying to find.

submitted by /u/jakehenderson01
[link] [comments]

Datasets With Notes, Quick Thoughts, Reminders?

I’m participating in a study on ways in which different people write their thoughts, lecture notes, reminders, and other short-form texts that are usually not meant to be shared.

Does anyone know of datasets that could be helpful here? One of our goals is to do some clustering analysis and determine the main “forms” of notes people use. We also want to find out how often people write multiple notes related to the same topic and obtain other interesting results.

Any suggestions are appreciated!

submitted by /u/smthamazing
[link] [comments]