Is It Ethical Or I Guess Allowed For Me To Use A Prior Data Set For Practice?

I think I already know the answer but want to get other opinions.

I have two large data sets that I had access to in the past: 1 was shared with me on Github and is still available on their profile – Its real data but redacted for HIPAA reasons.

Another Data set I had been given access to for during my Capstone project – Its also redacted and does not have any direct patient identifiers (Medical recor numbers but this means nothing to me or This is the only thing I’m worried about)

Would it be appropriate for me to re-use these data sets and put them up on my portfolio with data visualizations and as ‘data cleaning’ projects?

Any advice is appreciated

submitted by /u/Potential_Lettuce
[link] [comments]

0

Historical Data On UFC Fighters And Their Opponents

E.g. I’d like data of all of Khabib’s fights in the UFC, and data on his opponents. Most notably what their rank was in their respective weight class at the time of the fight, their record at the time, etc

submitted by /u/alpachino4
[link] [comments]

0

Metadata On US Or International Boycotts

Does anyone know of datasets that provide data on boycotts? Things like start/end dates, financial impact, industry/ companies impacted, scope of boycott (sq. miles or # of people), type of product, and/ or reason for boycott.

submitted by /u/Neighborhooddataguy
[link] [comments]

0

Does Any World Beaches Dataset Exist?

I’ve been searching for it but all I’ve found are a couple datasets from any specific country, but nothing global, neither free or paid.

What I need is something like: “country – city name – beach name”, it doesn’t have to be a perfect list of world beaches, but at least it should serve as a starting point.

submitted by /u/montesremotedev
[link] [comments]

0

Reported Chemicals In Makeup Dataset

The information provided in these data has been submitted to the California Safe Cosmetics Program (CSCP) at the California Department of Public Health (CDPH). The primary goal of the CSCP is to gather data on unsafe and potentially hazardous components in cosmetic products available for sale in California and make this information accessible to the public.

Under the California Safe Cosmetics Act, manufacturers, packers, and/or distributors are required to submit a list of all cosmetic products that contain any ingredients known or suspected to cause cancer, birth defects, or other developmental or reproductive harm to the CSCP, as indicated on the product label, for all cosmetic products sold in California.

Companies with reportable ingredients in their products must provide information to the CSCP if they meet the following criteria:

They have annual aggregate sales of cosmetic products of one million dollars or more They have sold cosmetic products in California on or after January 1, 2007.

To view the data: https://app.gigasheet.com/spreadsheet/Cosmetic-Company-Chemicals/26ed23e9_77da_4708_b5da_8bb23c6efcff

Source: https://catalog.data.gov/dataset/chemicals-in-cosmetics-7d6ab

submitted by /u/sheetheadd
[link] [comments]

0

Need Help With The IMDb Movie Datasets.

The data sets which I have right now are too big to be loaded on Google sheets and Rstudio. Suggest me ways to load and work on the data.

submitted by /u/Easy-Inflation3123
[link] [comments]

0

Cyber Security Related Data Set For A Project

I have a project due where I need to make 5 different linear regressions in Python on a cyber security topic such as cyberattacks, fake news, cyber intrusions, identity theft, malware, etc. I need a dataset with 200 lines and is a csv file. I know how to do the code but finding a good data set with numeric values is so hard!

submitted by /u/AmericanArsenal17
[link] [comments]

0

Can Anyone Point Me To A Database Of Worldwide Lux Records?

This might not be the best place for this question. Pointing me to a better forum would be appreciated if that’s true.

I live in Seattle, WA, which has a reputation for being rainy. But it’s not a well deserved one. There are cities in Florida that get more rain than us, for example.

After living here for 20 years, I’m convinced that what makes Seattle noteworthy is rather how dark it is. But any time I try to research this, it’s a dead end. All sources of data break things down into the binary of cloudy / sunny. Usually by day. One infographic I found at least had the nuance to use hours of sunshine.

I’m looking for a source to break cities down by average lux over the course of a year. With a smooth range from 120,000 lux to 10,000 for full daylight, and a range of 1,000 to 5 lux for cloud cover, and assumably 10,000 to 1,000 for some sort or partial cloud cover, it seems like there’s a ton of nuance possible here beyond “sunny” or “cloudy”.

With 10% or so of Americans being impacted by seasonal affective disorder, I’m confused why this information isn’t more in demand. I want to look at the big picture of average yearly light exposure.

But I also want my weather app to predict lux for tomorrow. How bright will it be at noon? I want people to have access to the vocabulary of lux like we’ve recently developed the vocabulary of air quality. “Wow, yesterday only got up to 10 lux in Seattle!”

It seems more significant to me than what time sunrise and sunset are, or what the humidity is, but I can’t find evidence that anyone is tracking this information at all 🫤

Can anyone point me to the secret database of global lux records?

submitted by /u/tigerproofrock
[link] [comments]

0

Historical 1-min OHLC Crypto Prices 1900+ Coins Dataset With Code

I created a dataset for analyzing crypto price data across a large number of coins traded on Ethereum.

The dataset can be viewed and downloaded from Kaggle here: https://www.kaggle.com/datasets/martkir/historical-ohlc-crypto-price-data-for-1900-coins

I also uploaded the code on Github if you want to reproduce the dataset and/or download fresh data. Link here: https://github.com/martkir/crypto-prices-download

I created the dataset because I couldn’t find a good / free place to download historical price data that was granular (1 min resolution) for a large enough cross section of coins.

Centralized exchanges (e.g. Binance, Kraken) have APIs but only for a small subset of tokens – which misses a lot of the small-cap coins traded on DEXs with interesting statistical properties.

Anyway, hope some of you find this dataset useful 🙂

submitted by /u/112129
[link] [comments]

0

Time Series For Climate Change: Forecasting Wind Power

submitted by /u/cavedave
[link] [comments]

0

PRESTO – A Multilingual Dataset For Parsing Realistic Task-oriented Dialogues

submitted by /u/cavedave
[link] [comments]

0

15,000 Human-generated Prompt Response Pairs Specifically Designed For Instruction Tuning Large Language Models

submitted by /u/cavedave
[link] [comments]

0

[self-promo] Cybersyn: Snowflake Funded Data-as-a-Service Provider

This post is self-promotional, but I genuinely feel it can offer value to this community to discuss our plans, expose our free datasets, and take feedback on what datasets would like to see on Snowflake:

https://www.snowflake.com/blog/snowflake-invests-cybersyn-bringing-unique-data-products-to-marketplace/ https://www.cybersyn.com/blog-series-a/

Find all of our products directly here: https://app.snowflake.com/marketplace/listings/Cybersyn%2C%20Inc

submitted by /u/aiatco2
[link] [comments]

0

{Academic Study} Play A Quick Memory Game To Test Your Short Term Recall. Shouldn’t Take More Than 5 Minutes.

Hi, we are currently testing the effect of circadian rhythms on short term recall. The instructions are pretty simple. Download this app (https://apps.apple.com/us/app/short-term-memory/id804088277), play levels 4 and 8 using only 15 seconds to memorize the items. Record how many items you were able to recall for each level. The caveat is that you need to do this once in the evening, and once in the morning. That is the whole purpose of the experiment. Thank you for the participation! You can post your results in the comments or DM me.

submitted by /u/Trevor-Dustin
[link] [comments]

0

USA County Data

submitted by /u/cavedave
[link] [comments]

0

Wooldridge “nbasal” Dataset Analysis Issues

Im trying to analyze the “nbasal” dataset based on position.

when I run this line:

model1 = lm(wage ~ exper, data = center_players) # regression on center players

summary(model1)

The output is this

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1184.64 355.26 3.335 0.00174 **

exper 80.21 51.22 1.566 0.12450

when I run this:

model2 = lm(wage ~ exper + points, data = center_players)

summary(model2)

the output is this:

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -33.43 244.90 -0.136 0.8921

exper 71.13 29.86 2.382 0.0217 *

points 149.05 16.02 9.306 7.3e-12 ***

I don’t understand how each point increases salary by 149.05 and the intercept become -33. can someone explain this to me.

submitted by /u/Expensive-Still7318
[link] [comments]

0

Require Urgent Help Regarding PPMI Dataset

Hi all, I’m new to working with the PPMI dataset for my research project and require SBR values (LC, RC, LP, RP) and CSF markers (ptau, total tau, beta, alpha-syn). I’m finding it really confusing as of from where can I get the CSV files for the same. COuld someone help meout. It’s kinda urgent

submitted by /u/unicorn262001
[link] [comments]

0

What Are The Best Tools For Web Scraping And Analysis Of Natural Language To Populate A Dataset?

submitted by /u/adjectivenounnr
[link] [comments]

0

We Made A Newsfeed For Tracking New And Deleted Datasets Across 200+ Open Data Portals (and They’re All Queryable With SQL)

submitted by /u/chatmasta
[link] [comments]

0

Unlimited Data For Creating Dataset For Intent Recognition And Other NLU Models

Nice idea to use chatGPT. It would be great if someone took on the task of creating an open datasets, so that resources wouldn’t be wasted on work that has already been done.

Breaking Through the Limits: How Unlimited Data Collection and Generation Can Overcome Traditional Barriers in Intent Recognition

submitted by /u/KMiNT21
[link] [comments]

0

Multivariate Dataset To Be Used For Confirmatory Factor Analysis

Hi, I am currently a student and in need of a dataset that I can use to practice my CFA knowledge. Do you guys have any dataset that I can use? I would appreciate if it is a real world dataset so that I can research more about the topic. Thank you!

submitted by /u/Flazh722
[link] [comments]

0

Gas And Weather Datasets (a Showcase Of Data Usage In The Dashboard)

Hi, I created a simple dashboard with public data from the following sources:

– https://agsi.gie.eu/

– https://www.visualcrossing.com/weather-data

Some of you might find it useful. The result is the following (a short article with a link to a public demo): https://medium.com/gooddata-developers/showcase-will-we-run-out-of-gas-183eadb29e75

submitted by /u/AmphibianInfamous574
[link] [comments]

0

US County Level Sunrise/sunset Time Or Duration Of Day Data Spanning Several Years

I know this is a long shot. All exportable sunrise/sunset data is specific to one location, but I need thousands of locations in the same dataset for several years. Would my best bet be to write a loop in R to download it from Naval Observatory and append it all?

submitted by /u/string_of_letter_s
[link] [comments]

0

Bidets Sales During The Pandemic (2019-2022)

I am looking for data that shows the sales of bidets pre-pandemic, during the pandemic, and maybe after it too.

submitted by /u/slapm3withit
[link] [comments]

0

Looking For Historical Data On Spotify Tracks

Looking for someplace that has data on spotify tracks popularity, etc.

I realize there is an API but was wondering on some past data so am looking for someone who has been tracking it already.

submitted by /u/mithi26
[link] [comments]

0

Looking For Someone With A Statista Premium Subscription

Hey everyone,

I hope you’re all doing well. I’m currently working on a startup in the gaming industry and I’m looking for some specific data that is available on Statista. However, I don’t have a premium subscription and unfortunately, the data I need is not available with the free version.

So, I was wondering if anyone here has a Statista Premium subscription and would be willing to help me out. I know it’s a long shot, but I thought I’d give it a try.

I don’t want to take up too much of your time, but if you’re able to help, I would be extremely grateful.

Thank you for reading this far, and I hope you have a great day!

submitted by /u/saltpeppermint
[link] [comments]

0

Looking For A Dataset With Both Book ISBNs And Genre(s)

I need to do some data visualization work with books, and the dataset from Goodreads is almost perfect for what I need to do.

However, it doesn’t have any genre(s) listed. Is there an existing dataset, which I can use in conjunction with this one, that also has a list of genres? I don’t need it to line up with all 10,000 books in the Goodreads set, but a decent amount.

Any help would be greatly appreciated

Edit: An english equivalent of this is what I’m trying to find.

submitted by /u/jakehenderson01
[link] [comments]

0

Searching For Video Game Users Demographics

Good evening, I hope all is well!

I am looking for a dataset that includes anything about video game users race, time spent playing, gender, etc.

I see a couple different datasets on game sales which is great but not what I’m interested in.

Thank you so much.

submitted by /u/Dwall_208
[link] [comments]

0

Alternative Medicine Effectiveness Datasets

Looking for datasets with data on how effective natural medicine methods are, such as herbs and supplements.

submitted by /u/Raccoon_1131
[link] [comments]

0

[REQUEST] Updated Locations Of All Home Depot & Lowes Stores In The United States By Zip Code

There was a list posted in this thread with some dead links from 8 years ago for the locations of Home Depot and Lowe’s stores but I need help finding an updated list or a way to create a list of all locations by zip code.

Any help or a point in the right direction would be extremely helpful! TYIA

submitted by /u/ilovemarketresearch
[link] [comments]

0

Category: Datatards

Is It Ethical Or I Guess Allowed For Me To Use A Prior Data Set For Practice?

Historical Data On UFC Fighters And Their Opponents

Metadata On US Or International Boycotts

Does Any World Beaches Dataset Exist?

Reported Chemicals In Makeup Dataset

Need Help With The IMDb Movie Datasets.

Cyber Security Related Data Set For A Project

Can Anyone Point Me To A Database Of Worldwide Lux Records?

Historical 1-min OHLC Crypto Prices 1900+ Coins Dataset With Code

Time Series For Climate Change: Forecasting Wind Power

PRESTO – A Multilingual Dataset For Parsing Realistic Task-oriented Dialogues

15,000 Human-generated Prompt Response Pairs Specifically Designed For Instruction Tuning Large Language Models

[self-promo] Cybersyn: Snowflake Funded Data-as-a-Service Provider

{Academic Study} Play A Quick Memory Game To Test Your Short Term Recall. Shouldn’t Take More Than 5 Minutes.

USA County Data

Wooldridge “nbasal” Dataset Analysis Issues

Require Urgent Help Regarding PPMI Dataset

What Are The Best Tools For Web Scraping And Analysis Of Natural Language To Populate A Dataset?

We Made A Newsfeed For Tracking New And Deleted Datasets Across 200+ Open Data Portals (and They’re All Queryable With SQL)

Unlimited Data For Creating Dataset For Intent Recognition And Other NLU Models

Multivariate Dataset To Be Used For Confirmatory Factor Analysis

Gas And Weather Datasets (a Showcase Of Data Usage In The Dashboard)

US County Level Sunrise/sunset Time Or Duration Of Day Data Spanning Several Years

Bidets Sales During The Pandemic (2019-2022)

Looking For Historical Data On Spotify Tracks

Looking For Someone With A Statista Premium Subscription

Looking For A Dataset With Both Book ISBNs And Genre(s)

Searching For Video Game Users Demographics

Alternative Medicine Effectiveness Datasets

[REQUEST] Updated Locations Of All Home Depot & Lowes Stores In The United States By Zip Code

Recent Posts

Recent Comments

18+ Content

Recent Posts

Recent Comments