Dataset Required For Model Creation In Hackathon

Can anyone provide me the datasets – NPDI Dataset, Pornography 2K Dataset & GGOI Dataset

submitted by /u/ank_007____
[link] [comments]

Project On Sportsbetting/dfs (maybe Focus Ing On NBA And Soccer)

I’m trynna do a simple project using R on ball games and and sportsbook projections; Looking at trends that usually make props go up or down and how the results usually turn out. Any ideas? Pointers for datasets that could match that perfectly?

I could focus on basketball and soccer to make the project simpler since they are the two sports I watch the most.

submitted by /u/Tight_Investigator14
[link] [comments]

0

Are There Gov Sites I Can Download/scrape Real Estate Data From? Sale Prices, Property Taxes Etc?

I was looking at njparcels website. I’d like similar data for Texas, is it possible to get from gov websites – legally?

submitted by /u/svg_12345
[link] [comments]

0

New Datasets With Job Postings On Kaggle

Techmap.io just published new datasets of Job Postings from Ireland on Kaggle. You can find them here: https://www.kaggle.com/techmap/datasets

Job Postings from Ireland (October 2020) – 58MB Job Postings from Ireland (October 2021) – 56MB Job Postings from Ireland (October 2022) – 101MB

submitted by /u/Techmap_io
[link] [comments]

0

Can Someone Please Help Me Compile Klay Thompson Data Into A Csv

Hey everyone, I’m taking a machine learning class in college and I want to build an R model that predicts Klay Thompson’s performance in NBA games. The problem is I can’t find a cleaned dataset with data from all 716 nba games he’s played, with all the covariates such as 3 pointers, rebounds, assists, free throws, etc. I found all this info on statmuse.com and that website that has a record of all the games he’s played but I need help compiling them into a csv. Can anyone help me do this?

submitted by /u/driftqueenjulie
[link] [comments]

0

Looking For Accessible ESG Datasets For School Project

Hi /r/datasets

For a school oroject I’m working on, I need data about ESG scores (preferably detailed for each pillar) for several companies (particularly European ones but anything goes) , supplementary data about different ESG criteria can be useful too Unfortunately, most data sources about this are very expensive or hardly useful… So any suggestions of accessible datasets like these would be very appreciated! Thanks in advance for any help!

PS : datasets about operational risks for companies can be interesting too

submitted by /u/floflo79
[link] [comments]

0

My First Trial To Find A Data Set For Thesis

s it always a riddle to find the data sets of a research paper?

or it is that some dont show them?

for example here, https://encyclopedia.pub/entry/2267

shouldnt they mention whether it is shown or not ?

submitted by /u/Professional_Yak9979
[link] [comments]

0

Looking For Dataset Of Correct And Incorrect Electronic Invoices

Looking for a dataset of electronic invoices with the following specs:

Type: Electronic invoices, not scanned docs, US invoices preferably

File Type: Pdf or jpg/png…

Quantity: At least 500 total invoices, preferably over 1,000

Additional details: The dataset needs to contain both correct and incorrect invoices. Incorrect invoices would be invoices that contain errors, inaccuracies or issues in them. Correct invoices need to have a tag in the name that indicates they are correct, same thing for the incorrect invoices. Not sure if this is the best move but I would be ok with having 2 separate datasets, 1 dataset of correct invoices and another dataset of incorrect invoices.

I am also open to suggestions of sites or resources that have invoices for web scrapping purposes.

I am willing to provide additional details if it helps.

Thanks in advance!

submitted by /u/souley16
[link] [comments]

0

Looking For Obesity Rate By Zip Or County

Does anyone know where I could get obesity rates by zip or county? I would need them by a level more detailed than the state level. Thank you

submitted by /u/jbr2811
[link] [comments]

0

Looking For A Good Fraud Data Set For A Class Project, Not Very Knowledgeable.

i somehow ended up in a data analytics class where I need to prepare a proposal for an investigation related to fraud and the prof has basically given us no insight. I need a data set that i can run at least three different supervised or semi-supervised analytical techniques on. I was thinking something related to spam email but i really don’t know what I’m looking for. Struggling to come up with good ideas. preferably simple, any help is greatly appreciated

submitted by /u/xnickg77
[link] [comments]

0

Is It Ethical Or I Guess Allowed For Me To Use A Prior Data Set For Practice?

I think I already know the answer but want to get other opinions.

I have two large data sets that I had access to in the past: 1 was shared with me on Github and is still available on their profile – Its real data but redacted for HIPAA reasons.

Another Data set I had been given access to for during my Capstone project – Its also redacted and does not have any direct patient identifiers (Medical recor numbers but this means nothing to me or This is the only thing I’m worried about)

Would it be appropriate for me to re-use these data sets and put them up on my portfolio with data visualizations and as ‘data cleaning’ projects?

Any advice is appreciated

submitted by /u/Potential_Lettuce
[link] [comments]

0

Historical Data On UFC Fighters And Their Opponents

E.g. I’d like data of all of Khabib’s fights in the UFC, and data on his opponents. Most notably what their rank was in their respective weight class at the time of the fight, their record at the time, etc

submitted by /u/alpachino4
[link] [comments]

0

Metadata On US Or International Boycotts

Does anyone know of datasets that provide data on boycotts? Things like start/end dates, financial impact, industry/ companies impacted, scope of boycott (sq. miles or # of people), type of product, and/ or reason for boycott.

submitted by /u/Neighborhooddataguy
[link] [comments]

0

Does Any World Beaches Dataset Exist?

I’ve been searching for it but all I’ve found are a couple datasets from any specific country, but nothing global, neither free or paid.

What I need is something like: “country – city name – beach name”, it doesn’t have to be a perfect list of world beaches, but at least it should serve as a starting point.

submitted by /u/montesremotedev
[link] [comments]

0

Reported Chemicals In Makeup Dataset

The information provided in these data has been submitted to the California Safe Cosmetics Program (CSCP) at the California Department of Public Health (CDPH). The primary goal of the CSCP is to gather data on unsafe and potentially hazardous components in cosmetic products available for sale in California and make this information accessible to the public.

Under the California Safe Cosmetics Act, manufacturers, packers, and/or distributors are required to submit a list of all cosmetic products that contain any ingredients known or suspected to cause cancer, birth defects, or other developmental or reproductive harm to the CSCP, as indicated on the product label, for all cosmetic products sold in California.

Companies with reportable ingredients in their products must provide information to the CSCP if they meet the following criteria:

They have annual aggregate sales of cosmetic products of one million dollars or more They have sold cosmetic products in California on or after January 1, 2007.

To view the data: https://app.gigasheet.com/spreadsheet/Cosmetic-Company-Chemicals/26ed23e9_77da_4708_b5da_8bb23c6efcff

Source: https://catalog.data.gov/dataset/chemicals-in-cosmetics-7d6ab

submitted by /u/sheetheadd
[link] [comments]

0

Need Help With The IMDb Movie Datasets.

The data sets which I have right now are too big to be loaded on Google sheets and Rstudio. Suggest me ways to load and work on the data.

submitted by /u/Easy-Inflation3123
[link] [comments]

0

Cyber Security Related Data Set For A Project

I have a project due where I need to make 5 different linear regressions in Python on a cyber security topic such as cyberattacks, fake news, cyber intrusions, identity theft, malware, etc. I need a dataset with 200 lines and is a csv file. I know how to do the code but finding a good data set with numeric values is so hard!

submitted by /u/AmericanArsenal17
[link] [comments]

0

Historical 1-min OHLC Crypto Prices 1900+ Coins Dataset With Code

I created a dataset for analyzing crypto price data across a large number of coins traded on Ethereum.

The dataset can be viewed and downloaded from Kaggle here: https://www.kaggle.com/datasets/martkir/historical-ohlc-crypto-price-data-for-1900-coins

I also uploaded the code on Github if you want to reproduce the dataset and/or download fresh data. Link here: https://github.com/martkir/crypto-prices-download

I created the dataset because I couldn’t find a good / free place to download historical price data that was granular (1 min resolution) for a large enough cross section of coins.

Centralized exchanges (e.g. Binance, Kraken) have APIs but only for a small subset of tokens – which misses a lot of the small-cap coins traded on DEXs with interesting statistical properties.

Anyway, hope some of you find this dataset useful 🙂

submitted by /u/112129
[link] [comments]

0

Can Anyone Point Me To A Database Of Worldwide Lux Records?

This might not be the best place for this question. Pointing me to a better forum would be appreciated if that’s true.

I live in Seattle, WA, which has a reputation for being rainy. But it’s not a well deserved one. There are cities in Florida that get more rain than us, for example.

After living here for 20 years, I’m convinced that what makes Seattle noteworthy is rather how dark it is. But any time I try to research this, it’s a dead end. All sources of data break things down into the binary of cloudy / sunny. Usually by day. One infographic I found at least had the nuance to use hours of sunshine.

I’m looking for a source to break cities down by average lux over the course of a year. With a smooth range from 120,000 lux to 10,000 for full daylight, and a range of 1,000 to 5 lux for cloud cover, and assumably 10,000 to 1,000 for some sort or partial cloud cover, it seems like there’s a ton of nuance possible here beyond “sunny” or “cloudy”.

With 10% or so of Americans being impacted by seasonal affective disorder, I’m confused why this information isn’t more in demand. I want to look at the big picture of average yearly light exposure.

But I also want my weather app to predict lux for tomorrow. How bright will it be at noon? I want people to have access to the vocabulary of lux like we’ve recently developed the vocabulary of air quality. “Wow, yesterday only got up to 10 lux in Seattle!”

It seems more significant to me than what time sunrise and sunset are, or what the humidity is, but I can’t find evidence that anyone is tracking this information at all 🫤

Can anyone point me to the secret database of global lux records?

submitted by /u/tigerproofrock
[link] [comments]

0

Time Series For Climate Change: Forecasting Wind Power

submitted by /u/cavedave
[link] [comments]

0

PRESTO – A Multilingual Dataset For Parsing Realistic Task-oriented Dialogues

submitted by /u/cavedave
[link] [comments]

0

15,000 Human-generated Prompt Response Pairs Specifically Designed For Instruction Tuning Large Language Models

submitted by /u/cavedave
[link] [comments]

0

[self-promo] Cybersyn: Snowflake Funded Data-as-a-Service Provider

This post is self-promotional, but I genuinely feel it can offer value to this community to discuss our plans, expose our free datasets, and take feedback on what datasets would like to see on Snowflake:

https://www.snowflake.com/blog/snowflake-invests-cybersyn-bringing-unique-data-products-to-marketplace/ https://www.cybersyn.com/blog-series-a/

Find all of our products directly here: https://app.snowflake.com/marketplace/listings/Cybersyn%2C%20Inc

submitted by /u/aiatco2
[link] [comments]

0

{Academic Study} Play A Quick Memory Game To Test Your Short Term Recall. Shouldn’t Take More Than 5 Minutes.

Hi, we are currently testing the effect of circadian rhythms on short term recall. The instructions are pretty simple. Download this app (https://apps.apple.com/us/app/short-term-memory/id804088277), play levels 4 and 8 using only 15 seconds to memorize the items. Record how many items you were able to recall for each level. The caveat is that you need to do this once in the evening, and once in the morning. That is the whole purpose of the experiment. Thank you for the participation! You can post your results in the comments or DM me.

submitted by /u/Trevor-Dustin
[link] [comments]

0

USA County Data

submitted by /u/cavedave
[link] [comments]

0

Wooldridge “nbasal” Dataset Analysis Issues

Im trying to analyze the “nbasal” dataset based on position.

when I run this line:

model1 = lm(wage ~ exper, data = center_players) # regression on center players

summary(model1)

The output is this

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1184.64 355.26 3.335 0.00174 **

exper 80.21 51.22 1.566 0.12450

when I run this:

model2 = lm(wage ~ exper + points, data = center_players)

summary(model2)

the output is this:

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -33.43 244.90 -0.136 0.8921

exper 71.13 29.86 2.382 0.0217 *

points 149.05 16.02 9.306 7.3e-12 ***

I don’t understand how each point increases salary by 149.05 and the intercept become -33. can someone explain this to me.

submitted by /u/Expensive-Still7318
[link] [comments]

0

Require Urgent Help Regarding PPMI Dataset

Hi all, I’m new to working with the PPMI dataset for my research project and require SBR values (LC, RC, LP, RP) and CSF markers (ptau, total tau, beta, alpha-syn). I’m finding it really confusing as of from where can I get the CSV files for the same. COuld someone help meout. It’s kinda urgent

submitted by /u/unicorn262001
[link] [comments]

0

What Are The Best Tools For Web Scraping And Analysis Of Natural Language To Populate A Dataset?

submitted by /u/adjectivenounnr
[link] [comments]

0

We Made A Newsfeed For Tracking New And Deleted Datasets Across 200+ Open Data Portals (and They’re All Queryable With SQL)

submitted by /u/chatmasta
[link] [comments]

0

Unlimited Data For Creating Dataset For Intent Recognition And Other NLU Models

Nice idea to use chatGPT. It would be great if someone took on the task of creating an open datasets, so that resources wouldn’t be wasted on work that has already been done.

Breaking Through the Limits: How Unlimited Data Collection and Generation Can Overcome Traditional Barriers in Intent Recognition

submitted by /u/KMiNT21
[link] [comments]

0

Category: Datatards

Dataset Required For Model Creation In Hackathon

Project On Sportsbetting/dfs (maybe Focus Ing On NBA And Soccer)

Are There Gov Sites I Can Download/scrape Real Estate Data From? Sale Prices, Property Taxes Etc?

New Datasets With Job Postings On Kaggle

Can Someone Please Help Me Compile Klay Thompson Data Into A Csv

Looking For Accessible ESG Datasets For School Project

My First Trial To Find A Data Set For Thesis

Looking For Dataset Of Correct And Incorrect Electronic Invoices

Looking For Obesity Rate By Zip Or County

Looking For A Good Fraud Data Set For A Class Project, Not Very Knowledgeable.

Is It Ethical Or I Guess Allowed For Me To Use A Prior Data Set For Practice?

Historical Data On UFC Fighters And Their Opponents

Metadata On US Or International Boycotts

Does Any World Beaches Dataset Exist?

Reported Chemicals In Makeup Dataset

Need Help With The IMDb Movie Datasets.

Cyber Security Related Data Set For A Project

Historical 1-min OHLC Crypto Prices 1900+ Coins Dataset With Code

Can Anyone Point Me To A Database Of Worldwide Lux Records?

Time Series For Climate Change: Forecasting Wind Power

PRESTO – A Multilingual Dataset For Parsing Realistic Task-oriented Dialogues

15,000 Human-generated Prompt Response Pairs Specifically Designed For Instruction Tuning Large Language Models

[self-promo] Cybersyn: Snowflake Funded Data-as-a-Service Provider

{Academic Study} Play A Quick Memory Game To Test Your Short Term Recall. Shouldn’t Take More Than 5 Minutes.

USA County Data

Wooldridge “nbasal” Dataset Analysis Issues

Require Urgent Help Regarding PPMI Dataset

What Are The Best Tools For Web Scraping And Analysis Of Natural Language To Populate A Dataset?

We Made A Newsfeed For Tracking New And Deleted Datasets Across 200+ Open Data Portals (and They’re All Queryable With SQL)

Unlimited Data For Creating Dataset For Intent Recognition And Other NLU Models

Recent Posts

Recent Comments

18+ Content

Recent Posts

Recent Comments