Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

School Assignment: Needs Dataset About Road Quality In Europe

Does anyone know a dataset about road qualities that i can use (for free).

I am working on a school assignment about the trafic situation in europe and currently the best website i found is this : https://www.theglobaleconomy.com/rankings/roads_quality/Europe/

However, this dataset isn’t free to use. Maybe this community has some datasets available for the task i want to perform.

Thanks in advance!

submitted by /u/Just_Presence_1414
[link] [comments]

Anthropic RLHF Dataset: Human Preference Data (+ Errors I Found)

Hello friends!

I recently found this RLHF-style dataset while browsing Hugging Face Datasets. With Reinforcement Learning from Human Feedback (RLHF) becoming the primary way to train AI assistants, it’s great to see organizations like Anthropic making their RLHF dataset publicly available (released as: hh-rlhf).

Like other RLHF datasets, every example in this one includes an input prompt and two outputs generated by the LLM: a chosen output and a rejected output, where a human-rater preferred the former over the latter.

submitted by /u/cmauck10
[link] [comments]

Dataset For Hyper-partisan Or Politically Valenced Misinformation Articles In The UK.

Im looking for a dataset containing fact-checked news articles relevant for the UK political context. This is for a study manipulating politically congruent vs incongruent misinformation and attitudes towards it.

Ive been looking around for ages and am pretty sure that none exists (this kind of thing is understudied in the UK) but would very much appreciate suggestions of places to look, thanks 🙂

submitted by /u/Grouchy_Preparation1
[link] [comments]

Real World Sales Datasets? Any Good Datasets That I Could Use For My Power BI Portfolio As I Interview For Jobs?

I want to create a few Power BI dashboards for my public analytics portfolio site and am looking for sales datasets. I want to use real world sales data (not mock data) and am trying to find sales data that would interest a wide variety of audiences since I’ll be interviewing at a variety of different companies/organizations for my 1st official data analytics job. A dataset that is fairly “generic” and straightforward that won’t require a lot of explanation ahead of time (for example, something “generic” like Amazon sales data, except I assume Amazon doesn’t release their confidential sales data LOL).

I’m also looking at a lot of datasets on Kaggle, GitHub, etc, but I wanted to check if there were any other good sales datasets that you would recommend for this purpose (an entry-level analytics portfolio). I would greatly appreciate it! 😊

Any ideas?

submitted by /u/Expert-Rhubarb-987
[link] [comments]

Can Someone Please Help Me Compile Klay Thompson Data Into A Csv

Hey everyone, I’m taking a machine learning class in college and I want to build an R model that predicts Klay Thompson’s performance in NBA games. The problem is I can’t find a cleaned dataset with data from all 716 nba games he’s played, with all the covariates such as 3 pointers, rebounds, assists, free throws, etc. I found all this info on statmuse.com and that website that has a record of all the games he’s played but I need help compiling them into a csv. Can anyone help me do this?

submitted by /u/driftqueenjulie
[link] [comments]

Looking For Accessible ESG Datasets For School Project

Hi /r/datasets

For a school oroject I’m working on, I need data about ESG scores (preferably detailed for each pillar) for several companies (particularly European ones but anything goes) , supplementary data about different ESG criteria can be useful too Unfortunately, most data sources about this are very expensive or hardly useful… So any suggestions of accessible datasets like these would be very appreciated! Thanks in advance for any help!

PS : datasets about operational risks for companies can be interesting too

submitted by /u/floflo79
[link] [comments]

Looking For Dataset Of Correct And Incorrect Electronic Invoices

Looking for a dataset of electronic invoices with the following specs:

Type: Electronic invoices, not scanned docs, US invoices preferably

File Type: Pdf or jpg/png…

Quantity: At least 500 total invoices, preferably over 1,000

Additional details: The dataset needs to contain both correct and incorrect invoices. Incorrect invoices would be invoices that contain errors, inaccuracies or issues in them. Correct invoices need to have a tag in the name that indicates they are correct, same thing for the incorrect invoices. Not sure if this is the best move but I would be ok with having 2 separate datasets, 1 dataset of correct invoices and another dataset of incorrect invoices.

I am also open to suggestions of sites or resources that have invoices for web scrapping purposes.

I am willing to provide additional details if it helps.

Thanks in advance!

submitted by /u/souley16
[link] [comments]

Looking For A Good Fraud Data Set For A Class Project, Not Very Knowledgeable.

i somehow ended up in a data analytics class where I need to prepare a proposal for an investigation related to fraud and the prof has basically given us no insight. I need a data set that i can run at least three different supervised or semi-supervised analytical techniques on. I was thinking something related to spam email but i really don’t know what I’m looking for. Struggling to come up with good ideas. preferably simple, any help is greatly appreciated

submitted by /u/xnickg77
[link] [comments]