Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Historical Financial Data Available On DoltHub

You can find the following repositories on DoltHub:

Earnings Financial statements (balance sheets, income statements, cash flow statements). Annual figures back to 2012; Quarterly figures back to 2016. Covers stocks listed in the US Analyst estimates (sales and earnings per share). Recorded weekly. Data goes back to 2018. Covers stocks listed in the US Options Option prices, vols, greeks for SPDR ETFs and ETF components. Recorded Monday, Wednesday, and Friday. Saves 2, 4, 8 week expirations and does not save all strikes. Data goes back to 2019. Records 30 ATM volatility history for easy computation of implied volatility rank. Rates US Treasury interpolated yield curve as published by the Fed. Recorded daily. Data goes back to 1990. Stocks Daily prices, splits, dividends, and symbol info for US listed stocks. Data goes back to 2018. Symbols that have been delisted are still present in the data set.

DoltHub is an interface to dolt where you can query for data using the same SQL as you would in MySQL. This allows for much more flexible and powerful querying across datasets as opposed to extracting data from multiple CSVs.

Note: This is not self promotion as I am not affiliated with DoltHub

submitted by /u/funkinaround
[link] [comments]

Searching For A Dataset For NFL Salaries Going Back As Far As 1967

I know this may be a pipe dream, but I’ve been searching for salary information on 10-15 NFL players, most of which played in the ’90s, as far back as 1967. Does anyone have any idea as to where I could find this? nflfastr only dates back to 1999, sportsdata.io only goes back a few years

For reference, I am doing a draft analysis and I am trying to compare previous draft pick trades. I have a “draft pick value” chart, which gives me a number value for each pick in the draft. I want to compare each side of the trade and say if it were a “good deal” or a “bad deal” based on the total value from each side. This works fine when there are only picks being traded, but when there are players involved, I am having a hard time objectively comparing each side. My thought is that the first overall pick is paid a certain amount of money when they are drafted, so I can find the value of a player by giving them the value of the first overall pick multiplied by how many first round salaries are within that players contract. (ex. pick 1.01 is worth 3000 and is paid $7M/year, player is paid $14M per year and worth 6000 because 3000*2=6000). Any ideas on a different way to try this would help too.

submitted by /u/jordanar189
[link] [comments]

Inorganic Chlorides And Their Associated Boiling Points

Hey!

I’m not sure if this is the right place to ask this but, I was curious if anyone here knew of a list of inorganic chlorides and their associated boiling points.

I have been working on a project to “distill” specific chlorides from a mixed group of salts. I am looking for a list of inorganic chlorides and their boiling points so I can determine the temperature ranges and the elements that are in each temperature band.

Also, if folks know of a place to find the phase diagrams for inorganic chlorides.

submitted by /u/minimalweirdness
[link] [comments]

Shapefile For 1987 Westminster Constituencies

I am struggling to find a single source for a shapefile of England, Wales and Scottland westminster constituency boundaries for 1987 (which I believe are the same as for 1983). I want to make a chloropleth of some data I have on MPs, but I can only find seperate shapefiles. I would piece them together, but I’m a beginner with this stuff, so would like to avoid that if I can. Many thanks

submitted by /u/MacAnBhacaigh
[link] [comments]

[Q] Need Free Dataset For Business That Applied “industry 4.0” Related Innovations

Hi,

as the headline says, I would need guidance or direction on where to retrieve data for my homework. I have 15 days to complete and teacher is not help. I should work on my own apparently.

I am working on a paper where I should analyze industry 4.0 as : what it is, what technologies are involved, what are potential applications, etc. (EASY PART) and then pick specific showcases, analyze them and how new implemented technologies specifically improved their performances in comparison to competition.

I tried to cite some annual reports but was told that’s not what I should deliver. I should make my own OLS of my own datasource and make multi-variable analysis (e.g. productivity of assembly line increased due to : trend / multiple variables usually involded / gross fixed capital formation – justified on side as the new technology investment)

To be honest, I am lost. School is not much help. If I don’t do this in 15 days I am expelled… I also work FTE and pay the tuition, so the lack of guidance is not what I need right now. You might see this spammed all over reddit now.

Anyone knows how I can actually retrieve data for this?

submitted by /u/-Belon-
[link] [comments]

Need A Public Access-free-very Easy Data Set. Please. For R Ggplot Exercise

Hello redditors,
For a university task regarding R visualizations with GGPLOT and SHINY: I’m working on a COVID in Spain data set but I finding so many difficulties: R studio running slow because of 2million rows, charts not working completly…So I’m giving up and would like to start over with an easier and smaller data set.
What we are requested is that it is free and public access.

Could you recommend me a dataset, maybe related with cars, sales, touristic destination (something easy to analyze) that contains around 5-8 columns maximum and not more than some few thousands of lines? The topic is free.

Thank you 🙂

submitted by /u/aquakeyblademaster
[link] [comments]

Dataset On The Arts & Culture Sector Of United States

SMU DataArts offers detailed financial, operational, and programmatic information from thousands of nonprofit arts and cultural organizations nationwide. Files contain disaggregated unprocessed data fields in Comma Separated Value (CSV) format, and are intended for academics, students, and independent researchers with experience using raw structured data to perform calculations and analyses. Data access fee is waived for those using data for academic purposes.

https://www.culturaldata.org/what-we-do/for-researchers-advocates/access-the-dataset/

submitted by /u/planbecca
[link] [comments]

I Want Dataset For Topic Modeling In Json Format

I need dataset for this
Using the concept of topic modeling, implement it using:

(i) Rule-based method

(ii) Latent Dirichlet Allocation LDA method

For your convenience, take any unlabeled dataset

Perform data cleaning

Use TF-IDF vectorizer and any clustering method in case of Rule-based method

Fit LatentDirichletAllocation estimator in case of LDA method

submitted by /u/Particular-Pie-1640
[link] [comments]

School Assignment: Needs Dataset About Road Quality In Europe

Does anyone know a dataset about road qualities that i can use (for free).

I am working on a school assignment about the trafic situation in europe and currently the best website i found is this : https://www.theglobaleconomy.com/rankings/roads_quality/Europe/

However, this dataset isn’t free to use. Maybe this community has some datasets available for the task i want to perform.

Thanks in advance!

submitted by /u/Just_Presence_1414
[link] [comments]

Anthropic RLHF Dataset: Human Preference Data (+ Errors I Found)

Hello friends!

I recently found this RLHF-style dataset while browsing Hugging Face Datasets. With Reinforcement Learning from Human Feedback (RLHF) becoming the primary way to train AI assistants, it’s great to see organizations like Anthropic making their RLHF dataset publicly available (released as: hh-rlhf).

Like other RLHF datasets, every example in this one includes an input prompt and two outputs generated by the LLM: a chosen output and a rejected output, where a human-rater preferred the former over the latter.

submitted by /u/cmauck10
[link] [comments]

Real World Sales Datasets? Any Good Datasets That I Could Use For My Power BI Portfolio As I Interview For Jobs?

I want to create a few Power BI dashboards for my public analytics portfolio site and am looking for sales datasets. I want to use real world sales data (not mock data) and am trying to find sales data that would interest a wide variety of audiences since I’ll be interviewing at a variety of different companies/organizations for my 1st official data analytics job. A dataset that is fairly “generic” and straightforward that won’t require a lot of explanation ahead of time (for example, something “generic” like Amazon sales data, except I assume Amazon doesn’t release their confidential sales data LOL).

I’m also looking at a lot of datasets on Kaggle, GitHub, etc, but I wanted to check if there were any other good sales datasets that you would recommend for this purpose (an entry-level analytics portfolio). I would greatly appreciate it! 😊

Any ideas?

submitted by /u/Expert-Rhubarb-987
[link] [comments]