Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Good APIs For Financial/trading Data (OHLC, Volume Etc.)

Hi, I am planning to create a data science-related portfolio project, and I want it to be focused on finance. So, I am considering using a free Python API where I can access OHLC data, volume, etc., enabling me to create indicators, conduct modeling, perform price prediction, sentiment analysis, and more. It can be stocks, options, or cryptocurrencies; I am indifferent, as long as the API is reliable. A few months ago, I utilized the yfinance Python library, but it appears that Yahoo Finance is reluctant to share their data, as I encountered numerous issues with blocked requests, etc. Currently, I am contemplating the Binance API. Although I have not yet used it, I have heard that it provides an extensive amount of data. Can anyone confirm this? Thanks in advance.

submitted by /u/-Oake
[link] [comments]

Make Graphs With Large Data Sets In Excel?

Hello data experts! I recently graduated as an analytical chemistry and started working for a system integrating company as an R&D specialist. I test and validate instrumentation, and develop applications for specific analyses among other activities.
In my latest project I collect data every ten seconds 24/7 from multiple inputs which at the end of the week leaves me with hundreds of thousands of data point. Graphing these data sets with Excel has become almost impossible even after reducing the number of points. What programs/procedures would you recommend to make these graphs and analyse trends without the program crashing on me every time I change anything? I haven’t used anything else other than Excel up to this point and my experience with programming is non existent. Definitely willing to explore options if it means fast and efficient data analysis. Help is much appreciated, A starting data analyst

submitted by /u/Leading-Click-7558
[link] [comments]

Looking For Zapier Datasets On Industries Or Companies That Use Zapier

I have a new startup company that is using Zapier and i am searching for other small business owners and startup clients

I came across this post on https://www.usesignhouse.com/blog/zapier-stats which breaks down the top industries that use Zapier and it lead me here

I will like to ask if you can share the dataset you used for the analysis or if anyone can point me in the right direction so i can get the list and distribution of the various types of companies that use Zapier so i can target similar companies for my marketing.

I am looking for datasets in a csv format i can further analyze industries or companies using data analytics to find a good niche that is underserved but needs Zapier automations so i can find clients.

Any help would be appreciated.

submitted by /u/cool-pop
[link] [comments]

Speech Datasets That Capture Numeral Errors (i.e., 57 > 75)?

Hi everyone.

Not sure if this exist since things are usually cleaned up quite a bit before going public, but are there any data sources that could be used to study common numeral errors?

Mainly interested in instances of leading-digit bias (i.e. reading 9.88 as 9 instead of 10), but that’s even weirder and harder to track down in speech. No way of filtering out ‘misspeaks’ in major corpora like ANC or COCA, AFAIK. Any recommendations or leads?

submitted by /u/dennu9909
[link] [comments]

Looking For Advice On Creating A Dataset Of Exam Questions From A Set Of Exam Papers

Hi,

I’m trying to create a dataset of exam questions from the A-level Edexcel Physics question papers.

Here’s a sample paper%2520QP.pdf) for example.

Ideally, I’d want to extract all the text, equations properly and the images (mainly graphs and diagrams) through just uploading the file but I assume this isn’t feasible as far as I know.

What I’m doing right now is just using PyPDF to extract the text alone and I’m ignoring possible errors in the format in which equations may be extracted in (which puts me in a difficult position, when there are more complex equations involved that just straight one line formulas). I’m then just manually cleaning it up, using regular expressions where I can to simplify the process. After that, I plan on just manually ‘snipping’ the images out and put all of this into a MySQL database.

The project I’m working on rn is a question suggestion system based on content and question difficulty and I’m using a very specific subset of questions, as I mentioned earlier, just because I’m not too committed atm to tediously creating a dataset. I’m not even sure if storing this in MySQL is a good idea and I’ve personally never worked on any ML projects that don’t involve .csv files or aren’t image datasets, so I am pretty lost on this.

Any advice would be super highly appreciated! Wish you a great day 🙂

submitted by /u/cakeandflowers2202
[link] [comments]

Does Discogs Allow A Data Export From The “whole” Marketplace?

I’m working on building a dashboard for an interview at a sales org. I figured sales data/price tracking with Discogs would be a good fit, and is something I can speak to and am interested in.

It looks like the only exportable data they can give is from my own marketplace/sales data. Unfortunately, I haven’t done enough sales on Discogs to build anything interesting/compelling. Is there a way I can get a CSV with the sales data across the whole Discogs marketplace? Or another way to gather a wider swath of metrics from their sales?

(I’m also open to hearing other suggestions, if you can think of a place that has CSV data in multiple tables pertaining to sales that I can easily download/import.)

submitted by /u/eulerpony
[link] [comments]

GPS Dataset Columns Interpretations.

Hey Data Scientists,I’ve been working with a GPS dataset for vehicle routing, but I’m having trouble interpreting some of the columns. The dataset doesn’t have column names, but I’ve managed to figure out some of them:

First column: Vehicle ID Second column: Timestamps Third column: Longitude Fourth column: Latitude Seventh column: Speed (I’ve determined this through patterns in the data)

However, I’m still unsure about the remaining columns:

Fifth column: This column starts with a value of 319 and keeps changing increasingly in general even though the vehicle is stationary. I noticed that the value stays constant when speed is constant. Sixth column: This column starts at 0 (the vehicle is stationary), moves up to 303 once the vehicle starts moving slightly, and goes back to 0 when the vehicle is stationary. Also, it shows a constant behaviour when speed is constant Eighth column: This column changes with location change, similar to the speed column. However, when the longitude and latitude remain constant, the values are 0. Any ideas on what this column signifies?

submitted by /u/ziade_e
[link] [comments]

Casia NIR-VIS 2.0 Dataset For Facial Images

The Casia NIR-VIS 2.0 dataset is for infrared and corresponding visible images. But when asked for access by mailing the email given, it says the domain is not available.

Is there anyone who knows about it’s availability and how to access it?

Or any other large scale publicly available dataset for NIR-VIS images

Thanks in advance!

submitted by /u/Ok_Championship9256
[link] [comments]

Could Someone Please Recommend A Source For Data On Dietary Intake For Many Different Countries?

I want to analyze the prevalence of dementia across different countries (compared with the proportion of elderly for reference) and thought it would be interesting if I compared it with something dietary, just to see if there is any correlation at all that I could look into further. Where can I find datasets on specific dietary aspects for many different countries? For example: saturated fat or omega-3 fatty acid intake. Anything is good! It doesn’t have to be relating to fat specifically, though that would be cool. The most important thing I think is that it covers many different countries.

submitted by /u/Relevant_Engineer442
[link] [comments]

Looking For Dataset Of Songs Sorted By Repetitiveness

Hi. I’m a desperate psychology PhD student looking for experimental stimuli for one of my experiments. I am studying how repetition in music is linked to cognitive mechanisms and how it affects aesthetic appraisal.
As the title says, I am looking for a dataset or database of songs/melodies/auditory stimuli that can be sorted from the most repetitive to the least repetitive. Looked everywhere but could not find one that suits my needs. Stumbled upon FMA but I am a bit lost in all the programming lingo and I don’t seem to find what I need in there.
Any lead would be appreciated, thanks in advance!

submitted by /u/AlexrooXell
[link] [comments]

Looking For A Simple Country/territory Dataset With Population And Area

Hi all, subject says it all. I was using the countryinfo package in Python but it is incomplete and outdated, I think. Any format would be fine, REST API form of some kind would be the very best though. Thank you in advance. I apologize if this is common knowledge or asked frequently.

edit: when I say territory, I mean that distinct/separate areas should be separate in the dataset, for instance Puerto Rico should be separate from the 50 states; British territories in the Caribbean should be separate, etc

submitted by /u/Beneficial_Order1050
[link] [comments]

Looking For A Dataset Of Photoshopped Images Of People

I’m looking for an image dataset of photos of people that have been photoshopped. The quality of the photoshopping is not particularly important, but ideally would have a range of good and bad photoshopping. Does such a dataset exist? I’m new to this area and to this sub, so apologies in advance if I am somehow breaching expected conduct! Thank you so much

submitted by /u/spring_beauty
[link] [comments]

Looking For A Dataset For Business Scheduling

Currently hitting a roadblock on a project of mine, I can’t seem to find a dataset for business schedules. The precise data I need is time and dates of business meetings and/or employee tasks.

A potential solution I’m starting to think of falling back on is mocking the data, in that case I’d love to hear any constraints I should introduce to the process to make the data at least quasi-realistic (like avoiding weekends/off work hours, concurrent meetings/tasks…etc)

Any help would be highly appreciated!

submitted by /u/pyrocitor02
[link] [comments]

Dataset For Customer Acceptance Rates On Financial Offers For Master’s Thesis

Hi everyone!
I’m working on my master’s thesis and urgently need a dataset on customer acceptance rates for insurance/private loans/mortgage offers. My collaboration with a bank has been delayed, and I’m exploring other options to test my models. I’ve checked Kaggle and Google Dataset Search with no luck.
Does anyone have leads on where I can find real data from banks or insurance companies? Any help or direction would be greatly appreciated!
Thanks!

submitted by /u/Intrepid_Patience_37
[link] [comments]

Dataset Needed For My Tableau Project

Hey everyone, I have been taking a Data Visualisation & Storytelling course. The curriculum involved data cleaning techniques, representing data to build a story around it, and teaching us Tableau & PowerBI. Now that we are at the end of the course, we need to choose a dataset and work on it using Tableau. I will say we were taught an intermediate level of Tableau. Please suggest a few datasets that you think will be perfect to work using Tableau. TIA!

submitted by /u/riteshzd
[link] [comments]