Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

ISP/TELCO Internet Speed Test And Usage Data

Looking for sources particularly for the West African market, but happy with other sources too. Not too concerned about granularity (by ip, location, etc) anything useful works. I know Facebook and Netflix collect some of the data, but not sure how to buy/get it from them. I assume other platforms – Google(YouTube) do as well. Thanks in advance!

submitted by /u/cr_re
[link] [comments]

Images For Vehicle Wheels From Such An Angle Where All Of Their Wheels Are Visible.

Hello everyone.

I’ve been tasked with finding a suitable dataset for our latest project, which involves training a model to recognize the number of wheels on vehicles. Despite extensive research, I haven’t been able to locate a suitable dataset online.
If anyone knows of a dataset or has access to one that could fulfill our requirements, I would greatly appreciate it if you could share the link or provide any assistance in obtaining it.

submitted by /u/Otherwise-Big-5537
[link] [comments]

Monthly Average Temperatures At Different Lattitudes

Hello! I need a dataset that contains monthly average temperatures at different lattitudes, going as far back as the 1900s. Where can I find something like this?

Also, I saw monthly temperature anomaly data on NOAA’s Climate at a glance tool, which were with respect to the 1901-2000 average. However, I cannot seem to find the 1901-2000 average data. Do any of you know where I can find it? (https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/global/time-series)

I really appreciate the help!

submitted by /u/Many_Flatworm_1372
[link] [comments]

Can Anyone Help Me Find A Historical GPU Price Dataset?

Hello I am currently working on a university project and need the historical price data for graphics cards from Jan 1st 2019 to Mar 1st 2024. I am wanting to compare the prices of Nvidia and AMD cards over those years. I am looking for day to day data to look at price increases between specific cards. Does anyone know where I can find such datasets or can point me in the right direction? I have wanted to avoid web scraping up to this point, but won’t dismiss the idea if it comes to it. Thank you!

submitted by /u/DrGoneDirty
[link] [comments]

How To Create Bins And All Permutation And Combination To Analyse?

If I have 10,000 records of fields like CashAdvance, Interest Rate, Credit Score and Loan Term and if the loan was default or nor not (boolean 1,0). How do I find all permutation and combination of different ranges of these attributes where the loan was <10% default rate? So like,Bin1 – Credit score 652-673, AdvAmt 23-27K, Interest rate 12-15% and term months 3-7 had 8% defaulted loans. Bin 2 Credit score 625-632, AdvAmt 32-42K, Interest rate 2-5% and term months 6-9 had 5% default loans. Bin 3 Credit score 682-693, AdvAmt 13-17K, Interest rate 2-4% and term months 1-2 had 4% default loans Bin 4 Credit score 692-721, AdvAmt 74-95K, Interest rate 15-17% and term months 8-10 had 9% default loans so on and so forth? My question is how do I find these ranges for all the above mentioned attributes without manually creating where the default rate is low?

submitted by /u/southbeacher
[link] [comments]

80 Million Tiny Images Dataset Image Decoding Problem

I can’t get to visualize correctly the dataset, i’ve tried to convert the matlab script into a python script but this is the result:

https://drive.google.com/file/d/1kzA7mNC4th8nbJh4iGoaZJB_xV4HO7r_/view?usp=sharing

and this is the adapted script:

import numpy as np

import os import matplotlib.pyplot as plt

def load_tiny_images(ndx, filename=None): if filename is None: filename = ‘Z:/Tiny_Images_Dataset/data/tiny_images.bin’ # filename = ‘C:/atb/Databases/Tiny Images/tiny_images.bin’

sx = 32 #side size Nimages = len(ndx) nbytes_per_image = sx * sx * 3 img = np.zeros((sx * sx * 3, Nimages), dtype=np.uint8) pointer = (np.array(ndx) – 1) * nbytes_per_image # read data with open(filename, ‘rb’) as f: for i in range(Nimages): f.seek(pointer[i]) # moves the pointer to the beginning of the image img[:, i] = np.frombuffer(f.read(nbytes_per_image), dtype=np.uint8) img = img.reshape((sx, sx, 3, Nimages)) return img

def show_images(images): N = images.shape[3] fig, axes = plt.subplots(1, N, figsize=(N, 1)) if N == 1: axes = [axes] for i, ax in enumerate(axes): ax.imshow(images[:, :, :, i]) ax.axis(‘off’) plt.show()

load the first 10/79302017 imgs

img = load_tiny_images(list(range(1, 11)))

show_images(img)

What am i missing? is anyone able to correctly open it with python?

​

just for completeness, this is the original matlab code (i’m a total zero in matlab):

​

function img = loadTinyImages(ndx, filename)

% % Random access into the file of tiny images. % % It goes faster if ndx is a sorted list % % Input: % ndx = vector of indices % filename = full path and filename % Output: % img = tiny images [32x32x3xlength(ndx)]

if nargin == 1 filename = ‘Z:Tiny_Images_Datasetdatatiny_images.bin’; % filename = ‘C:atbDatabasesTiny Imagestiny_images.bin’; end

% Images sx = 32; Nimages = length(ndx); nbytesPerImage = sxsx3; img = zeros([sxsx3 Nimages], ‘uint8’);

% Pointer pointer = (ndx-1)*nbytesPerImage; offset = pointer; offset(2:end) = offset(2:end)-offset(1:end-1)-nbytesPerImage;

% Read data [fid, message] = fopen(filename, ‘r’); if fid == -1 error(message); end frewind(fid) for i = 1:Nimages fseek(fid, offset(i), ‘cof’); tmp = fread(fid, nbytesPerImage, ‘uint8’); img(:,i) = tmp; end fclose(fid);

img = reshape(img, [sx sx 3 Nimages]);

% load in first 10 images from 79,302,017 images img = loadTinyImages([1:10]);

useless to say: in matlab nothing is working, it gives me some path error i have no idea how to resolve and it shows no image etc, i can’t learn matlab now so i’d like to read this huge bin file with python, am i that fool?

​

Thanks a lot in advance for any help and sorry about my english

submitted by /u/AstroGippi
[link] [comments]

PISA Data Set Results Score, Not Found.

Hello everyone.

I want to do my master thesis about different ethnicities and their score on the pisa test. In the spss data set file from 2022 i can’t seem to find the results to the test, which makes doing regression analysis a bit hard. Does anybody know were i can find it?

​

submitted by /u/raceb4
[link] [comments]

Medical Dataset For Health Information Kind Like Blood Pressure And Presciprition Of Medicine. NO PII Needed

I’m a CS student aiming to use health information for ML purpose. I’ve found Mimic containing the information I want, I wonder if there are any other data sets contain the information. Like (blood pressure with dosage/prescription), I only need information about health information like blood pressure, weight or other parameters and prescription about medicine and dosage. No personal identification information is needed
Much Appreciated!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

submitted by /u/Humble_Dark_2107
[link] [comments]

Monthly Movie Aggregator: Sources For Reviews, Ratings, And Descriptions?

I’m a big movie person and my local theaters usually put out a schedule for the month. I’d like to compile this schedule and build some aggregator newsletter tickered towards my interests.

I’m seeking sources for information on a movie’s reviews, ratings, and descriptions. Letterboxd is def a source I want to use but it seems their API is not public. The only way is through a scrape? Which isn’t that bad. Has anyone seen some potential sources to use for a project similar to this?

submitted by /u/raz_the_kid0901
[link] [comments]

What Are Your Data Enhancement Dream Licenses

Hi there, I work for a large national non-profit as a data analyst for our fundraising campaign. I’ve been asked to provide a “dream budget” for licensing third-party data. B2B is the main focus, but understanding consumer behaviors with a place-based focus is very useful as well. Wealth, income, employment, philanthropic giving, Executive networks are all of interest. I’ve always wanted full access to things like Experian and Dunn and Bradstreet, but are there other sets, lists, databases that I should consider?

submitted by /u/xiancaldwell
[link] [comments]

How Do I Create A Database Of Restaurant Menus?

I’m currently trying to compile a database of the foodstuff restaurants offer, with my main focus being Melbourne – something of the form [restaurant, location, menuObject], where menuObject is an object containing the items on the menu. I have identified restaurants and extracted metadata using the Google Maps API.

Any ideas for compiling the menu part? I do need fairly good coverage for my study.

​

submitted by /u/Wackome
[link] [comments]

Calling All Data Wizards: Help Us Craft The Ultimate Amazon Seller Dataset!

Hey everyone!

Our organization is gearing up to create some awesome business intelligence solutions tailored specifically for Amazon sellers. We’re currently in the process of putting together a demo architecture, complete with a database and dashboard.

I’ve been assigned the task of sourcing a dataset containing information on Amazon sellers, with a primary focus on orders, returns, and product reviews.

I’ve already taken a look on Kaggle, but unfortunately, I’ve only managed to find datasets related to reviews.

Does anyone happen to have a sample dataset they could share, or perhaps some ideas on where else I might be able to find the data I need? Any help would be greatly appreciated!

submitted by /u/Fun_Signature_9812
[link] [comments]

Easy Dataset For General Linear Modeling?

I’m a senior stats major and am so utterly burnt out but my professor wants us to find an interesting dataset that we can apply GLM which I just can’t fathom doing. If anyone knows an easy dataset that would work you would be a lifesaver:) Extra brownie points if it’s music related because I might actually have some fun working with it lol

submitted by /u/makurroon_
[link] [comments]

Need 2 Datasets: One For Studying The Tradeoff Between Data Utility And Data Privacy, One To Study Investment In Clean Technology

Hello, I am writing a thesis (I am a student at CEMFI, Madrid.)

I have 2 projects to do:

Project 1: Use text data and do something fancy, I would like to study the tradeoff between data privacy and utility but I did not find any useful datasets.

Project 2:
I am writing a macroeconomic model about the optimal transitional dynamics towards more sustainable energy production. I am looking for a dataset with granular data where I could exploit some variation over the years in some interesting measures in order to calibrate my model.

I’d greatly appreciate any leads or suggestions on where to find relevant datasets for these projects. Thank you!

submitted by /u/Inevitable_Counter94
[link] [comments]