Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Can Anyone Help Me Find A Historical GPU Price Dataset?

Hello I am currently working on a university project and need the historical price data for graphics cards from Jan 1st 2019 to Mar 1st 2024. I am wanting to compare the prices of Nvidia and AMD cards over those years. I am looking for day to day data to look at price increases between specific cards. Does anyone know where I can find such datasets or can point me in the right direction? I have wanted to avoid web scraping up to this point, but won’t dismiss the idea if it comes to it. Thank you!

submitted by /u/DrGoneDirty
[link] [comments]

How To Create Bins And All Permutation And Combination To Analyse?

If I have 10,000 records of fields like CashAdvance, Interest Rate, Credit Score and Loan Term and if the loan was default or nor not (boolean 1,0). How do I find all permutation and combination of different ranges of these attributes where the loan was <10% default rate? So like,Bin1 – Credit score 652-673, AdvAmt 23-27K, Interest rate 12-15% and term months 3-7 had 8% defaulted loans. Bin 2 Credit score 625-632, AdvAmt 32-42K, Interest rate 2-5% and term months 6-9 had 5% default loans. Bin 3 Credit score 682-693, AdvAmt 13-17K, Interest rate 2-4% and term months 1-2 had 4% default loans Bin 4 Credit score 692-721, AdvAmt 74-95K, Interest rate 15-17% and term months 8-10 had 9% default loans so on and so forth? My question is how do I find these ranges for all the above mentioned attributes without manually creating where the default rate is low?

submitted by /u/southbeacher
[link] [comments]

80 Million Tiny Images Dataset Image Decoding Problem

I can’t get to visualize correctly the dataset, i’ve tried to convert the matlab script into a python script but this is the result:

https://drive.google.com/file/d/1kzA7mNC4th8nbJh4iGoaZJB_xV4HO7r_/view?usp=sharing

and this is the adapted script:

import numpy as np

import os import matplotlib.pyplot as plt

def load_tiny_images(ndx, filename=None): if filename is None: filename = ‘Z:/Tiny_Images_Dataset/data/tiny_images.bin’ # filename = ‘C:/atb/Databases/Tiny Images/tiny_images.bin’

sx = 32 #side size Nimages = len(ndx) nbytes_per_image = sx * sx * 3 img = np.zeros((sx * sx * 3, Nimages), dtype=np.uint8) pointer = (np.array(ndx) – 1) * nbytes_per_image # read data with open(filename, ‘rb’) as f: for i in range(Nimages): f.seek(pointer[i]) # moves the pointer to the beginning of the image img[:, i] = np.frombuffer(f.read(nbytes_per_image), dtype=np.uint8) img = img.reshape((sx, sx, 3, Nimages)) return img

def show_images(images): N = images.shape[3] fig, axes = plt.subplots(1, N, figsize=(N, 1)) if N == 1: axes = [axes] for i, ax in enumerate(axes): ax.imshow(images[:, :, :, i]) ax.axis(‘off’) plt.show()

load the first 10/79302017 imgs

img = load_tiny_images(list(range(1, 11)))

show_images(img)

What am i missing? is anyone able to correctly open it with python?

just for completeness, this is the original matlab code (i’m a total zero in matlab):

function img = loadTinyImages(ndx, filename)

% % Random access into the file of tiny images. % % It goes faster if ndx is a sorted list % % Input: % ndx = vector of indices % filename = full path and filename % Output: % img = tiny images [32x32x3xlength(ndx)]

if nargin == 1 filename = ‘Z:Tiny_Images_Datasetdatatiny_images.bin’; % filename = ‘C:atbDatabasesTiny Imagestiny_images.bin’; end

% Images sx = 32; Nimages = length(ndx); nbytesPerImage = sxsx3; img = zeros([sxsx3 Nimages], ‘uint8’);

% Pointer pointer = (ndx-1)*nbytesPerImage; offset = pointer; offset(2:end) = offset(2:end)-offset(1:end-1)-nbytesPerImage;

% Read data [fid, message] = fopen(filename, ‘r’); if fid == -1 error(message); end frewind(fid) for i = 1:Nimages fseek(fid, offset(i), ‘cof’); tmp = fread(fid, nbytesPerImage, ‘uint8’); img(:,i) = tmp; end fclose(fid);

img = reshape(img, [sx sx 3 Nimages]);

% load in first 10 images from 79,302,017 images img = loadTinyImages([1:10]);

useless to say: in matlab nothing is working, it gives me some path error i have no idea how to resolve and it shows no image etc, i can’t learn matlab now so i’d like to read this huge bin file with python, am i that fool?

Thanks a lot in advance for any help and sorry about my english

submitted by /u/AstroGippi
[link] [comments]

PISA Data Set Results Score, Not Found.

Hello everyone.

I want to do my master thesis about different ethnicities and their score on the pisa test. In the spss data set file from 2022 i can’t seem to find the results to the test, which makes doing regression analysis a bit hard. Does anybody know were i can find it?

submitted by /u/raceb4
[link] [comments]

Medical Dataset For Health Information Kind Like Blood Pressure And Presciprition Of Medicine. NO PII Needed

I’m a CS student aiming to use health information for ML purpose. I’ve found Mimic containing the information I want, I wonder if there are any other data sets contain the information. Like (blood pressure with dosage/prescription), I only need information about health information like blood pressure, weight or other parameters and prescription about medicine and dosage. No personal identification information is needed
Much Appreciated!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

submitted by /u/Humble_Dark_2107
[link] [comments]

Monthly Movie Aggregator: Sources For Reviews, Ratings, And Descriptions?

I’m a big movie person and my local theaters usually put out a schedule for the month. I’d like to compile this schedule and build some aggregator newsletter tickered towards my interests.

I’m seeking sources for information on a movie’s reviews, ratings, and descriptions. Letterboxd is def a source I want to use but it seems their API is not public. The only way is through a scrape? Which isn’t that bad. Has anyone seen some potential sources to use for a project similar to this?

submitted by /u/raz_the_kid0901
[link] [comments]

What Are Your Data Enhancement Dream Licenses

Hi there, I work for a large national non-profit as a data analyst for our fundraising campaign. I’ve been asked to provide a “dream budget” for licensing third-party data. B2B is the main focus, but understanding consumer behaviors with a place-based focus is very useful as well. Wealth, income, employment, philanthropic giving, Executive networks are all of interest. I’ve always wanted full access to things like Experian and Dunn and Bradstreet, but are there other sets, lists, databases that I should consider?

submitted by /u/xiancaldwell
[link] [comments]

How Do I Create A Database Of Restaurant Menus?

I’m currently trying to compile a database of the foodstuff restaurants offer, with my main focus being Melbourne – something of the form [restaurant, location, menuObject], where menuObject is an object containing the items on the menu. I have identified restaurants and extracted metadata using the Google Maps API.

Any ideas for compiling the menu part? I do need fairly good coverage for my study.

submitted by /u/Wackome
[link] [comments]

Calling All Data Wizards: Help Us Craft The Ultimate Amazon Seller Dataset!

Hey everyone!

Our organization is gearing up to create some awesome business intelligence solutions tailored specifically for Amazon sellers. We’re currently in the process of putting together a demo architecture, complete with a database and dashboard.

I’ve been assigned the task of sourcing a dataset containing information on Amazon sellers, with a primary focus on orders, returns, and product reviews.

I’ve already taken a look on Kaggle, but unfortunately, I’ve only managed to find datasets related to reviews.

Does anyone happen to have a sample dataset they could share, or perhaps some ideas on where else I might be able to find the data I need? Any help would be greatly appreciated!

submitted by /u/Fun_Signature_9812
[link] [comments]

Easy Dataset For General Linear Modeling?

I’m a senior stats major and am so utterly burnt out but my professor wants us to find an interesting dataset that we can apply GLM which I just can’t fathom doing. If anyone knows an easy dataset that would work you would be a lifesaver:) Extra brownie points if it’s music related because I might actually have some fun working with it lol

submitted by /u/makurroon_
[link] [comments]

Need 2 Datasets: One For Studying The Tradeoff Between Data Utility And Data Privacy, One To Study Investment In Clean Technology

Hello, I am writing a thesis (I am a student at CEMFI, Madrid.)

I have 2 projects to do:

Project 1: Use text data and do something fancy, I would like to study the tradeoff between data privacy and utility but I did not find any useful datasets.

Project 2:
I am writing a macroeconomic model about the optimal transitional dynamics towards more sustainable energy production. I am looking for a dataset with granular data where I could exploit some variation over the years in some interesting measures in order to calibrate my model.

I’d greatly appreciate any leads or suggestions on where to find relevant datasets for these projects. Thank you!

submitted by /u/Inevitable_Counter94
[link] [comments]

Is There A Good Up-to-date Rotten Tomatoes Dataset?

I’m looking for a Rotten Tomatoes dataset that has user reviews, critic reviews and movies (doesn’t need to necessarily have metadata but would be preferred) for a recommendation system I’m trying to build. Are there any good datasets that would work for this or would I need to attempt to scrape it myself (I have 0 experience webscraping).

submitted by /u/RealHellcharm
[link] [comments]