Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

80 Million Tiny Images Dataset Image Decoding Problem

I can’t get to visualize correctly the dataset, i’ve tried to convert the matlab script into a python script but this is the result:

https://drive.google.com/file/d/1kzA7mNC4th8nbJh4iGoaZJB_xV4HO7r_/view?usp=sharing

and this is the adapted script:

import numpy as np

import os import matplotlib.pyplot as plt

def load_tiny_images(ndx, filename=None): if filename is None: filename = ‘Z:/Tiny_Images_Dataset/data/tiny_images.bin’ # filename = ‘C:/atb/Databases/Tiny Images/tiny_images.bin’

sx = 32 #side size Nimages = len(ndx) nbytes_per_image = sx * sx * 3 img = np.zeros((sx * sx * 3, Nimages), dtype=np.uint8) pointer = (np.array(ndx) – 1) * nbytes_per_image # read data with open(filename, ‘rb’) as f: for i in range(Nimages): f.seek(pointer[i]) # moves the pointer to the beginning of the image img[:, i] = np.frombuffer(f.read(nbytes_per_image), dtype=np.uint8) img = img.reshape((sx, sx, 3, Nimages)) return img

def show_images(images): N = images.shape[3] fig, axes = plt.subplots(1, N, figsize=(N, 1)) if N == 1: axes = [axes] for i, ax in enumerate(axes): ax.imshow(images[:, :, :, i]) ax.axis(‘off’) plt.show()

load the first 10/79302017 imgs

img = load_tiny_images(list(range(1, 11)))

show_images(img)

What am i missing? is anyone able to correctly open it with python?

just for completeness, this is the original matlab code (i’m a total zero in matlab):

function img = loadTinyImages(ndx, filename)

% % Random access into the file of tiny images. % % It goes faster if ndx is a sorted list % % Input: % ndx = vector of indices % filename = full path and filename % Output: % img = tiny images [32x32x3xlength(ndx)]

if nargin == 1 filename = ‘Z:Tiny_Images_Datasetdatatiny_images.bin’; % filename = ‘C:atbDatabasesTiny Imagestiny_images.bin’; end

% Images sx = 32; Nimages = length(ndx); nbytesPerImage = sxsx3; img = zeros([sxsx3 Nimages], ‘uint8’);

% Pointer pointer = (ndx-1)*nbytesPerImage; offset = pointer; offset(2:end) = offset(2:end)-offset(1:end-1)-nbytesPerImage;

% Read data [fid, message] = fopen(filename, ‘r’); if fid == -1 error(message); end frewind(fid) for i = 1:Nimages fseek(fid, offset(i), ‘cof’); tmp = fread(fid, nbytesPerImage, ‘uint8’); img(:,i) = tmp; end fclose(fid);

img = reshape(img, [sx sx 3 Nimages]);

% load in first 10 images from 79,302,017 images img = loadTinyImages([1:10]);

useless to say: in matlab nothing is working, it gives me some path error i have no idea how to resolve and it shows no image etc, i can’t learn matlab now so i’d like to read this huge bin file with python, am i that fool?

Thanks a lot in advance for any help and sorry about my english

submitted by /u/AstroGippi
[link] [comments]

PISA Data Set Results Score, Not Found.

Hello everyone.

I want to do my master thesis about different ethnicities and their score on the pisa test. In the spss data set file from 2022 i can’t seem to find the results to the test, which makes doing regression analysis a bit hard. Does anybody know were i can find it?

submitted by /u/raceb4
[link] [comments]

Medical Dataset For Health Information Kind Like Blood Pressure And Presciprition Of Medicine. NO PII Needed

I’m a CS student aiming to use health information for ML purpose. I’ve found Mimic containing the information I want, I wonder if there are any other data sets contain the information. Like (blood pressure with dosage/prescription), I only need information about health information like blood pressure, weight or other parameters and prescription about medicine and dosage. No personal identification information is needed
Much Appreciated!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

submitted by /u/Humble_Dark_2107
[link] [comments]

Monthly Movie Aggregator: Sources For Reviews, Ratings, And Descriptions?

I’m a big movie person and my local theaters usually put out a schedule for the month. I’d like to compile this schedule and build some aggregator newsletter tickered towards my interests.

I’m seeking sources for information on a movie’s reviews, ratings, and descriptions. Letterboxd is def a source I want to use but it seems their API is not public. The only way is through a scrape? Which isn’t that bad. Has anyone seen some potential sources to use for a project similar to this?

submitted by /u/raz_the_kid0901
[link] [comments]

What Are Your Data Enhancement Dream Licenses

Hi there, I work for a large national non-profit as a data analyst for our fundraising campaign. I’ve been asked to provide a “dream budget” for licensing third-party data. B2B is the main focus, but understanding consumer behaviors with a place-based focus is very useful as well. Wealth, income, employment, philanthropic giving, Executive networks are all of interest. I’ve always wanted full access to things like Experian and Dunn and Bradstreet, but are there other sets, lists, databases that I should consider?

submitted by /u/xiancaldwell
[link] [comments]

How Do I Create A Database Of Restaurant Menus?

I’m currently trying to compile a database of the foodstuff restaurants offer, with my main focus being Melbourne – something of the form [restaurant, location, menuObject], where menuObject is an object containing the items on the menu. I have identified restaurants and extracted metadata using the Google Maps API.

Any ideas for compiling the menu part? I do need fairly good coverage for my study.

submitted by /u/Wackome
[link] [comments]

Calling All Data Wizards: Help Us Craft The Ultimate Amazon Seller Dataset!

Hey everyone!

Our organization is gearing up to create some awesome business intelligence solutions tailored specifically for Amazon sellers. We’re currently in the process of putting together a demo architecture, complete with a database and dashboard.

I’ve been assigned the task of sourcing a dataset containing information on Amazon sellers, with a primary focus on orders, returns, and product reviews.

I’ve already taken a look on Kaggle, but unfortunately, I’ve only managed to find datasets related to reviews.

Does anyone happen to have a sample dataset they could share, or perhaps some ideas on where else I might be able to find the data I need? Any help would be greatly appreciated!

submitted by /u/Fun_Signature_9812
[link] [comments]

Easy Dataset For General Linear Modeling?

I’m a senior stats major and am so utterly burnt out but my professor wants us to find an interesting dataset that we can apply GLM which I just can’t fathom doing. If anyone knows an easy dataset that would work you would be a lifesaver:) Extra brownie points if it’s music related because I might actually have some fun working with it lol

submitted by /u/makurroon_
[link] [comments]

Need 2 Datasets: One For Studying The Tradeoff Between Data Utility And Data Privacy, One To Study Investment In Clean Technology

Hello, I am writing a thesis (I am a student at CEMFI, Madrid.)

I have 2 projects to do:

Project 1: Use text data and do something fancy, I would like to study the tradeoff between data privacy and utility but I did not find any useful datasets.

Project 2:
I am writing a macroeconomic model about the optimal transitional dynamics towards more sustainable energy production. I am looking for a dataset with granular data where I could exploit some variation over the years in some interesting measures in order to calibrate my model.

I’d greatly appreciate any leads or suggestions on where to find relevant datasets for these projects. Thank you!

submitted by /u/Inevitable_Counter94
[link] [comments]

Is There A Good Up-to-date Rotten Tomatoes Dataset?

I’m looking for a Rotten Tomatoes dataset that has user reviews, critic reviews and movies (doesn’t need to necessarily have metadata but would be preferred) for a recommendation system I’m trying to build. Are there any good datasets that would work for this or would I need to attempt to scrape it myself (I have 0 experience webscraping).

submitted by /u/RealHellcharm
[link] [comments]

Ai Datasets Built By Community – Need Feedback

hey there,

after 5 years of building AI models from scratch I know to the bone the importance of dataset to model quality. hence openai is there where it is, solely bc of qualitative dataset.

haven’t seen a good “service” that offers a way to build a dataset (any task: chat, instruct, qa, speech, etc) that’s baked by community.

thinking to start a service that will help companies & individuals to build a dataset by rewarding people w/ a crypto coin as a incentivization mechanism . after ds is build ~data’s collection finalized, that could be sent to HF or any other service for model training / finetuning.

what’s your feedback folks? what do you think about this? does the market exists?

submitted by /u/betimd
[link] [comments]