Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Looking For The Reliable Data Set For Representation Learning For The Large Scale Dynamic Network

Hi I am now doing the project associated with use the representation learning for the large scale dynamic network. And my work now is based on the reddit data set. I am trying to find the data set which include the time series stamps, making the user as the node of the graph, I can build the edge for the different nodes.

I have tried some source, but these data did not meet my requirement in some places:

https://snap.stanford.edu/graphsage/ ( the node is not user)

https://github.com/dingidng/reddit-dataset ( can only build very few edges)

And I am confused how can I get the access to use the reddit data set with the pushshift, since I was rejected by the bot for many times. And how can I use the data set in the pushshift platform. If anyone can help me find the reliable and useful data source, thank you so much!

Thank you in advance for any help and suggestions.

submitted by /u/Terrible_Band6290
[link] [comments]

REDD Dataset – How Do We Get It Now?

I am looking for access to the redd dataset. The link – redd.csail.mit.edu has been dead for a few months. How do we download it now? The archive [page](https://web.archive.org/web/20220812015008/http://redd.csail.mit.edu/) prevents me from downloading it because it requires password. Could i pass a username/password as a query parameter/cookie and download it? If so, what are the credentials? Are there other alternatives, or ideas for how to acquire it?

submitted by /u/DuckHunterZx
[link] [comments]

Two Sizes Given For MIMIC-III Files On Its Webpage

The webpage for MIMIC-III shows that the full zip file for download is 6.2GB. In particular chartevents.csv.gz file is listed as 4.0GB. Download process in the browse shows 6.2 GB to be downloaded, but it is very-very slow.

The webpage also gives a wget command to download on command line, and this command says the total data size is 4.2 GB. In its download, the chartevents.csv.gz file is 2.3 GB. BTW, this method is about 8 times faster than the browser-based download.

Would appreciated insight into this difference. Has anyone encountered this before?

submitted by /u/Far-Cantaloupe4144
[link] [comments]

Reliable Sources For Population By Country?

Hi all,

I recently started a project where I’d need to collect the following data:

the population of various countries across the world

-cost of electric per use in said country

-total hotels in x country

total grocery stores in x country

-average hotel size (sq ft)

-average grocery store size (sq ft)

As a college Freshman this is my first research project and would like to know what steps/ sources would be most useful to collect this data. My first instinct is to just do google searches but I don’t know if there is a data base of method more professional.

submitted by /u/dumbbitch44
[link] [comments]

In Search Of Raw Dataset For PGA Golf Courses

I’m wondering if there are advanced metrics for specific golf courses available somewhere online.

For example, if I wanted to know what percentage of time PGA golfers, during tournament play, hit into the fairway bunker on the 1st hole of Augusta, where could I go to find that info? These types of stats have to exist somewhere, right?

Shotlink appears to be a relatively new technology that keeps this data, but I haven’t had any luck finding access to their database. Datagolf.com has a lot of good info, but it’s player-based, not course-based. Appreciate any help!

submitted by /u/mmckeever23
[link] [comments]

Clean Beer Bottle Images For Dataset

Hey everyone,

I’m working on a beer database project and need clean images of beer bottles. Does anyone know of any websites or places where I can find these? URLs or the actual db where all are stored.

I’ve been struggling to find good sources. Any help would be greatly appreciated. Thanks!

submitted by /u/Candid_Muscle_4654
[link] [comments]

I Need Help With The MIMIC-III Dataset

Hello, I’m not sure if this is the right place, but I urgently need access to the MIMIC-III database for my thesis. I thought access was free, but it turns out you have to pay for a course. I’m a postgraduate student facing financial difficulties, so I wanted to ask if anyone knows how to access that database for free. Please.

submitted by /u/captain_77destroyer
[link] [comments]

Comprehensive US Election Candidate Dataset

I’m trying to find a dataset for every single candidate contesting from any form of election (federal, state, local) in the US for the past 3-4 years for a political video ad classification project. Does anyone have any idea if this sort of database exists or how I can compile it? Just need name, party, state, and contested office.

submitted by /u/moul1k
[link] [comments]

Dataset For Companies And Their Respective Categories

I’m trying to build an analyzer of my spending habits and I would like to know what various categories of expenses I have.

For example, I have a csv of all my transactions One transaction might say “Chipotle” and I would like that to be categorized into a restaurants. My approach is to have a dataset of these popular companies and their respective types in order to categorize them into “genres”. I’m currently using OpenStreetMaps: overpass api because they have tags on each company or store classifying what type they are. If anyone has a dataset like this or suggestions for a different approach, please let me know.

TLDR: Looking for a dataset that has companies that people ordinarily buy from and their category “Chipotle: Restaurant” “Nike: fashion” …

submitted by /u/AimBot_4000
[link] [comments]

I Have Made A Queryable MySQL And JSON Dataset From The DSM-V

I have published a FREE MySQL and JSON version of the DSM-V. I am working on developing my own AI-powered semi-private healthcare app, and I am doing it all 100% myself, so if you wish to use my dataset, please consider donating to help me with my own project if you’re willing and able! It would really help me out with the development of my app. If you are willing to donate, please see the readme in the GitHub repo. TYSM in advance.

So anyway, this dataset contains all of the DSM-V disorders, their diagnostic criteria (organized into categories and subcategories, as laid out in the DSM-V), culture and gender-related considerations for diagnosis, prevalence data, recording procedures, and any other information provided about the disorder, conveniently organized and queryable, written in MySQL with a JSON export copy included as well.

Here’s the link! https://github.com/Danm998/DSM-V

This took me a fair bit of work, so please consider donating if it helps you with a project of your own. Thanks in advance, I hope you enjoy!

submitted by /u/Danm998
[link] [comments]

RSpace Data Management Platform Is Now Open Source

RSpace is an all-in-one ELN, sample manager and Research Data Management (RDM) platform that integrates with many other data tools. RSpace is designed to act as a central data hub and pipeline for large academic institutes who want to support open science and FAIR data principles. RSpace already has good open APIs, but to encourage the data community to build even more integrations to allow better flow of data, RSpace is now fully open source. Learn more here: https://github.com/rspace-os

submitted by /u/invasifspecies
[link] [comments]

Buying Customer Parent-Child Relationship Data

We need to build parent/child relationships between customers in our system.

We have about 300,000 customers—many are mid-sized companies, a few are very large, and a number are very small outfits.

About 3% of our customers have “children,” meaning they own other customers in our database. We are unsure how many customers fall into that ‘child’ category, but we estimate it may be around 10% of the total customer population.

We have an enterprise MDM system connected to our CRM, which can help us manage the data.

Our challenge is finding a reliable source for parent-child relationship data. We are a large national company (USA) and we have a reasonable budget to purchase this data, but I am unsure where to start looking. We currently buy some data from D&B, but their parent-child values are unreliable at best, making it difficult to depend on them.

If anyone has suggestions on where we can obtain accurate parent-child relationship data, please share. It would be much appreciated.

submitted by /u/HuronChief
[link] [comments]

Dataset Of Clinical Indicators For Benign Stomach Tumors

Request:
I’m looking for a dataset that contains clinical indicators for various benign stomach tumors. The dataset should include one or more of the following tumor types: gastric polyps, gastrointestinal stromal tumors (GISTs), lipomas, leiomyomas, schwannomas, neurofibromas, pancreatic heterotopia, hemangiomas, lymphangiomas, glomus tumors, fibromas.

metric of what I’d like about:

age, gender,blood cell count, liver function, kidney function, blood lipids, tumor markers and pathological results

I’m not too picky about the format as long as the diagnosis is separate from the clinical indicators and the formatting is consistent.

Artificial datasets are okay, maybe even preferred, as long as they’re accurate.

submitted by /u/TodayAshamed6095
[link] [comments]