Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Reliable Data Set For The Reddit Dataset

now I am trying to do a project which is associated with the representation learning for large scale dynamic network, and I want to look for a reliable reddit data set( the data should include post_id, user_id, time, comment). So that I can build the graph by using the user as node and if two user comment the same post i can build one edge.

The macro task of the current article is to create a representation learning. For the purpose of the reddit dataset (build a good representation learning to complete a community search based on a graph of social network data. I want to use reddit data to complete my project, and I have some requirements for the data I need. I want the reddit dataset to contain users as nodes, and then I want to use different users to comment on the same post to build edges. I tried a few datasets, but I feel that none of them meet my needs. I would like to ask if you have a link to a reddit dataset that meets my needs. The following are what I have tried:

https://github.com/dingidng/reddit-dataset (I only can create several edge based on these data which is not making sense) https://snap.stanford.edu/graphsage/#datasets (the node is not user)

And I also have problem about how to using the Pushshift to access any Reddit data. Since whenever I submitted the request of the access to the data, my request will be rejected by the bot automatically. If anyone knows how to use the pushshift to access the data set and get the access permission for that.
https://pushshift.io/signup

This is my first time posting for help, thank you for any help you can provide!

submitted by /u/Terrible_Band6290
[link] [comments]

Searching For Social Media Screenshot Dataset

I have been searching for a dataset that contains screenshots of social media posts from various platforms (Twitter, Instagram, Truth Social, Facebook, etc.). I have been able to find datasets that contain URLs of social media posts, but none of sufficient size that include screenshots. I would like at least 1,000 images per platform. Please let me know if there are any datasets that you know of or if you have any advice.

submitted by /u/ImpossibleBear6458
[link] [comments]

Chatbot Datasets That Is Used For RNN And NLP

Hello everyone,

I recently started to learn about AI and RNN. I started to learn how do models work. But recently I wanted to do something else I though i can make my first NLP model from scratch but the main problem is that there is little to no information on how to make a rich dataset to train the model.

I’ve looked everywhere but whenever I put the model to test the results are very bad.

Can someone help me or refer me to dataset examples that it is used for training a chatbot model? Thanks

submitted by /u/InfiniteAd328
[link] [comments]

How Reliable Is Data On Wikipedia (war Casualties)?

Interested in working with data on war casualties. Wikipedia has an interesting page (List of battles by casualties), but the data seems implausible/lacking evidence/sources.

E.g., the Battle of Stalingrad is listed with 1,250,000 to 4,172,000 casualties while the Battle of Berlin is listed with 1,286,367 casualties.

These numbers fall out of numbers I read elsewhere. Is there a more reliable list/dataset to be found online?

submitted by /u/Vylerios
[link] [comments]

Looking For The Reliable Data Set For Representation Learning For The Large Scale Dynamic Network

Hi I am now doing the project associated with use the representation learning for the large scale dynamic network. And my work now is based on the reddit data set. I am trying to find the data set which include the time series stamps, making the user as the node of the graph, I can build the edge for the different nodes.

I have tried some source, but these data did not meet my requirement in some places:

https://snap.stanford.edu/graphsage/ ( the node is not user)

https://github.com/dingidng/reddit-dataset ( can only build very few edges)

And I am confused how can I get the access to use the reddit data set with the pushshift, since I was rejected by the bot for many times. And how can I use the data set in the pushshift platform. If anyone can help me find the reliable and useful data source, thank you so much!

Thank you in advance for any help and suggestions.

submitted by /u/Terrible_Band6290
[link] [comments]

REDD Dataset – How Do We Get It Now?

I am looking for access to the redd dataset. The link – redd.csail.mit.edu has been dead for a few months. How do we download it now? The archive [page](https://web.archive.org/web/20220812015008/http://redd.csail.mit.edu/) prevents me from downloading it because it requires password. Could i pass a username/password as a query parameter/cookie and download it? If so, what are the credentials? Are there other alternatives, or ideas for how to acquire it?

submitted by /u/DuckHunterZx
[link] [comments]

Two Sizes Given For MIMIC-III Files On Its Webpage

The webpage for MIMIC-III shows that the full zip file for download is 6.2GB. In particular chartevents.csv.gz file is listed as 4.0GB. Download process in the browse shows 6.2 GB to be downloaded, but it is very-very slow.

The webpage also gives a wget command to download on command line, and this command says the total data size is 4.2 GB. In its download, the chartevents.csv.gz file is 2.3 GB. BTW, this method is about 8 times faster than the browser-based download.

Would appreciated insight into this difference. Has anyone encountered this before?

submitted by /u/Far-Cantaloupe4144
[link] [comments]

Reliable Sources For Population By Country?

Hi all,

I recently started a project where I’d need to collect the following data:

the population of various countries across the world

-cost of electric per use in said country

-total hotels in x country

total grocery stores in x country

-average hotel size (sq ft)

-average grocery store size (sq ft)

As a college Freshman this is my first research project and would like to know what steps/ sources would be most useful to collect this data. My first instinct is to just do google searches but I don’t know if there is a data base of method more professional.

submitted by /u/dumbbitch44
[link] [comments]

In Search Of Raw Dataset For PGA Golf Courses

I’m wondering if there are advanced metrics for specific golf courses available somewhere online.

For example, if I wanted to know what percentage of time PGA golfers, during tournament play, hit into the fairway bunker on the 1st hole of Augusta, where could I go to find that info? These types of stats have to exist somewhere, right?

Shotlink appears to be a relatively new technology that keeps this data, but I haven’t had any luck finding access to their database. Datagolf.com has a lot of good info, but it’s player-based, not course-based. Appreciate any help!

submitted by /u/mmckeever23
[link] [comments]

Clean Beer Bottle Images For Dataset

Hey everyone,

I’m working on a beer database project and need clean images of beer bottles. Does anyone know of any websites or places where I can find these? URLs or the actual db where all are stored.

I’ve been struggling to find good sources. Any help would be greatly appreciated. Thanks!

submitted by /u/Candid_Muscle_4654
[link] [comments]

I Need Help With The MIMIC-III Dataset

Hello, I’m not sure if this is the right place, but I urgently need access to the MIMIC-III database for my thesis. I thought access was free, but it turns out you have to pay for a course. I’m a postgraduate student facing financial difficulties, so I wanted to ask if anyone knows how to access that database for free. Please.

submitted by /u/captain_77destroyer
[link] [comments]

Comprehensive US Election Candidate Dataset

I’m trying to find a dataset for every single candidate contesting from any form of election (federal, state, local) in the US for the past 3-4 years for a political video ad classification project. Does anyone have any idea if this sort of database exists or how I can compile it? Just need name, party, state, and contested office.

submitted by /u/moul1k
[link] [comments]

Dataset For Companies And Their Respective Categories

I’m trying to build an analyzer of my spending habits and I would like to know what various categories of expenses I have.

For example, I have a csv of all my transactions One transaction might say “Chipotle” and I would like that to be categorized into a restaurants. My approach is to have a dataset of these popular companies and their respective types in order to categorize them into “genres”. I’m currently using OpenStreetMaps: overpass api because they have tags on each company or store classifying what type they are. If anyone has a dataset like this or suggestions for a different approach, please let me know.

TLDR: Looking for a dataset that has companies that people ordinarily buy from and their category “Chipotle: Restaurant” “Nike: fashion” …

submitted by /u/AimBot_4000
[link] [comments]

I Have Made A Queryable MySQL And JSON Dataset From The DSM-V

I have published a FREE MySQL and JSON version of the DSM-V. I am working on developing my own AI-powered semi-private healthcare app, and I am doing it all 100% myself, so if you wish to use my dataset, please consider donating to help me with my own project if you’re willing and able! It would really help me out with the development of my app. If you are willing to donate, please see the readme in the GitHub repo. TYSM in advance.

So anyway, this dataset contains all of the DSM-V disorders, their diagnostic criteria (organized into categories and subcategories, as laid out in the DSM-V), culture and gender-related considerations for diagnosis, prevalence data, recording procedures, and any other information provided about the disorder, conveniently organized and queryable, written in MySQL with a JSON export copy included as well.

Here’s the link! https://github.com/Danm998/DSM-V

This took me a fair bit of work, so please consider donating if it helps you with a project of your own. Thanks in advance, I hope you enjoy!

submitted by /u/Danm998
[link] [comments]