Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

School Directory Data – What I Can/cant Do?

Several years ago now my college accidentally sent the entire faculty and student directory master excel sheet through email. Now I cant remember who they sent it to, if they rescinded it moments later but I was staring at my email when it was sent. I opened it and downloaded it, it contains over 5000 email addresses, majors, home phones numbers and cell phone numbers. Now I am curious as to what I could do with this data, I understand its usually very hard to come across something like this unless sold you. Are there legal aspects? Could these be email marketing leads? Obviously scammers, etc would love this but id like to just be ethical about it.

Thanks…

submitted by /u/Taziot7
[link] [comments]

U.S. Consumer Expenditures Data By County

I’m looking for public datasets on consumer/household expenditures in the US by county and household size. I know the BLS’s Consumer Expenditure Survey provides this data, but it’s not available on a county level. Does anyone know where this information is available? I’d like to see mean values for rent/mortgage, food (both store-bought groceries and delivery/restaurant), and other household expenses for Manhattan (NY County) specifically. Thank you!

submitted by /u/nd9760
[link] [comments]

Need To Migrate A SAS Database To A New Software

Hey, I just joined a new job as Data Manger with little to no experience in the field and they told me that they want to move away from SAS for the data base.

As I said, I have almost no experience in this filed and they are looking for my input on where we can migrate to. It is a fairly big data base with (I think) about 1 TB of storage of medical information on different studies and patients (we are studying sleep apnea and other sleep illnesses)

Does anyone have suggestions or ideas on what I could propose to the team to switch?

I don’t know the exact structure, but we seem to be using SAS for generating queries and saving the data base and we use MySQL to look at the different tables and gather the necessary info.

submitted by /u/Yottarro
[link] [comments]

Reliable Data Set For The Reddit Dataset

now I am trying to do a project which is associated with the representation learning for large scale dynamic network, and I want to look for a reliable reddit data set( the data should include post_id, user_id, time, comment). So that I can build the graph by using the user as node and if two user comment the same post i can build one edge.

The macro task of the current article is to create a representation learning. For the purpose of the reddit dataset (build a good representation learning to complete a community search based on a graph of social network data. I want to use reddit data to complete my project, and I have some requirements for the data I need. I want the reddit dataset to contain users as nodes, and then I want to use different users to comment on the same post to build edges. I tried a few datasets, but I feel that none of them meet my needs. I would like to ask if you have a link to a reddit dataset that meets my needs. The following are what I have tried:

https://github.com/dingidng/reddit-dataset (I only can create several edge based on these data which is not making sense) https://snap.stanford.edu/graphsage/#datasets (the node is not user)

And I also have problem about how to using the Pushshift to access any Reddit data. Since whenever I submitted the request of the access to the data, my request will be rejected by the bot automatically. If anyone knows how to use the pushshift to access the data set and get the access permission for that.
https://pushshift.io/signup

This is my first time posting for help, thank you for any help you can provide!

submitted by /u/Terrible_Band6290
[link] [comments]

Searching For Social Media Screenshot Dataset

I have been searching for a dataset that contains screenshots of social media posts from various platforms (Twitter, Instagram, Truth Social, Facebook, etc.). I have been able to find datasets that contain URLs of social media posts, but none of sufficient size that include screenshots. I would like at least 1,000 images per platform. Please let me know if there are any datasets that you know of or if you have any advice.

submitted by /u/ImpossibleBear6458
[link] [comments]

Chatbot Datasets That Is Used For RNN And NLP

Hello everyone,

I recently started to learn about AI and RNN. I started to learn how do models work. But recently I wanted to do something else I though i can make my first NLP model from scratch but the main problem is that there is little to no information on how to make a rich dataset to train the model.

I’ve looked everywhere but whenever I put the model to test the results are very bad.

Can someone help me or refer me to dataset examples that it is used for training a chatbot model? Thanks

submitted by /u/InfiniteAd328
[link] [comments]

How Reliable Is Data On Wikipedia (war Casualties)?

Interested in working with data on war casualties. Wikipedia has an interesting page (List of battles by casualties), but the data seems implausible/lacking evidence/sources.

E.g., the Battle of Stalingrad is listed with 1,250,000 to 4,172,000 casualties while the Battle of Berlin is listed with 1,286,367 casualties.

These numbers fall out of numbers I read elsewhere. Is there a more reliable list/dataset to be found online?

submitted by /u/Vylerios
[link] [comments]

Looking For The Reliable Data Set For Representation Learning For The Large Scale Dynamic Network

Hi I am now doing the project associated with use the representation learning for the large scale dynamic network. And my work now is based on the reddit data set. I am trying to find the data set which include the time series stamps, making the user as the node of the graph, I can build the edge for the different nodes.

I have tried some source, but these data did not meet my requirement in some places:

https://snap.stanford.edu/graphsage/ ( the node is not user)

https://github.com/dingidng/reddit-dataset ( can only build very few edges)

And I am confused how can I get the access to use the reddit data set with the pushshift, since I was rejected by the bot for many times. And how can I use the data set in the pushshift platform. If anyone can help me find the reliable and useful data source, thank you so much!

Thank you in advance for any help and suggestions.

submitted by /u/Terrible_Band6290
[link] [comments]

REDD Dataset – How Do We Get It Now?

I am looking for access to the redd dataset. The link – redd.csail.mit.edu has been dead for a few months. How do we download it now? The archive [page](https://web.archive.org/web/20220812015008/http://redd.csail.mit.edu/) prevents me from downloading it because it requires password. Could i pass a username/password as a query parameter/cookie and download it? If so, what are the credentials? Are there other alternatives, or ideas for how to acquire it?

submitted by /u/DuckHunterZx
[link] [comments]

Two Sizes Given For MIMIC-III Files On Its Webpage

The webpage for MIMIC-III shows that the full zip file for download is 6.2GB. In particular chartevents.csv.gz file is listed as 4.0GB. Download process in the browse shows 6.2 GB to be downloaded, but it is very-very slow.

The webpage also gives a wget command to download on command line, and this command says the total data size is 4.2 GB. In its download, the chartevents.csv.gz file is 2.3 GB. BTW, this method is about 8 times faster than the browser-based download.

Would appreciated insight into this difference. Has anyone encountered this before?

submitted by /u/Far-Cantaloupe4144
[link] [comments]

Reliable Sources For Population By Country?

Hi all,

I recently started a project where I’d need to collect the following data:

the population of various countries across the world

-cost of electric per use in said country

-total hotels in x country

total grocery stores in x country

-average hotel size (sq ft)

-average grocery store size (sq ft)

As a college Freshman this is my first research project and would like to know what steps/ sources would be most useful to collect this data. My first instinct is to just do google searches but I don’t know if there is a data base of method more professional.

submitted by /u/dumbbitch44
[link] [comments]

In Search Of Raw Dataset For PGA Golf Courses

I’m wondering if there are advanced metrics for specific golf courses available somewhere online.

For example, if I wanted to know what percentage of time PGA golfers, during tournament play, hit into the fairway bunker on the 1st hole of Augusta, where could I go to find that info? These types of stats have to exist somewhere, right?

Shotlink appears to be a relatively new technology that keeps this data, but I haven’t had any luck finding access to their database. Datagolf.com has a lot of good info, but it’s player-based, not course-based. Appreciate any help!

submitted by /u/mmckeever23
[link] [comments]

Clean Beer Bottle Images For Dataset

Hey everyone,

I’m working on a beer database project and need clean images of beer bottles. Does anyone know of any websites or places where I can find these? URLs or the actual db where all are stored.

I’ve been struggling to find good sources. Any help would be greatly appreciated. Thanks!

submitted by /u/Candid_Muscle_4654
[link] [comments]

I Need Help With The MIMIC-III Dataset

Hello, I’m not sure if this is the right place, but I urgently need access to the MIMIC-III database for my thesis. I thought access was free, but it turns out you have to pay for a course. I’m a postgraduate student facing financial difficulties, so I wanted to ask if anyone knows how to access that database for free. Please.

submitted by /u/captain_77destroyer
[link] [comments]

Comprehensive US Election Candidate Dataset

I’m trying to find a dataset for every single candidate contesting from any form of election (federal, state, local) in the US for the past 3-4 years for a political video ad classification project. Does anyone have any idea if this sort of database exists or how I can compile it? Just need name, party, state, and contested office.

submitted by /u/moul1k
[link] [comments]