Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Anyone Have Experience With Working With The NIS/HCUP Datasets In R?

Hi all, trying to load NIS data into R since I don’t have access to SAS/STATA/SPSS, they provide load programs for those but nothing for R obviously. However, no matter what I try I can’t seem to load it into program? I constantly get column mismatches. The file is several gbs so I can’t open a text editor to view it. Anyone have experience with this?

The link to their load programs https://hcup-us.ahrq.gov/db/nation/sasloadprog.jsp?year=2016&db=NIS

submitted by /u/OBO786
[link] [comments]

Is There A Quick And/or Easy Way For Me To Fill In Data In A Csv File Based On Other Data Already There?

My csv file is an export from a program I am using. With another code I found online, plus help from the python discord about changing up some details, I can now turn it into a bunch of md files with YAML frontmatter for Obsidian. So far so good…

The program is Aeon Timeline, and it allows you to create intricate detailed timelines of either real or fictional worlds, with sections for things like Person, Location, Event, etc.

It gives you a nice info panel on the right that can display all of this and how many of the things relate to each other. The best part is you can denote a relationship between items as either Bi-directional or Inverse and it will update both items at the same time if you modify either.

So let’s say my designated relationship is called Siblings. I mark down that Casey has Denise and Eugene in their Siblings relationship. If I then go to Denise’s panel, it will already have Casey and Eugene linked as Siblings. Ditto Eugene’s panel.

I can also make a relationship work inversely. If I put in the relationship Parents/Children, I can add Amy and Ben as Casey’s Parents. In doing so, in Amy’s panel, it will have Casey under Children. I do have to do this for every ‘sibling’ and add the parents manually, as they don’t connect outside of the two-way direction, but that’s alright.

I can also mark down Amy as Ben’s Spouse so Ben is also Amy’s Spouse.

I had assumed when I exported a CSV of all of this data that it would give me all of said relationships in both directions, as that is what the program does. That is part of the reason why I use it, so I don’t have to go back and re-add things so they’re ‘linked’ from both sides. But I annoyingly realized it only gives half.

So instead of this:

Label Parents Children Siblings Spouse Amy – Casey,Denise,Eugene – Ben Ben – Casey,Denise,Eugene – Amy Casey Amy,Ben – Denise,Eugene – Denise Amy,Ben – Casey,Eugene – Eugene Amy,Ben – Casey,Denise –

I get this:

Label Parents Siblings Spouse Amy – – Nothing Ben – – Amy Casey Amy,Ben Denise,Eugene – Denise Amy,Ben Nothing – Eugene Amy,Ben Nothing

‘Children’ doesn’t even show up in the export as a column at all, just ‘Parents.’ For relationships where it was bi-directional and one word, it only gives me the first half- the one I typed in. So Amy is Ben’s Spouse, but Amy isn’t Ben’s.

Making the meta-data in the YAML worthless as I have to go through and re-fill it out again from the other side of things anyway.

I therefore need a script, program, or way that I could fill in these missing data points. Ideally I would love to be able to:

1) Define which relationships need new columns, and which can be filled in in an existing column (because I have many varying people/events/locations with different relationships and how they relate go each other):

same_relationship = (‘Extended Family’, ‘Romances’, ‘Siblings’)

opposite_relationship = (‘Parents’, ‘Birthplace’)

2) And then be able to write in something like

if {a Label} is in Column/Header “same_relationship” for certain Rows, add {the Label of this new Row} into the same column for {original Label}.

if Denise is in Column/Header “Siblings” for certain Rows (the rows where the labels are Casey and Eugene), add ‘Casey’ and ‘Eugene’ into “Siblings” for Denise.

if {Label} is in Column/Header “opposite_relationship” for certain Rows, add {the Label of this new Row} into new column ‘opposite_relationship Opposite’ for {original Label}.

if Amy is in Column/Header “Parents” for certain Rows (the rows where the labels are Casey, Denise, and Eugene), add ‘Casey,Denise,Eugene’ into new column ‘Parents Opposite’ for Amy.

Then I could manually change ‘Parents Opposite’ into ‘Children’ along with all of the other new columns I need to change.

I feel like there has to be an easy-ish way to somehow do this with something, I just have no idea how. Or where to start. I just want to be able to fill in my data and continue with my worldbuilding.

submitted by /u/Faustyna
[link] [comments]

Datasets On Age-Related Macular Degeneration (AMD) Eye Disease

Hello, I’m doing a ML project for my 3rd academic year at university. For this I need images of “Age-Related Macular Degeneration (AMD) Eye Disease” in 3 categories.

Normal
Wet
Dry

I have enough images for the Normal condition. But I can’t find enough data for the Wet and Dry conditions. At least I need 1000 images per category. Does anyone know where to find datasets for this specific eye disease?

submitted by /u/Slanomatic
[link] [comments]

Request: Dataset Of 80s Movies With Information On Smoking, Drugs, Etc. (like Found On Commonsensemedia)

Hello. I’m taking a data science course in Python. To practice classification, I wanted to take movies from the 80s from before and after the pg-13 rating came into effect. The idea is to use the movies after the pg-13 rating was in effect to create a model to reclassify the movies before and see which ones that were pg would have been pg-13. I tried https://www.commonsensemedia.org/ as it has a 5-star ratings for things like drinking, swearing, drugs, nudity, etc. However, the number of 80s movies seems to be limited to the ones that are still popular/watched (not surprisingly). Are there any datasets out there that have a lot of 80s movies with this info?

submitted by /u/Mcletters
[link] [comments]

A Particular Dataset I Want, On Drug Policy Can Only Be Accessed By Those With A British University Email Address. I Would Be Extremely Grateful If Someone Could Get It For Me!

A quick request that I would be very grateful if someone could fulfill. A particular dataset I want, the on drug policy voices can only be accessed by those with a British University email address. I would be extremely grateful if someone could get it for me!

The dataset can be found here:

https://reshare.ukdataservice.ac.uk/856279/

It’s concerned with the political beliefs of drug users in the UK.

If you manage to get it let me know DM me or say so in the comments and I’ll DM you.

Thankyou!

submitted by /u/philbearsubstack
[link] [comments]

Recommendations For Beginner Friendly Dataset For Learning R

Hello! I am learning R and I need a dataset to practice doing regression. I wanted to use data from IPUMS but it is not loading properly and now I don’t want to lose anymore time playing with it. Can anyone suggest any social science datasets in R that are easy to work with? I’m interested in inequality but any topic is probably okay. In class we used Boston Housing so probably not that exact one, but something similarly beginner friendly would be good. Thanks in advance for any suggestions!

submitted by /u/blksquare
[link] [comments]

How Important Are Demonstrating That You Know JOIN’s In Your Data Analyst Portfolio For Entry Level Roles? What Is The Best Approach To Showcase This Knowledge?

Hi guys,

My typical approach when creating portfolio projects is finding a public dataset online (which most are already cleaned, etc. and ready to go). I then come up with specific problems I would like to investigate. I write SQL queries to solve these problems. I then visualize the solutions on a Tableau dashboard to tell a story.

Every job is different but I assume that most will require you to Join multiple tables together prior to analysis. The issue i’ve come across during portfolio creation are that most datasets that are publicly available online are already put together.

I’ve come up with the idea of finding two completely unrelated datasets and trying to join them together with a common column but completely struggle with execution due to the complexity of the datasets and a common column not always being available. Ex: Amazon package delivery speeds vs weather and joining on DATES.

I know what joins are and can solve easy to maybe medium SQL leet code Join questions with not that much difficulty but completely struggle with the hard problem as well as my scenario in the prev paragraph. So few questions:

How important are demonstrating that you know joins in a data analyst portfolio for entry level roles? Aka showing the sql code of joining 2+ tables and doing your analysis on that?

if it is needed, how can i demonstrate this? I struggle with joining two completely unrelated datasets together. Is there a better way to do this while still showing that i know joins or should i just keep on doing analysis on fully completed datasets that are already available online?

Thanks so much, greatly appreciate any advice I can get in regards to this!! Located in big city in midwest, USA btw.

submitted by /u/believeinriven
[link] [comments]

Womens Health Clinic Or Center Patient Data?

Howdy folks,

Was wondering if someone might possibly have an example data set of a woman’s health clinic or center patient data set?

Im interviewing for an org that specializes in customer acquisition for womens health clinics and trying to find any example datasets to build out a portfolio. I know customer acquisition is a bit different than the patient care here, but Id still like to show I could transform this type of data for operations.

I looked on Kaggle and didnt see anything pertaining to this exactly. Maybe some type of clinic data, but not any focused on women in particular.

If you know of anything that might fit, please let me know.

Thank you.

submitted by /u/WhatsTheAnswerDude
[link] [comments]

HELP!!! NEED DATASET FOR NETWORK ANALYSIS

my final paper is on binge drinking in college and I need data to preform a network analysis.

I need a dataset for the top 2,000 tweets and related network nodes and edge data points relating to #alcohol and another one for #party (or any other # that could relate to this topic) please I am literally begging

submitted by /u/Valuable_Dig9324
[link] [comments]

Dataset On Global Plants And Native Area

I’m looking for a dataset connecting global native plants with their natural locations (countries, regions, cities, etc). I’ve found a few datasets that don’t have locations, but cover tons of plants!

GlobalUsefulNativeTrees – https://zenodo.org/records/7994433 World Checklist of Useful Plant Species – https://kew.iro.bl.uk/concern/datasets/7243d727-e28d-419d-a8f7-9ebef5b9e03e?locale=en all global flora – https://www.worldfloraonline.org/ Trees and location, but no plantshttps://www.bgci.org/resources/bgci-databases/globaltreesearch/

Any other datasets you all have used? Thanks!

submitted by /u/teenwent11
[link] [comments]

HELP FOR MY STATA PROJECT (FINDING DATASETS)

Hi guys i would like to ask some information about Datasets in Stata, Does someone know where i can download a dta file or an excel in order to do a project It would be better to be official datas i was searching in particular for health datas such as Drug abuse and the use of drugs in Medicine as drugs Otherwise im looking for anything that is interesting as long as makes the professor evaluate the project well! Thanks in advance

submitted by /u/Academic-Muffin-5119
[link] [comments]

Seeking Data On Historical University Protests In The US

I am interested in conducting a statistical analysis comparing current protests to historical ones at universities in the US. Specifically, I would like to examine the timeline and organization of these protests using a statistical approach.

Does anyone know of an open source dataset that can be used for this analysis? Alternatively, has anyone already conducted a similar analysis that I can reference?

Thank you for any assistance!

submitted by /u/Tolure
[link] [comments]

Looking For Purchase Orders Dataset Of PDFs Provided By Procurement Managers.

I couldn’t find dataset online, be it fictive or real (obviously because of privacy reasons).

If there are fictive PO dataset filled with PDFs and corresponding table of data against a PO number, it’ll be helpful.

Otherwise, I’m looking to create my own dataset with fictional items generated by GPT and populated to a PDF Purchase Order template, any GitHub code similar to something like this?

submitted by /u/adhadse
[link] [comments]

Seeking Data Sets On Power Grids For Machine Learning Projects

Hi everyone,

I’m currently exploring machine learning applications related to power grids and am in search of relevant data sets. Specifically, I’m looking for any of the following:

Labeled Image Data: Images of power grid components such as distribution poles, power lines, substations, etc., that are labeled for machine learning models. Failure Data: Information on failures or malfunctions within power grid elements, which could be used for predictive maintenance models. Operational Data: Any data that captures the operational aspects of power grids, including load, demand, flow, etc (not so much for generation).

For any dataset, the higher spatial/temporal resolution, the better, but I’m not too picky about that. I have already found some resources but I want to learn about any other datasets that might be out there, especially ones that might not be widely known. If you have or know of datasets that could fit these needs, could you please share them?

If you think that me sharing the datasets I found so far could make the post more informative, I would be happy to do that. Thanks in advance for your help!

submitted by /u/Zarashi00
[link] [comments]

Seeking Datasets For Cancer Research Project In The UK

I’m currently working on a cancer research project focusing on analyzing factors influencing cancer outcomes in the UK. As part of my project, I’m in need of datasets containing information related to cancer incidence, demographics, healthcare utilization, socioeconomic factors, environmental variables, and other relevant factors specific to the UK.

I was wondering if anyone in the community is aware of any websites or resources where I can find such datasets? Any leads or suggestions would be greatly appreciated.

submitted by /u/Blue-Croissant
[link] [comments]

Help Required In Opening Files Of A Dataset (.phys, .thermal, .pts, .ass Extensions)

We have received a dataset that consists of audio, visual, thermal, and physiological modalities. Upon exploring the dataset, we encountered some challenges in opening the following file types:

.phys with the Physiological information .thermal, .hist and .stat with the thermal information .pts with the visual information .ass with the auditory information

We have attempted various approaches to open these files, but unfortunately, none have proven successful thus far. We are not aware of the extensions used, and despite our persistent and thorough efforts, we have been unable to open these files. Please help us by guiding us on how to open files with these extensions.

submitted by /u/AnupKumarGupta_
[link] [comments]