Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Question About Complex Sampling Designs

Hello all. I am working with a large CDC survey combining multiple years of data and it is required that I use complex sampling procedures to analyze the data. Since this is a national survey and I’m analyzing multiple years combined, the sample size is quite large when raw and even larger when weighted (obviously!). I’m worried about being overpowered when I apply weights, however weighting it is required per CDC for more accurate interpretation of the findings and complex sampling procedures in SPSS require weighting to be input into the plan file. My question after all of this is 1) if anyone has general advice on what I described and 2) if weighting is always required when I am analyzing data that uses complex sampling designs? Thank you!!!

submitted by /u/PharmaNerd1921
[link] [comments]

Looking For Data On Hospital Equipment Usage

Did a quick search on this sub and I can see that hospital data is frequently requested and can be tricky to access. But as I understand this is mostly the case with patient information and the like. I’m looking for things like Operating room, radiology, x-ray usage rates.

I’ve looked around without much success so any help would be great. Thank you.

submitted by /u/TheShrlmp
[link] [comments]

Looking For A Dataset Breaking Down The Details For The Happiest People In The World

The World Happiness Report 2017‘s Figure 2.1 charts population-weighted distributions of happiness for various world regions, where ‘happiness’ is self-reported happiness on a 0-10 scale.

In every world region there are respondents who report 10/10; I’ve always been interested in these people (who are they? What can I learn about them regarding the other factors examined in the World Happiness Reports, from GDP per capita to healthy life expectancy to social support to generosity to perceptions of corruption to freedom to make life choices? What can I calculate? etc)

Unfortunately the accompanying dataset linked for Figure 2.1 doesn’t break down the data more granularly, it only reports the summarized chart values. Do any of you know of a more granular breakdown? The precise year (2017) and charts (Figure 2.1) don’t really matter to me; I really just want to see the data for the other factors corresponding to these self-reported happiness = 10/10 people. Thanks 🙂

submitted by /u/MoNastri
[link] [comments]

Looking For A Free Use Disease Data Set With Medical And Lifestyle Features

Hi all,

I am looking for a free use dataset with an outcome variable of e.g has heart desiease, diabetes, stroke etc. I would like the dataset to have as many features as possible, including medical results, such as blood preasuere (things only a docotor could measure) as well as lifestyle factor features, like exercise, smoking etc. (things anyone could measure). Unfortunatly most datasets only seem to have medical, or life style and not both. I would hope to have around 10+ medical and 20+ lifestyle. Anyone know of any datasets?

many thanks,

submitted by /u/Josh_Bonham
[link] [comments]

Seeking Guidance On Extracting And Analyzing Subreddit/Post Comments Using ChatGPT-4?

Hello! While I have basic programming knowledge and a fair understanding of how it works, I wouldn’t call myself an expert. However, I am quite tech-savvy.

For research, I’m interested in downloading all the comments from a specific Subreddit or Post and then analyzing them using ChatGPT-4. I realize that there are likely some challenges in both collecting and storing the comments, as well as limitations in ChatGPT-4’s ability to analyze large datasets.

If someone could guide me through the process of achieving this, I would be extremely grateful. I am even willing to offer payment via PayPal for the assistance. Thank you!

submitted by /u/JackJackCreates
[link] [comments]

I Have An Issue With Importing A Dataset From Kaggle. I Am A Novice And Want Tips To Learn ML Through AWS Sagemaker.

I am a novice on ML and want tips to where should I upload an image dataset. There some datasets of medical images named as ODIR-5K on Kaggle, and I can’t use Kaggle API to work with AWS Sagemaker notebooks. I tried on Google Collaboratory but they just work fine there instead, but for the sake of my own wallet, I prefer to use Sagemaker on a free tier. Is there any way to import a dataset from Kaggle without issues on a Jupyter Notebook / AWS SageMaker Notebook? Or is it best to change the place I store this dataset?

submitted by /u/MemeH4rd
[link] [comments]

What Is The Difference Between Apache Airflow And Apache NiFi

Are you confused between Apache Airflow and Apache NiFi? 🤔 Both are popular open-source data integration tools, but they serve different purposes. 🤷‍♂️
✅ Apache Airflow: is a platform for programmatically defining, scheduling, and monitoring workflows. It’s great for data engineering tasks, like ETL, data warehousing, and data processing. 📊
✅ Apache NiFi: is a data integration tool for real-time data processing and event-driven architecture. It’s designed for stream processing, data routing, and data transformation. 🌊
If you want to learn more about the differences between Apache Airflow and Apache NiFi, check out this article. 📄
In this article, you’ll get a detailed comparison of the two tools, including their features, use cases, and architecture. 🏗️
https://devblogit.com/what-is-the-difference-between-a-data-lake-and-a-delta-lake/
#ApacheAirflow #ApacheNiFi #DataIntegration #DataEngineering #ETL #DataWarehousing #DataProcessing #StreamProcessing #EventDrivenArchitecture #DataScience #DataEngineer #ITPro

submitted by /u/Bubbly_Bed_4478
[link] [comments]

IMDB Dataset – How Do I Get Film Posters?

I’m developing a film recommendation system using the IMDB datasets, using around 350,000 films after pre-processing. Does IMDB offer a way to access the relevant film poster for each items in its dataset, or does anyone know a different source or method to import these?

Any help would be appreciated

submitted by /u/wobowizard
[link] [comments]

Need Help With Physionet Databases…

Hey there!
I am a freshman currently working on an independent project that requires data from MIMIC III however I do not have physionet credentials and I literally have no one who can refer me in. Is there any other way to get access to the database? If you could refer me, I can provide you with a brief description of what I am building.

submitted by /u/Global_Landscape1119
[link] [comments]

How To Get A GDP Breakdown For Sub-industries?

Hi guys,

I need for a project to get the data of the GDP of countries by sub-industries and the best would be to have it breakdown using the Global Industry Classification Standard (or an other advanced standard that shows sub-industries).

I wasn’t able to found data that was that much precise (most GDP by sectors or some big sectors by not going into industries & sub). So maybe the data needed is on a special website that I don’t know or is hardly accessible by a simple Google search.

Thanks for any response / upvote / help.

submitted by /u/Haunting_Taste6349
[link] [comments]

Is There A Longitudinal Dataset On US Newspaper Ownership Such That I Can Track Changes In The Ownership Of Any Given US Newspaper/daily Over A Period Of Time?

I want to look at how change in ownership affects the type of information conveyed by a newspaper, especially in cases where the acquirer may have a vested commercial motive. For example, there has been a significant uptick in the number of US newspapers acquired by private equity players. I’d like to see if such acquisitions affect the choice and delivery of content that may have direct commercial implications for the private equity owner.

submitted by /u/Charming-Incident600
[link] [comments]

Anyone Have/know Where To Find A Dataset For The Following:

Hi, so for my AP statistics project, I have to analyze two quantitative variables (separately). I am trying to answer the following question: How does the annual enrollment rate in STEM courses at educational institutions correlate with the annual increase in women pursuing careers in STEM fields over the past years?

Additionally, here are more specifications:

Response Variable: The number of women in STEM increased every year.

Explanatory Variable: The enrollment rate of STEM courses in different American Institutes.

Parameter: The population correlation coefficient between the annual enrollment rate in STEM courses and the number of women pursuing careers in STEM fields over the past years.

Null Hypothesis (H0):

“There is no significant correlation between the annual enrollment rate in STEM courses and the annual increase in the number of women pursuing careers in STEM fields over the past years (ρ = 0).”

Alternative Hypothesis (Ha):

“There is a significant correlation between the annual enrollment rate in STEM courses and the annual increase in the number of women pursuing careers in STEM fields over the past years (ρ ≠ 0).”

Please comment on any links to places where I can find raw quantitative datasets that are CSV files.

submitted by /u/Aloeiq
[link] [comments]