Hey Everyone!
I’m having issues attempting to decode the information provided by the CDC. I downloaded the Mortality Multiple Cause File for 2021, and the .txt file is – not only over 2GB, but also incomprehensible. I followed the accompanying .pdf file and was even more confused by its “List of File Data Elements and Tape Locations”, and how I’m supposed to use the file to comprehend a list of codes upon codes upon codes? Especially, when the .txt file has no structure, and when I try to follow a top down approach, codes don’t seem to match.
I wanted to ask if there is a common approach to this, or if I am missing something?
Additional Info:
I am using R for statistical analysis. I wanted the raw data for this reason. I attempted to convert the .txt file to a .csv file format using Python, and it helped by structuring the data a little, but I still don’t know what I am looking at in terms of what it all means.
This is how the rows look now: 11 7101 F1080 422210 4D1 2021U7CN C851129 039 13 0511I509 21I518 31I513 41C851 61M481 05 C851 I509 I513 I518 M481 100 01 184005949020
I would appreciate any, and all help. Thank you all very much in advance.
submitted by /u/Meece156
[link] [comments]