FDA Novel Drug Approvals (2021–2024) + US Nonprofit Hospital Charity-care Reporting — Parquet/JSON/CSV, Public Domain

Disclosure: I’m the author of the open-source project (trove) that parses and repackages these. Original government sources are linked below; my bundles are at the end. MIT code, public-domain data, nothing paid.

Two public-domain US healthcare datasets that get cited constantly but are painful to use in raw form:

  1. FDA novel drug approvals, 2021–2024 — 218 drugs (192 CDER NMEs + 26 CBER cell & gene therapies). Each row: application number, sponsor, approval date, indication, regulatory center, and a deep link to the approval-package docs.

Original sources:

– CDER Novel Drug Approvals: https://www.fda.gov/drugs/development-approval-process-drugs/novel-drug-approvals-fda

– CBER Approved Cellular and Gene Therapy Products: https://www.fda.gov/vaccines-blood-biologics/cellular-gene-therapy-products/approved-cellular-and-gene-therapy-products

– Drugs@FDA: https://www.fda.gov/drugsatfda

  1. Nonprofit hospital charity-care reporting, TY2022 — 1,295 nonprofit hospital systems, with CMS HCRIS Worksheet S-10 and IRS Form 990 Schedule H side by side. Both lines are meant to capture the cost of care for patients who couldn’t pay, but the rules diverge, so the two numbers often disagree. Each row also carries a CDC Social Vulnerability Index county percentile and a deep link to the 990 on ProPublica.

Original sources:

– CMS HCRIS (Hospital 2552-10 cost reports): https://www.cms.gov/data-research/statistics-trends-and-reports/cost-reports/hospital-2552-2010-form

– IRS Form 990 series XML downloads: https://www.irs.gov/charities-non-profits/form-990-series-downloads

– CDC Social Vulnerability Index 2022: https://www.atsdr.cdc.gov/place-health/php/svi/index.html

– ProPublica Nonprofit Explorer (where the 990 deep links point): https://projects.propublica.org/nonprofits/

What I added on top: parsing the raw formats (headerless 100k-row HCRIS CSVs, IRS bulk-XML ZIPs, hundreds of FDA PDF directories) into tidy Parquet/JSON/CSV, plus a CCN↔EIN crosswalk that joins the two hospital filings.

My packaged bundles + parsers (self-promo — I built this): https://github.com/cbetz/trove — browsable lookup at https://troveproject.com

Happy to answer questions about the parsing or add fields people want!

submitted by /u/scrapdog
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *