Looking For Datasets That Resemble Real Medical Record Packets (for Chronology Extraction)

I’m working on a system that processes large medical record packets and generates a chronological timeline with evidence citations (think: turning hundreds or thousands of pages of medical records into a structured chronology).

Right now I’m trying to find datasets that resemble real world medical record packets so I can test robustness. Most of the datasets I’ve found so far are either:

• purely structured EHR tables (diagnoses, labs, etc.)
• small sets of individual clinical notes
• synthetic datasets

What I’m ideally looking for:

• Long clinical documents (discharge summaries, physician notes, operative reports)
• Multi-document patient records
• Collections of clinical PDFs or reports
• Narrative-heavy hospital documentation
• Anything resembling actual chart records rather than isolated notes

Datasets I already know about:

• MIMIC-IV / MIMIC-IV-Note (waiting for credentials, anyone have a mirror?)
• i2b2 / n2c2 clinical NLP datasets (registration to download it is closed?)
• MTSamples medical transcription dataset

submitted by /u/deputy1389
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *