Disclosure: I created and host this dataset.
I compiled a dataset of 80 cybersecurity incident disclosures from SEC filings (primarily 8-K reports) and labeled them using a structured taxonomy.
The goal was to create a more usable dataset for analyzing real-world cyber incidents based on public disclosures.
Dataset includes:
- Threat type classification (ransomware, data theft, insider, supply chain, etc.)
- Indicators of business impact (operational disruption, recovery status)
- Sector categorization (e.g., financial services)
- Whether cyber insurance was mentioned
- Source filing references (SEC EDGAR)
Some high-level observations from the dataset:
- ~72% of cases indicate incomplete recovery or significant disruption
- 50% involve data theft or exposure
- Financial services is the most represented sector
- ~18% mention cyber insurance
Methodology:
- Source: SEC EDGAR (8-K incident disclosures)
- Manual review of each case
- Consistent tagging using a predefined taxonomy
- AI used to assist classification consistency (not fully automated)
Limitations:
- Disclosure quality varies significantly
- Many filings are intentionally vague
- Sample size is still relatively small (n=80)
submitted by /u/LordKittyPanther
[link] [comments]