Full disclosure [self-promotion]: I’m the solo builder. Happy to answer questions about the data, methodology, or entity resolution approach.
I built FastDOL, a platform that links federal workplace enforcement records across agencies into a single employer profile. The government publishes this data, but each agency has its own database, its own identifiers, and its own terrible search UI.
The cross-agency dataset links enforcement records from OSHA, WHD, MSHA, EPA, EEOC, OFCCP, OFLC, and others at the employer level with parent-company rollup. The interesting finding: employers cited by 3+ agencies have a 3.4x higher worker fatality rate than employers cited by 1-2 agencies.
Four open datasets available so far, all CC BY 4.0:
- Cross-Agency Federal Violations by Employer (~2.3M rows)
- OSHA Construction Enforcement by Employer (377K rows)
- OSHA Citations Q1 2026 (28,827 rows, citation-level)
- WHD Wage Theft Enforcement Actions by Employer
All hosted on Hugging Face, Kaggle, and Zenodo with DOIs. Full schema, methodology, and BibTeX on the canonical pages: https://www.fastdol.com/datasets
submitted by /u/chill-botulism
[link] [comments]