Came across this dataset paper that I think deserves more attention.
RIR-Mega is a large-scale collection of simulated Room Impulse Responses (RIRs) designed specifically for ML workflows. What makes it stand out from older RIR datasets:
- 50,000 RIRs with a clean, flat Parquet metadata schema (RT60, DRR, C50, C80, band RT60s)
- Three evaluation splits: random, unseen_room, and unseen_distance — so you can actually test generalization
The HF dataset is at: https://huggingface.co/datasets/mandipgoswami/rirmega Paper: https://arxiv.org/abs/2510.18917
Has anyone used this for dereverberation or acoustic parameter estimation? Curious how it holds up against BUT-ReverbDB or OpenRIR for downstream ASR robustness tasks.
submitted by /u/Stellar_Bluebird
[link] [comments]