Anyone Has Any Good RIR Mega Dataset In The Audio ML Space? [Synthetic]

Came across this dataset paper that I think deserves more attention.

RIR-Mega is a large-scale collection of simulated Room Impulse Responses (RIRs) designed specifically for ML workflows. What makes it stand out from older RIR datasets:

  • 50,000 RIRs with a clean, flat Parquet metadata schema (RT60, DRR, C50, C80, band RT60s)
  • Three evaluation splits: random, unseen_room, and unseen_distance — so you can actually test generalization

The HF dataset is at: https://huggingface.co/datasets/mandipgoswami/rirmega Paper: https://arxiv.org/abs/2510.18917

Has anyone used this for dereverberation or acoustic parameter estimation? Curious how it holds up against BUT-ReverbDB or OpenRIR for downstream ASR robustness tasks.

submitted by /u/Stellar_Bluebird
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *