Speech Datasets That Capture Numeral Errors (i.e., 57 > 75)?

Hi everyone.

Not sure if this exist since things are usually cleaned up quite a bit before going public, but are there any data sources that could be used to study common numeral errors?

Mainly interested in instances of leading-digit bias (i.e. reading 9.88 as 9 instead of 10), but that’s even weirder and harder to track down in speech. No way of filtering out ‘misspeaks’ in major corpora like ANC or COCA, AFAIK. Any recommendations or leads?

submitted by /u/dennu9909
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *