Perceptual effects of noise reduction by time-frequency masking of noisy speech

Inge Brons, Rolph Houben, Wouter A. Dreschler

Research output: Contribution to journalArticleAcademicpeer-review

21 Citations (Scopus)

Abstract

Time-frequency masking is a method for noise reduction that is based on the time-frequency representation of a speech in noise signal. Depending on the estimated signal-to-noise ratio (SNR), each time-frequency unit is either attenuated or not. A special type of a time-frequency mask is the ideal binary mask (IBM), which has access to the real SNR (ideal). The IBM either retains or removes each time-frequency unit (binary mask). The IBM provides large improvements in speech intelligibility and is a valuable tool for investigating how different factors influence intelligibility. This study extends the standard outcome measure (speech intelligibility) with additional perceptual measures relevant for noise reduction: listening effort, noise annoyance, speech naturalness, and overall preference. Four types of time-frequency masking were evaluated: the original IBM, a tempered version of the IBM (called ITM) which applies limited and non-binary attenuation, and non-ideal masking (also tempered) with two different types of noise-estimation algorithms. The results from ideal masking imply that there is a trade-off between intelligibility and sound quality, which depends on the attenuation strength. Additionally, the results for non-ideal masking suggest that subjective measures can show effects of noise reduction even if noise reduction does not lead to differences in intelligibility. (C) 2012 Acoustical Society of America. [http://dx.doi.org/10.1121/1.4747006]
Original languageEnglish
Pages (from-to)2690-2699
JournalJournal of the Acoustical Society of America
Volume132
Issue number4 1
DOIs
Publication statusPublished - 2012

Cite this