Perceptual effects of noise reduction by time-frequency masking of noisy speech

Inge Brons; Rolph Houben; Wouter A. Dreschler

doi:https://doi.org/10.1121/1.4747006

Perceptual effects of noise reduction by time-frequency masking of noisy speech

Inge Brons, Rolph Houben, Wouter A. Dreschler

Research output: Contribution to journal › Article › Academic › peer-review

21 Citations (Scopus)

Abstract

Time-frequency masking is a method for noise reduction that is based on the time-frequency representation of a speech in noise signal. Depending on the estimated signal-to-noise ratio (SNR), each time-frequency unit is either attenuated or not. A special type of a time-frequency mask is the ideal binary mask (IBM), which has access to the real SNR (ideal). The IBM either retains or removes each time-frequency unit (binary mask). The IBM provides large improvements in speech intelligibility and is a valuable tool for investigating how different factors influence intelligibility. This study extends the standard outcome measure (speech intelligibility) with additional perceptual measures relevant for noise reduction: listening effort, noise annoyance, speech naturalness, and overall preference. Four types of time-frequency masking were evaluated: the original IBM, a tempered version of the IBM (called ITM) which applies limited and non-binary attenuation, and non-ideal masking (also tempered) with two different types of noise-estimation algorithms. The results from ideal masking imply that there is a trade-off between intelligibility and sound quality, which depends on the attenuation strength. Additionally, the results for non-ideal masking suggest that subjective measures can show effects of noise reduction even if noise reduction does not lead to differences in intelligibility. (C) 2012 Acoustical Society of America. [http://dx.doi.org/10.1121/1.4747006]

Original language	English
Pages (from-to)	2690-2699
Journal	Journal of the Acoustical Society of America
Volume	132
Issue number	4 1
DOIs	https://doi.org/10.1121/1.4747006
Publication status	Published - 2012

Access to Document

https://doi.org/10.1121/1.4747006

Cite this

@article{c12d29681ff84de0b7a15c4870c5895d,

title = "Perceptual effects of noise reduction by time-frequency masking of noisy speech",

abstract = "Time-frequency masking is a method for noise reduction that is based on the time-frequency representation of a speech in noise signal. Depending on the estimated signal-to-noise ratio (SNR), each time-frequency unit is either attenuated or not. A special type of a time-frequency mask is the ideal binary mask (IBM), which has access to the real SNR (ideal). The IBM either retains or removes each time-frequency unit (binary mask). The IBM provides large improvements in speech intelligibility and is a valuable tool for investigating how different factors influence intelligibility. This study extends the standard outcome measure (speech intelligibility) with additional perceptual measures relevant for noise reduction: listening effort, noise annoyance, speech naturalness, and overall preference. Four types of time-frequency masking were evaluated: the original IBM, a tempered version of the IBM (called ITM) which applies limited and non-binary attenuation, and non-ideal masking (also tempered) with two different types of noise-estimation algorithms. The results from ideal masking imply that there is a trade-off between intelligibility and sound quality, which depends on the attenuation strength. Additionally, the results for non-ideal masking suggest that subjective measures can show effects of noise reduction even if noise reduction does not lead to differences in intelligibility. (C) 2012 Acoustical Society of America. [http://dx.doi.org/10.1121/1.4747006]",

author = "Inge Brons and Rolph Houben and Dreschler, {Wouter A.}",

year = "2012",

doi = "https://doi.org/10.1121/1.4747006",

language = "English",

volume = "132",

pages = "2690--2699",

journal = "Journal of the Acoustical Society of America",

issn = "0001-4966",

publisher = "Acoustical Society of America",

number = "4 1",

}

TY - JOUR

T1 - Perceptual effects of noise reduction by time-frequency masking of noisy speech

AU - Brons, Inge

AU - Houben, Rolph

AU - Dreschler, Wouter A.

PY - 2012

Y1 - 2012

N2 - Time-frequency masking is a method for noise reduction that is based on the time-frequency representation of a speech in noise signal. Depending on the estimated signal-to-noise ratio (SNR), each time-frequency unit is either attenuated or not. A special type of a time-frequency mask is the ideal binary mask (IBM), which has access to the real SNR (ideal). The IBM either retains or removes each time-frequency unit (binary mask). The IBM provides large improvements in speech intelligibility and is a valuable tool for investigating how different factors influence intelligibility. This study extends the standard outcome measure (speech intelligibility) with additional perceptual measures relevant for noise reduction: listening effort, noise annoyance, speech naturalness, and overall preference. Four types of time-frequency masking were evaluated: the original IBM, a tempered version of the IBM (called ITM) which applies limited and non-binary attenuation, and non-ideal masking (also tempered) with two different types of noise-estimation algorithms. The results from ideal masking imply that there is a trade-off between intelligibility and sound quality, which depends on the attenuation strength. Additionally, the results for non-ideal masking suggest that subjective measures can show effects of noise reduction even if noise reduction does not lead to differences in intelligibility. (C) 2012 Acoustical Society of America. [http://dx.doi.org/10.1121/1.4747006]

AB - Time-frequency masking is a method for noise reduction that is based on the time-frequency representation of a speech in noise signal. Depending on the estimated signal-to-noise ratio (SNR), each time-frequency unit is either attenuated or not. A special type of a time-frequency mask is the ideal binary mask (IBM), which has access to the real SNR (ideal). The IBM either retains or removes each time-frequency unit (binary mask). The IBM provides large improvements in speech intelligibility and is a valuable tool for investigating how different factors influence intelligibility. This study extends the standard outcome measure (speech intelligibility) with additional perceptual measures relevant for noise reduction: listening effort, noise annoyance, speech naturalness, and overall preference. Four types of time-frequency masking were evaluated: the original IBM, a tempered version of the IBM (called ITM) which applies limited and non-binary attenuation, and non-ideal masking (also tempered) with two different types of noise-estimation algorithms. The results from ideal masking imply that there is a trade-off between intelligibility and sound quality, which depends on the attenuation strength. Additionally, the results for non-ideal masking suggest that subjective measures can show effects of noise reduction even if noise reduction does not lead to differences in intelligibility. (C) 2012 Acoustical Society of America. [http://dx.doi.org/10.1121/1.4747006]

U2 - https://doi.org/10.1121/1.4747006

DO - https://doi.org/10.1121/1.4747006

M3 - Article

C2 - 23039461

SN - 0001-4966

VL - 132

SP - 2690

EP - 2699

JO - Journal of the Acoustical Society of America

JF - Journal of the Acoustical Society of America

IS - 4 1

ER -