TY - JOUR
T1 - Automatic evaluation of spontaneous oral cancer speech using ratings from naive listeners
AU - Halpern, Bence Mark
AU - Feng, Siyuan
AU - van Son, Rob
AU - van den Brekel, Michiel
AU - Scharenborg, Odette
N1 - Funding Information: We would like to thank Noa Hannah for helping out with the SLP ratings. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under Marie Sklodowska-Curie grant agreement No 766287 . The Department of Head and Neck Oncology and surgery of the Netherlands Cancer Institute receives a research grant from Atos Medical (Hörby, Sweden), which contributes to the existing infrastructure for quality of life research. Funding Information: We would like to thank Noa Hannah for helping out with the SLP ratings. This project has received funding from the European Union's Horizon 2020 research and innovation programme under Marie Sklodowska-Curie grant agreement No 766287. The Department of Head and Neck Oncology and surgery of the Netherlands Cancer Institute receives a research grant from Atos Medical (Hörby, Sweden), which contributes to the existing infrastructure for quality of life research. Publisher Copyright: © 2023 The Author(s)
PY - 2023/4/1
Y1 - 2023/4/1
N2 - In this paper, we build and compare multiple speech systems for the automatic evaluation of the severity of a speech impairment due to oral cancer, based on spontaneous speech. To be able to build and evaluate such systems, we collected a new spontaneous oral cancer speech corpus from YouTube consisting of 124 utterances rated by 100 non-expert listeners and one trained speech-language pathologist, which we made publicly available. We evaluated the systems in two scenarios: a scenario where transcriptions were available (reference-based) and a scenario where transcriptions might not be available (reference-free). The results of extensive experiments showed that (1) when transcriptions were available, the highest correlation with the human severity ratings was obtained using an automatic speech recognition (ASR) retrained with oral cancer speech. (2) When transcriptions were not available, the best results were achieved by a LASSO model using modulation spectrum features. (3) We found that naive listeners’ ratings are highly similar to the speech pathologist's ratings for speech severity evaluation. (4) The use of binary labels led to lower correlations of the automatic methods with the human ratings than using severity scores.
AB - In this paper, we build and compare multiple speech systems for the automatic evaluation of the severity of a speech impairment due to oral cancer, based on spontaneous speech. To be able to build and evaluate such systems, we collected a new spontaneous oral cancer speech corpus from YouTube consisting of 124 utterances rated by 100 non-expert listeners and one trained speech-language pathologist, which we made publicly available. We evaluated the systems in two scenarios: a scenario where transcriptions were available (reference-based) and a scenario where transcriptions might not be available (reference-free). The results of extensive experiments showed that (1) when transcriptions were available, the highest correlation with the human severity ratings was obtained using an automatic speech recognition (ASR) retrained with oral cancer speech. (2) When transcriptions were not available, the best results were achieved by a LASSO model using modulation spectrum features. (3) We found that naive listeners’ ratings are highly similar to the speech pathologist's ratings for speech severity evaluation. (4) The use of binary labels led to lower correlations of the automatic methods with the human ratings than using severity scores.
KW - Automatic speech evaluation
KW - Oral cancer
KW - Pathological speech
UR - http://www.scopus.com/inward/record.url?scp=85151357382&partnerID=8YFLogxK
U2 - https://doi.org/10.1016/j.specom.2023.03.008
DO - https://doi.org/10.1016/j.specom.2023.03.008
M3 - Article
SN - 0167-6393
VL - 149
SP - 84
EP - 97
JO - Speech Communication
JF - Speech Communication
ER -