Shape-based modeling of the fundamental frequency contour for emotion detection in speech
Author
dc.contributor.author
Arias Aparicio, Juan Pablo
Author
dc.contributor.author
Busso, Carlos
es_CL
Author
dc.contributor.author
Becerra Yoma, Néstor
es_CL
Admission date
dc.date.accessioned
2014-12-24T13:11:15Z
Available date
dc.date.available
2014-12-24T13:11:15Z
Publication date
dc.date.issued
2014
Cita de ítem
dc.identifier.citation
Computer Speech and Language 28 (2014) 278–294
en_US
Identifier
dc.identifier.other
dx.doi.org/10.1016/j.csl.2013.07.002
Identifier
dc.identifier.uri
https://repositorio.uchile.cl/handle/2250/126799
General note
dc.description
Artículo de publicación ISI
en_US
Abstract
dc.description.abstract
This paper proposes the use of neutral reference models to detect local emotional prominence in the fundamental frequency. A
novel approach based on functional data analysis (FDA) is presented, which aims to capture the intrinsic variability of F0 contours.
The neutral models are represented by a basis of functions and the testing F0 contour is characterized by the projections onto
that basis. For a given F0 contour, we estimate the functional principal component analysis (PCA) projections, which are used as
features for emotion detection. The approach is evaluated with lexicon-dependent (i.e., one functional PCA basis per sentence) and
lexicon-independent (i.e., a single functional PCA basis across sentences) models. The experimental results show that the proposed
system can lead to accuracies as high as 75.8% in binary emotion classification, which is 6.2% higher than the accuracy achieved by
a benchmark system trained with global F0 statistics. The approach can be implemented at sub-sentence level (e.g., 0.5 s segments),
facilitating the detection of localized emotional information conveyed within the sentence. The approach is validated with the
SEMAINE database, which is a spontaneous corpus. The results indicate that the proposed scheme can be effectively employed in
real applications to detect emotional speech.
en_US
Patrocinador
dc.description.sponsorship
This work was funded by the Government of Chile under grants Fondecyt 1100195 and Mecesup FSM0601, and
US National Science Foundation under grants IIS-1217104 and IIS-1329659.