Shape-based modeling of the fundamental frequency contour for emotion detection in speech
Artículo
Publication date
2014Metadata
Show full item record
Cómo citar
Arias Aparicio, Juan Pablo
Cómo citar
Shape-based modeling of the fundamental frequency contour for emotion detection in speech
Abstract
This paper proposes the use of neutral reference models to detect local emotional prominence in the fundamental frequency. A
novel approach based on functional data analysis (FDA) is presented, which aims to capture the intrinsic variability of F0 contours.
The neutral models are represented by a basis of functions and the testing F0 contour is characterized by the projections onto
that basis. For a given F0 contour, we estimate the functional principal component analysis (PCA) projections, which are used as
features for emotion detection. The approach is evaluated with lexicon-dependent (i.e., one functional PCA basis per sentence) and
lexicon-independent (i.e., a single functional PCA basis across sentences) models. The experimental results show that the proposed
system can lead to accuracies as high as 75.8% in binary emotion classification, which is 6.2% higher than the accuracy achieved by
a benchmark system trained with global F0 statistics. The approach can be implemented at sub-sentence level (e.g., 0.5 s segments),
facilitating the detection of localized emotional information conveyed within the sentence. The approach is validated with the
SEMAINE database, which is a spontaneous corpus. The results indicate that the proposed scheme can be effectively employed in
real applications to detect emotional speech.
General note
Artículo de publicación ISI
Patrocinador
This work was funded by the Government of Chile under grants Fondecyt 1100195 and Mecesup FSM0601, and
US National Science Foundation under grants IIS-1217104 and IIS-1329659.
Identifier
URI: https://repositorio.uchile.cl/handle/2250/126799
DOI: dx.doi.org/10.1016/j.csl.2013.07.002
Quote Item
Computer Speech and Language 28 (2014) 278–294
Collections