Shape-based modeling of the fundamental frequency contour for emotion detection in speech

Arias Aparicio, Juan Pablo; Busso, Carlos; Becerra Yoma, Néstor

Author	dc.contributor.author	Arias Aparicio, Juan Pablo
Author	dc.contributor.author	Busso, Carlos	es_CL
Author	dc.contributor.author	Becerra Yoma, Néstor	es_CL
Admission date	dc.date.accessioned	2014-12-24T13:11:15Z
Available date	dc.date.available	2014-12-24T13:11:15Z
Publication date	dc.date.issued	2014
Cita de ítem	dc.identifier.citation	Computer Speech and Language 28 (2014) 278–294	en_US
Identifier	dc.identifier.other	dx.doi.org/10.1016/j.csl.2013.07.002
Identifier	dc.identifier.uri	https://repositorio.uchile.cl/handle/2250/126799
General note	dc.description	Artículo de publicación ISI	en_US
Abstract	dc.description.abstract	This paper proposes the use of neutral reference models to detect local emotional prominence in the fundamental frequency. A novel approach based on functional data analysis (FDA) is presented, which aims to capture the intrinsic variability of F0 contours. The neutral models are represented by a basis of functions and the testing F0 contour is characterized by the projections onto that basis. For a given F0 contour, we estimate the functional principal component analysis (PCA) projections, which are used as features for emotion detection. The approach is evaluated with lexicon-dependent (i.e., one functional PCA basis per sentence) and lexicon-independent (i.e., a single functional PCA basis across sentences) models. The experimental results show that the proposed system can lead to accuracies as high as 75.8% in binary emotion classification, which is 6.2% higher than the accuracy achieved by a benchmark system trained with global F0 statistics. The approach can be implemented at sub-sentence level (e.g., 0.5 s segments), facilitating the detection of localized emotional information conveyed within the sentence. The approach is validated with the SEMAINE database, which is a spontaneous corpus. The results indicate that the proposed scheme can be effectively employed in real applications to detect emotional speech.	en_US
Patrocinador	dc.description.sponsorship	This work was funded by the Government of Chile under grants Fondecyt 1100195 and Mecesup FSM0601, and US National Science Foundation under grants IIS-1217104 and IIS-1329659.	en_US
Lenguage	dc.language.iso	en	en_US
Publisher	dc.publisher	Elsevier	en_US
Type of license	dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Chile	*
Link to License	dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/cl/	*
Keywords	dc.subject	Emotion detection	en_US
Título	dc.title	Shape-based modeling of the fundamental frequency contour for emotion detection in speech	en_US
Document type	dc.type	Artículo de revista

Files in this item

Name:: Shape-based-modeling-of-the-fu ...
Size:: 569.7Kb
Format:: PDF

This item appears in the following Collection(s)

Artículos de revistas
Artículos de revistas

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Chile