A perceptually-motivated low-complexity instantaneous linearchannel normalization technique applied to speaker verification

Poblete Ramírez, Víctor; Espic, Felipe; King, Simon; Stern, Richard M.; Huenupán, Fernando; Fredes Sandoval, Josué Abraham; Becerra Yoma, Néstor

Author	dc.contributor.author	Poblete Ramírez, Víctor
Author	dc.contributor.author	Espic, Felipe
Author	dc.contributor.author	King, Simon
Author	dc.contributor.author	Stern, Richard M.
Author	dc.contributor.author	Huenupán, Fernando
Author	dc.contributor.author	Fredes Sandoval, Josué Abraham
Author	dc.contributor.author	Becerra Yoma, Néstor
Admission date	dc.date.accessioned	2015-08-17T20:21:41Z
Available date	dc.date.available	2015-08-17T20:21:41Z
Publication date	dc.date.issued	2015
Cita de ítem	dc.identifier.citation	Computer Speech and Language 31 (2015) 1–27	en_US
Identifier	dc.identifier.other	DOI: 10.1016/j.csl.2014.10.006
Identifier	dc.identifier.uri	https://repositorio.uchile.cl/handle/2250/132800
General note	dc.description	Artículo de publicación ISI	en_US
Abstract	dc.description.abstract	This paper proposes a new set of speech features called Locally-Normalized Cepstral Coefficients (LNCC) that are based onSeneff’s Generalized Synchrony Detector (GSD). First, an analysis of the GSD frequency response is provided to show that itgenerates spurious peaks at harmonics of the detected frequency. Then, the GSD frequency response is modeled as a quotient of twofilters centered at the detected frequency. The numerator is a triangular band pass filter centered around a particular frequency similarto the ordinary Mel filters. The denominator term is a filter that responds maximally to frequency components on either side of thenumerator filter. As a result, a local normalization is performed without the spurious peaks of the original GSD. Speaker verificationresults demonstrate that the proposed LNCC features are of low computational complexity and far more effectively compensate forspectral tilt than ordinary MFCC coefficients. LNCC features do not require the computation and storage of a moving average of thefeature values, and they provide relative reductions in Equal Error Rate (EER) as high as 47.7%, 34.0% or 25.8% when comparedwith MFCC, MFCC + CMN, or MFCC + RASTA in one case of variable spectral tilt, respectively.	en_US
Patrocinador	dc.description.sponsorship	CONICYT-ANILLO ACT 1120 CONICYT-FONDECYT 1100195 EPSRC EP/I031022/1 Defense Advanced Research Projects Agency (DARPA) D10PC20024	en_US
Lenguage	dc.language.iso	en	en_US
Publisher	dc.publisher	Elsevier	en_US
Type of license	dc.rights	Atribución-NoComercial-SinDerivadas 3.0 Chile	*
Link to License	dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/cl/	*
Keywords	dc.subject	Channel robust feature extraction	en_US
Keywords	dc.subject	Auditorymodels	en_US
Keywords	dc.subject	Spectral local normalization	en_US
Keywords	dc.subject	Synchrony detection	en_US
Título	dc.title	A perceptually-motivated low-complexity instantaneous linearchannel normalization technique applied to speaker verification	en_US
Document type	dc.type	Artículo de revista

Files in this item

Name:: A-perceptually-motivated-low-c ...
Size:: 2.117Mb
Format:: PDF

This item appears in the following Collection(s)

Artículos de revistas
Artículos de revistas

Show simple item record

Except where otherwise noted, this item's license is described as Atribución-NoComercial-SinDerivadas 3.0 Chile