A perceptually-motivated low-complexity instantaneous linearchannel normalization technique applied to speaker verification
Author
dc.contributor.author
Poblete Ramírez, Víctor
Author
dc.contributor.author
Espic, Felipe
Author
dc.contributor.author
King, Simon
Author
dc.contributor.author
Stern, Richard M.
Author
dc.contributor.author
Huenupán, Fernando
Author
dc.contributor.author
Fredes Sandoval, Josué Abraham
Author
dc.contributor.author
Becerra Yoma, Néstor
Admission date
dc.date.accessioned
2015-08-17T20:21:41Z
Available date
dc.date.available
2015-08-17T20:21:41Z
Publication date
dc.date.issued
2015
Cita de ítem
dc.identifier.citation
Computer Speech and Language 31 (2015) 1–27
en_US
Identifier
dc.identifier.other
DOI: 10.1016/j.csl.2014.10.006
Identifier
dc.identifier.uri
https://repositorio.uchile.cl/handle/2250/132800
General note
dc.description
Artículo de publicación ISI
en_US
Abstract
dc.description.abstract
This paper proposes a new set of speech features called Locally-Normalized Cepstral Coefficients (LNCC) that are based onSeneff’s Generalized Synchrony Detector (GSD). First, an analysis of the GSD frequency response is provided to show that itgenerates spurious peaks at harmonics of the detected frequency. Then, the GSD frequency response is modeled as a quotient of twofilters centered at the detected frequency. The numerator is a triangular band pass filter centered around a particular frequency similarto the ordinary Mel filters. The denominator term is a filter that responds maximally to frequency components on either side of thenumerator filter. As a result, a local normalization is performed without the spurious peaks of the original GSD. Speaker verificationresults demonstrate that the proposed LNCC features are of low computational complexity and far more effectively compensate forspectral tilt than ordinary MFCC coefficients. LNCC features do not require the computation and storage of a moving average of thefeature values, and they provide relative reductions in Equal Error Rate (EER) as high as 47.7%, 34.0% or 25.8% when comparedwith MFCC, MFCC + CMN, or MFCC + RASTA in one case of variable spectral tilt, respectively.