On Reducing Harmonic and Sampling Distortion in Vocal Tract Length Normalization
Author
dc.contributor.author
Becerra Yoma, Néstor
Author
dc.contributor.author
Garretón, Claudio
es_CL
Author
dc.contributor.author
Huenupán, Fernando
es_CL
Author
dc.contributor.author
Catalán, Ignacio
es_CL
Author
dc.contributor.author
Wuth Sepúlveda, Jorge
es_CL
Admission date
dc.date.accessioned
2014-03-06T19:43:41Z
Available date
dc.date.available
2014-03-06T19:43:41Z
Publication date
dc.date.issued
2013
Cita de ítem
dc.identifier.citation
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2013
en_US
Identifier
dc.identifier.other
doi 10.1109/TASL.2012.2215590
Identifier
dc.identifier.uri
https://repositorio.uchile.cl/handle/2250/126416
General note
dc.description
Artículo de publicación ISI
en_US
Abstract
dc.description.abstract
This paper proposes a novel feature-space VTLN
(vocal tract length normalization) method that models frequency
warping as a linear interpolation of contiguous Mel filter-bank
energies. The presented technique aims to reduce the distortion
in the Mel filter-bank energy estimation due to the harmonic
composition of voiced speech intervals and DFT (discrete Fourier
transform) sampling when the central frequency of band-pass
filters is shifted. This paper also proposes an analytical maximum
likelihood (ML) method to estimate the optimal warping factor
in the cepstral space. The presented interpolated filter-bank energy-
based VTLN leads to relative reductions inWER (word error
rate) as high as 11.2% and 7.6% when compared with the baseline
system and standard VTLN, respectively, in a medium-vocabulary
continuous speech recognition task. Also, the proposed VTLN
scheme can provide significant reductions inWER when compared
with state-of-the-art VTLN methods based on linear transforms in
the cepstral feature-space. The warping factor estimated with the
proposed VTLN approach shows more dependence on the speaker
and more independence of the acoustic-phonetic content than the
warping factor resulting from standard and state-of-the-art VTLN
methods. Finally, the analytical ML-based optimization scheme
presented here achieves almost the same reductions in WER as
the ML grid search version of the technique with a computational
load 20 times lower.