On Reducing Harmonic and Sampling Distortion in Vocal Tract Length Normalization
Artículo
Open/ Download
Publication date
2013Metadata
Show full item record
Cómo citar
Becerra Yoma, Néstor
Cómo citar
On Reducing Harmonic and Sampling Distortion in Vocal Tract Length Normalization
Author
Abstract
This paper proposes a novel feature-space VTLN
(vocal tract length normalization) method that models frequency
warping as a linear interpolation of contiguous Mel filter-bank
energies. The presented technique aims to reduce the distortion
in the Mel filter-bank energy estimation due to the harmonic
composition of voiced speech intervals and DFT (discrete Fourier
transform) sampling when the central frequency of band-pass
filters is shifted. This paper also proposes an analytical maximum
likelihood (ML) method to estimate the optimal warping factor
in the cepstral space. The presented interpolated filter-bank energy-
based VTLN leads to relative reductions inWER (word error
rate) as high as 11.2% and 7.6% when compared with the baseline
system and standard VTLN, respectively, in a medium-vocabulary
continuous speech recognition task. Also, the proposed VTLN
scheme can provide significant reductions inWER when compared
with state-of-the-art VTLN methods based on linear transforms in
the cepstral feature-space. The warping factor estimated with the
proposed VTLN approach shows more dependence on the speaker
and more independence of the acoustic-phonetic content than the
warping factor resulting from standard and state-of-the-art VTLN
methods. Finally, the analytical ML-based optimization scheme
presented here achieves almost the same reductions in WER as
the ML grid search version of the technique with a computational
load 20 times lower.
General note
Artículo de publicación ISI
Quote Item
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2013
Collections