On Reducing Harmonic and Sampling Distortion in Vocal Tract Length Normalization
Artículo
Open/ Download
Date
2013Metadata
Show full item record
Cómo citar
Becerra Yoma, Néstor
Cómo citar
On Reducing Harmonic and Sampling Distortion in Vocal Tract Length Normalization
Author
Abstract
This paper proposes a novel feature-space VTLN
(vocal tract length normalization) method that models frequency
warping as a linear interpolation of contiguous Mel filter-bank
energies. The presented technique aims to reduce the distortion
in the Mel filter-bank energy estimation due to the harmonic
composition of voiced speech intervals and DFT (discrete Fourier
transform) sampling when the central frequency of band-pass
filters is shifted. This paper also proposes an analytical maximum
likelihood (ML) method to estimate the optimal warping factor
in the cepstral space. The presented interpolated filter-bank energy-
based VTLN leads to relative reductions inWER (word error
rate) as high as 11.2% and 7.6% when compared with the baseline
system and standard VTLN, respectively, in a medium-vocabulary
continuous speech recognition task. Also, the proposed VTLN
scheme can provide significant reductions inWER when compared
with state-of-the-art VTLN methods based on linear transforms in
the cepstral feature-space. The warping factor estimated with the
proposed VTLN approach shows more dependence on the speaker
and more independence of the acoustic-phonetic content than the
warping factor resulting from standard and state-of-the-art VTLN
methods. Finally, the analytical ML-based optimization scheme
presented here achieves almost the same reductions in WER as
the ML grid search version of the technique with a computational
load 20 times lower.
General note
Artículo de publicación ISI
Quote Item
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2013
Collections