On Reducing Harmonic and Sampling Distortion in Vocal Tract Length Normalization

Becerra Yoma, Néstor; Garretón, Claudio; Huenupán, Fernando; Catalán, Ignacio; Wuth Sepúlveda, Jorge

Author	dc.contributor.author	Becerra Yoma, Néstor
Author	dc.contributor.author	Garretón, Claudio	es_CL
Author	dc.contributor.author	Huenupán, Fernando	es_CL
Author	dc.contributor.author	Catalán, Ignacio	es_CL
Author	dc.contributor.author	Wuth Sepúlveda, Jorge	es_CL
Admission date	dc.date.accessioned	2014-03-06T19:43:41Z
Available date	dc.date.available	2014-03-06T19:43:41Z
Publication date	dc.date.issued	2013
Cita de ítem	dc.identifier.citation	IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2013	en_US
Identifier	dc.identifier.other	doi 10.1109/TASL.2012.2215590
Identifier	dc.identifier.uri	https://repositorio.uchile.cl/handle/2250/126416
General note	dc.description	Artículo de publicación ISI	en_US
Abstract	dc.description.abstract	This paper proposes a novel feature-space VTLN (vocal tract length normalization) method that models frequency warping as a linear interpolation of contiguous Mel filter-bank energies. The presented technique aims to reduce the distortion in the Mel filter-bank energy estimation due to the harmonic composition of voiced speech intervals and DFT (discrete Fourier transform) sampling when the central frequency of band-pass filters is shifted. This paper also proposes an analytical maximum likelihood (ML) method to estimate the optimal warping factor in the cepstral space. The presented interpolated filter-bank energy- based VTLN leads to relative reductions inWER (word error rate) as high as 11.2% and 7.6% when compared with the baseline system and standard VTLN, respectively, in a medium-vocabulary continuous speech recognition task. Also, the proposed VTLN scheme can provide significant reductions inWER when compared with state-of-the-art VTLN methods based on linear transforms in the cepstral feature-space. The warping factor estimated with the proposed VTLN approach shows more dependence on the speaker and more independence of the acoustic-phonetic content than the warping factor resulting from standard and state-of-the-art VTLN methods. Finally, the analytical ML-based optimization scheme presented here achieves almost the same reductions in WER as the ML grid search version of the technique with a computational load 20 times lower.	en_US
Lenguage	dc.language.iso	en	en_US
Type of license	dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Chile	*
Link to License	dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/cl/	*
Keywords	dc.subject	Speech analysis	en_US
Título	dc.title	On Reducing Harmonic and Sampling Distortion in Vocal Tract Length Normalization	en_US
Document type	dc.type	Artículo de revista

Files in this item

Name:: On Reducing Harmonic.pdf
Size:: 1.886Mb
Format:: PDF

This item appears in the following Collection(s)

Artículos de revistas
Artículos de revistas

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Chile