Show simple item record

Authordc.contributor.authorBecerra Yoma, Néstor 
Authordc.contributor.authorGarretón, Claudio es_CL
Authordc.contributor.authorHuenupán, Fernando es_CL
Authordc.contributor.authorCatalán, Ignacio es_CL
Authordc.contributor.authorWuth Sepúlveda, Jorge es_CL
Admission datedc.date.accessioned2014-03-06T19:43:41Z
Available datedc.date.available2014-03-06T19:43:41Z
Publication datedc.date.issued2013
Cita de ítemdc.identifier.citationIEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2013en_US
Identifierdc.identifier.otherdoi 10.1109/TASL.2012.2215590
Identifierdc.identifier.urihttps://repositorio.uchile.cl/handle/2250/126416
General notedc.descriptionArtículo de publicación ISIen_US
Abstractdc.description.abstractThis paper proposes a novel feature-space VTLN (vocal tract length normalization) method that models frequency warping as a linear interpolation of contiguous Mel filter-bank energies. The presented technique aims to reduce the distortion in the Mel filter-bank energy estimation due to the harmonic composition of voiced speech intervals and DFT (discrete Fourier transform) sampling when the central frequency of band-pass filters is shifted. This paper also proposes an analytical maximum likelihood (ML) method to estimate the optimal warping factor in the cepstral space. The presented interpolated filter-bank energy- based VTLN leads to relative reductions inWER (word error rate) as high as 11.2% and 7.6% when compared with the baseline system and standard VTLN, respectively, in a medium-vocabulary continuous speech recognition task. Also, the proposed VTLN scheme can provide significant reductions inWER when compared with state-of-the-art VTLN methods based on linear transforms in the cepstral feature-space. The warping factor estimated with the proposed VTLN approach shows more dependence on the speaker and more independence of the acoustic-phonetic content than the warping factor resulting from standard and state-of-the-art VTLN methods. Finally, the analytical ML-based optimization scheme presented here achieves almost the same reductions in WER as the ML grid search version of the technique with a computational load 20 times lower.en_US
Lenguagedc.language.isoenen_US
Type of licensedc.rightsAttribution-NonCommercial-NoDerivs 3.0 Chile*
Link to Licensedc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/cl/*
Keywordsdc.subjectSpeech analysisen_US
Títulodc.titleOn Reducing Harmonic and Sampling Distortion in Vocal Tract Length Normalizationen_US
Document typedc.typeArtículo de revista


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Chile
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Chile