Show simple item record

Authordc.contributor.authorChelba, Ciprian es_CL
Authordc.contributor.authorSilva, Jorge es_CL
Authordc.contributor.authorAcero, Alex es_CL
Admission datedc.date.accessioned2008-05-14T14:12:53Z
Available datedc.date.available2008-05-14T14:12:53Z
Publication datedc.date.issued2007es_CL
Cita de ítemdc.identifier.citationCOMPUTER SPEECH AND LANGUAGE Vol. 21 JUL 2007 3 458-478es_CL
Identifierdc.identifier.urihttps://repositorio.uchile.cl/handle/2250/124726
General notedc.descriptionPublicación ISIes_CL
Abstractdc.description.abstractThe paper presents the Position Specific Posterior Lattice (PSPL), a novel lossy representation of automatic speech recognition lattices that naturally tends itself to efficient indexing and subsequent relevance ranking of spoken documents. This technique explicitly takes into consideration the content uncertainty by means of using soft-hits. Indexing position information allows one to approximate N-gram expected counts and at the same time use more general proximity features in the relevance score calculation. In fact, one can easily port any state-of-the-art text-retrieval algorithm to the scenario of indexing ASR lattices for spoken documents, rather than using the 1-best recognition result. Experiments performed on a collection of lecture recordings-MIT iCampus database-show that the spoken document ranking performance was improved by 17-26% relative over the commonly used baseline of indexing the 1-best output from an automatic speech recognizer (ASR). The paper also addresses the problem of integrating speech and text content sources for the document search problem, as well as its usefulness from an ad hoc retrieval-keyword search-point of view. In this context, the PSPL formulation is naturally extended to deal with both speech and text content for a given document, where a new relevance ranking framework is proposed for integrating the different sources of information available. Experimental results on the MIT iCampus corpus show a relative improvement of 302% in Mean Average Precision (MAP) when using speech content and text-only metadata as opposed to just text-only metadata (which constitutes about 1% of the amount of data in the transcription of the speech content, measured in number of words). Further experiments show that even in scenarios for which the metadata size is artificially augmented such that it contains more than 10% of the spoken document transcription, the speech content still provides significant performance gains in MAP with respect to only using the text-metadata for relevance ranking. (c) 2006 Elsevier Ltd. All rights reserved.es_CL
Lenguagedc.language.isoenes_CL
Keywordsdc.subjectRECOGNITIONes_CL
Area Temáticadc.subject.otherComputer Science, Artificial Intelligencees_CL
Títulodc.titleSoft indexing of speech content for search in spoken documentses_CL
Document typedc.typeArtículo de revista


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record