Soft indexing of speech content for search in spoken documents

Chelba, Ciprian; Silva, Jorge; Acero, Alex

Author	dc.contributor.author	Chelba, Ciprian	es_CL
Author	dc.contributor.author	Silva, Jorge	es_CL
Author	dc.contributor.author	Acero, Alex	es_CL
Admission date	dc.date.accessioned	2008-05-14T14:12:53Z
Available date	dc.date.available	2008-05-14T14:12:53Z
Publication date	dc.date.issued	2007	es_CL
Cita de ítem	dc.identifier.citation	COMPUTER SPEECH AND LANGUAGE Vol. 21 JUL 2007 3 458-478	es_CL
Identifier	dc.identifier.uri	https://repositorio.uchile.cl/handle/2250/124726
General note	dc.description	Publicación ISI	es_CL
Abstract	dc.description.abstract	The paper presents the Position Specific Posterior Lattice (PSPL), a novel lossy representation of automatic speech recognition lattices that naturally tends itself to efficient indexing and subsequent relevance ranking of spoken documents. This technique explicitly takes into consideration the content uncertainty by means of using soft-hits. Indexing position information allows one to approximate N-gram expected counts and at the same time use more general proximity features in the relevance score calculation. In fact, one can easily port any state-of-the-art text-retrieval algorithm to the scenario of indexing ASR lattices for spoken documents, rather than using the 1-best recognition result. Experiments performed on a collection of lecture recordings-MIT iCampus database-show that the spoken document ranking performance was improved by 17-26% relative over the commonly used baseline of indexing the 1-best output from an automatic speech recognizer (ASR). The paper also addresses the problem of integrating speech and text content sources for the document search problem, as well as its usefulness from an ad hoc retrieval-keyword search-point of view. In this context, the PSPL formulation is naturally extended to deal with both speech and text content for a given document, where a new relevance ranking framework is proposed for integrating the different sources of information available. Experimental results on the MIT iCampus corpus show a relative improvement of 302% in Mean Average Precision (MAP) when using speech content and text-only metadata as opposed to just text-only metadata (which constitutes about 1% of the amount of data in the transcription of the speech content, measured in number of words). Further experiments show that even in scenarios for which the metadata size is artificially augmented such that it contains more than 10% of the spoken document transcription, the speech content still provides significant performance gains in MAP with respect to only using the text-metadata for relevance ranking. (c) 2006 Elsevier Ltd. All rights reserved.	es_CL
Lenguage	dc.language.iso	en	es_CL
Keywords	dc.subject	RECOGNITION	es_CL
Area Temática	dc.subject.other	Computer Science, Artificial Intelligence	es_CL
Título	dc.title	Soft indexing of speech content for search in spoken documents	es_CL
Document type	dc.type	Artículo de revista

Files in this item

Name:: file_5674.txt
Size:: 0bytes
Format:: Text file

This item appears in the following Collection(s)

Artículos de revistas
Artículos de revistas

Show simple item record