On the image content of a web segment:  Chile as a case study

Jaimes, A.; Ruiz del Solar, Javier; Verschae, Rodrigo; Baeza Yates, Ricardo; Castillo, C.; Yaksic, D.; Davis, E.

Author	dc.contributor.author	Jaimes, A.
Author	dc.contributor.author	Ruiz del Solar, Javier	es_CL
Author	dc.contributor.author	Verschae, Rodrigo	es_CL
Author	dc.contributor.author	Baeza Yates, Ricardo	es_CL
Author	dc.contributor.author	Castillo, C.	es_CL
Author	dc.contributor.author	Yaksic, D.	es_CL
Author	dc.contributor.author	Davis, E.	es_CL
Admission date	dc.date.accessioned	2012-12-18T14:07:28Z
Available date	dc.date.available	2012-12-18T14:07:28Z
Publication date	dc.date.issued	2004
Cita de ítem	dc.identifier.citation	Journal of Web Engineering, Vol. 3, No.2 (2004) 153-168	es_CL
Identifier	dc.identifier.uri	https://repositorio.uchile.cl/handle/2250/125696
Abstract	dc.description.abstract	We propose a methodology to characterize the image contents of a web segment, and we present an analysis of the contents of a segment of the Chilean web (.CL domain). Our framework uses an efficient web-crawling architecture, standard content-based analysis tools (to extract low-level features such as color, shape and texture), and novel skin and face detection algorithms. In an automated process we start by examining all websites within a domain (e.g., .cl websites), obtaining links to images, and downloading a large number of the images (in all of our experiments approx. 383,000 images that correspond to about 35 billion pixels). Once the images are downloaded to a local server, our process automatically extracts several low-level visual features (color, texture, shape, etc.). Using novel algorithms we perform skin and face detection. The results of visual feature extraction, skin, and face detection are then used to characterize the contents of a web segment. We tested our methodology on a segment of the Chilean web (.cl), by automatically downloading and processing 183,000 images in 2003 and 200,000 images in 2004. We present some statistics derived from both sets of images, which should be of use to anyone concerned with the image content of the web in Chile. Our study is the first one to use content-based tools to determine the image contents of a given web segment.	es_CL
Patrocinador	dc.description.sponsorship	This research was funded by Millennium Nucleus Center for Web Research, Grant P01-029-F, Mideplan, Chile.	es_CL
Lenguage	dc.language.iso	en	es_CL
Publisher	dc.publisher	Rinton Press	es_CL
Keywords	dc.subject	Web characterization	es_CL
Título	dc.title	On the image content of a web segment: Chile as a case study	es_CL
Document type	dc.type	Artículo de revista

Files in this item

Name:: Jaimes_A.pdf
Size:: 331.3Kb
Format:: PDF

This item appears in the following Collection(s)

Artículos de revistas
Artículos de revistas

Show simple item record