Crawling a Country: Better Strategies than BreadthFirst for Web Page Ordering
Author
dc.contributor.author
Baeza Yates, Ricardo
Author
dc.contributor.author
Castillo Ocaranza, Carlos
es_CL
Author
dc.contributor.author
Marín, Mauricio
es_CL
Author
dc.contributor.author
Rodríguez, Andrea
es_CL
Admission date
dc.date.accessioned
2013-12-20T18:39:49Z
Available date
dc.date.available
2013-12-20T18:39:49Z
Publication date
dc.date.issued
2005
Cita de ítem
dc.identifier.citation
WWW 2005 May 10–14, 2005, Chiba, Japan
en_US
Identifier
dc.identifier.uri
https://repositorio.uchile.cl/handle/2250/125821
General note
dc.description
Artículo de publicación ISI
en_US
Abstract
dc.description.abstract
This article compares several page ordering strategies for
Web crawling under several metrics. The objective of these
strategies is to download the most \important" pages \early"
during the crawl. As the coverage of modern search engines
is small compared to the size of the Web, and it is impossi-
ble to index all of the Web for both theoretical and practical
reasons, it is relevant to index at least the most important
pages.
We use data from actual Web pages to build Web graphs
and execute a crawler simulator on those graphs. As the
Web is very dynamic, crawling simulation is the only way to
ensure that all the strategies considered are compared un-
der the same conditions. We propose several page ordering
strategies that are more e cient than breadth- rst search
and strategies based on partial Pagerank calculations.