Crawling a Country: Better Strategies than BreadthFirst for Web Page Ordering
Artículo
Open/ Download
Publication date
2005Metadata
Show full item record
Cómo citar
Baeza Yates, Ricardo
Cómo citar
Crawling a Country: Better Strategies than BreadthFirst for Web Page Ordering
Abstract
This article compares several page ordering strategies for
Web crawling under several metrics. The objective of these
strategies is to download the most \important" pages \early"
during the crawl. As the coverage of modern search engines
is small compared to the size of the Web, and it is impossi-
ble to index all of the Web for both theoretical and practical
reasons, it is relevant to index at least the most important
pages.
We use data from actual Web pages to build Web graphs
and execute a crawler simulator on those graphs. As the
Web is very dynamic, crawling simulation is the only way to
ensure that all the strategies considered are compared un-
der the same conditions. We propose several page ordering
strategies that are more e cient than breadth- rst search
and strategies based on partial Pagerank calculations.
General note
Artículo de publicación ISI
Identifier
URI: https://repositorio.uchile.cl/handle/2250/125821
Quote Item
WWW 2005 May 10–14, 2005, Chiba, Japan
Collections