Multiple-goal Search Algorithms and their Application to Web Crawling

Dmitry Davidov and Shaul Markovitch. Multiple-goal Search Algorithms and their Application to Web Crawling. In Proceedings of the Eighteenth National Conference on Artificial Intelligence, 713-718 Edmonton, Alberta, Canada, 2002.

Abstract

The work described in this paper presents a new framework for heuristic search where the task is to collect as many goals as possible within the allocated resources. We show the inadequacy of traditional distance heuristics for this type of tasks and present alternative types of heuristics that are more appropriate for multiple-goal search. In particular we introduce the yield heuristic that estimates the cost and the benefit of exploring a subtree below a search node. We present a learning algorithm for inferring the yield based on search experience. We apply our adaptive and non-adaptive multiple-goal search algorithms to the web crawling problem and show their efficiency.

Keywords: Resource-Bounded Reasoning, Speedup Learning

Secondary Keywords:

Online version:

Bibtex entry:

 @inproceedings{Davidov:2002:MGS,
  Author = {Dmitry Davidov and Shaul Markovitch},
  Title = {Multiple-goal Search Algorithms and their Application to Web Crawling},
  Year = {2002},
  Booktitle = {Proceedings of the Eighteenth National Conference on Artificial Intelligence},
  Pages = {713--718},
  Address = {Edmonton, Alberta, Canada},
  Url = {http://www.cs.technion.ac.il/~shaulm/papers/pdf/Davidov-Markovitch-aaai2002.pdf},
  Keywords = {Resource-Bounded Reasoning, Speedup Learning},
  Secondary-keywords = {Heuristic Search, Focused Crawling},
  Abstract = {
    The work described in this paper presents a new framework for
    heuristic search where the task is to collect as many goals as
    possible within the allocated resources. We show the inadequacy of
    traditional distance heuristics for this type of tasks and present
    alternative types of heuristics that are more appropriate for
    multiple-goal search. In particular we introduce the yield
    heuristic that estimates the cost and the benefit of exploring a
    subtree below a search node. We present a learning algorithm for
    inferring the yield based on search experience. We apply our
    adaptive and non-adaptive multiple-goal search algorithms to the
    web crawling problem and show their efficiency.
  }

  }