Time+Place: Sunday 24/06/2007 14:30 Room 337-8 Taub Bld.
Title: Peer-to-Peer Web Search with Minerva
Speaker: Gerhard Weikum http://www.mpi-inf.mpg.de/~weikum
Affiliation: Max-Planck Institute for Informatics
Host: Oded Shmueli

Abstract:

The peer-to-peer (P2P) computing paradigm is an intriguing alternative 
to Google-style search engines for querying and ranking Web content. In 
a network with many thousands or millions of peers the storage and 
access load requirements per peer are much lighter than for a 
centralized server farm; thus more powerful techniques from information 
retrieval, statistical learning, computational linguistics, and 
ontological reasoning can be employed on each peer's local search engine 
for boosting the quality of search results. In addition, peers can 
dynamically collaborate on advanced and particularly difficult queries. 
Moreover, a peer-to-peer setting is ideally suited to capture local user 
behavior, like query logs and click streams, and disseminate and 
aggregate this information in the network, at the discretion of the 
corresponding user, in order to incorporate richer cognitive models.

On the other hand, P2P Web search also poses major challenges, one of 
them being the computation, dissemination, and efficient management of 
statistical measures that are crucial for good search strategies and 
ranking algorithms. Statistics (e.g., local and global document 
frequencies, overlap among peers' contents, PageRank-style authority) 
need to be acquired and maintained in a decentralized manner for 
scalability, they need to be compact for efficient communication, and 
they need to provide sufficiently accurate estimators of various 
measures of interest. This talk will give an overview on our ongoing 
research on P2P Web search, with emphasis on statistics-driven query 
routing, decentralized PageRank computation, and exploitation of user 
behavior. The developed methods have been implemented in the Minerva 
prototype system, an experimental testbed for P2P research.