Abstract:
The peer-to-peer (P2P) computing paradigm is an intriguing alternative
to Google-style search engines for querying and ranking Web content. In
a network with many thousands or millions of peers the storage and
access load requirements per peer are much lighter than for a
centralized server farm; thus more powerful techniques from information
retrieval, statistical learning, computational linguistics, and
ontological reasoning can be employed on each peer's local search engine
for boosting the quality of search results. In addition, peers can
dynamically collaborate on advanced and particularly difficult queries.
Moreover, a peer-to-peer setting is ideally suited to capture local user
behavior, like query logs and click streams, and disseminate and
aggregate this information in the network, at the discretion of the
corresponding user, in order to incorporate richer cognitive models.
On the other hand, P2P Web search also poses major challenges, one of
them being the computation, dissemination, and efficient management of
statistical measures that are crucial for good search strategies and
ranking algorithms. Statistics (e.g., local and global document
frequencies, overlap among peers' contents, PageRank-style authority)
need to be acquired and maintained in a decentralized manner for
scalability, they need to be compact for efficient communication, and
they need to provide sufficiently accurate estimators of various
measures of interest. This talk will give an overview on our ongoing
research on P2P Web search, with emphasis on statistics-driven query
routing, decentralized PageRank computation, and exploitation of user
behavior. The developed methods have been implemented in the Minerva
prototype system, an experimental testbed for P2P research.