Abstract:
In the past decade, Web search engines have evolved from a first generation
based on classic IR algorithms scaled to Web size and thus supporting only
informational queries, to a second generation supporting navigational
queries using Web specific information (primarily link analysis), to a third
generation enabling transactional and other "semantic" queries based on a
variety of technologies and external information sources aimed to directly
satisfy the unexpressed "user intent". At the same time, the Web is still
expanding, the number and cultural diversity of Web users is still growing,
and the average Web query is still infamously just 2.4 words long, thus
inferring this user intent is as challenging as ever.
"Rare" queries, which in aggregate represent a significant portion of the
query volume, are often completely incomprehensible to an outside
observer; nevertheless, the goal of many of these queries becomes quite
clear once we study their results. Starting from this simple observation,
in a series of papers [SIGIR2007, SIGIR2008, CIKM2008] we have developed a
robust methodology of "query understanding" based on viewing each search
result of a query as an independent source of information about its intent.
Applications include query classification, improved search advertising,
query substitution for optimizing relevance and revenue in ad search, and
cross-lingual taxonomy re-use for query classification.