Irena Grabovitch-Zuyev, M.Sc. Thesis Seminar
Wednesday, 14.10.2015, 16:00
Facebook is the most popular online social network, with more than a billion active users. As such, Facebook has become a significant source of information and searching within its mass content has become a necessity. Nevertheless, very little research has been done on Facebook Search, due to the difficulty of collecting Facebook data by researchers who are not Facebook employees.
In this research we study entity search in Facebook. Queries are public entities, like celebrities, companies, places, organizations, movies, and books. The entities are represented as Facebook pages. Search results (documents) are content items (posts, checkins, status updates, shares, photos, groups) that are accessible to a particular Facebook user and are relevant to her query. Beyond directly addressing the user’s need to search her content in Facebook, entity search can be useful for recommending interesting content to the user and generating a user model that can lead to better personalized services and ad targeting.
Searching within Facebook content is difficult because documents are short and rife with slang and other social network jargon. Our search algorithm tackles this challenge by using a rich representation of each query entity, including aliases in various languages and related entities and terms. This, together with aggressive stemming, allows our algorithm to retrieve even short and informal documents that refer to the entity by various nicknames in different languages. The rich entity representation is also used to score documents based on their similarity to the entity’s related terms and discard ones that either refer to other entities with the same name or that only marginally refer to the entity.
In order to generate a rich entity representation, we reconcile Facebook pages with Freebase entities and use the content of both the Facebook pages and the corresponding Freebase entries to come up with aliases and related terms and entities. This reconciliation module could be of independent interest.
We evaluated our search algorithm on content collected from 6 Facebook accounts, covering items posted by 1,000 Facebook users. For almost all categories of entities, our algorithms achieves 88% precision and more than 70% recall.
*The presentation will be given in Hebrew.