Abstract:
Adequate representation of natural language semantics requires
access to vast amounts of common sense and domain-specific world
knowledge. In this talk I will present a novel method,
called Explicit Semantic Analysis (ESA), for fine-grained semantic
interpretation of unrestricted natural language texts. Our method
represents meaning in a high-dimensional space of concepts derived
from Wikipedia, or other large-scale human-built repositories.
We show an automatic way for building and using such semantics.
We evaluate the effectiveness of our method on text analysis tasks
such as text categorization, semantic relatedness, and information
retrieval.
I will then present our recent development of CHESA - an hierarchical
compact representation that utilizes Wikipedia category hierarchy
to represent the semantics of texts as directed acyclic graphs.
This is a joint work with Ofer Egozi, Evgeniy Gabrilovich and Sonya
Liberman.