Ariel Raviv, M.Sc. Thesis Seminar
Wednesday, 14.12.2011, 12:30
The task of automatically determining the correct sense of a polysemous word has remained a challenge to this day. It is crucial in many Natural Language Processing (NLP) applications such as speech recognition, information retrieval, machine translation and computational advertising. In our research, we introduce Concept-Based Disambiguation (CBD), a novel framework that utilizes recent semantic analysis techniques to represent both the context of the word and its senses in a high-dimensional space of natural concepts. The concepts are retrieved from a vast encyclopedic resource, thus enriching the disambiguation process with large amount of domain-specific knowledge. In such concept-based spaces, more comprehensive measures can be applied in order to pick the right sense. Additionally, we introduce a novel representation scheme, denoted Anchored Representation, that builds a more specific representation for a text that is associated with an anchoring word. We evaluated our framework using two recent text representation schemes, Explicit Semantic Analysis (ESA) and Compact Hierarchical Explicit Semantic Analysis (CHESA) and their two anchored counterparts, and showed that the anchored representation is more suitable to the task of Word sense Disambiguation (WSD). Finally we evaluate our system in coarse-grained settings and show that it outperforms state-of-the-art learning-based methods that exploit large annotated corpora and that it is comparable to recent state-of-the-art unsupervised methods that make use of vast knowledge bases.