|Title:||Concept-Based Approach to Word-Sense Disambiguation
|Abstract:||The task of automatically determining the correct sense of a polysemous word has remained a challenge to this day. It is crucial in many natural language processing (NLP) applications such as speech recognition, information retrieval, machine translation and computational advertising. In our research, we introduce Concept-Based Disambiguation (CBD), a novel framework that utilizes recent semantic analysis techniques to represent both the context of the word and its senses in a high-dimensional space of natural concepts. The concepts are retrieved from a vast encyclopedic resource, thus enriching the disambiguation process with large amounts of domain-specific knowledge. In such concept-based spaces, more comprehensive measures can be applied in order to pick the right sense. Additionally, we introduce a novel representation scheme, denoted anchored representation, that builds a more specific text representation associated with an anchoring word. We evaluated our framework using two recent text representation schemes, Explicit Semantic Analysis (ESA) and Compact Hierarchical Explicit Semantic Analysis (CHESA) and their two anchored counterparts, and showed that the anchored representation is more suitable to the task of word sense disambiguation (WSD). Finally, we show that our system is superior to state-of-the-art methods when evaluated on domain-specific corpora, and is competitive to recent methods when evaluated on a general corpus.|
|Copyright||The above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information|
Remark: Any link to this technical report should be to this page (http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-info.cgi/2012/MSC/MSC-2012-07), rather than to the URL of the PDF files directly. The latter URLs may change without notice.
To the list of the MSC technical reports of 2012
To the main CS technical reports page
Computer science department, Technion