Technical Report MSC-2012-27

TR#:MSC-2012-27
Class:MSC
Title: Coverage-Driven Refinement of Conceptual Representations
Authors: Haggai Toledano
Supervisors: Shaul Markovitch
PDFMSC-2012-27.pdf
Abstract: Many text processing tasks are based on estimating semantic relatedness between texts. For example, in information retrieval, the relevancy of documents can be determined based on the semantic distance from the query. Recently, many algorithms have been developed for evaluating semantic relatedness based on a conceptual representation of the input texts. The concept spaces for these algorithms are based, in most cases, on large repositories of knowledge, such as Wikipedia and WordNet. Through these repositories, such representations are able to use more natural concepts and semantic relations than previous statistical corpora analysis based methods. The large concept spaces often yield representations that consist of very large collections of concepts. In many cases this has a negative impact on the performance of the semantic tasks due to redundancy that gives a superficially large weight to less relevant concepts, thus hiding important semantic aspects of the texts. In this work we present a new algorithm that produces semantic interpretations of texts in the form of conceptual representations which are based on hierarchical concept spaces. The algorithm incrementally adds strongly-associated concepts to the representation, while using the hierarchical structure of the semantic database to maximize coverage. Inherent to this algorithm is the problem of finding an acceptable trade-off between concept coverage, enabling a more detailed semantic interpretation of the texts, and concept redundancy which degrades the performance of semantic tasks. We suggest a solution to this problem, that uses the hierarchical structure of the semantic database to compute a stopping condition to the algorithm. We test the new algorithm for text relatedness tasks and show its advantage over existing approaches.
CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-info.cgi/2012/MSC/MSC-2012-27), rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the MSC technical reports of 2012
To the main CS technical reports page

Computer science department, Technion
admin