Haggai Toledano, M.Sc. Thesis Seminar
Wednesday, 30.5.2012, 13:30
Many text processing tasks are based on estimating semantic relatedness between texts. For example, in information retrieval, relevancy of documents can be determined based on the semantic distance from the query. Recently, many algorithms have been developed for evaluating semantic relatedness based on a conceptual representation of the input texts. The concept spaces for these algorithms are based, in most cases, on large repositories of knowledge, such as Wikipedia and Wordnet. These large concept spaces often yield representations that consist of very large collections of concepts, that in many cases have a negative impact on the performance of the semantic tasks due to redundancy that gives a superficially large weight to less relevant concepts, thus hiding important semantic aspects of the texts. In this work we present a new algorithm for conceptual representations that are based on hierarchical concept spaces. The algorithm incrementally adds strongly-associated concepts to the representation, while using the hierarchical structure of the semantic database to maximize coverage. We test the new algorithm for text relatedness tasks and show its advantage over existing approaches.