Technical Report MSC-2011-17

Title: Supervised Learning of Semantic Relatedness
Authors: David Yanay
Supervisors: Ran El-Yaniv
Abstract: We propose and study a novel supervised approach to learning semantic relatedness from examples. Using an empirical risk minimization approach our algorithm computes a weighted measure of term co-occurrence with respect to a corpus of text documents, and utilizes the labeled examples to fit the model to the training sample. Our method is corpus independent and can essentially rely on any sufficiently large (unstructured) collection of coherent texts. We present the results of a range of experiments from large to small scale. Evaluation over the WordSim353 benchmark show significant improvements in correlation results over the state-of-the-art using either a reduced (older) version of Wikipedia or the books in the Project Gutenberg collection. These results indicate that the proposed method is effective and competitive with the state-of-the-art.
CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (, rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the MSC technical reports of 2011
To the main CS technical reports page

Computer science department, Technion