Technical Report MSC-2019-11

Title: An Attention-RNN based approach for Named Entity Disambiguation with Noisy Texts
Authors: Yotam Eshel
Supervisors: Shaul Markovich
PDFCurrently accessibly only within the Technion network
Abstract: Named entity disambiguation (NED) is the task of linking mentions of entities in text to a knowledge base, such as Freebase or Wikipedia. Currently, research on the task of NED is driven by a number of standard datasets. These datasets are based on news and encyclopedic texts that are naturally coherent, well-structured and rich. However, texts in other scenarios such as web fragments, social media or search queries are shorter, less coherent and more challenging in general.

To address the task of NED for noisy text we design a novel neural model based on RNNs and attention. Our algorithm can utilize large amounts of training samples and learn to capture the limited and noisy local context surrounding entity-mentions in noisy text. We train our model with a novel method for sampling informative negative examples. In addition, we describe a new way of initializing word and entity embeddings that significantly improves performance.

To facilitate research on NED with noisy text, we present WikilinksNED: A large- scale NED dataset of text fragments from the web that is based on the Wikilinks dataset. Our dataset is orders of magnitude larger, significantly noisier and more challenging than existing news-based datasets.

We evaluate our model both on WikilinksNED and a smaller newswire dataset and find our model significantly outperforms existing state-of-the-art methods on WikilinksNED while achieving comparable performance on the smaller dataset.

CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (, rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the MSC technical reports of 2019
To the main CS technical reports page

Computer science department, Technion