Technical Report MSC-2019-10

Title: Latent Entities Extraction: How to Extract Entities that Do Not Appear in the Text?
Authors: Eylon Shoshan
Supervisors: Kira Radinsky
PDFCurrently accessibly only within the Technion network
Abstract: Named-entity Recognition (NER) is an important task in the NLP field, and is widely used to solve many challenges. However, in many scenarios, not all of the entities are explicitly mentioned in the text. Sometimes they could be inferred from the context or from other indicative words. Consider the following sentence: ”CMA can easily hydrolyze into free acetic acid”. Although water is not mentioned explicitly, one can infer that H2O is an entity involved in the process.

In this work, we present the problem of Latent Entities Extraction (LEE). We present several methods for determining whether entities are discussed in a text, even though, potentially, they are not explicitly written. Specifically, we design a neural model that handles extraction of multiple entities jointly. We show that our model, along with multi-task learning approach and a novel task grouping algorithm, reaches high performance in identifying latent entities.

Moreover, we propose an additional novel neural architecture for LEE which lever- ages a context conditioned autoencoder for classification. Once the model is trained, we utilize the benefit of it as a generative model to produce the classification with a multiple sampling technique. We show that our model scales well as the number of entities grows.

Our experiments are conducted on two datasets: (1) A large biological dataset from the biochemical field. The dataset contains text descriptions of biological processes, and for each process, all of the involved entities in the process are labeled, including implicitly mentioned ones. (2) A new dataset that we construct on top of twitter data, which is designed to conform with the settings of the latent entities extraction task. We believe LEE is a task that will significantly improve many NER and subsequent applications and improve text understanding and inference.

CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (, rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the MSC technical reports of 2019
To the main CS technical reports page

Computer science department, Technion