Yoav Nahshon, M.Sc. Thesis Seminar
Wednesday, 7.2.2018, 13:30
Textual data written in some natural language carries concealed and valuable
information within. Information Extraction (IE) is the task of automatically
extracting this information in a structured representation. Standard relational
database systems, who are highly suitable for representing structured
information, are in fact incapable of performing deep text analysis, and
therefore out-of-database solutions are often applied. However, this approach
is prone to laborious development processes, complex and tangled programs, and
inefficient control flows. These deficiencies have given rise to declarative
solutions that automates significant parts of the manual work. Nevertheless,
such frameworks typically stitch together various programming components and
technologies, and may be lack of an all-binding theory.
In this work we present a novel framework that extends the relational model for
the case of text which uniformly represents the key players of a typical IE
task; these are the unstructured data (text), the structured data (extracted
information), and the functions that carry out the transformations from the
former to the latter. In addition, we report on initial results w.r.t.
expressive power, introduce an optimization technique that can be applied due
to the understanding of the data flow our formalism provides, and present an
implementation of our framework.