Liat Peterfreund (CS, Technion)
Wednesday, 9.5.2018, 12:30
A recent principled approach to information extraction from text views the query as an ordinary relational query, but not on ordinary relations; instead, the relations are tuples of intervals, or "spans," extracted from text documents. A prominent example is Fagin et al.'s "regular spanners," which are the closure of regular expressions with capture variables (used for tokenization, n-grams, segmentation, etc.) under the relational algebra.
In this talk I will present several research directions motivated by this framework. In the first, we study the complexity of evaluating regular spanners. In the second, we inspect the connection between the algebraic structure of the query and the complexity of its evaluation. In the third, we explore the impact of using recursive Datalog instead of relational algebra.