Rivka Malca, M.Sc. Thesis Seminar
Thursday, 19.7.2018, 15:00
Text processing on the web is challenging due to the use of informal and ungrammatical language. Yet, this is a very prominent domain for Natural Language Processing, due to the growing volume of textual information that is communicated through Internet pages and social media platforms. In this thesis we describe a particular form of text processing on the web - syntactic parsing of web queries. We present two novel contributions: (1) We extend the transition system of a state-of-the-art transition-based parser so that it can account for the unique grammar of web queries; and (b) We present a novel deep architecture, based on BiLSTMs, that allows us to jointly train a parser with a Named Entity (NE) Tagger, which we believe should improve the parsing performance due to the frequent usage of NEs in web queries. In experiments, our joint model substantially outperforms previous work on the task. Particularly, we show that both components of our model provide a substantial contribution to its performance, and that these contributions are complementary.