Technical Report MSC-2018-27

Title: Neural Transition Based Parsing of Web Queries: An Entity Based Approach
Authors: Rivka Malca
Supervisors: Roi Reichart
PDFCurrently accessibly only within the Technion network
Abstract: Web queries with question intent manifest a complex syntactic structure that consists of one or more independent sub-structures. The processing of this structure is important for their interpretation. [PRS16] has formalized the grammar of these queries and proposed semisupervised algorithms for the adaptation of parsers originally designed to parse according to the standard dependency grammar, so that they can account for the unique forest grammar of queries. However, their algorithm rely on resources typically not available outside of big web corporates.

We propose a new bidirectional LSTM (BiLSTM) query parser that: (1) Explicitly accounts for the unique grammar of web queries using a new transition system that consists of a new set of transitions and configurations; and (2) Utilizes named entity (NE) information from a BiLSTM NE tagger, that can be jointly trained with the parser. The proposed parser is a transition based parser with a dedicated transition system that contains a new transition denoted with PushToSeg. The new transition refers to the need to segment the query as part of the parsing process. In order to train our model we annotate the query treebank of [PRS16] with NEs. When trained on 2500 annotated queries our segmentation and NE aware parser achieves UAS of 83.5% and segmentation F1-score of 84.5, substantially outperforming existing state-of-the-art parsers.

CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (, rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the MSC technical reports of 2018
To the main CS technical reports page

Computer science department, Technion