Time+Place: Monday 09/01/2012 14:30 Room 337-8 Taub Bld.
Title: Efficient and Exact Inter-Sentence Decoding for Natural Language Processing
Speaker: Roi Reichart NOTE UNUSUAL DAY http://people.csail.mit.edu/roiri/)
Affiliation: CS and AI Lab - M I T
Host: Johann Makowsky

Abstract:


A fundamental task in Natural Language Processing (NLP) is learning the
syntax of human languages from text.
The task is defined both in the sentence level ("syntactic parsing") where a
syntactic tree describing the head-argument structure is to be created,
and in the word level ("part-of-speech tagging") where every word is
assigned a syntactic category such as noun, verb, adjective etc.
This syntactic analysis is an important building block in NLP applications
such as machine translation and information extraction.

While supervised learning algorithms perform very well on these tasks when
large collections of manually annotated text (corpora) exist,
creating manually annotated corpora is costly and error prone due to the
complex nature of annotation. Since most languages and text genres do not
have large syntactically annotated corpora, developing algorithms that learn
syntax with little human supervision is of crucial importance.

The work I will describe is focused on learning better parsing and tagging
models from limited amounts of manually annotated training data.
Our key observation is that existing models for these tasks are defined at
the sentence level, keeping inference tractable at the cost of discarding
inter-sentence information. 

In this work we use Markov random fields to augment sentence-level models
for parsing and part-of-speech tagging with inter-sentence constraints. 
To handle the resulting inference problem, we present a dual decomposition 
algorithm for efficient, exact decoding of such global objectives. We apply 
our model to the lightly supervised setting and show significant improvements 
to strong sentence-level models across six languages.

Our technique is general and can be applied to other structured prediction
problems in natural language processing and in other fields, to enable inference 
over large collections of data.

Joint work with Alexander Rush, Amir Globerson and Michael Collins.


Short bio:

Roi Reichart is a post-doctoral associate at the Computer Science and
Artificial Intelligence laboratory in the Massachusetts Institute of
Technology (MIT). 
He is a member of the natural language processing group of Professor Regina
Barzilay. Before that he completed his PhD (June 2010)  in the Hebrew
University under the supervision of Prof. Ari Rappoport.

His main research interests are unsupervised and semi-supervised learning
in NLP, especially for syntactic acquisition tasks. His paper on active
learning for syntactic parsing  (together with Ari Rappoport) has won the
best paper award in CoNLL 2009. He is a recipient of the ISF bikura
fellowship for outstanding Israeli post-docs.