Anna Feldman (Ohio State University): Portable Language Technology: A resource-light approach to morphological tagging
(joint work with Jiri Hana and Chris Brew)

Abstract
Part-of-speech tagging is essential for many NLP tasks, and is needed both for resource-rich languages (such as English or Czech) and resource-poor languages (such as Russian).  Because of wide variation between languages and tagsets, it cannot be assumed that the same methodology for tagging will be appropriate in all cases (Elworthy 1995).  But linguists do have useful knowledge of the probable relationships between languages, so it is natural to wonder whether these relationships can be pressed into service for the rapid development of effective taggers.  In this talk, I will describe a resource-light system for the automatic morphological analysis and tagging of Russian. We eschew the use of extensive resources (particularly, large annotated corpora and lexicons), exploiting instead 1) pre-existing annotated corpora of Czech; 2) an unannotated corpus of Russian. We use a (resource-light) morphological analyzer (Hana 2004) and an automatically derived lexicon of Russian (Hana 2004), combine the results with the information derived from Czech and use the TnT tagger (Brants 2000) in a number of different ways, including modes where we use a committee-based approach. We show that our approach has benefits, and present what we believe to be one of the first full evaluations of a Russian tagger in the openly available literature.