TAGGING

Next: ALIGNMENT Up: Papers Previous: MATH FOUNDATIONS

TAGGING

1.

DeRose
Grammatical Category Disambiguation by statistical optimization Computational Linguistics, 14, (1988).

2.

Church
Current Practice in Part of Speech Tagging and Suggestions for the Future.

3.

Kupiec
Robust part-of-speech tagging using hidden Markov model

HMM based approach. Treats word equivalence classes (determined by the possible tags), rather than individual words (however, the 100 most common words are treated as singleton classes). Discusses problematic words and constructs.

4.

(a): Cutting Kupiec Pedersen Sibun
A practical part-of-speech tagger Xerox-Parc
3rd Applied NLP, ACL, Trento Italy, 1992.
Uses a HMM and some practical considerations. Some theory on HMM, the practical discussion is of less importance.
(b): Doug Cutting and Jan Pedersen
The Xerox Part-of-speach tagger, version 1.0. April, 1993.

5.

Charniak Hendrickson Jacobson Perkowitz
Equations for Part-of-Speech Tagging

Develops equations for bigrams/trigrams. Discusses zero probability words and morphological variants.

6.

Kempe
Probabilistic tagging with feature structures

Morphologically rich languages may have features (gender, number etc.) attached to tags. Considers grouping and other methods to overcome sparse data.

7.

Kempe
A probabilistic tagger and an analysis of tagging errors
Yet another 96% accurate HMM-based tagger. Sources of errors are incomplete tag info, sparseness,...

8.

Schmidt
Probabilistic Part-of-Speech Tagging Using Decision Trees
Uses decision trees and the ID-3 alg to estimate probabilities. Performs better than trigrams on small training corpora, on large corpora achieves 96%.

9.

Brill
Automatically learns rules for tagging. Initially, every word is tagged with its most likely tag, then the tag is transformed using rules that depend on the assigned tags and on the words themselves.

(a): A Simple Rule-Based Part of Speech Tagger
(b): Some Advances in Transformation-Based Part of Speech Tagging
(c): Transformation-based error-driven learning and NLP: A case study in part-of-speech tagging.
Computational Linguistics 21, 543-565.

10.

Merialdo
Tagging Text with a Probabilistic Model

Compares ML and Viterbi tagging, and some variants.

11.

Matsukawa Miller Weischedel
Example Based Correction of Word Segmentation and Part of Speech Labellings

Addresses the problem of word segmentation in Japanese. Uses 3 passes: a morphological analyzer, a trained rule-based correction module, and a HMM tagger. Learned from a very limited sized training set. The role of the correction module is to reduce the size of the training set.

12.

Macklovitch
Where the Tagger Falters
Analyzes errors made by statistical taggers. Suggests a linguistically based FSA to detect such errors (the experiment discussed past vs past participle verbs). Comments on the success of this approach.
Nice paper. Linguistic background is needed.

13.

Shaked
How do we count? The Problem of Tagging Phrasal Verbs in PARTS

Discusses the deficiency of the trigram model for distinguishing between particles and prepositions.

Next: ALIGNMENT Up: Papers Previous: MATH FOUNDATIONS

Alon &
2002-04-11