Grammatical Category Disambiguation by statistical optimization
Computational Linguistics, 14, (1988).
Current Practice in Part of Speech Tagging and Suggestions for the
Robust part-of-speech tagging using hidden Markov model
HMM based approach. Treats word equivalence classes (determined by
the possible tags), rather than individual words (however, the 100
most common words are treated as singleton classes).
Discusses problematic words and constructs.
- Cutting Kupiec Pedersen Sibun
A practical part-of-speech tagger
3rd Applied NLP, ACL, Trento Italy, 1992.
Uses a HMM and some practical considerations.
Some theory on HMM, the practical discussion is of less
- Doug Cutting and Jan Pedersen
The Xerox Part-of-speach tagger, version 1.0.
- Charniak Hendrickson Jacobson Perkowitz
Equations for Part-of-Speech Tagging
Develops equations for bigrams/trigrams.
Discusses zero probability words and morphological variants.
Probabilistic tagging with feature structures
Morphologically rich languages may have features (gender, number
etc.) attached to tags.
Considers grouping and other methods to overcome sparse data.
A probabilistic tagger and an analysis of tagging errors
Yet another 96% accurate HMM-based tagger.
Sources of errors are incomplete tag info, sparseness,...
Probabilistic Part-of-Speech Tagging Using Decision Trees
Uses decision trees and the ID-3 alg to estimate probabilities.
Performs better than trigrams on small training corpora,
on large corpora achieves 96%.
Automatically learns rules for tagging.
Initially, every word is tagged with its most likely tag,
then the tag is transformed using rules that depend on the
assigned tags and on the words themselves.
- A Simple Rule-Based Part of Speech Tagger
- Some Advances in Transformation-Based Part of Speech Tagging
- Transformation-based error-driven learning and NLP:
A case study in part-of-speech tagging.
Computational Linguistics 21, 543-565.
Tagging Text with a Probabilistic Model
Compares ML and Viterbi tagging, and some variants.
- Matsukawa Miller Weischedel
Example Based Correction of Word Segmentation and Part of Speech
Addresses the problem of word segmentation in Japanese.
Uses 3 passes: a morphological analyzer, a trained rule-based
correction module, and a HMM tagger. Learned from a very limited
sized training set. The role of the correction module is to
reduce the size of the training set.
Where the Tagger Falters
Analyzes errors made by statistical taggers.
Suggests a linguistically based FSA to detect such errors
(the experiment discussed past vs past participle verbs).
Comments on the success of this approach.
Nice paper. Linguistic background is needed.
How do we count? The Problem of Tagging Phrasal Verbs in PARTS
Discusses the deficiency of the trigram model for distinguishing
between particles and prepositions.