Ido Dagan, Shaul Marcus and Shaul Markovitch. Contextual Word Similarity and Estimation from Sparse Data. Computer Speech and Language, 9:123-152 1995.
In recent years there is much interest in word cooccurrence relations, such as n-grams, verb-object combinations, or cooccurrence within a limited context. This paper discusses how to estimate the probability of cooccurrences that do not occur in the training data. We present a method which makes local analogies between each specific unobserved cooccurrence and other cooccurrences which contain similar words, as determined by an appropriate word similarity metric. Our evaluation suggests that this method performs better than exisitng smoothing methods, and may provide an alternative to class based models.
@article{Dagan:1995:CWS,
Author = {Ido Dagan and Shaul Marcus and Shaul Markovitch},
Title = {Contextual Word Similarity and Estimation from Sparse Data},
Year = {1995},
Journal = {Computer Speech and Language},
Volume = {9},
Pages = {123--152},
Url = {http://www.cs.technion.ac.il/~shaulm/papers/pdf/Dagan-Marcus-Markovitch-speech1995.pdf},
Keywords = {Information Retrieval, Semantic Relatedness},
Secondary-keywords = {Word Cooccurrence, Word Similarity},
Abstract = {
In recent years there is much interest in word cooccurrence
relations, such as n-grams, verb-object combinations, or
cooccurrence within a limited context. This paper discusses how to
estimate the probability of cooccurrences that do not occur in the
training data. We present a method which makes local analogies
between each specific unobserved cooccurrence and other
cooccurrences which contain similar words, as determined by an
appropriate word similarity metric. Our evaluation suggests that
this method performs better than exisitng smoothing methods, and
may provide an alternative to class based models.
}
}