Selective Sampling for Nearest Neighbor Classifiers

Michael Lindenbaum, Shaul Markovitch and Dmitry Rusakov. Selective Sampling for Nearest Neighbor Classifiers. Machine Learning, 54:125-152 2004.

Abstract

Most existing inductive learning algorithms work under the assumption that their training examples are already tagged. There are domains, however, where the tagging procedure requires significant computation resources or manual labor. In such cases, it may be beneficial for the learner to be active, intelligently selecting the examples for labeling with the goal of reducing the labeling cost. In this paper we present LSS - a lookahead algorithm for selective sampling of examples for nearest neighbor classifiers. The algorithm is looking for the example with the highest utility, taking its effect on the resulting classifier into account. Computing the expected utility of an example requires estimating the probability of its possible labels. We propose to use the random field model for this estimation. The LSS algorithm was evaluated empirically on seven real and artificial data sets, and its performance was compared to other selective sampling algorithms. The experiments show that the proposed algorithm outperforms other methods in terms of average error rate and stability.

Keywords: Active Learning

Secondary Keywords:

Online version:

Bibtex entry:

 @article{Lindenbaum:2004:SSN,
  Author = {Michael Lindenbaum and Shaul Markovitch and Dmitry Rusakov},
  Title = {Selective Sampling for Nearest Neighbor Classifiers},
  Year = {2004},
  Journal = {Machine Learning},
  Volume = {54},
  Number = {2},
  Pages = {125--152},
  Url = {http://www.cs.technion.ac.il/~shaulm/papers/pdf/Lindenbaum-Markovitch-Rusakov-mlj2004.pdf},
  Keywords = {Active Learning},
  Secondary-keywords = {Selective Sampling, Lookahead, Value of Information},
  Abstract = {
    Most existing inductive learning algorithms work under the
    assumption that their training examples are already tagged. There
    are domains, however, where the tagging procedure requires
    significant computation resources or manual labor. In such cases,
    it may be beneficial for the learner to be active, intelligently
    selecting the examples for labeling with the goal of reducing the
    labeling cost. In this paper we present LSS - a lookahead
    algorithm for selective sampling of examples for nearest neighbor
    classifiers. The algorithm is looking for the example with the
    highest utility, taking its effect on the resulting classifier
    into account. Computing the expected utility of an example
    requires estimating the probability of its possible labels. We
    propose to use the random field model for this estimation. The LSS
    algorithm was evaluated empirically on seven real and artificial
    data sets, and its performance was compared to other selective
    sampling algorithms. The experiments show that the proposed
    algorithm outperforms other methods in terms of average error rate
    and stability.
  }

  }