Self-consistent Batch-Classification

Shaul Markovitch and Oren Shnitzer. Self-consistent Batch-Classification. Technical report CIS-2005-04, Technion, 2005.

Abstract

Most existing learning algorithms generate classifiers that take as an input a single untagged instance and return its classification. When given a set of instances to classify, the classifier treats each member of the set independently. In this work we introduce a new setup we call \emph{batch classification}. In this setup the induced classifier receives the \emph{testing} instances as a set. Knowing the test set in advance theoretically allows the classifier to classify it more precisely. We study the batch classification framework and develop learning algorithms that take advantage of this setup. We present several KNN-based solutions \citep{FixHodges51, dudahart} that combine the nearest-neighbor rule with some additions that allow it to use the additional information about the test set. Extensive empirical evaluation shows that these algorithms indeed outperform traditional independent classifiers.

Keywords: Transduction, KNN, Semi-Supervised Learning

Secondary Keywords:

Online version:

Bibtex entry:

 @techreport{Markovitch:2005:SCB,
  Author = {Shaul Markovitch and Oren Shnitzer},
  Title = {Self-consistent Batch-Classification},
  Year = {2005},
  Number = {CIS-2005-04},
  Type = {Technical report},
  Institution = {Technion},
  Url = {http://www.cs.technion.ac.il/~shaulm/papers/pdf/Markovitch-CIS-2005-04.pdf},
  Keywords = {Transduction, KNN, Semi-Supervised Learning},
  Secondary-keywords = {Inductive Learning},
  Abstract = {
    Most existing learning algorithms generate classifiers that take
    as an input a single untagged instance and return its
    classification. When given a set of instances to classify, the
    classifier treats each member of the set independently. In this
    work we introduce a new setup we call \emph{batch classification}.
    In this setup the induced classifier receives the \emph{testing}
    instances as a set. Knowing the test set in advance theoretically
    allows the classifier to classify it more precisely. We study the
    batch classification framework and develop learning algorithms
    that take advantage of this setup. We present several KNN-based
    solutions \citep{FixHodges51, dudahart} that combine the nearest-
    neighbor rule with some additions that allow it to use the
    additional information about the test set. Extensive empirical
    evaluation shows that these algorithms indeed outperform
    traditional independent classifiers.
  }

  }