Anytime Learning of Decision Trees

Saher Esmeir and Shaul Markovitch. Anytime Learning of Decision Trees. Journal of Machine Learning Research, 8:891-933 2007.

Abstract

The majority of the existing algorithms for learning decision trees are greedy: a tree is induced top-down, making locally optimal decisions at each node. In most cases, however, the constructed tree is not globally optimal. Furthermore, the greedy algorithms require a fixed amount of time and are not able to generate a better tree if additional time is available. We introduce a framework for anytime induction of decision trees that overcomes these problems by trading computation speed for better tree quality. Our proposed family of algorithms employs a novel strategy for evaluating candidate splits. A biased sampling of the space of consistent trees rooted at an attribute is used to estimate the size of the minimal tree under that attribute, and an attribute with the smallest expected tree is selected. We present two types of anytime induction algorithms: a contract algorithm that determines the sample size on the basis of a pre-given allocation of time, and an interruptible algorithm that starts with a greedy tree and continuously improves subtrees by additional sampling. Experimental results indicate that, for several hard concepts, our proposed approach exhibits good anytime behavior and yields significantly better decision trees when more time is available.

Keywords: Anytime Algorithms, Anytime Learning, Decision Tree Induction

Secondary Keywords:

Online version:

Bibtex entry:

 @article{Esmeir:2007:ALD,
  Author = {Saher Esmeir and Shaul Markovitch},
  Title = {Anytime Learning of Decision Trees},
  Year = {2007},
  Journal = {Journal of Machine Learning Research},
  Volume = {8},
  Month = {May},
  Pages = {891--933},
  Url = {http://www.cs.technion.ac.il/~shaulm/papers/pdf/Esmeir-Markovitch-JMLR2007.pdf},
  Keywords = {Anytime Algorithms, Anytime Learning, Decision Tree Induction},
  Secondary-keywords = {Hard Concepts, Lookahead},
  Abstract = {
    The majority of the existing algorithms for learning decision
    trees are greedy: a tree is induced top-down, making locally
    optimal decisions at each node. In most cases, however, the
    constructed tree is not globally optimal. Furthermore, the greedy
    algorithms require a fixed amount of time and are not able to
    generate a better tree if additional time is available. We
    introduce a framework for anytime induction of decision trees that
    overcomes these problems by trading computation speed for better
    tree quality. Our proposed family of algorithms employs a novel
    strategy for evaluating candidate splits. A biased sampling of the
    space of consistent trees rooted at an attribute is used to
    estimate the size of the minimal tree under that attribute, and an
    attribute with the smallest expected tree is selected. We present
    two types of anytime induction algorithms: a contract algorithm
    that determines the sample size on the basis of a pre-given
    allocation of time, and an interruptible algorithm that starts
    with a greedy tree and continuously improves subtrees by
    additional sampling. Experimental results indicate that, for
    several hard concepts, our proposed approach exhibits good anytime
    behavior and yields significantly better decision trees when more
    time is available.
  }

  }