Saher Esmeir and Shaul Markovitch. Anytime Learning of Decision Trees. Journal of Machine Learning Research, 8:891-933 2007.
The majority of the existing algorithms for learning decision trees are greedy: a tree is induced top-down, making locally optimal decisions at each node. In most cases, however, the constructed tree is not globally optimal. Furthermore, the greedy algorithms require a fixed amount of time and are not able to generate a better tree if additional time is available. We introduce a framework for anytime induction of decision trees that overcomes these problems by trading computation speed for better tree quality. Our proposed family of algorithms employs a novel strategy for evaluating candidate splits. A biased sampling of the space of consistent trees rooted at an attribute is used to estimate the size of the minimal tree under that attribute, and an attribute with the smallest expected tree is selected. We present two types of anytime induction algorithms: a contract algorithm that determines the sample size on the basis of a pre-given allocation of time, and an interruptible algorithm that starts with a greedy tree and continuously improves subtrees by additional sampling. Experimental results indicate that, for several hard concepts, our proposed approach exhibits good anytime behavior and yields significantly better decision trees when more time is available.
@article{Esmeir:2007:ALD, Author = {Saher Esmeir and Shaul Markovitch}, Title = {Anytime Learning of Decision Trees}, Year = {2007}, Journal = {Journal of Machine Learning Research}, Volume = {8}, Month = {May}, Pages = {891--933}, Url = {http://www.cs.technion.ac.il/~shaulm/papers/pdf/Esmeir-Markovitch-JMLR2007.pdf}, Keywords = {Anytime Algorithms, Anytime Learning, Decision Tree Induction}, Secondary-keywords = {Hard Concepts, Lookahead}, Abstract = { The majority of the existing algorithms for learning decision trees are greedy: a tree is induced top-down, making locally optimal decisions at each node. In most cases, however, the constructed tree is not globally optimal. Furthermore, the greedy algorithms require a fixed amount of time and are not able to generate a better tree if additional time is available. We introduce a framework for anytime induction of decision trees that overcomes these problems by trading computation speed for better tree quality. Our proposed family of algorithms employs a novel strategy for evaluating candidate splits. A biased sampling of the space of consistent trees rooted at an attribute is used to estimate the size of the minimal tree under that attribute, and an attribute with the smallest expected tree is selected. We present two types of anytime induction algorithms: a contract algorithm that determines the sample size on the basis of a pre-given allocation of time, and an interruptible algorithm that starts with a greedy tree and continuously improves subtrees by additional sampling. Experimental results indicate that, for several hard concepts, our proposed approach exhibits good anytime behavior and yields significantly better decision trees when more time is available. } }