Home | Publications | CS Home

Feature Generation Using General Constructor Functions


Shaul Markovitch and Danny Rosenstein. Feature Generation Using General Constructor Functions. Machine Learning, 49:59-98 2002.


Abstract

Most classification algorithms receive as input a set of attributes of the classified objects. In many cases, however, the supplied set of attributes is not sufficient for creating an accurate, succinct and comprehensible representation of the target concept. To overcome this problem, researchers have proposed algorithms for automatic construction of features. The majority of these algorithms use a limited predefined set of operators for building new features. In this paper we propose a generalized and flexible framework that is capable of generating features from any given set of constructor functions. These can be domain-independent functions such as arithmetic and logic operators, or domain-dependent operators that rely on partial knowledge on the part of the user. The paper describes an algorithm which receives as input a set of classified objects, a set of attributes, and a specification for a set of constructor functions that contains their domains, ranges and properties. The algorithm produces as output a set of generated features that can be used by standard concept learners to create improved classifiers. The algorithm maintains a set of its best generated features and improves this set iteratively. During each iteration, the algorithm performs a beam search over its defined feature space and constructs new features by applying constructor functions to the members of its current feature set. The search is guided by general heuristic measures that are not confined to a specific feature representation. The algorithm was applied to a variety of classification problems and was able to generate features that were strongly related to the underlying target concepts. These features also significantly improved the accuracy achieved by standard concept learners, for a variety of classification problems.


Keywords: Feature Generation
Secondary Keywords:
Online version:
Bibtex entry:
 @article{Markovitch:2002:FGU,
  Author = {Shaul Markovitch and Danny Rosenstein},
  Title = {Feature Generation Using General Constructor Functions},
  Year = {2002},
  Journal = {Machine Learning},
  Volume = {49},
  Pages = {59--98},
  Url = {http://www.cs.technion.ac.il/~shaulm/papers/pdf/Markovitch-Rosenstein-mlj2002.pdf},
  Keywords = {Feature Generation},
  Secondary-keywords = {Constructive Induction, Feature Construction, Feature Selection, Decision Trees},
  Abstract = {
    Most classification algorithms receive as input a set of
    attributes of the classified objects. In many cases, however, the
    supplied set of attributes is not sufficient for creating an
    accurate, succinct and comprehensible representation of the target
    concept. To overcome this problem, researchers have proposed
    algorithms for automatic construction of features. The majority of
    these algorithms use a limited predefined set of operators for
    building new features. In this paper we propose a generalized and
    flexible framework that is capable of generating features from any
    given set of constructor functions. These can be domain-
    independent functions such as arithmetic and logic operators, or
    domain-dependent operators that rely on partial knowledge on the
    part of the user. The paper describes an algorithm which receives
    as input a set of classified objects, a set of attributes, and a
    specification for a set of constructor functions that contains
    their domains, ranges and properties. The algorithm produces as
    output a set of generated features that can be used by standard
    concept learners to create improved classifiers. The algorithm
    maintains a set of its best generated features and improves this
    set iteratively. During each iteration, the algorithm performs a
    beam search over its defined feature space and constructs new
    features by applying constructor functions to the members of its
    current feature set. The search is guided by general heuristic
    measures that are not confined to a specific feature
    representation. The algorithm was applied to a variety of
    classification problems and was able to generate features that
    were strongly related to the underlying target concepts. These
    features also significantly improved the accuracy achieved by
    standard concept learners, for a variety of classification
    problems.
  }

  }