Sequence Prediction

In the paper "On Prediction Using Variable Order Markov Models" (    ) we studied and compared the performance of various prediction algorithms. Among them are Context Tree Weighting (CTW), Prediction by Partial Match (PPM), Probabilistic Suffix Trees (PST) and Lempel-Ziv (LZ78). Ron Begleiter coded all these algorithms in Java and the code can be downloaded here.

 Active Learning

In the paper "Online Choice of Active Learning Algorithms" (    ) we propose a new meta-algorithm for active learning: operate a small ensemble of active learners and switch between them online. Kobi Luz coded our algorithm as well as other SVM-based active learners including algorithm "Simple" of Tong and Koller and an algorithm by Roy and McCallum. The Java code can be downloaded here.  

An improved version of this Java code, as well as a Matlab wrapper were coded by Ron Begleiter. This recent implementation has two main components: Experimenter and Learner. The Experimenter outputs a learning curve graph (for the given algorithm) based on k-fold cross validation. The learner implements a standard active learner interface ("learn", "query" and "classify"). The base code is a Java 1.4.* code. We also provide a Matlab code (wrapper) for the learner component. All relevant parameters are fully configurable via a textual configuration file. Press here to get this code as well as documentation.

 Multi-Way Distributional Clustering

In the paper "Multi-Way Distributional Clustering via Pairwise Interactions" (    ) we propose a new clustering algorithm utilizing multiple feature dimensions or modalities at once. This idea is implemented and made efficient using a factored representation as used in graphical models and by applying both top-down and bottom-up clustering. We report results on email clustering, and new best clustering results on 20 Newsgroups. Ron Bekkerman's C++ implementation of the algorithm can be accessed from here.   

 Localized Boosting and Other Classifiers (2D demo)

In the paper "Localized Boosting" (  ) we propose a new type of classifier boosting strategy where each weak learner (or "expert") is explicitly restricted to be a specialist only in a certain vicinity of the data space. Gilad Mishne programmed a nice applet demonstrating this algorithm as well as many other classifiers. The applet is based on the Weka Data Mining Package. The code of our applet can be downloaded here.

 

Note: The code provided in this page is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This code is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License (GPL) for more details.