Yoav Haimovitch, M.Sc. Thesis Seminar
Wednesday, 14.11.2012, 12:30
We describe a bootstrapping algorithm able to learn from partially labeled data.
We report the results of an empirical study for using this algorithm to improve performance of sentiment classification using up to 15 million unlabeled Amazon product reviews. Our experiments cover semi-supervised learning, domain adaptation and weakly supervised learning. In some cases our methods were able to reduce test error by more than half using such large amounts of data, and in all cases a significant improvement over the baseline was shown.
The extensive empirical study includes a comparison to T-SVM (showing our method to be superior), an examination of various parameters and settings, and experiments such as a large-scale, 1-to-many domain adaptation, showing the potential usefulness of the described method. We show that the algorithm, which is an extension of AROW (Adaptive Regularization of Weight Vectors) to the semi-supervised setting, retains the relative superior effectiveness of AROW when applied to sentiment classification.
In the weakly supervised setting, we show an extension of our method that allows it to begin with no labeled data, and using rules designed with prior knowledge, to automatically label an initial training set and thus use the bootstrapping method.
In addition, we discuss theoretical and practical scalability issues, and suggest potentially interesting directions for future research.