Technical Report MSC-2013-03

Title: Large Scale Semi-Supervised Sentiment Analysis
Authors: Yoav Haimovitch
Supervisors: Koby Crammer, Shie Mannor
Abstract: We describe a bootstrapping algorithm able to learn from partially labeled data. We report the results of an empirical study for using this algorithm to improve performance of sentiment classification using up to 15 million unlabeled Amazon product reviews. Our experiments cover semi-supervised learning, domain adaptation and weakly supervised learning. In some cases our methods were able to reduce test error by more than half using such large amounts of data, and in all cases a significant improvement over the baseline was shown.

The extensive empirical study includes a comparison to T-SVM (showing our method to be superior), an examination of various parameters and settings, and experiments such as a large-scale, 1-to-many domain adaptation, showing the potential usefulness of the described method. We show that the algorithm, which is an extension of AROW (Adaptive Regularization of Weight Vectors) to the semi-supervised setting, retains the relative superior effectiveness of AROW when applied to sentiment classification. In the weakly supervised setting, we show an extension of our method that allows it to begin with no labeled data, and using rules designed with prior knowledge, to automatically label an initial training set and thus use the bootstrapping method.

In addition, we discuss theoretical and practical scalability issues, and suggest potentially interesting directions for future research.

CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (, rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the MSC technical reports of 2013
To the main CS technical reports page

Computer science department, Technion