Uri Alon, M.Sc. Thesis Seminar
Thursday, 17.8.2017, 10:00
We present a novel approach for automatic feature generation for predicting program properties.
Our approach automatically produces features that can capture long-distance syntactic relationships
between program elements. The features are purely syntactic, and the method is useful for any
Inspired by Parse Tree Paths in Natural Language Processing (NLP), we generate features that
capture relationships in an Abstract Syntax Tree (AST). We show that these features are general
(i) cover a number of different prediction tasks, (ii) drive two different learning algorithms
(for both generative and discriminative models), and (iii) work across different programming languages.
We evaluate our approach on the tasks of predicting variable names, method names, and types of
expressions. We use the generated features to drive both CRF-based and word2vec-based learning, for
generated features capture semantic similarities and produce better results than existing methods.
By representing program elements using path features, we believe that our approach can be used in a
variety of other machine learning tasks for programming languages, including different applications
and different learning models.