Skip to content (access key 's')
Logo of Technion
Logo of CS Department
Logo of CS4People

The Taub Faculty of Computer Science Events and Talks

Structural Language Models for Code
event speaker icon
Shaked Brody (Ph.D. Thesis Seminar)
event date icon
Thursday, 16.05.2024, 14:00
event location icon
Zoom Lecture:92691399579
event speaker icon
Advisor: Prof. E. Yahav
In the past few years, software has been at the heart of many applications ranging from appliances to virtual services. Helping software developers to write better code is a crucial task. In parallel, recent developments in machine learning and deep learning in particular have shown great promise in many fields, and code-related tasks in particular. The main challenge is how to represent code in a way that can be used by deep learning models effectively. While code can be treated as a sequence of tokens, it also can be represented using its underlying Abstract Syntax Tree (AST) that contains rich structural information. In this thesis, we investigate the use of the structural nature of code for code-related tasks.

We start by introducing the edit completion task, a new task that requires predicting the next edit operation in a code snippet, given a sequence of contextual edits. This task may be helpful for software developers, as a substantial part of the time spent on writing code is dedicated to editing existing code. We investigate different approaches for this task and show that harnessing the structural nature of code can be beneficial for this task. Then, we generalize different structural approaches for addressing code-related tasks and show that the combined approach can be beneficial for different tasks such as edit completion and code summarization.

The use of Graph Neural Networks (GNNs) is common for structural code representation. We reveal that the commonly used Graph Attention Network (GAT) model has a limitation in capturing complex relationships in graphs due to its attention mechanism. We provide an analysis of the expressive power of GAT and introduce a simple fix to it, GATv2, that has proven to be more powerful. Our experiments show that GATv2 gains better performance in different tasks in is more robust to noisy graphs.

The Transformer architecture is widely used in the natural language processing (NLP) field as a model for sequence processing. However, this architecture can be considered as a special case of GNNs. Layer Normalization is a key component in the Transformer architecture. In the last part of this thesis, we investigate the expressivity role of Layer Normalization in the Transformers' attention. We provide a novel geometric interpretation of Layer Normalization and show its importance to the attention mechanism that follows it in the Transformer architecture.

Our work provides both practical and theoretical contributions to the field of using deep neural models on code.