Technical Report MSC-2017-28

Title: Modular Demand-Driven Analysis of Semantic Difference for Program Versions
Authors: Anna Trostanetski
Supervisors: Prof. Orna Grumberg
PDFCurrently accessibly only within the Technion network
Abstract: Programs are often built in stages, a new (patched)program version is built on top of an old one. If we could understand the semantic difference between two consecutive program versions, it would be very beneficial for the fast development process of correct programs. We can use the correctness of the old (and checked) version to infer the correctness of the new version. Code reviews, security vulnerability checks, and new feature verification would become easier if the reviewer were to understand the semantic differences between both versions. In general this problem is undecidable, yet we devise an algorithm for computing over- and under-approximations of the semantic (input--output) differences between program versions. We aim at providing precise enough abstractions for real code, and allowing guidance by the user to reach good results that match their needs. Since this information is used during the development process, it may be sufficient (and possibly preferable) to give results for intermediate procedures, instead of the entire program. We provide a mechanism for guiding the analysis towards interesting procedures, and the precision of the approximation is constantly improved by our anytime algorithm.

While our algorithm can work for very different versions of code, it will work better on syntactically similar versions. Syntactic changes in program versions are often small and local, and may apply to procedures that are deep in the call graph. Our approach analyses only those parts of the programs that are affected by the changes. Moreover, the analysis is modular, processing a single pair of procedures at a time. Called procedures are not inlined. Rather, their previously computed summaries and difference summaries are used.

For efficiency, procedure summaries and difference summaries are abstracted using uninterpreted functions, and may be refined on demand. We show how we can use common uninterpreted functions to use our knowledge of equivalence when no precise summery is available. Our algorithm works bottom-up from the locations of the syntactic changes, towards the main procedure. When the precision of the abstractions used is not sufficient, we run (top-down) refinement to create new summaries that are sufficiently precise. The refinement is guided by the context of the call we analyse.

We define modular symbolic execution and prove its connection to standard symbolic execution. We use modular symbolic execution to analyse each path in each procedure at most once, without re-analysing paths in called procedures.

We have compared our method to well established tools and observed speedups of at least one order of magnitude. Furthermore, in many cases our tool proves equivalence or finds differences while others fail to do so.

CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (, rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the MSC technical reports of 2017
To the main CS technical reports page

Computer science department, Technion