Omer Katz, Ph.D. Thesis Seminar
Thursday, 14.3.2019, 14:30
Today we are literally surrounded by software. Almost all products and services we use on a daily basis involve some piece of software. The abundance of software has made our lives easier. Unfortunately it also made us susceptible than ever to software vulnerabilities and exploitation.
A big part of the effort to secure software is often carried out by researchers external to the team that developed the software. A crucial tool in their tool bag for finding vulnerabilities and securing system is Reverse Engineering.
Given a program in stripped binary form (such as how software reaches most end users), reverse engineering is the process of understanding what that binary does and how it does it. Other than merely understanding a program, reverse engineering can be utilized for fixing bugs, extending programs and generally affecting some desired change to the program.
Traditionally, reverse engineering is known as a long and tedious process requiring highly trained specialists with years of experience and expertise.
In many cases, reverse engineers pose questions for which an automatic and perfectly accurate answer is simply not computationally feasible. In such cases, reverse engineers are forced to manually examine and analyze the program to determine the answer.
We found that in many circumstances, providing the reverse engineer with a statistical answer (i.e. that is highly likely but not guaranteed to be 100% accurate), can be extremely beneficial. Such answers help focus and guide the reverse engineer and thus increase effectiveness and utilization of the reverse engineering efforts.
In this talk we present techniques we developed for providing statistical answers to reverse engineering questions. This approach combines software analysis techniques with machine learning tools to create powerful solutions to otherwise unsolved problems.
Using several real world binaries, we demonstrate the benefits of our approach on problems such as detecting types of objects and reconstructing type hierarchies.
Ultimately, using insights and lessons learned from statistical reverse engineering, we take our research a step further and suggest a new statistical approach and a new point-of-view to the problem of Decompilation -- lifting binary code to human-readable equivalent high-level source-code that can be modified and compiled.