Alon Mishne, M.Sc. Thesis Seminar
Wednesday, 22.6.2011, 15:30
We present the PRIME tool which utilizes static specification mining techniques to extract useful specifications of library APIs from a large number of code fragments that use it, and then uses data mining techniques to aggregate the samples into use-cases and sort them according to popularity and complexity.
Programming is becoming more and more about using frameworks and libraries, with most of them designed to support a wide range of usage scenarios. Typically, a programmer only needs partial functionality from a library, and is required to navigate the extensive library interface (API) to find how to implement the desired functionality.
Instead of navigating the complicated library code and documentation, programmers often rely on code examples of client programs that use the library. Such code examples can be obtained from library documentation, from other programmers, or via a myriad of search engines and other online tools.
The availability of services such as Google Code Search exposes the programmer to a vast number of code examples. Making sense of these examples, however, can be an extremely challenging task. Code fragments using the API of interest may appear in slightly different contexts and are often interleaved with irrelevant code, making it hard for a programmer to tease out the relevant details. Furthermore, for a given code sample there is always the possibility its use of the API is erroneous or sub-optimal. These factors make it hard for a human to benefit from this vast amount of available information.
Using a combination of program analysis and machine learning techniques, PRIME mines library specifications from a large collection of client code using it, allowing programmers to write new code using the library, even when they are not familiar with it.