Empirical Study of Object-Layout Strategies and Optimization
Table of contents:
One of the most exciting challenges in the
implementation of object oriented programming languages is to efficiently
realize together multiple inheritance and dynamic dispatch. Indeed, it
was believed that it was impossible to efficiently introduce multiple inheritance
into C++, until proved otherwise by Stroustrup. Although there is a large
body of research on the time overhead of object oriented programs, there
is little work on memory overhead. To support efficient implementation
of dynamic dispatch and multiple inheritance, there are compiler-generated
fields, stored in object's memory. This work takes an empirical approach
to the study of the memory overhead due to compiler generated fields, which
turns out to be significant in the presence of multiple inheritance.
We study the performance, in terms of overhead to
object size of three compilation strategies: separate compilation, whole
program analysis, and user annotations as done in C++. A variant to each
such strategy is the inclusion of pointers to indirect virtual bases in
objects. Using a database of several large multiple inheritance hierarchies,
spanning 7000 classes, several application domains and different programming
languages we find that in all strategies there are certain classes which
give rise a large number of compiler generated fields in their object layout.
Several recently introduced optimization techniques
are studied in this work. Our study shows that the existence of virtual
inheritance greatly increases the number of compiler generated fields stored
in object memory. Some of the techniques, such as devirtualization and
inlining, strive to reduce the number of virtual inheritance in class hierarchies.
Others, such as bidirectional layout and packing, minimize the number of
compiler generated fields by suggesting major architecture changes. The
optimization techniques also broken down by their applicability in different
compilation strategies: some require whole program information while others
don't. We study the efficacy of the optimization techniques and show that
an average saving of close to 50% in object memory overhead can be achieved.
Seminar lecture slides
For online presentation (html) click here
The slides can also be dowloaded
Final thesis paper
This work was presented at the 14th European Conference on Object-Oriented
Programming (ECOOP) in June 2000.
The final paper, submitted to the Senate of the Technion, is availabled
The paper used a great amount of benchmarking data, compiled in the form
of tables and figures. Those figures were used in the Encapsulated Postscript
(eps) format. Click here to dowload the figures
used in this paper (zipped eps).
Software and resources used and developed in this
Database of class hierarchies
In this research a large database of classs hierarchies was used for the
purposes of benchmarking of object-layout optimization algorithms. Our
database selection started from the collection of large hierarchies which
served Krall, Vitek and Horspool in becnhamrking their algorithms:
We enriched this collection with the class hierarchies of JDK1.1 and ISE
version 4 distribution of Eiffel.
A. Krall, J. Vitek, and R. N. Horspool. Efficient type inclusion tests.
In Proceedings of the 12th Annual Conference on Object-Oriented Programming
Systems, Languages, and Applications, pages 142–157, Atlanta, Georgia,
Oct. 5-9 1997. OOPSLA’97, Acm SIGPLAN Notices 32(10) Oct. 1997.
A. Krall, J. Vitek, and R. N. Horspool. Near optimal hierarchical encoding
of types. In Aksit and Matsuoka , pages 128–145.
The database contains 25 text files with extension "INH". Each file describes
a specific class hierarchy. A hierarchy file contain a line
Click here to dowload the final version of our
database (zip) - 121 kb.
super <base class> name <derived class>
for each inheritance link in the hierarchy. There is no information
about the nature of inheritance (virtual or non-virtual) available.
A program was written in order to analyze the structure of class hierarchies
and produce the benchmmarking results for our optimization algorithms.
Click here to dowload the source code of PROCESS
project (zip) - 35 kb.
Click here to dowload only the executable, built
in release mode (exe) - 73 kb.
How to build and run the project:
Platform: Microsoft Windows NT version 4.0 (service pack 3), or Microsoft
Environment: Microsoft Developer Studio 97
Language: Visual C++ 5.0 with MFC support
Dowload the sources, unpack zip file into a separate directory.
Open PROCESS.DSW file in the Developer Studio (by double clicking this
file in the Explorer or by selecting "Open Workspace" in the "File" menu
of the Developer Studio).
Select a mode of compilation (Release or Debug) and build the project.
Developer Studio normally creates all intermediate files, like .obj, and
also the final executable called PROCESS.EXE in the sub-directory "Release"
or "Debug" according to the compilation mode.
Run the executable from the command line with arguments - the list of execution
options will be printed.
Choose the desired option and the run the executable with this option and
redirect the standard input to one of the database INH files. For example:
> process -s < eiffel4.inh
The simple inlining algorithm will be executed on the Eiffel4 hierarchy
and the benchmarking results will be printed to the standard output.
The statistics generated by the program were processed using Microsoft
(R) Excel 97 Hebrew Edition. The following zipped XLS files contains benchmarking
data and charts used in this work
It HIGHTLY RECOMMENDED to dowload and unpack all those files simultaneously,
since there are links between those files - the data is stored in one and
the charts are in the other.