Technical Report PHD-2007-11

Title: An architecture and programming model for Extremely Fine Grain Parallelization
Authors: Alex Gontmakher
Supervisors: Assaf Schuster
Abstract: This work addresses one of the more important problems in today's processor architecture: that of the amount of accessible code parallelism. Parallelism, which exists at different granularity levels, is limited by various factors. At the finest granularity, the instruction-level parallelism (ILP) depends on the processor’s ability to find future independent instructions, and is constrained by instruction dependencies and implementation complexity. At a higher granularity, the thread-level parallelism (TLP), specified by dividing the computation into explicit subtasks, is limited by the overhead of thread manipulation.

We present Inthreads, an architecture that aims at filling the gap between instruction-level and thread-level parallelism. Inthreads provides an extremely lightweight threading mechanism that allows using TLP methods at granularity comparable to that of ILP. To reduce the overhead, the architecture introduces a shared-register threading mechanism, which can be implemented with minimal changes to the processor. The most important feature of Inthreads is the programming model that is based on static assignment of tasks to threads and assigns responsibility for preserving correctness of concurrent execution to the software. We show that the programming model benefits all the components of the system, including the programmer, the compiler and the processor implementation. For the programmer, Inthreads presents a simple and straightforward memory consistency model. For the compiler, the model allows automatic detection of the shared variables and analysis of interactions between the threads, which results in efficient and automatic code optimization. Finally, the programming model eliminates complex interactions between speculative instructions of different threads and thus simplifies the microarchitecture.

The simple implementation allows Inthreads to be used for energy efficient computing in addition to its natural application as a mechanism for sequential code acceleration. To this end, we switch the processor from out-of-order to inorder pipeline and use Inthreads as an alternative source of parallelism. With this technique, we achieve considerable energy saving but retain about the same performance as that of out-of-order.


CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (, rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the PHD technical reports of 2007
To the main CS technical reports page

Computer science department, Technion