Technical Report PHD-2015-16

Title: Processing Real-time Data Streams on GPU-based Systems
Authors: Uri Verner
Supervisors: Assaf Schuster and Avi Mendelson
Abstract: Processing massive streams of data is an important problem in modern computer systems, and in particular for applications that process big data. Many such applications, including data analytics, production inspection, and fraud detection, require that the response times be below given thresholds. Meeting these constraints makes the scheduling problem a very challenging task. Each data stream generates a sequence of data packets that need to be processed within a given time from its arrival. GPUs are very promising candidates for providing the required processing power. However, their use for processing real-time streams is complicated by various factors, such as obscure operation mechanisms, complex performance scaling, and contention for bandwidth on a shared interconnect.

The goal of this dissertation is to develop an approach to efficient processing of real-time data streams on heterogeneous computing platforms that consist of CPUs and GPUs. To achieve this goal, we develop and investigate a processing model, scheduler, and framework. We present several work distribution techniques that statically assign the streams to the CPU or GPU in a way that simultaneously satisfies their aggregate throughput requirements and the deadline constraint of each stream alone. These methods serve as the basis for four new methods for partitioning work in multi-GPU systems, each presenting different compromise between the complexity of the algorithm and achievable throughput. We design a generic real-time data stream processing framework that implements our methods, and use it to evaluate them with extensive empirical experiments using an AES-CBC encryption operator on thousands of streams. The experiments show that our scheduler yields up to 50% higher system throughput than alternative methods.

Another major challenge in using multiple GPUs for real-time processing is in scheduling recurrent real-time data transfers among CPUs and GPUs on a shared interconnect. We develop a new scheduler and execution engine for periodic real-time data transfers in a multi-GPU system. The scheduler is based on a new approach where the basic unit of work sent for execution is a batch of data-block transfer operations. In our experiments with two realistic applications, our execution method yielded up to 7.9x shorter execution times than alternative methods. The scheduler analyzes the data transfer requirements and produces a verifiable schedule that transfers the data in parallel, and achieves up to 74% higher system throughput than existing scheduling methods.

The CPU contributes to the processing capability of the system by providing easily managed and predictable compute power. However, a recent trend in CPU design is to include a frequency scaling mechanism such as Turbo Boost that changes the performance-scaling curve. We characterize the processing speeds with Turbo Boost, and present an offline task scheduler that minimizes the total execution time. We also extend Amdahl’s law for speedup and energy consumption to take into account the effects of Turbo Boost. Finally, we generalize the new resource model and define a new class of scheduling problems that enables more efficient use of parallel resources by accurately characterizing their performance, thereby laying the foundation for further research.

CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (, rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the PHD technical reports of 2015
To the main CS technical reports page

Computer science department, Technion