Uri Verner (CS, Technion)
Wednesday, 21.1.2015, 11:30
Two problems with parallel summation of floating-point numbers on GPUs are loss of precision and non-reproducible results. The precision loss is due to round-off error propagation, and the lack of bit-accurate consistency across platforms and setups can be attributed to the non-associative nature of floating-point addition.
To address these problems, we implemented a new method for efficient bit-accurate parallel summation of double-precision numbers on GPUs. This method provides the summation result with full precision, i.e., without round-off error. Thus, it provides the same result on all architectures and execution setups. We see two main uses for this method: (1) algorithms that benefit from extended precision, such as iterative linear solvers (QUDA, AMGX), and (2) where reproducible results are required, such as in cross-platform libraries, for tuning of execution parameters with result validation, etc.
Uri Verner is a PhD student at the Department of Computer Science in the Technion, where he is advised by Professors Assaf Schuster and Avi Mendelson. His research interests include theoretical and practical aspects of real-time data-stream processing in GPU-based systems, and his publications on the subject address computation and communication scheduling. After completing his BSc studies, Uri continued towards a PhD in the direct track in the same institution. Uri interned at NVIDIA during Summer 2014, where he initiated the work presented in this talk.