Matan Hamilis, M.Sc. Thesis Seminar
Wednesday, 30.12.2015, 13:30
Additive Fast Fourier Transforms and the finite fields in which they work have a main role in many algebraic applications.
One of these applications is to implement a whole PCP system to obtain a succinct proof of computational integrity.
In this work we modifty a known algorithm to calculate additive FFTs over affine subspaces in GF(2^k); where previous works were applied only to linear subspaces. We present a parallel implementation of this algorithm for CPU and GPU architectures and discuss their performance.
The FFT algorithm relies on an implementation of finite field arithmetics. In this work we present an algorithm for the multiplication of elements represented in the standard basis using an irreducible polynomial with a special property that we call k-gapped. We also present an implementation of this multiplication in binary finite fields (2^k) in CPU and GPU and discuss its performance. The CPU implementation uses the PCLMULQDQ processor instruction to multiply its polynomials. The CPU implementation is 1.85 times faster than NTL's finite field multiplication. The GPU implementation gives a 14x higher throughput than our CPU implementation. We also present a method to exploit locality of memory access among threads of the same warp using the shuffle instruction instead of using shared memory. Applying this method on our finite field multiplication application use has given our implementation additional throughput.