In this paper we propose a new hash function, which is called Tiger, as it is strong and fast: as fast as SHA-1 on 32-bit processors, and about three times faster on 64-bit (DEC Alpha) processors. It is also expected to be faster than SHA-1 on 16-bit processors, since SHA-1 is optimized for 32-bit machines, while our proposal is designed to work adequately on many word sizes.
Its main operation is table lookup into four S-boxes, each from eight bits to 64 bits. On 32-bit machines this can be implemented as a pair of table lookups, with the offset computation done only once. The other operations are 64-bit additions and subtractions, 64-bit multiplication by small constants (5, 7 and 9), 64-bit shifts and logical operations such as XOR and NOT. All these operations are at most twice as slow on 32-bit machines, with the exception of the shifts and the multiplications by small constants which are four or five times slower (Alpha processors have special instructions which multiply by constants of the form and ).
For drop-in compatibility, we adopt the outer structure of the MD4 family: the message is padded by a single `1' bit followed by a string of `0's and finally the message length as a 64-bit word. The result is divided into n 512-bit blocks.
The size of the hash value, and of the intermediate state, is three words, or 192 bits. This value was chosen for the following reasons:
Tiger with the full 192 bits of output in use may be called Tiger/192, and we recommend its use in all new applications. When replacing other functions in existing applications, we suggest two shorter variants:
We conjecture that all the three variants of Tiger are collision-free, in that collisions for Tiger/N cannot be found with substantially less effort than . We also believe that they are one-way and multiplication-free [And93].
The efficiency of this function is partially based on the potential parallelism in its design. In the MD and Snefru families, each operation depends directly on the result of the previous operation, and thus RISC processors cannot be used efficiently due to pipeline stalls. In each round of Tiger, the eight table lookup operations can be done in parallel, so compilers can make best use of pipelining. The design also allows efficient hardware implementation.
The memory size required by Tiger is only slightly more than the size of the four S boxes. If this can be accommodated within the cache of the processor, the computation runs about twice as fast (measured on DEC Alpha). The size of the four S boxes is Kbytes, which is about the size of the cache on most machines. If eight S boxes were used, 16 Kbytes would be required, which is twice as the size of the cache on Alpha.