The gentle art of making ... programs: Bit twiddling

Recently I had an interesting problem at hand: we needed to port a kind of hashing (or scaling) algorithm from mainframe assembler to another platform which doesn't have bit shifting operations. I started my journey by verifying what the assembler routine actually does. Althouhg I've done some x86 assember in the past and even some assembler for z-machines, it turned out to be an interesting session. In the beginning I was totally lost with assembler source code for the following reasons:

Registers in mainframe have really nice names: they are just numbers from 0 to 15. Which operand represents value and which represents register?
Instructions in mainframe assembler do not use too many letters: N stands for AND, XR stands for XOR, ...
Some instructions are manipulating multiple registers at once. For example SRDL 2,16 shifts contents of register 2 to register 3 by 16 bits.

Soon I realized I have to consult my colleague to figure what is going here. During the half an hour session with him, I ported the logic to Java just to understand how the algorithm works. In the end the algorithm was verified to be quite simple: shifting base value to the left by 4 bits, reversing the bits and shifting the result again to the left by 3 bits. By shifting the bit sequence one position less to the left after reversing we can guarantee that hash value never gets negative (32th bit is always zero). The lowest 4 bits were used for certain purpose which not relevant in this story. The resulting C++ routing is quite short, but performance-wise far from optimal. The initial code looks like this:

int32_t hash( uint_t base,uint32_t* hash ) {
if( base > 0x07FFFFFF )
return -1;
*hash = reverseBits( base << 4 ) << 3;
return 0;
}

uint32_t reverseBits( uint32_t num ) {
uint32_t reverse_num = 0;
for ( int i = 0; i < 32; i++ ) {
if( ( num & ( 1 << i ) ) )
reverse_num |= 1 << ( ( 32 - 1 ) - i );
}
return reverse_num;
}

And here's a sample, showing bit twiddling with base value 25:

Base (dec 25): 0000 0000 0000 0000 0000 0000 0001 1001
After 1st shift: 0000 0000 0000 0000 0000 0001 1001 0000
After reversing: 0000 1001 1000 0000 0000 0000 0000 0000
After 2nd shift: 0100 1100 0000 0000 0000 0000 0000 0000

Since almost all platforms can use DLLs/shared objects, I decided to port my naive implementation to C++ and compile it to DLL. The results were identical on Windows, but being a paranoid, I also compiled the module on z/OS in 64-bit AMODE. Again, the results were identical, even though the program was now 64-bit and byte order on z/OS machine is big-endian.

I didn't invent the bit-reversing algorithm, but just found it from the Internet. For some reason, I decided to arrange a "race" between Java and C++ implementations. In the beginning the results were shocking; even after full optimizations the C++ implementation was more than eight times slower. But isn't C++ ought to be faster than Java? After looking deeper under the hood, it was clear what made the difference: looping. Modifying the way bits are reversed gave an incredible boost. A better bit-reversing routine looks like this:

uint32_t reverseBits( uint32_t num ) {
num = ( num & 0x55555555 ) << 1 | ( num >> 1 ) & 0x55555555;
num = ( num & 0x33333333) << 2 | ( num >> 2 ) & 0x33333333;
num = ( num & 0x0F0F0F0F) << 4 | ( num >> 4 ) & 0x0F0F0F0F;
num = ( num << 24 ) | ( ( num & 0xFF00 ) << 8 ) | ( ( num >> 8 ) & 0xFF00 ) | ( num >> 24 );
return num;
}

However, since the target platform actually runs on multiple operating systems (at least on Windows and AIX), using C/C++ requires compilations on multiple platforms. For this reason a colleague of mine implemented another solution which doesn't need bit shifting operations, and thus can be implemented directly within the target platform. Of course, the performance of the routine dropped again. To keep languages in minimum, I'm showing this approach again in C++.

int32_t hash( uint_t base,uint32_t* hash ) {
if( base > 0x07FFFFFF )
return -1;
*hash = reverseBits( base * 16 ) * 8;
return 0;
}

uint32_t reverseBits( uint32_t base ) {
uint32_t reversed_num = 0;
for( int bitpos = 32; base != 0; ) {
bitpos--;
if( base % 2 == 1 )
reversed_num += ( uint32_t ) pow( 2.0,( double ) bitpos );
base /= 2;
}
return reversed_num ;
}

Finally, being a Java fella, I conclude this episode with a Java implementation of the hash routine. This implementation is the easiest and performance-wise comparable to C++. (Which is not a big surprise since the algorithm is the same). Here you are:

Integer.reverse( base << 4 ) << 3;

Now the big question is: what is all this bit twiddling good for? The reason why bits are reversed, is the way how DB2 for z/OS locks work, especially in datasharing mode. Since the number under manipulation is used as a clustering key, DB2 for z/OS tries to place consecutive numbers close to each other, possibly in the same page. This is turn means that inserts with consecutive numbers hit the same page at the same time and they are disturbing each other due to exclusive locks on the page. The page becomes "hot". But after the bit reversal, consecutive numbers are not not consecutive anymore and thus are not placed into same page (if there are more than one page, of course). Let's take a simple example with numbers 100, 101 and 102. The hashed values are 318 767 104, 1 392 508 928 and 855 638 016, respectively. The differences are huge and thus data lands evenly all around the pages. The new platform will not need this functionality per se, but it needs to be compliant with the numbers in legacy system. That's it.

The gentle art of making ... programs

Monday, November 5, 2012

Bit twiddling

No comments:

Post a Comment