Thursday, November 29, 2012

How's it hanging?


Once again I found myself going deeper in the world of low-level programming. This all began due to hang process caused systematically by the "collaboration" of two products. Let the offender be anonymous, while the other party was was BMC's Control-M, running as a Windows service.

Finding a correct location


How did I find what's wrong? It all started by using Process Explorer from Sysinternals. From the threads view I could see that certain DLL was always stuck pretty much in the same place. Actually it wasn't completely frozen but anyhow for some reason it couldn't proceed. This was a good start, but still it didn't lead me deeper into the problem at hand. I only knew that a thread was stuck and in the middle of the call stack I found FindWindow call, which was from now on my primary suspect. I googled FindWindow within a Windows service, but I couldn't find any clues of hang process behavior. I decided to take a look into DLL I found in the call stack. I searched for FindWindow and found this:

 L30003277:
  push 00000000h
  call [KERNEL32.dll!Sleep]
  push edi
  push 00000000h
  call [USER32.dll!FindWindowW]
  mov esi,eax
  test esi,esi
  jz L30003277

Wow! Although I'd say I found it by accident, I instantly knew this is it (or shit); a code snippet looping until certain window handle is found. And since this code is run under noninteractive window station, it will never find the window. However, I wanted more evidence and I downloaded API Monitor from Rohitab.com. Some people say it has it's deficiencies, but at least in this case this tool was extremely valuable. I ran the code with it and vĂ³ila: I caught the looping FindWindow code red-handed. From the call stack I saw the very same address as I found with the disassembler. The case seemed to be closed. We actually got a patch during the same day, but it wasn't because of my discoveries. The case was already reported by someone else.

Anyway, few days later I was discussing with my colleague about the case, who asked if I know how the API monitor does it's magic. I had to say I have no idea, but immediately I knew this won't be a long lasting answer. I had to find out how they do it! Enter hotpatches.

Hotpatches


Hotpatches allows you to patch a running process without requiring that the process be stopped and thus the patch can be applied without even stopping the process. This mechanism can be used to implement API spying, software cracking and malware, just to name few uses. What makes it possible is the sequence of five bytes just before the function and two bytes (doing basically nothing) in the beginning of the function. The instructions before function are NOPs while the very first instruction in the function is mov edi,edi. Here are some examples from USER32.DLL:

Disassembly of MessageBoxW looks like this:


7DCBFECA  90 db 90h;   '?'
7DCBFECB  90 db 90h;   '?'
7DCBFECC  90 db 90h;   '?'
7DCBFECD  90 db 90h;   '?'
7DCBFECE  90 db 90h;   '?'
7DCBFECF                           MessageBoxW:
7DCBFECF  8BFF mov edi,edi
7DCBFED1  55 push ebp
7DCBFED2  8BEC mov ebp,esp


while MessageBoxA looks like this:


7DCBFEA9  9090909090 Align 2
7DCBFEAE                           MessageBoxA:
7DCBFEAE  8BFF mov edi,edi
7DCBFEB0  55         push ebp
7DCBFEB1  8BEC mov ebp,esp


However, the thing is that either way there are five NOPs (0x90) before actual function entry. And these bytes can be rewritten to do far jump. But how to install our hot patch into given process?

DLL injection


There's a really neat way to inject code for a given process by using CreateRemoteThread. With this function we can start a thread in the process and execute our code there. The mechanism, called DLL injection, is actually quite simple and I was able to do it even after many years of C++ hibernation. Although it's definitely possible to hotpatch a running process, I will describe how to start a process so that hotpatching is active right from the beginning. This means I am creating the process. The basic principle goes like this:

1. Start a process so that dwCreationFlags are OR'ed with CREATE_SUSPENDED (CreateProcess).
2. Grab a module handle to KERNEL32.DLL (GetModuleHandle).
3. Retrieve the address of LoadLibraryA from kernel32 (GetProcAddress).
4. Allocate memory for DLL path in the remote process (VirtualAllocEx) and write DLL path to it (WriteProcessMemory)
5. Create a remote thread into remote process (CreateRemoteThread) and set it's starting address as previously taken address of LoadLibraryA.
- Injected DLL is now loaded and it's DllMain is run. This is where hotpatching is done.
6. Wait for the thread to terminate
7. Resume the primary thread of the remote process.
8. Clean up things in the calling process.

If the process is already running, one might try to pause the process by calling SuspendThread to calm down activities, but I think it's not mandatory.

The source code of injected DLL can be found here and the initiator of injection is here. Beware that the level of error handling is next to nothing.

Monday, November 5, 2012

Bit twiddling


Recently I had an interesting problem at hand: we needed to port a kind of hashing (or scaling) algorithm from mainframe assembler to another platform which doesn't have bit shifting operations. I started my journey by verifying what the assembler routine actually does. Althouhg I've done some x86 assember in the past and even some assembler for z-machines, it turned out to be an interesting session. In the beginning I was totally lost with assembler source code for the following reasons:


  • Registers in mainframe have really nice names: they are just numbers from 0 to 15. Which operand represents value and which represents register?
  • Instructions in mainframe assembler do not use too many letters: N stands for AND, XR stands for XOR, ...
  • Some instructions are manipulating multiple registers at once. For example SRDL 2,16 shifts contents of register 2 to register 3 by 16 bits.


Soon I realized I have to consult my colleague to figure what is going here. During the half an hour session with him, I ported the logic to Java just to understand how the algorithm works. In the end the algorithm was verified to be quite simple: shifting base value to the left by 4 bits, reversing the bits and shifting the result again to the left by 3 bits. By shifting the bit sequence one position less to the left after reversing we can guarantee that hash value never gets negative (32th bit is always zero). The lowest 4 bits were used for certain purpose which not relevant in this story. The resulting C++ routing is quite short, but performance-wise far from optimal. The initial code looks like this:

int32_t hash( uint_t base,uint32_t* hash ) {
if( base > 0x07FFFFFF )
return -1;
*hash = reverseBits( base << 4 ) << 3;
return 0;
}

uint32_t reverseBits( uint32_t num ) {
    uint32_t reverse_num = 0;
    for ( int i = 0; i < 32; i++ ) {
        if( ( num & ( 1 << i ) ) )
reverse_num |= 1 << ( ( 32 - 1 ) - i );
    }
    return reverse_num;
}

And here's a sample, showing bit twiddling with base value 25:

Base (dec 25): 0000 0000 0000 0000 0000 0000 0001 1001
After 1st shift: 0000 0000 0000 0000 0000 0001 1001 0000
After reversing: 0000 1001 1000 0000 0000 0000 0000 0000
After 2nd shift: 0100 1100 0000 0000 0000 0000 0000 0000

Since almost all platforms can use DLLs/shared objects, I decided to port my naive implementation to C++ and compile it to DLL. The results were identical on Windows, but being a paranoid, I also compiled the module on z/OS in 64-bit AMODE. Again, the results were identical, even though the program was now 64-bit and byte order on z/OS machine is big-endian.

I didn't invent the bit-reversing algorithm, but just found it from the Internet. For some reason, I decided to arrange a "race" between Java and C++ implementations. In the beginning the results were shocking; even after full optimizations the C++ implementation was more than eight times slower. But isn't C++ ought to be faster than Java? After looking deeper under the hood, it was clear what made the difference: looping. Modifying the way bits are reversed gave an incredible boost. A better bit-reversing routine looks like this:

uint32_t reverseBits( uint32_t num ) {
num = ( num & 0x55555555 ) << 1 | ( num >> 1 ) & 0x55555555;
num = ( num & 0x33333333) << 2 | ( num >> 2 ) & 0x33333333;
num = ( num & 0x0F0F0F0F) << 4 | ( num >> 4 ) & 0x0F0F0F0F;
num = ( num << 24 ) | ( ( num & 0xFF00 ) << 8 ) | ( ( num >> 8 ) & 0xFF00 ) | ( num >> 24 );
return num;
}

However, since the target platform actually runs on multiple operating systems (at least on Windows and AIX), using C/C++ requires compilations on multiple platforms. For this reason a colleague of mine implemented another solution which doesn't need bit shifting operations, and thus can be implemented directly within the target platform. Of course, the performance of the routine dropped again. To keep languages in minimum, I'm showing this approach again in C++.

int32_t hash( uint_t base,uint32_t* hash ) {
if( base > 0x07FFFFFF )
return -1;
*hash = reverseBits( base * 16 ) * 8;
return 0;
}

uint32_t reverseBits( uint32_t base ) {
uint32_t reversed_num = 0;
for( int bitpos = 32; base != 0; ) {
   bitpos--;
   if( base % 2 == 1 )
       reversed_num  += ( uint32_t ) pow( 2.0,( double ) bitpos );
   base /= 2;
}
return reversed_num ;
}

Finally, being a Java fella, I conclude this episode with a Java implementation of the hash routine. This implementation is the easiest and performance-wise comparable to C++. (Which is not a big surprise since the algorithm is the same). Here you are:

Integer.reverse( base << 4 ) << 3;

Now the big question is: what is all this bit twiddling good for? The reason why bits are reversed, is the way how DB2 for z/OS locks work, especially in datasharing mode. Since the number under manipulation is used as a clustering key, DB2 for z/OS tries to place consecutive numbers close to each other, possibly in the same page. This is turn means that inserts with consecutive numbers hit the same page at the same time and they are disturbing each other due to exclusive locks on the page. The page becomes "hot". But after the bit reversal, consecutive numbers are not not consecutive anymore and thus are not placed into same page (if there are more than one page, of course). Let's take a simple example with numbers 100, 101 and 102. The hashed values are 318 767 104, 1 392 508 928 and 855 638 016, respectively. The differences are huge and thus data lands evenly all around the pages. The new platform will not need this functionality per se, but it needs to be compliant with the numbers in legacy system. That's it.

Sunday, November 4, 2012

REST


Long time no see. My brains have been in REST for a while, and I decided to share my thoughts about it. Beware RESTafarians.

SOAP vs REST WebServices


There are certain differences between these two approaches. Here's a short list to name just a few:


  • SOAP is about services while REST is about resources. 
  • REST interface is standardized by HTTP (GET, POST, PUT, DELETE, ...) and that's what (according to RESTafarians) makes it easy. SOAP places no restrictions on the interface, but service producer must design and describe each and every SOAP interface.
  • SOAP services have a description language (WSDL) while REST services do not. Some RESTafarians may disagree with me and praise WADL, but here's what wikipedia says about WADL: "WADL was submitted to the World Wide Web Consortium by Sun Microsystems on 31 August 2009, but the consortium has no current plans to standardize it and it is not yet widely supported. WADL is the REST equivalent of SOAP's Web Services Description Language (WSDL), which can also be used to describe REST web services.". And as we know, wikipedia is not wrong. Ever. What this means, is that you should not expect any proxy code generation based on description of the service in hand.
  • Browsers can invoke REST services natively while SOAP services require major hassle. For REST based services this is a big plus.
  • SOAP webservices are based on XML while REST services use content negotiation and should be able to produce multiple representations (like JSON).
  • REST responses are meant to be cached and the caching infrastructure is well understood and widely available. With SOAP you are on your own.
  • REST security is quite simple and is purely based on HTTP and things like SPNEGO. Don't expect services other than plain HTTP/HTTPS have. 
  • Since SOAP is not based on HTTP, you can build and consume SOAP WebServices without full understanding of HTTP. With REST services you'd better know crucial parts of HTTP by heart (e.g. response codes).


REST used directly from a browser 


I have absolutely nothing against using REST directly from a browser. Browsers implement all necessary functionality to invoke REST services correctly. They are built to deal with HTTP messages and understanding various HTTP response codes is business as usual for them. Similarly, passing user identity to REST services is mostly taken care of by the browser. And if you want to have SSO, the chances are good that you have a working solution already. For example, if your clients have SPNEGO enabled browser and your application server also knows SPNEGO, implementing SSO is piece of cake; the identity of the user is passed all the way down to your REST service. Browsers also know how to handle different representations, be it JSON, XML, HTML or almost whatever.

REST used in application-to-application communication


Communication between applications with REST is something I just don't buy. While it's relatively easy in a to build REST services, consuming services may not be so simple. For example, consuming REST service from WebSphere Application Server (and from most Java EE application servers) means you have to manage user identity by yourself; nobody's propagating user identity for you. And although appending a couple of parameters to an URL is simple, handling response may not be that simple. First of all, you must understand the HTTP response code. You also must understand the resource representation returned and soon you will end up having all kinds of content parsers within your application (e.g. some kind of shitty open-source JSON parser). While these kind of parsers probably exist, I wouldn't expect quality even close to JAXB. Are you now wondering why I'm foaming about things like JSON parsers since REST services can also produce XML? Why don't just use XML and JAXB? Of course you can use XML and JAXB, but what does it actually mean? It means you end up building the very same things by yourself what SOAP toolkits/proxies/... provide for you.

Conclusion


I think the simplicity of REST is vastly overvalued. You need to know this and that to make it work. For example, localization/internationalization is a good example: how do you request representation in multiple languages? Are the URIs same or not? Is "Accept-Language" -header the way to go or not? Search the web and you'll find pretty nasty debates going on with this.

So, at the end of the day I'm still in favour of using SOAP between applications and REST if the service is consumed directly by the browser. RESTafarians of course ask the big question: "how would you know who is consuming your service and why would you even care"?