When reversing malware it is common to find an injected payload loading references to external resources (DLL functions). This happens for two main reasons:
- The hosting process does not have all resources necessary to the execution of the injected payload;
- Making reversing engineering the malware trickier since the dumped segment will have all calls pointing to a meaningless address table.
This article is divided in three main parts:
- Explaining the observed technique;
- How it works; and
- How to circumventing it in order to facilitate reversing.
First of all, shout out to Sergei Frankoff from Open Analysis for this amazing video tutorial on this same topic which inspired me to write about my analyses. Regards also to Mark Lim who also wrote a very interesting article about labelling indirect calls in 2018. His article uses structures instead of patching the code (which is also a good approach) but I think it lacks important details and I will try to cover these points in here.
Examples presented in this article were extracted from the following Smokeloader sample:
Size: 215.5 KB (220672 bytes)
Type: PE32 executable for MS Windows (GUI) Intel 80386 32-bit
Explaining what is going on in the first stage (packer/crypter) is out of scope; this article focuses on characteristics found in the final payload. This sample injects the main payload in "explorer.exe" as it is possible to observe in this AnyRun sandbox analysis.
Figure 01 shows how the code looks immediately after the execution control passes to the injected code.
|Figure 01 - Smokeloader's final payload.|
Three points were marked in this code snip (1, 2 and 3). The first point (1) is the call to the main function (located at 0x002F1853). This function expects to receive an address through ECX register. This address points to a data segment where all temporary structures will be stored.
The third point (3) is an indirect call to an address stored in register ESI plus offset 0xEAE. The debugger was not able to resolve this address since the "memory segment" pointed by ESI is not set at this point of the execution (Instruction Pointer pointing to 0x002F1844). This pattern usually is an indicator that this code will dynamically resolve and import external resources to a specific address table (in this case stored in what we called "data segment"). This is an interesting technique because this table can be moved around by changing the address stored in ESI as long as offsets are preserved. In this code ESI is set to "0x002E0000" which is the address of a read-and-write memory segment created during the first stage. Figure 02 shows the region pointed by the offset 0xEAE which is empty at this point of the execution.
|Figure 02 - Address pointed by the indirect call.|
The second point (2) marks a function call immediately before the indirect call (3). This is a strong indicator that the code for creating the address table must be somewhere inside this function. The address located in "002E0EAE" will be filled with pointers to the expected API function. Figure 3 shows this same memory region after the "__load_libraries" function is executed.
|Figure 03 - Address pointed by the indirect call is filled after the "__load_libraries" function is called|
x32dbg has a memory dump visualisation mode called "Address" which will list every function pointed to each address loaded in the call table we just described.
|Figure 04 - Resolved address in call table|
Figure 04 shows that the position pointed by the indirected call listed in point (3) points to function "sleep" inside "kernel32.dll". Basically this call table is an Array of unsigned integers (4 bytes) containing an address pointing to an API call in each position.
The "__load_library" function is responsible for creating this "call table" so the focus of this article will move to understand how it works.
--- End of part I ---
|Figure 05 - "__load_libraries" zoomed out CFG representation.|
Figure 05 shows an overview of the "__load_library" function created by IDA. This function is quite large and performs few connected steps which we need to go through in order to fully understanding its behaviour. This function can be divided in three main sections:
- Code responsible for finding the base addresses for core libraries;
- Code responsible for loading addresses for calls within code libraries;
- The last section is responsible for loading other libraries necessary for executing the malware.
Figure 06 presents the first part of the "__load_libraries" function. In its preamble the code navigates through the TEB (Thread Environment Block) and loads 4 bytes from offset 0x30 into register EAX. This address contains the address of the PEB (Process Environment Block). Next step is to get the location for the "PEB_LDR_DATA" structure which is located in offset 0xC. This structure contains a linked list containing information about all modules (DLLs) loaded by a specific process.
|Figure 06 - first section of the "__load_libraries" function.|
The code accesses the offset 0xC in the "PEB_LDR_DATA" structure which contains the head element for the loaded modules in the order they were loaded by the process. Each element in this linked list is a combination of "_LDR_DATA_TABLE_ENTRY" and "_LIST_ENTRY" structures. This structure has an entry to the base name of the module in the offset 0x30. Figure 07 summarises all this "structure maze" used in order to fetch loaded module names (excusez-moi for my paint brush skills :D).
|Figure 07 - Path through the process internal structures to get loaded DLL names and base addresses|
The main loop, beginning at "loc_2F189F" (Figure 06), goes through all modules loaded by the "explorer.exe" process. This algorithm fetches the module name and calculates a hash out of it. The second smaller looping located at "loc_2F18AB" (Figure 06) is the part of the code responsible for calculating this hash. Figure 08 shows the reversed code for this hashing algorithm.
|Figure 08 - Reversed hashing algorithm used in the first part of the analysed code|
Moving forward, after calculating a hash the algorithm does a XOR operation with a hardcoded value 0x25A56A90 and this value is compared with two hardcoded hashes: 0x4C5DACBC (kernel32.dll) and 0x7FA40424 (ntdll.dll). The base addresses of each DLL are stored in two global variables located in the following addresses [ESI+0x1036] and [ESI+0x103A].
Bonus: these hardcoded hashes can be used for detecting this specific version of Smokeloader.
Summarising, this first part of the code is responsible by finding the base address of two core libraries in MS Windows ("ntdll.dll" and "kernel32.dll"). These addresses will be used for fetching resources necessary for loading all other libraries required by the malware.
Figure 09 shows the second section of "__load_libraries". This figure shows the code with some functions names already figured out in order to make it more didactic.
|Figure 09 - second section of the "__load_libraries" function.|
The first two basic blocks checks if the function was able to find "ntdll.dll" and "kernel32.dll" base addresses. If these modules are available then the "__load_procs_from_module" function is invoked for filling the call table. This function receives 4 parameters and does not follow the standard C calling convention. Two parameters are passed through the stack and the other two through registers (ECX and EDX). This function expects a DLL base address in EDX, the data segment in ECX, an address to a list of unsigned ints (api calls hashes) and a destination address (where the calls addresses will be stored). The last two parameters are pushed in the stack.
Figure 10 shows the hardcoded hashes passed as parameter to "__load_procs_from_module" function. This list will be used to determine which procedures will be loaded in the call table.
|Figure 10 - Array of hashes of "ntdll.dll" function names|
Next step is to take a look inside "__load_procs_from_module" function. Figure 11 shows the code for this function. Parameters and functions were named to facilitate the understanding of this code.
|Figure 11 - Code for "__load_procs_from_modules" function|
This function iterates over a list of 4 bytes hashes received as parameter. Each element is XORed with a hardcoded value (0x25A56A90) and passed to the function "__get_proc_address" together with a base address of a library. This function iterates over all procedures names exported by a DLL, calculates a hash and compares it with the hash received as parameter. If it finds a match, "__get_proc_address" returns an address for the specific function.
Lets take a closer look inside "__get_proc_address" to figure out how it navigates through the loaded DLL. Figure 12 shows a snip of the code for this function.
|Figure 12 - Code for "__get_proc_address" function.|
The preamble of the function fetches the address for the PE header by accessing offset 0x3C in the DLL base address. Next step it fetches the relative virtual address (RVA) for the export directory at offset 0x78 of the PE header. From the Export Directory structure this function fetches the following fields: NumberOfNames (offset 0x18), AddressOfNames (offset 0x20) and AddressOfNameOrdinals (offset 0x24). References for all these structures can be found in the Corkami Windows Executable format overview.
After loading information about the exports the code will iterates through the list of function names and calculates a 4 bytes hash by calling the "__hashing" function (same algorithm described in Figure 08). If the output of the "__hashing" function matches the hardcoded hash then the ordinal for that function is saved and the address related to that ordinal is returned.
Figure 13 shows a code in Python that reproduces the above mentioned comparison algorithm using hardcoded hashes extracted from memory (Figure 10) and all function names exported by ntdll.dll.
|Figure 13 - Reversing outcome for code responsible by resolving "ntdll.dll" hardcoded hashes|
This code produces the following output:
Finally, these addresses are used for filling the call table which will be referenced by indirect calls in the main payload. It is possible to confirm that what was described so far is true by observing the function addresses written in the data segment after executing the second section of "__load_libraries". Figure 14 shows the part of the call table filled so far with the expected "ntdll.dll" calls.
|Figure 14 - Segment of Smokeloader's dynamically generated call table|
The last segment of the "__load_libraries" function de-obfuscates the remain libraries names and load them by using the same resources used for loading "ntdll" and "kernel32". The libraries loaded by Smokeloader are: "user32", "advapi32", "urlmon", "ole32", "winhttp", "ws2_32", "dnsapi" and "shell32".
Now that the whole process of creating the call table used by the indirect calls is described, next step will get into fixing the memory containing the main payload by using IDA Python.
--- End of part II ---
When the main payload of Smokeloader is imported into IDApro it is possible to see code containing indirect calls which uses a base address stored in a register plus an offset. Figure 15 presents a snip of the main payload containing such indirect calls.
|Figure 15 - Indirect calls calling functions pointed at the dynamic generated calls table.|
This characteristic makes the processing of reversing this code harder since the interaction with other resources in the Operating System is not clear as all external calls is not explicit. The goal in this part of the article is to patch these calls for pointing to addresses we going to map and label (using IDA Python). The code below implements the change we want.
This code performs the following actions into our IDB:
- Reads a memory dump of the data segment of an executing Smokeloader binary (line 106);
- Creates a DATA segment mapped into 0x00000000 (line 107).
- Loads the dumped data segment from the running sample into this new segment (line 35);
- Imports API names extracted from x32dbg to specific positions in the new data segment (line 112);
- Patches all indirect call instructions (opcode 55 9X) to direct call instructions (line 51).
Figure 16 shows the code listed after executing the script above. As we can see, all indirect calls were translated to direct calls to a labeled table located in the freshly created data segment starting at address 0x00000000.
|Figure 16 - Patched code with calls containing meaningful labels.|
Just heads up for preventing people against messing up research IDBs: for obvious reasons (different instruction sets) the script above can not be used for patching 64 bits Smokeloader IDBs but it could be easily adapted to do the same task.
--- End of part III ---
That's all folks!
The ideas described in this article can be extended and used to analyse any other malware families dynamically importing libraries and using indirect calls. Another thing cool for experimenting in future would be write a script which loads DLLs and extracts labels statically by using the reversed "__hashing" function and native functionalities in IDA for mapping DLLs in the process address space.