Wednesday 18 December 2019

Inline Loop Detection for Compressing API Call Traces


I have been working on a solution for compressing files containing trace of API calls coming out of a sandbox (Cuckoo sandbox [1]). This file holds events generated by a monitor module (DLL) which is injected into processes and hooks a pre-configured set of API calls. The main goal of this log is to describe interaction between processes and Operating System (Behavioural data).

This file can grow in size (memory demanding), complexity (CPU demanding) and noise (difficult to read) depending on two main factors: 
  • how long an analysed process is executed; and 
  • presence of specific control flow structures (e.g. loops).
This article proposes two methods for detecting looping patterns. The outcome of these methods is further used for compressing the above described sandbox log files without changing its structure.

Detecting loops by observing its execution is a complex task since not always a well defined pattern emerges. This article uses the terms "repeated element" and "loop body" to identify a content inside a loop statement. This element can be composed by a single call (repeated multiple times), multiple sequential calls and nested calls; its execution output can also be affected by control flow elements such as conditionals.

Repeated elements can also be affected by Pseudo-Random Number Generators. In this case, the final log presents more complex patterns. Algorithms described in this article do not consider this less common scenario and target detection of more general looping patterns.

.::[ Context and Example

Python-like pseudo-code snips are presented along this article for illustrating the proposed techniques. The following code snip describes a loop executing calls to functions labeled from 1 ("call_001") to 5 ("call_005"):

The body of the loop statement (line 2) executes functions "002" and "0035 times and function "004" every time "i" is even. The behavioural log produced by executing the script above would be:

Each entry in this log contains a position (line number "line[x]") and an identifier ("c_00[x]") to a call. A new entry is added to this file every time a function is executed. Taking a closer look, it is possible to observe a repetition pattern located between lines 2 and 14. In this specific example, these lines describes the execution of a loop and could be compressed to the following form:


The compressed output above has the following properties:
  • log size was reduced in 3 times (66% compression rate);
  • preserved the original order that functions are executed; and 
  • it is easier to read;
Next sessions describe ideas to reach the compression exemplified above. The first idea uses Suffix Trees [2] and Non-Overlapping Longest Repeated Subset [3] and the second uses N-Gram [4].

.::[ Method 001: Non-Overlapping Longest Repeated Substring

The first idea consists in using Non-Overlapping Longest Repeated Substring [5][6] which is a well stablished algorithm with complexity equal to O(n^2). This algorithm can be applied to our specific problem for finding repeated subsets of entries which represent repeated calls (and possible loop). Each entry in the log file must be mapped to a checksum and this value should represent a setup to a API call (position, parameters, function name etc).

The following pseudo-code shows a simplified representation of this algorithm.

The first part of this code (lines 1 to 4) creates a vector named "data" containing checksums calculated from each log entry (we call these checksums "primitive" tokens). After that, the algorithm runs in a loop until "calculate_lrs()" returns an empty value. "replace_lrs()" function replaces the all instances of LRS (Longest Repeated Substring) vector within "data" with a token. This token and its respective LRS vector are saved in a translation map (dictionary named "map"). Next step is reducing repeated tokens and count how many times it repeats (line 14). At the end, tokens are translated back to vectors by using the translation map (line 15). 

The second "while" listed in line 10 detects repetition patterns inside LRS vectors (this creates recursion). Next code snip shows a use case for this scenario.

In this case the vector "[2, 2]" would be replaced with a unique token in the original data vector and the main loop moves to its next iteration using this transformed vector. 

The final output for the input vector above would be "[(0, 2), (1, 2), (2, 8)]". The first element of each set is a primitive token and the second element is the amount of repetitions.

The code below is a Python implementation of the logic described above. 

The output to the code above is:

Log entries (line 1) were compressed (line 2) and the information of repetition was appended. This algorithm could be applied in a real scenario for compressing behavioural logs in a very efficient way in terms of compression but not really in terms of computational resources usage. During our experiments with real world behavioural logs, we could detect two critical drawbacks of this approach:
  • memory consumption - as all log entries have to be loaded in memory in order to run the "find_repeated_substring()" function; and
  • processing time - the complexity is not that bad O(n^2) but the algorithm is called many times and this impacts the overall efficiency of the script.
This method can be used in a scenario with many small/medium size log files when maximum efficiency compression is a hard-requirement. 

These limitations pushed this research to experiment other approaches. Since the log files are relatively big and loops usually perform small and concise block of operations so the full coverage (to all sizes of looping body) is not really necessary. 

.::[ Method 002: N-Gram

N-Gram is a contiguous sequence of "n" items from a given sample [4]. This concept can be used for representing repeated patterns and could be used in a solution to the problem investigated in this article.

Since human-generated loop contents are limited in size, the solution proposed in this section can be bounded according to a parameter. This parameter is named "Compressor Level" in this article and holds a value (positive Integer) representing the maximum size to a loop body (repeated pattern). This determines how many pattern detectors the algorithm will create with size varying from 1 to "Compressor Level".

The following pseudo-code shows the idea described in this section:

This algorithm first pre-initialise a list of compressors (this component will be better explained later) using the above mentioned "Compressor Level" parameter. This "for" loop (line 2) creates multiple compressors containing N-Grams with "N" ranging from 1 to a positive integer passed as parameter ("i" on line 3). 

The loop listed from lines 5 to 10 connects Compressors. Elements are added to the smallest Compressor and its output is added to the next one in line with it. This cycle repeats until the algorithm reaches the last Compressor.

A buffer in a compressor acts like a queue (first in first out) of size equal to "level" times 2. This buffer is incremented and if buffer is full an analysis is executed. This analysis simply checks if the first half of buffer is the same as the second. If the answer to this check is positive then the buffer is compressed eliminating the repetition. 

The following code is a Python implementation of the method described in this section.

The next snip shows the output of the code above.

This approach is able to detect loop patterns and compresses them dynamically (inline) without using big chunks of memory. This algorithm could be implemented in a solution for compressing behavioural logs and optimising resources usage in a Sandbox.

.::[ Conclusions

The algorithm described in this article can be used for identifying repetition patterns in any kind of data series. We successfully used it for compressing behavioural log files coming out from a Sandbox used for analysing malware. The main benefits of using the algorithm described in this paper were:
  • Logs got compressed;
  • Defeated malware strategies to DOS this kind of analysis;
  • Behavioural report became easier to read;
  • Since the amount of data reduced, searches got faster;
  • Saved memory and disk space; and
  • It became easier to visually identify loops;
Next step is to move this code inside Cuckoo Sandbox [1] and check how effective it compresses "tasks" structures (this is how Cuckoo calls each sandboxing session) before importing them into Mongo just by compressing behavioural logs (of all processes).

Here goes some screenshots from our preliminary tests using Cuckoo Sandbox CAPE [7]. We developed this small PE sample [8][9] which runs a loop going through few calls just for testing our approach:

Figure 01: Number of pages of API call entries BEFORE compression
Figure 02: Number of pages ofAPI call entries AFTER compression

Figure 03: Behavioural analysis report section BEFORE compression

Figure 04: Behavioural analysis report section AFTER compression
Figure 05: Compressed BSON sizes in disk
The final BSONs (compressed JSON) got 3 times smaller after compressed this impacts the load of Mongo, size of final JSON report and speed of searches using the UI.

Thursday 31 October 2019

Dynamic Imports and Working Around Indirect Calls - Smokeloader Study Case

When reversing malware it is common to find an injected payload loading references to external resources (DLL functions). This happens for two main reasons:
  1. The hosting process does not have all resources necessary to the execution of the injected payload; 
  2. Making reversing engineering the malware trickier since the dumped segment will have all calls pointing to a meaningless address table. 
This article explains how to revert this trick and get back API call names annotations in an IDApro database. A sample of Smokeloader was used for illustrating the ideas described in this post.

This article is divided in three main parts:
  1. Explaining the observed technique;
  2. How it works; and
  3. How to circumventing it in order to facilitate reversing. 
First of all, shout out to Sergei Frankoff from Open Analysis for this amazing video tutorial on this same topic which inspired me to write about my analyses. Regards also to Mark Lim who also wrote a very interesting article about labelling indirect calls in 2018. His article uses structures instead of patching the code (which is also a good approach) but I think it lacks important details and I will try to cover these points in here.

Examples presented in this article were extracted from the following Smokeloader sample:

Filename:   p0n36i2d.exe
MD5:         a8cc396b6f5568e94f28ca3381c7f9df
SHA1:       12948e36584e1677e80f78b8cc5c20576024c13f
SHA256:   17b548f9c8077f8ba66b70d55c383f87f92676520e2749850e555abb4d5f80a5
Size:           215.5 KB (220672 bytes)
Type:          PE32 executable for MS Windows (GUI) Intel 80386 32-bit

Explaining what is going on in the first stage (packer/crypter) is out of scope; this article focuses on characteristics found in the final payload. This sample injects the main payload in "explorer.exe" as it is possible to observe in this AnyRun sandbox analysis.

Figure 01 shows how the code looks immediately after the execution control passes to the injected code.

Figure 01 - Smokeloader's final payload.
Three points were marked in this code snip (1, 2 and 3).  The first point (1) is the call to the main function (located at 0x002F1853). This function expects to receive an address through ECX register. This address points to a data segment where all temporary structures will be stored. 


The third point (3) is an indirect call to an address stored in register ESI plus offset 0xEAE. The debugger was not able to resolve this address since the "memory segment" pointed by ESI is not set at this point of the execution (Instruction Pointer pointing to 0x002F1844). This pattern usually is an indicator that this code will dynamically resolve and import external resources to a specific address table (in this case stored in what we called "data segment"). This is an interesting technique because this table can be moved around by changing the address stored in ESI as long as offsets are preserved. In this code ESI is set to "0x002E0000" which is the address of a read-and-write memory segment created during the first stage. Figure 02 shows the region pointed by the offset 0xEAE which is empty at this point of the execution.

Figure 02 - Address pointed by the indirect call.
The second point (2) marks a function call immediately before the indirect call (3).  This is a strong indicator that the code for creating the address table must be somewhere inside this function. The address located in "002E0EAE" will be filled with pointers to the expected API function. Figure 3 shows this same memory region after the "__load_libraries" function is executed.

Figure 03 - Address pointed by the indirect call is filled after the "__load_libraries" function is called
x32dbg has a memory dump visualisation mode called "Address" which will list every function pointed to each address loaded in the call table we just described. 

Figure 04 - Resolved address in call table
Figure 04 shows that the position pointed by the indirected call listed in point (3) points to function "sleep" inside "kernel32.dll". Basically this call table is an Array of unsigned integers (4 bytes) containing an address pointing to an API call in each position. 

The "__load_library" function is responsible for creating this "call table" so the focus of this article will move to understand how it works.

--- End of part I ---

Figure 05 - "__load_libraries" zoomed out CFG representation.
Figure 05 shows an overview of the "__load_library" function created by IDA. This function is quite large and performs few connected steps which we need to go through in order to fully understanding its behaviour. This function can be divided in three main sections:
  1. Code responsible for finding the base addresses for core libraries;
  2. Code responsible for loading addresses for calls within code libraries;
  3. The last section is responsible for loading other libraries necessary for executing the malware.
Figure 06 presents the first part of the "__load_libraries" function. In its preamble the code navigates through the TEB (Thread Environment Block) and loads 4 bytes from offset 0x30 into register EAX. This address contains the address of the PEB (Process Environment Block). Next step is to get the location for the "PEB_LDR_DATA" structure which is located in offset 0xC. This structure contains a linked list containing information about all modules (DLLs) loaded by a specific process. 

Figure 06 - first section of the "__load_libraries" function.
The code accesses the offset 0xC in the "PEB_LDR_DATA" structure which contains the head element for the loaded modules in the order they were loaded by the process. Each element in this linked list is a combination of "_LDR_DATA_TABLE_ENTRY" and "_LIST_ENTRY" structures. This structure has an entry to the base name of the module in the offset 0x30. Figure 07 summarises all this "structure maze" used in order to fetch loaded module names (excusez-moi for my paint brush skills :D).

Figure 07 - Path through the process internal structures to get loaded DLL names and base addresses
The main loop, beginning at "loc_2F189F" (Figure 06), goes through all modules loaded by the "explorer.exe" process. This algorithm fetches the module name and calculates a hash out of it. The second smaller looping located at "loc_2F18AB" (Figure 06) is the part of the code responsible for calculating this hash. Figure 08 shows the reversed code for this hashing algorithm. 

Figure 08 - Reversed hashing algorithm used in the first part of the analysed code
Moving forward, after calculating a hash the algorithm does a XOR operation with a hardcoded value 0x25A56A90 and this value is compared with two hardcoded hashes: 0x4C5DACBC (kernel32.dll) and 0x7FA40424 (ntdll.dll). The base addresses of each DLL are stored in two global variables located in the following addresses [ESI+0x1036] and [ESI+0x103A]

Bonus: these hardcoded hashes can be used for detecting this specific version of Smokeloader

Summarising, this first part of the code is responsible by finding the base address of two core libraries in MS Windows ("ntdll.dll" and "kernel32.dll"). These addresses will be used for fetching resources necessary for loading all other libraries required by the malware. 

Figure 09 shows the second section of "__load_libraries". This figure shows the code with some functions names already figured out in order to make it more didactic. 

Figure 09 - second section of the "__load_libraries" function.
The first two basic blocks checks if the function was able to find "ntdll.dll" and "kernel32.dll" base addresses. If these modules are available then the "__load_procs_from_module" function is invoked for filling the call table. This function receives 4 parameters and does not follow the standard C calling convention. Two parameters are passed through the stack and the other two through registers (ECX and EDX).  This function expects a DLL base address in EDX, the data segment in ECX, an address to a list of unsigned ints (api calls hashes) and a destination address (where the calls addresses will be stored). The last two parameters are pushed in the stack. 

Figure 10 shows the hardcoded hashes passed as parameter to "__load_procs_from_module" function. This list will be used to determine which procedures will be loaded in the call table. 

Figure 10 - Array of hashes of "ntdll.dll" function names
Next step is to take a look inside "__load_procs_from_module" function. Figure 11 shows the code for this function. Parameters and functions were named to facilitate the understanding of this code.

Figure 11 - Code for "__load_procs_from_modules" function
This function iterates over a list of 4 bytes hashes received as parameter. Each element is XORed with a hardcoded value (0x25A56A90) and passed to the function "__get_proc_address" together with a base address of a library.  This function iterates over all procedures names exported by a DLL, calculates a hash and compares it with the hash received as parameter. If it finds a match, "__get_proc_address" returns an address for the specific function. 

Lets take a closer look inside "__get_proc_address" to figure out how it navigates through the loaded DLL. Figure 12 shows a snip of the code for this function.

Figure 12 - Code for "__get_proc_address" function.
The preamble of the function fetches the address for the PE header by accessing offset 0x3C in the DLL base address. Next step it fetches the relative virtual address (RVA) for the export directory at offset 0x78 of the PE header. From the Export Directory structure this function fetches the following fields: NumberOfNames (offset 0x18), AddressOfNames (offset 0x20) and AddressOfNameOrdinals (offset 0x24). References for all these structures can be found in the Corkami Windows Executable format overview.

After loading information about the exports the code will iterates through the list of function names and calculates a 4 bytes hash by calling the "__hashing" function (same algorithm described in Figure 08). If the output of the "__hashing" function matches the hardcoded hash then the ordinal for that function is saved and the address related to that ordinal is returned.

Figure 13 shows a code in Python that reproduces the above mentioned comparison algorithm using hardcoded hashes extracted from memory (Figure 10) and all function names exported by ntdll.dll

Figure 13 - Reversing outcome for code responsible by resolving "ntdll.dll" hardcoded hashes
This code produces the following output:


Finally, these addresses are used for filling the call table which will be referenced by indirect calls in the main payload. It is possible to confirm that what was described so far is true by observing the function addresses written in the data segment after executing the second section of "__load_libraries". Figure 14 shows the part of the call table filled so far with the expected "ntdll.dll" calls.
Figure 14 - Segment of Smokeloader's dynamically generated call table
The last segment of the "__load_libraries" function de-obfuscates the remain libraries names and load them by using the same resources used for loading "ntdll" and "kernel32". The libraries loaded by Smokeloader are: "user32", "advapi32", "urlmon", "ole32", "winhttp", "ws2_32", "dnsapi" and "shell32".

Now that the whole process of creating the call table used by the indirect calls is described, next step will get into fixing the memory containing the main payload by using IDA Python.

--- End of part II ---

When the main payload of Smokeloader is imported into IDApro it is possible to see code containing indirect calls which uses a base address stored in a register plus an offset. Figure 15 presents a snip of the main payload containing such indirect calls.

Figure 15 - Indirect calls calling functions pointed at the dynamic generated calls table.
This characteristic makes the processing of reversing this code harder since the interaction with other resources in the Operating System is not clear as all external calls is not explicit. The goal in this part of the article is to patch these calls for pointing to addresses we going to map and label (using IDA Python).  The code below implements the change we want. 




This code performs the following actions into our IDB:
  1. Reads a memory dump of the data segment of an executing Smokeloader binary (line 106);
  2. Creates a DATA segment mapped into 0x00000000 (line 107).
  3. Loads the dumped data segment from the running sample into this new segment (line 35);
  4. Imports API names extracted from x32dbg to specific positions in the new data segment (line 112); 
  5. Patches all indirect call instructions (opcode 55 9X) to direct call instructions (line 51).
Figure 16 shows the code listed after executing the script above. As we can see, all indirect calls were translated to direct calls to a labeled table located in the freshly created data segment starting at address 0x00000000.

Figure 16 - Patched code with calls containing meaningful labels.
Just heads up for preventing people against messing up research IDBs: for obvious reasons (different instruction sets) the script above can not be used for patching 64 bits Smokeloader IDBs but it could be easily adapted to do the same task. 

--- End of part III ---

That's all folks! 

The ideas described in this article can be extended and used to analyse any other malware families dynamically importing libraries and using indirect calls. Another thing cool for experimenting in future would be write a script which loads DLLs and extracts labels statically by using the reversed "__hashing" function and native functionalities in IDA for mapping DLLs in the process address space. 

Monday 5 August 2019

Smokeloader's Hardcoded Domains - Sneaky Third Party Vendor or Cheap Buyer?

Smokeloader is a small modular bot first seen in 2011 [1] mainly used as a dropper for other malware families. Although mainly used for delivering a second stage stage, Smokeloader implements several malicious capabilities through its modules, such as: keylogging, process monitoring, DDOS, DNS redirection and form grabbing. These modules are often used for profiling and accessing infected machines before deploying a final malware increasing effectiveness of campaigns.  

- So here comes the main story -

Last week I saw a tweet [2] with an image of this server hosting few quite large executables (~1.2MB) claimed to be Smokeloader samples. These binaries were accessible through an Open Directory.

Figure 01: Open Directory exposing modified Smokeloader samples.

Along the 30th and 31st of July these files were changed few times. Here are the hashes found and analysed during the time of this research:




What caught my attention was that the controller contacted by these samples were different from the ones extracted by using public knowledge on this malware family [3].

Figure 02: Network capture showing HTTP connection to hardcoded controller.

Although extractor returns "hxxp://185.35.137[.]147/mlp/" as controller URL the sample tries to connect to "hxxp://jnanny2[.]pw/br/". This means that something modified the way controller URLs are stored. So we decided to reverse one of these samples and check what was going on.

Although all these samples have the same anomalous behaviour, we picked up "joibr.exe" for experimenting (from now one called "target sample").

Target sample is based on the 2018 version of Smokeloader since it sends 63 bytes of data to the controller and open source configuration extraction code for this specific version are able to retrieve the configuration correctly.

In order to compare what was changed, a 2018 Smokeloader sample [4] behaving like expected was also reversed (called "normal" sample from now on in this article). This sample tries to connect to "hxxp://ymad[.]ug/" which is exactly the same address observed in the config.

The following code shows the first part of the URL decryption routine for our normal sample:

Figure 03: First part of C2 URL decryption routine of a normal 2018 Smokeloader sample

Summarising, this code iterates through an array of pointers where its index is stored in a variable named "C2_index". Each pointer in this array points to a blob of data containing an encrypted URL which will be used as controller. This same code could be found in the target sample.

Figure 04: First part of C2 URL decryption routine of a modified 2018 Smokeloader sample

As we can see both routines are identical with one single difference: the label given to the addresses in the jump instruction at the bottom of the last basic block. If we keep reversing,  the normal sample jumps to the second part of the decryption code which decrypts data passed as parameter through ECX and returns a plain-text URL.

Figure 05: Second part of C2 URL decryption routine of a normal 2018 Smokeloader sample

This code basically allocates a buffer, decrypts the URL and returns a pointer to it.

Now comes the interesting part - for the target sample we got the following code as second part of the decryption routine.

Figure 06: Hardcoded C2 address of a modified 2018 Smokeloader version.

As we can see this routine has the "real" controller URL hardcoded together with the code. We can also note some evidences pointing that this code is kind of handcrafted (e.g. that "$+5" trick to get the address of the string).

Another really interesting characteristic is that this code is EXACTLY the same size of the original encryption function (82 bytes). This is a strong evidence that the original code has been patched.

This patch makes sense in order to bypass the business model of this family as Smokeloader developer also monetises by selling re-builds. This means that every time a buyer wants to change C2s URLs they have to contact the developer and pay for a new build (30 USD).
Стоимость:
- BOT - 400$ 
- STEALER - 100$
- FORM GRABBER - 300$
- PASS SNIFFER - 100$
- FAKE DNS - 100$
- DDOS - 200$
- HIDDEN TV - 150$
- KEYLOGGER- 100$
- PROCMON - 50$
- FILE SEARCH - 50$
- EMAIL GRABBER - 100$
- ребилд бота - 30$ (ребилды делаются в случае блокировки основного адреса, либо "про запас", если я буду отсутствовать более недели)
- обновления: мелкие фиксы - бесплатно, остальное обговаривается отдельно

So this leave us with three hypotheses:
  • Someone got tired of paying the 30 USD;
  • Someone decided to cut off the delay of contacting the author (in order to update C2 addresses within samples); or
  • Someone is planning to create a new builder and re-sell the malware for a cheaper price. :D
Now that we have more or less a picture on what is going on I decided to retro hunting samples presenting similar characteristics. I could find few Smokeloader 2017 samples doing the same trick dating since February 2018. This modified version has been around for some time and has co-existed with the official one. Figure 07 shows the same technique being used in a 2017 sample.

Figure 07: C2 address of a modified 2017 Smokeloader version.

Encrypted payload containing "hxxp://dogewareservice[.]ru/" is present in the sample and decryption function was patched to return a hardcoded controller address ("hxxp://haxmall[.]in/s/").

- Conclusions -

This move does not disrupt the business as this side version of Smokeloader is frozen in version 2018 and Smokeloader's official developer continues improving the malware. According to our retro hunting, both modified and official versions of Smokeloader have been co-existing for some time without any issue. Finally, there is a possibility that new vendors will show up selling modified versions of this malware by a fraction of its original price and take some market share.


IOCs:
[01] hxxp://grandsinarsari[.]com/av/ (hosting Smokeloader samples)
[02] hxxp://www.confezionamento-viti[.]it/img/1/ (hosting Smokeloader samples)
[03] 6632e26a6970d8269a9d36594c07bc87d266d898bc7f99198ed081d9ff183b3f  (Smokeloader)
[04] 1cea3a87500fdc933aa64cc45373034b1da6921644640106cd56483aa758b3bf  (Smokeloader)
[05] 501675053b0d4ba02477900a5b28829e2f009f68dffc044d51ba3d2c61c042b9  (Smokeloader)
[06] 8d40fb9983050026c86277d9443d384e1a1aee92582cc2e61415fa6a3a0b4c99  (Smokeloader)
[07] 065871459fa254daa362564b70ea4357bb197ef04cfee8de7426cfdf480e4a78 (Smokeloader)
[08] hxxp://185.35.137[.]147/mlp/ (Smokeloader C2)
[09] hxxp://jnanny2[.]pw/br/ (Smokeloader C2)
[10] hxxp://dogewareservice[.]ru/ (Smokeloader C2)
[11] hxxp://haxmall[.]in/s/ (Smokeloader C2)