Monday 22 June 2020

Comparative analysis between Bindiff and Diaphora - Patched Smokeloader Study Case

This article presents a comparative study case of diffing binaries using two technologies: Bindiff [1] and Diaphora [2]. We approached this topic in a Malware Analysis perspective by analyzing a (guess which malware family?) Smokeloader (! :D) campaign.

In August 2019, I spotted this campaign using patched samples of Smokeloader 2018 samples. This specific actor patched binaries to add new controllers URLs without needing to pay extra money (Smokeloader's seller charges extra-fee for C2 URL updates). This campaign was described in more detail in this previous article [3].

More details about the original samples [4][5] analyzed in this article can be found in the following tables:

 Filename:                        smokeloader_2018_unpatched.bin 
 33792 Bytes
 File type: PE32 executable (GUI) Intel 80386, for Microsoft Windows
 md5: 76d9c9d7a779005f6caeaa72dbdde445
 sha1: 34efc6312c7bff374563b1e429e2e29b5da119c2
 sha256: b61991e6b19229de40323d7e15e1b710a9e7f5fafe5d0ebdfc08918e373967d3  

 Filename:                        smokeloader_2018_patched.bin
 Size: 1202732 Bytes
 File type: PE32 executable (GUI) Intel 80386, for Microsoft Windows 
 md5: 7ba7a0d8d3e09be16291d5e7f37dcadb
 sha1: 933d532332c9d3c2e41f8871768e0b1c08aaed0c
 sha256: 6632e26a6970d8269a9d36594c07bc87d266d898bc7f99198ed081d9ff183b3f  

The following tables hold details about the unpacked code dumped from "explorer.exe" used in this article [6][7]. 

 Filename:                          explorer.exe.7e8e32c0.0x02ee0000-0x02ef3fff.dmp 
 Size: 81920 Bytes
 File type: data
 md5: 711c02bec678b9dace09bed151d4cedd
 sha1: 84d6b468fed7dd7a40a1eeba8bdc025e05538f3c
 sha256: 865c18d1dd13eaa77fabf2e03610e8eb405e2baa39bf68906d856af946e5ffe1  

 Filename:                       explorer.exe.7e8df030.0x00be0000-0x00bf3fff_patched.dmp
 Size: 81920 Bytes (yes, same size)
 File type: data
 md5: d8f23c399f8de9490e808d71d00763ef
 sha1: e1daad6cb696966c5ced8b7d6a2425ff249bf227
 sha256: 421482d292700639c27025db06a858aafee24d89737410571faf40d8dcb53288  

Summarizing the main changes implemented by this patch are: 
  • wipes out the code for decrypting C2 URLs;
  • replaces it with NOPs and hardcoded C2 URL string; and
  • preserves the original size of decryption function to not disrupt offsets;
Figure 01 and 02 presents the graph of the original code. Figure 01 is the code used for indexing a table of encrypted C2 URLs payloads. Figure 02 lists the code used for decrypting the C2 URLs. This function is called in other parts of the code (not only by the function shown in Figure 01) - this is why they are not merged in one function. We labeled functions (e.g. "__decrypt_C2_url" and "__decrypt_c2_algorithm") in this assembly code to make it easier to read. 

Figure 01 - Original code used for decrypting C2 URLs.

Figure 02 - Original encryption scheme used for decrypting C2 URLs.

Figure 03 and 04 shows the same functions on the patched version of Smokeloader. We can notice that the first function is the same as the unpatched version but the second function was replaced. This second function returns the address for the hardcoded URL in ECX. More details about how it works can be found in this article [3].

Figure 03 - Patched code used for decrypting C2 URLs.

Figure 04 - Patched code returning decrypted C2 URL string.

The next sections describe the output of Diaphora and Bindiff when diffing the samples above. 

.::[ Diffing using Diaphora ]

Diaphora is an Open Source binary diffing tool that uses SQLite as an intermediate representation for storing code and characteristics of reversed binaries [8]. It implements many diffing heuristics (strategies) directly on top of this database. The main advantage of this approach is that Diaphora is technology agnostic - this means that it does not depend on any reversing framework such as Ghidra, IDApro, or Binary Ninja

It can even compare reversing databases built up using one specific tool with projects using another tool. This characteristic facilitates collaboration among researchers. Another big advantage is that by using SQLite for describing its heuristics it makes the processing of adding a new heuristic as "simple" as writing a new SQL so more people can contribute to the growth of the project and more experimental heuristics can be quickly prototyped and verified. 

In the experiment described in this section, we used Diaphora version 2.0.2 (released in October 2019) and IDApro 7.5. Diaphora works as an IDApro Python script and can be easily executed by "File -> Script File" or "alt + F7". Figure 05 shows its main interface. 

Figure 05 - Diaphora main Dialog Interface

This interface is a little bit confusing at a first sight especially for users that did not go through the documentation before trying to use it. It expects the user to input both SQLite databases to be compared, boundaries (for the working database), and set up some checkboxes with options. 

First, we used Diaphora over the reference database (unpatched Smokeloader) for extracting characteristics and generating our reference Diaphora SQLite database. For doing this we just need to open the reference database on IDA, open Diaphora, and fill the "Export IDA database to SQLite" input field. Diaphora will export its SQLite database to the same base directory of the IDA working files. 

After executing Diaphora on our patched Smokeloader using out saved labeled database of Smokeloader 2018 as a reference in Diaphora we get four new tabs in IDA
  • Best Matches - common functions to both databases. The ones with 100% match ratio;
  • Partial Matches - all functions that are not Best matches and not unmatched; 
  • Unmatched in Primary - all functions in the first database that are not present in the second;
  • Unmatched in Secondary - all functions in the second database that are not present in the first;
Figure 06 shows the content of the "Best Matches" tab in our example. 

Figure 06 - Diaphora Best Matches tab

Each row shows function labels in primary and second databases, matching ratio (which goes from 0 to 1), amount of basic blocks in each database and information about which heuristic was used to compare both functions. Diaphora provides features to importing features (such as commends and function labels) from the reference database to the target database. Usually, you will want to copy all annotations from one database to another in this "Best Matches" tab. By checking this tab we can see that few core functions are kept intact in the patched version and we are facing two very similar applications. 

Figure 07 presents the "Partial Matches" functions.

Figure 07 - Partial Matches functions and our "__decrypt_C2_url" function right there with 0.978 matching ratio.

Diaphora provides very fine level diffing and most of these functions are basically unmatching constants. These matches are marked in yellow in the graph diff view. Figure 08 and 09 shows the "__set_file_attributes" function and its match in the primary database. We can clearly see that they actually are the same function. 

Figure 08 - Diffing "__set_file_attributes" function assembly view.

Figure 09 - Diffing "__set_file_attributes" function using graph view.

Figure 10 shows the relevant patch we are interested in the "__decrypt_C2_url" function. As we can see both functions are basically identical until the jump instruction. The function jumps to what I labeled as "__decrypt_C2_algorithm" and is used as a function in other places around the code. 

Figure 10 - "__decrypt_C2_url" graph diff pointed in the "Partial Matches" tab

What disappointed me a little bit was that Diaphora did not include the rest of the code of "__decrypt_C2_algorithm" in this function.  This move bumped up the matching ratio and this was misleading when prioritizing what to manually analyze. The "__decrypt_C2_algorithm" function shows up in the "Unmatched in Secondary" tab. This is a good thing as functions in this tab should be the priority in this kind of analysis. In our example, we got 18 functions (out of 52) to analyze marked in the "Unmatched in Secondary" tab. Figure 11 shows this tab and Figure 12 shows the graph view of this function.

Figure 11 - "Unmatched in Secondary" tab and patched "__decrypted_C2_algorithm" function

Figure 12 - Patched version of "__decrypt_C2_algorithm" function

The nicest thing about using Diaphora is that all this analysis could be easily shared by sharing compressed SQLite databases around. So that is it for the Diaphora side. 

.:: [ Diffing using BinDiff ]

BinDiff is an executable-comparison tool created by Zynamics [9] (called SABRE in earlier days) in 2007. Zynamics was acquired by Google in 2011 and BinDiff became freeware in 2016 [10][11]. BinDiff is a plugin of IDAPro and works directly over IDBs (IDA pro Database). Because of this design, BinDiff requires users to have an IDA license (#notcool). BinDiff 6.0 supports also Ghidra using this extra plugin called BinExport [12] which implements something similar to Diaphora's design.

OALabs released a very didactic video tutorial on BinDiff [13]. This video also teaches how to install BinDiff. We used BinDiff 6.0 and IDAPro 7.5 in this experiment. BinDiff adds a new option to the "File" menu in IDA and to compare two executable is as easy as opening a new IDB on top of the current one. Figure 13 shows the BinDiff option in IDA

Figure 13 - BinDiff option in IDAPro

After loading an annotated IDA database using BinDiff, it will add 4 new tabs:
  • Primary Unmatched - functions in the primary database that did not match any function in the secondary database;
  • Secondary Unmatched - functions in the secondary database that did not match any function in the primary database;
  • Statistics - general similarity information about both executables;
  • Matched Functions - all functions with matches and their respective similarity index. 
The two first tabs hold the same information as their correlated tabs in Diaphora. Statistics tab provides high-level information about the matching process. Information in this tab can be used for quick knowing if the binary is a variation of the reference database. Figure 13 shows the data presented in the Statistics tab after loading our reference Smokeloader database against the patched one using BinDiff

Figure 14 - Statistics tab and confidence and similarity indexes

BinDiff calculated a similarity index of 95% (with 99% confidence). This means that we are likely dealing with two versions of the same software. The other metrics are more about counters and general information about both databases. 

Figure 14 shows the Matched Functions tab and the similarity index for each match. It is also possible to see the heuristic used for each match. 

Figure 15 - Matched Functions tab

BinDiff managed to match 51 out of 52 functions. This is a very good result. Besides that Bindiff managed to find out the two most affected functions by the patch: (i) "__decrypt_C2_algorithm" and (ii) "__decrypt_C2_url". It is possible to zoom in and visualize changes in graphs of the function matched. 

Figure 16 shows changes in function "__decrypt_C2_url" (73% of similarity and 97% of confidence).

Figure 16 - Changes in function "__decrypt_C2_url"

As we can see, BinDiff hits the bullseye and detects exactly the changes without splitting the function into two parts. The way BinDiff organizes its graphs makes changes really easy to visualize. 

Bindiff also matches the "__decrypt_C2_algorithm" function with some other function located at "0x00BE19A5" using the "loop count" heuristic but the similarity index is only 19% and confidence is 27%. This means that Bindiff matched the "__decrypted_C2_algorithm" function, which was wiped out with the patch, with some other random function. I don't even consider this a false-positive as results clearly state that this match has low confidence. It is useful to get a spotlight pointed to this function - this for sure is a good place to start an analysis in this specific scenario. 

.:: [ Conclusions ]

For the specific malware analysis problem discussed in this article, we definitely got better results using BinDiff than Diaphora. I also feel that all fine program analysis used in heuristics applied by BinDiff makes a big difference in the final result. 

In terms of general design (not taking in consideration heuristics), I think Diaphora has more advantages than Bindiff, because of its intermediate representation using an open specification format as SQLite and SQL for modeling heuristics. Diaphora is also Open Source so there is a task force sustained by a community in order to improve heuristics and this is the way to go IMHO

Diaphora also takes boundaries as parameters and this can be very useful in case of analyzing big databases especially when analyzing patches for vulnerability development purposes. In line with that, this article focuses on malware analysis but these same technologies could also be used for analyzing vulnerabilities and its patches - ammunition to another blog post. 

.:: [ References ] 

[8] (BSides Joxean)
[13] (BinDiff OALabs)

Wednesday 10 June 2020

Unpacking Smokeloader and Reconstructing PE Programatically using LIEF

This article holds notes on my experience unpacking a Smokeloader 2020 sample. The unpacked payload is further used for composing a valid PE file. The outcome is a PE32 executable containing clean code ready for reversing. 

First things first, here is the sample used in this research:

 Size308.17 KB (315568 bytes) 
 TypePE32 executable for MS Windows (GUI) Intel 80386 32-bit 
 First seen                     2020-03-06 21:45:11 
 md5 c067e0a2d7fc6092bb77abc7f7156b60
 sha256   25959cfe4619126ab554d3111b875218f1dbfadd79eed1ed0f6a8c1900fa36e0 

You can find it in VirusTotal [1]. 

This sample does regular already documented Smokeloader checks before unpacking the main payload, such as: 
  • checks if the process is running in the context of a debugger using "kernel32.isDebuggerPresent" function [2];
  • makes a copy of ntdll.dll, loads it and uses it instead. This technique helps to evade some sandboxes and has been described already in this article here [3];
  • looks for specific patterns in registry keys to check if the sample is running under a virtualised environment.
It also performs a small profiling of the hosting machine in order to decide which payload to inject. Smokeloader has specific code for both main architectures x86 and x64. In this article, we gonna unpack the x86 payload of the above mentioned sample. 

Smokeloader has been using various techniques to inject its final payload into the user file management process "explorer.exe". The sample analysed uses RtlCreateUserThread approach in order to copy the final payload to the targeted process. This injection method is better described in this Endgame/Elastic article [4].

So our game plan is: 
  1. pause execution before the unpacked payload is executed by "explorer.exe";
  2. transplant this code to a dummy PE shell;
  3. fix PE header values and section boundaries;
  4. patching Smokeloader code preamble;
  5. test unpacked Smokeloader PE;
  6. how to do all this programatically using LIEF [5].

Smokeloader 1st stage decompresses its payload using ntdll.RtlDecompressBuffer [6] after few anti-analysis checks described above. It does not call this function from the initially loaded ntdll.dll but from a copy of it loaded afterwards. So breakpoints should be set after the binary loads the copy of ntdll.dll. Figure 01 presents a screenshot of this specific code IDA.

Figure 01: Smokeloader first stage decompression code

This code allocates a buffer with 0x2D000 bytes using ZwAllocateVirtualMemory which stores the main decompressed payload [7]. This code is still transformed before being injected into "explorer.exe". The following steps are performed during injection:
  • fetches explorer.exe PID by calling GetShellWindow and GetWindowThreadProcessId;
  • sections and maps are created in the current and remote processes using ZwCreateSection and ZwMapViewOfSection;
  • main payload is copied to local section and reflected in the remote section;
  • data section is created in the remote process for holding parameters and dynamically created Import Table;
  • A new thread is created in the remote process by invoking RtlCreateUserThread.
So, at this point, you could ask me: what is the relevance of describing all these call names to the final goal of this article? the answer is: so you can reproduce exactly what I'm describing in here. :D

Next step is setting up a break point in RtlCreateUserThread (from the copy of ntdll.dll) and dump the final payload. It is also necessary to take note of few important addresses: (i) entry point of the thread created in the remote processes and (ii) base addresses for injected code in virtual process.

Figure 02 shows a screenshot of IDApro showing the call to RtlCreateUserThread (where we should pause the execution).

Figure 02: Call to RtlCreateUserThread after injecting code into remote process

By stoping the execution on this call we can collect all data we need to move on to the next step:

 Base address code                    0x02060000
 Base address data  0x00B60000 
 Data payload  a01751fb6eb3f19d9b010818bbecc23c  [8]
 Code payload  2547231b4ae82ea9e395fb0c8a308982 [9] 
 Code entry point  0x02061734 

Code payload is the final unpacked Smokeloader code adjusted to run on Virtual Address with base equal to 0x02060000. The created thread receives the base address of the data segment (0x00B60000) as parameter ("StartParameter" parameter of "RtlCreateUserThread" call). 

Smokeloader loads all resources necessary to its execution dynamically. This article here [10] describes how Smokeloader builds up its import table and how to prepare patch an IDB to overcome this technique before starting reversing. So this main payload does not need any specific setup of imports

In this section we will use 010 Hex Editor [11] to transplant a PE header from a random executable. 010 Hex Editor has a PE format template [12]. Although any other valid PE32 binary could be used in this experiment, we used a PE header extracted from an executable listed in this Sotirov's blog post [13][14]. 

Smokeloader code payload has 0x1000 null bytes at offset zero, so we copied the first 0x1000 bytes containing the PE header from tinype.exe to this region. 

Coincidently, .text section will be already pointing to the beginning of the our payload at offset 0x1000. Probably the malware author just wiped out the PE header before creating the payload and left the null bytes there. Next step is to paste all 20480 bytes (0x5000) of our data payload in the offset 0x4400.

Figure 03 shows the new layout of our binary containing the PE header in the beginning followed by 0x3400 bytes of code payload (at offset 0x1000) and finally 0x5000 bytes of data from our data payload (at offset 0x4400).

Figure 03: Initial layout of new handcrafted PE binary

It is time to adjust our implanted PE header manually using 010 Hex Editor. At this point, all fields in this header are still set up according "tinype.exe". From now on, we gonna use this schematic as reference to PE header internal structures [15].

The first adjustment is to change the number of sections to 2 for holding code and data. This field is located in the "COFFHeader.NumberOfSections". Now our binary will list only 2 sections named ".text" and ".rdata" we can rename this second one to ".data" by changing "SectionHeaders[1].Name". 

Next step is make sure that both sections have correct permissions. "SectionHeaders[0].Characteristics" (".text") should have CODE, EXECUTE and READ flags active and "SectionHeaders[1].Characteristics" (".data") should have the INITIALIZED_DATA, READ and WRITE flags active. Still on SectionHeaders, we can setup the bounds and virtual addresses. "SectionHeaders[0].SizeOfRawData" should be set to 0x3400 (13312 Bytes), "SectionHeaders[0].PointerToRawData" should be set to "0x1000" and finally  "SectionHeaders[0].VirtualAddress" should be set to 0x1000. For "SectionHeaders[1]" (".data") we gonna set "SizeOfRawData" to 0x5000, "PointerToRawData" to 0x4400 and "VirtualAddress" to 0x5000. These changes means that these sections will be mapped in memory in base_address (defined in the OptionalHeader) shifted by each section Virtual Addresses offset. There is an Union inside these section headers called "PhysicalAddress" and "VirtualSize", these fields should hold the same value as "SizeofRawData".

Figure 04 shows a diagram of a Section header. Each section in the binary has an instance of this header associated to it. 

Figure 04: PE Section Header

Now we need to adjust few fields in the Optional Header. In this header we will need to change the following fields: 

 ImageBase Virtual Address where binary will be mapped 0x02060000
 SizeOfCode  size of .text section  0x3400 bytes
 SizeOfInitializedData  size of .text and .data sections together  0x8400 bytes 
 AddressOfEntryPoint offset of the entry point code 0x1734
 BaseOfCode .text section Virtual Address 0x1000
 BaseOfData .data section Virtual Address 0x5000
 SectionAlignment Virtual Addresses have to be multiple of this value  0x1000
 FileAlignment file offsets have to be multiple of this value 0x200
 SizeOfImage total size of binary headers + sections 0x9400
 Checksum  PE file checksum - use PE Explorer [16] or Hiew [17] to calculate this value   --------

"ImageBase" has to match the base of the code section we dumped from "explorer.exe" (0x02060000). As we will not export or import anything all "Data Directories" inside the Optional Header can be zeroed as well.

Summarising the whole process: 
  1. Transplanting PE header from a dummy PE;
  2. Fix sections sizes, boundaries, permissions and Virtual Addresses in SectionHeaders;
  3. Setup section contents;
  4. Setup Optional Header fields;
  5. Setup PE checksum;
Here is the version of our binary after following up all steps described above [18]. This binary is a valid executable and we can load it in any debugger or disassembly but we still need to change one last thing before call it a valid unpacked Smokeloader sample. 

Figure 05 shows our reconstructed PE loaded in IDApro paused on the correct Entry Point. 

Figure 05 - Reconstructed PE paused on Entry Point

We can notice that the entry point function receives an argument (0x02061737) and loads it into ECX and then calls another function located in 0x02061743 which is just below the current function. This argument is the address of the data segment. This data segment will be used for various tasks during Smokeloader execution including holding the dynamically created import table. 

If we execute this file without a valid value in ECX it will break when the main payload tries to write into the data segment (invalid address in ECX). Figure 06 shows what happens when we try to execute our binary the way it is right now. 

Figure 06 - Access Violation exception when executing unpatched reconstructed binary

The plan now is to patch this binary to load the correct address of the data segment into ECX before calling "sub_2061743". Since both functions are consecutive and the function on top does not do much - we gonna replace all 15 bytes of this function (0x02061743 - 0x02061734). Figure 07 shows the new patched code. 

Figure 07 - Code after patching

In this new code the entry point remains the same. We can see that we loaded ECX with the address of the data segment by using the push and pop instructions and then we filled the rest of the remaining bytes with NOP (0x90). We can see the beginning of the second function at the same address as before (0x02061743). Of course there are many ways to achieve this same result but this was the simplest approach we could think of. 

The final step is to update the PE checksum field inside the Optional Header again and we will have a fully unpacked Smokeloader sample. Here are the last version of our reconstructed binary [19]:

 File name                         new_pe_patched.bin
 File type  PE32 executable for MS Windows (console) Intel 80386 32-bit
 Size  37888 bytes
 md5  f401109ae24aaf47dce75266ffc049f8
 sha1  49e7ed68b9569e0e987da71b3c678974d8ed7c81
 sha256  cd42f017913034d527d90a84feebcde015e714baa03714c83f80608555e52386 

For testing our branding new reconstructed PE we ran it into Cuckoo sandbox [20] to analyse its behaviour [21]. As we can see in figure 08 and 09, the binary was executed properly and we got it checking in and contacting its controllers. 

Figure 08 - Reconstructed sample connecting back to Controllers

Figure 09 - List of API calls intercepted by Cuckoo

As we can see we got the sample connecting back to three controller URLs and many pages of intercepted API calls in the behavioural analysis. This is an indication that our unpacked and reconstructed Smokeloader sample is functional. 

So far we described how to unpack Smokeloader main payload and how to manually reconstruct a valid PE file out of this. Now we will automate what we did manually by transplanting a PE header from a dummy binary ("tinype.exe"). The Python library used in this experiment is LIEF [5]. We extended this example called "Create a PE from scratch" they have in their official documentation [22].

The following code does exactly the same as the manual approach but using LIEF.

The final binary generated by LIEF is:

 File name                         unpacked_smokeloader.exe
 File type  PE32 executable for MS Windows (console) Intel 80386 32-bit
 Size  34816 bytes
 md5  a0aebc61bc89208be0585eca4d1ed00c
 sha1  ea2f3c914dec6bb36832abc313b3fce826cdecb0
 sha256  0247de510507792fcbf425fab9dbbc2f067c25dc7e4e80a958d1ebfb0505f6e6 

We uploaded it for testing to Virustotal [23] and CAPE sandbox [24] and is a valid unpacked Smokeloader PE32 executable.


Wednesday 18 December 2019

Inline Loop Detection for Compressing API Call Traces

I have been working on a solution for compressing files containing trace of API calls coming out of a sandbox (Cuckoo sandbox [1]). This file holds events generated by a monitor module (DLL) which is injected into processes and hooks a pre-configured set of API calls. The main goal of this log is to describe interaction between processes and Operating System (Behavioural data).

This file can grow in size (memory demanding), complexity (CPU demanding) and noise (difficult to read) depending on two main factors: 
  • how long an analysed process is executed; and 
  • presence of specific control flow structures (e.g. loops).
This article proposes two methods for detecting looping patterns. The outcome of these methods is further used for compressing the above described sandbox log files without changing its structure.

Detecting loops by observing its execution is a complex task since not always a well defined pattern emerges. This article uses the terms "repeated element" and "loop body" to identify a content inside a loop statement. This element can be composed by a single call (repeated multiple times), multiple sequential calls and nested calls; its execution output can also be affected by control flow elements such as conditionals.

Repeated elements can also be affected by Pseudo-Random Number Generators. In this case, the final log presents more complex patterns. Algorithms described in this article do not consider this less common scenario and target detection of more general looping patterns.

.::[ Context and Example

Python-like pseudo-code snips are presented along this article for illustrating the proposed techniques. The following code snip describes a loop executing calls to functions labeled from 1 ("call_001") to 5 ("call_005"):

The body of the loop statement (line 2) executes functions "002" and "0035 times and function "004" every time "i" is even. The behavioural log produced by executing the script above would be:

Each entry in this log contains a position (line number "line[x]") and an identifier ("c_00[x]") to a call. A new entry is added to this file every time a function is executed. Taking a closer look, it is possible to observe a repetition pattern located between lines 2 and 14. In this specific example, these lines describes the execution of a loop and could be compressed to the following form:

The compressed output above has the following properties:
  • log size was reduced in 3 times (66% compression rate);
  • preserved the original order that functions are executed; and 
  • it is easier to read;
Next sessions describe ideas to reach the compression exemplified above. The first idea uses Suffix Trees [2] and Non-Overlapping Longest Repeated Subset [3] and the second uses N-Gram [4].

.::[ Method 001: Non-Overlapping Longest Repeated Substring

The first idea consists in using Non-Overlapping Longest Repeated Substring [5][6] which is a well stablished algorithm with complexity equal to O(n^2). This algorithm can be applied to our specific problem for finding repeated subsets of entries which represent repeated calls (and possible loop). Each entry in the log file must be mapped to a checksum and this value should represent a setup to a API call (position, parameters, function name etc).

The following pseudo-code shows a simplified representation of this algorithm.

The first part of this code (lines 1 to 4) creates a vector named "data" containing checksums calculated from each log entry (we call these checksums "primitive" tokens). After that, the algorithm runs in a loop until "calculate_lrs()" returns an empty value. "replace_lrs()" function replaces the all instances of LRS (Longest Repeated Substring) vector within "data" with a token. This token and its respective LRS vector are saved in a translation map (dictionary named "map"). Next step is reducing repeated tokens and count how many times it repeats (line 14). At the end, tokens are translated back to vectors by using the translation map (line 15). 

The second "while" listed in line 10 detects repetition patterns inside LRS vectors (this creates recursion). Next code snip shows a use case for this scenario.

In this case the vector "[2, 2]" would be replaced with a unique token in the original data vector and the main loop moves to its next iteration using this transformed vector. 

The final output for the input vector above would be "[(0, 2), (1, 2), (2, 8)]". The first element of each set is a primitive token and the second element is the amount of repetitions.

The code below is a Python implementation of the logic described above. 

The output to the code above is:

Log entries (line 1) were compressed (line 2) and the information of repetition was appended. This algorithm could be applied in a real scenario for compressing behavioural logs in a very efficient way in terms of compression but not really in terms of computational resources usage. During our experiments with real world behavioural logs, we could detect two critical drawbacks of this approach:
  • memory consumption - as all log entries have to be loaded in memory in order to run the "find_repeated_substring()" function; and
  • processing time - the complexity is not that bad O(n^2) but the algorithm is called many times and this impacts the overall efficiency of the script.
This method can be used in a scenario with many small/medium size log files when maximum efficiency compression is a hard-requirement. 

These limitations pushed this research to experiment other approaches. Since the log files are relatively big and loops usually perform small and concise block of operations so the full coverage (to all sizes of looping body) is not really necessary. 

.::[ Method 002: N-Gram

N-Gram is a contiguous sequence of "n" items from a given sample [4]. This concept can be used for representing repeated patterns and could be used in a solution to the problem investigated in this article.

Since human-generated loop contents are limited in size, the solution proposed in this section can be bounded according to a parameter. This parameter is named "Compressor Level" in this article and holds a value (positive Integer) representing the maximum size to a loop body (repeated pattern). This determines how many pattern detectors the algorithm will create with size varying from 1 to "Compressor Level".

The following pseudo-code shows the idea described in this section:

This algorithm first pre-initialise a list of compressors (this component will be better explained later) using the above mentioned "Compressor Level" parameter. This "for" loop (line 2) creates multiple compressors containing N-Grams with "N" ranging from 1 to a positive integer passed as parameter ("i" on line 3). 

The loop listed from lines 5 to 10 connects Compressors. Elements are added to the smallest Compressor and its output is added to the next one in line with it. This cycle repeats until the algorithm reaches the last Compressor.

A buffer in a compressor acts like a queue (first in first out) of size equal to "level" times 2. This buffer is incremented and if buffer is full an analysis is executed. This analysis simply checks if the first half of buffer is the same as the second. If the answer to this check is positive then the buffer is compressed eliminating the repetition. 

The following code is a Python implementation of the method described in this section.

The next snip shows the output of the code above.

This approach is able to detect loop patterns and compresses them dynamically (inline) without using big chunks of memory. This algorithm could be implemented in a solution for compressing behavioural logs and optimising resources usage in a Sandbox.

.::[ Conclusions

The algorithm described in this article can be used for identifying repetition patterns in any kind of data series. We successfully used it for compressing behavioural log files coming out from a Sandbox used for analysing malware. The main benefits of using the algorithm described in this paper were:
  • Logs got compressed;
  • Defeated malware strategies to DOS this kind of analysis;
  • Behavioural report became easier to read;
  • Since the amount of data reduced, searches got faster;
  • Saved memory and disk space; and
  • It became easier to visually identify loops;
Next step is to move this code inside Cuckoo Sandbox [1] and check how effective it compresses "tasks" structures (this is how Cuckoo calls each sandboxing session) before importing them into Mongo just by compressing behavioural logs (of all processes).

Here goes some screenshots from our preliminary tests using Cuckoo Sandbox CAPE [7]. We developed this small PE sample [8][9] which runs a loop going through few calls just for testing our approach:

Figure 01: Number of pages of API call entries BEFORE compression
Figure 02: Number of pages ofAPI call entries AFTER compression

Figure 03: Behavioural analysis report section BEFORE compression

Figure 04: Behavioural analysis report section AFTER compression
Figure 05: Compressed BSON sizes in disk
The final BSONs (compressed JSON) got 3 times smaller after compressed this impacts the load of Mongo, size of final JSON report and speed of searches using the UI.

Thursday 31 October 2019

Dynamic Imports and Working Around Indirect Calls - Smokeloader Study Case

When reversing malware it is common to find an injected payload loading references to external resources (DLL functions). This happens for two main reasons:
  1. The hosting process does not have all resources necessary to the execution of the injected payload; 
  2. Making reversing engineering the malware trickier since the dumped segment will have all calls pointing to a meaningless address table. 
This article explains how to revert this trick and get back API call names annotations in an IDApro database. A sample of Smokeloader was used for illustrating the ideas described in this post.

This article is divided in three main parts:
  1. Explaining the observed technique;
  2. How it works; and
  3. How to circumventing it in order to facilitate reversing. 
First of all, shout out to Sergei Frankoff from Open Analysis for this amazing video tutorial on this same topic which inspired me to write about my analyses. Regards also to Mark Lim who also wrote a very interesting article about labelling indirect calls in 2018. His article uses structures instead of patching the code (which is also a good approach) but I think it lacks important details and I will try to cover these points in here.

Examples presented in this article were extracted from the following Smokeloader sample:

Filename:   p0n36i2d.exe
MD5:         a8cc396b6f5568e94f28ca3381c7f9df
SHA1:       12948e36584e1677e80f78b8cc5c20576024c13f
SHA256:   17b548f9c8077f8ba66b70d55c383f87f92676520e2749850e555abb4d5f80a5
Size:           215.5 KB (220672 bytes)
Type:          PE32 executable for MS Windows (GUI) Intel 80386 32-bit

Explaining what is going on in the first stage (packer/crypter) is out of scope; this article focuses on characteristics found in the final payload. This sample injects the main payload in "explorer.exe" as it is possible to observe in this AnyRun sandbox analysis.

Figure 01 shows how the code looks immediately after the execution control passes to the injected code.

Figure 01 - Smokeloader's final payload.
Three points were marked in this code snip (1, 2 and 3).  The first point (1) is the call to the main function (located at 0x002F1853). This function expects to receive an address through ECX register. This address points to a data segment where all temporary structures will be stored. 

The third point (3) is an indirect call to an address stored in register ESI plus offset 0xEAE. The debugger was not able to resolve this address since the "memory segment" pointed by ESI is not set at this point of the execution (Instruction Pointer pointing to 0x002F1844). This pattern usually is an indicator that this code will dynamically resolve and import external resources to a specific address table (in this case stored in what we called "data segment"). This is an interesting technique because this table can be moved around by changing the address stored in ESI as long as offsets are preserved. In this code ESI is set to "0x002E0000" which is the address of a read-and-write memory segment created during the first stage. Figure 02 shows the region pointed by the offset 0xEAE which is empty at this point of the execution.

Figure 02 - Address pointed by the indirect call.
The second point (2) marks a function call immediately before the indirect call (3).  This is a strong indicator that the code for creating the address table must be somewhere inside this function. The address located in "002E0EAE" will be filled with pointers to the expected API function. Figure 3 shows this same memory region after the "__load_libraries" function is executed.

Figure 03 - Address pointed by the indirect call is filled after the "__load_libraries" function is called
x32dbg has a memory dump visualisation mode called "Address" which will list every function pointed to each address loaded in the call table we just described. 

Figure 04 - Resolved address in call table
Figure 04 shows that the position pointed by the indirected call listed in point (3) points to function "sleep" inside "kernel32.dll". Basically this call table is an Array of unsigned integers (4 bytes) containing an address pointing to an API call in each position. 

The "__load_library" function is responsible for creating this "call table" so the focus of this article will move to understand how it works.

--- End of part I ---

Figure 05 - "__load_libraries" zoomed out CFG representation.
Figure 05 shows an overview of the "__load_library" function created by IDA. This function is quite large and performs few connected steps which we need to go through in order to fully understanding its behaviour. This function can be divided in three main sections:
  1. Code responsible for finding the base addresses for core libraries;
  2. Code responsible for loading addresses for calls within code libraries;
  3. The last section is responsible for loading other libraries necessary for executing the malware.
Figure 06 presents the first part of the "__load_libraries" function. In its preamble the code navigates through the TEB (Thread Environment Block) and loads 4 bytes from offset 0x30 into register EAX. This address contains the address of the PEB (Process Environment Block). Next step is to get the location for the "PEB_LDR_DATA" structure which is located in offset 0xC. This structure contains a linked list containing information about all modules (DLLs) loaded by a specific process. 

Figure 06 - first section of the "__load_libraries" function.
The code accesses the offset 0xC in the "PEB_LDR_DATA" structure which contains the head element for the loaded modules in the order they were loaded by the process. Each element in this linked list is a combination of "_LDR_DATA_TABLE_ENTRY" and "_LIST_ENTRY" structures. This structure has an entry to the base name of the module in the offset 0x30. Figure 07 summarises all this "structure maze" used in order to fetch loaded module names (excusez-moi for my paint brush skills :D).

Figure 07 - Path through the process internal structures to get loaded DLL names and base addresses
The main loop, beginning at "loc_2F189F" (Figure 06), goes through all modules loaded by the "explorer.exe" process. This algorithm fetches the module name and calculates a hash out of it. The second smaller looping located at "loc_2F18AB" (Figure 06) is the part of the code responsible for calculating this hash. Figure 08 shows the reversed code for this hashing algorithm. 

Figure 08 - Reversed hashing algorithm used in the first part of the analysed code
Moving forward, after calculating a hash the algorithm does a XOR operation with a hardcoded value 0x25A56A90 and this value is compared with two hardcoded hashes: 0x4C5DACBC (kernel32.dll) and 0x7FA40424 (ntdll.dll). The base addresses of each DLL are stored in two global variables located in the following addresses [ESI+0x1036] and [ESI+0x103A]

Bonus: these hardcoded hashes can be used for detecting this specific version of Smokeloader

Summarising, this first part of the code is responsible by finding the base address of two core libraries in MS Windows ("ntdll.dll" and "kernel32.dll"). These addresses will be used for fetching resources necessary for loading all other libraries required by the malware. 

Figure 09 shows the second section of "__load_libraries". This figure shows the code with some functions names already figured out in order to make it more didactic. 

Figure 09 - second section of the "__load_libraries" function.
The first two basic blocks checks if the function was able to find "ntdll.dll" and "kernel32.dll" base addresses. If these modules are available then the "__load_procs_from_module" function is invoked for filling the call table. This function receives 4 parameters and does not follow the standard C calling convention. Two parameters are passed through the stack and the other two through registers (ECX and EDX).  This function expects a DLL base address in EDX, the data segment in ECX, an address to a list of unsigned ints (api calls hashes) and a destination address (where the calls addresses will be stored). The last two parameters are pushed in the stack. 

Figure 10 shows the hardcoded hashes passed as parameter to "__load_procs_from_module" function. This list will be used to determine which procedures will be loaded in the call table. 

Figure 10 - Array of hashes of "ntdll.dll" function names
Next step is to take a look inside "__load_procs_from_module" function. Figure 11 shows the code for this function. Parameters and functions were named to facilitate the understanding of this code.

Figure 11 - Code for "__load_procs_from_modules" function
This function iterates over a list of 4 bytes hashes received as parameter. Each element is XORed with a hardcoded value (0x25A56A90) and passed to the function "__get_proc_address" together with a base address of a library.  This function iterates over all procedures names exported by a DLL, calculates a hash and compares it with the hash received as parameter. If it finds a match, "__get_proc_address" returns an address for the specific function. 

Lets take a closer look inside "__get_proc_address" to figure out how it navigates through the loaded DLL. Figure 12 shows a snip of the code for this function.

Figure 12 - Code for "__get_proc_address" function.
The preamble of the function fetches the address for the PE header by accessing offset 0x3C in the DLL base address. Next step it fetches the relative virtual address (RVA) for the export directory at offset 0x78 of the PE header. From the Export Directory structure this function fetches the following fields: NumberOfNames (offset 0x18), AddressOfNames (offset 0x20) and AddressOfNameOrdinals (offset 0x24). References for all these structures can be found in the Corkami Windows Executable format overview.

After loading information about the exports the code will iterates through the list of function names and calculates a 4 bytes hash by calling the "__hashing" function (same algorithm described in Figure 08). If the output of the "__hashing" function matches the hardcoded hash then the ordinal for that function is saved and the address related to that ordinal is returned.

Figure 13 shows a code in Python that reproduces the above mentioned comparison algorithm using hardcoded hashes extracted from memory (Figure 10) and all function names exported by ntdll.dll

Figure 13 - Reversing outcome for code responsible by resolving "ntdll.dll" hardcoded hashes
This code produces the following output:

Finally, these addresses are used for filling the call table which will be referenced by indirect calls in the main payload. It is possible to confirm that what was described so far is true by observing the function addresses written in the data segment after executing the second section of "__load_libraries". Figure 14 shows the part of the call table filled so far with the expected "ntdll.dll" calls.
Figure 14 - Segment of Smokeloader's dynamically generated call table
The last segment of the "__load_libraries" function de-obfuscates the remain libraries names and load them by using the same resources used for loading "ntdll" and "kernel32". The libraries loaded by Smokeloader are: "user32", "advapi32", "urlmon", "ole32", "winhttp", "ws2_32", "dnsapi" and "shell32".

Now that the whole process of creating the call table used by the indirect calls is described, next step will get into fixing the memory containing the main payload by using IDA Python.

--- End of part II ---

When the main payload of Smokeloader is imported into IDApro it is possible to see code containing indirect calls which uses a base address stored in a register plus an offset. Figure 15 presents a snip of the main payload containing such indirect calls.

Figure 15 - Indirect calls calling functions pointed at the dynamic generated calls table.
This characteristic makes the processing of reversing this code harder since the interaction with other resources in the Operating System is not clear as all external calls is not explicit. The goal in this part of the article is to patch these calls for pointing to addresses we going to map and label (using IDA Python).  The code below implements the change we want. 

This code performs the following actions into our IDB:
  1. Reads a memory dump of the data segment of an executing Smokeloader binary (line 106);
  2. Creates a DATA segment mapped into 0x00000000 (line 107).
  3. Loads the dumped data segment from the running sample into this new segment (line 35);
  4. Imports API names extracted from x32dbg to specific positions in the new data segment (line 112); 
  5. Patches all indirect call instructions (opcode 55 9X) to direct call instructions (line 51).
Figure 16 shows the code listed after executing the script above. As we can see, all indirect calls were translated to direct calls to a labeled table located in the freshly created data segment starting at address 0x00000000.

Figure 16 - Patched code with calls containing meaningful labels.
Just heads up for preventing people against messing up research IDBs: for obvious reasons (different instruction sets) the script above can not be used for patching 64 bits Smokeloader IDBs but it could be easily adapted to do the same task. 

--- End of part III ---

That's all folks! 

The ideas described in this article can be extended and used to analyse any other malware families dynamically importing libraries and using indirect calls. Another thing cool for experimenting in future would be write a script which loads DLLs and extracts labels statically by using the reversed "__hashing" function and native functionalities in IDA for mapping DLLs in the process address space.