[Tool] Messing Around with Gepetto

[Tool] Messing Around with Gepetto

AI-Assisted Reversing Engineering in 2026

Recently, I discovered this IDAPro plugin called Gepetto [1]. It connects IDA to LLMs and assists in annotating disassembled code interactively directly from the UI.

You can simply right-click on a decompiled function’s name and ask Gepetto to explain its behavior.

This article lists few examples (using Anthropic’s model Claude 4.5 Sonnet) of annotations provided by Gepetto while co-reversing a sample that I had fully analyzed some time ago. I found it quite accurate and would like to share some thoughts.

Cryptography

Gepetto is quite straight forward and accurate when identifying crypto and variants of popular crypto algorithms. This capability can save time of trained analysts and level-up beginner reversers (by training them in common patterns).

Custom Hashing

Gepetto accurately identified a custom hashing function and described its sub-operations, such as converting ASCII characters to lowercase and performing accumulative XOR.

// ----- Comment generated by Gepetto -----
//
// This function computes a custom hash value from a null-terminated string. It
// iterates through each byte, performing a bitwise AND with 0xDF (which converts
// lowercase ASCII letters to uppercase, treating the input case-insensitively),
// then combines this with the accumulated hash using XOR, a left rotation by 8
// bits, and addition. The function returns a 32-bit hash value derived from the
// string's case-normalized characters
.
//
// Proposed name:
// compute_case_insensitive_hash

__int64 __fastcall _hashing(__int64 a1, __int64 a2, __int64 a3, _BYTE *a4)
{
  char v4; // al
  _BYTE *v5; // r8
  unsigned int v6; // edx

  v4 = *a4;
  v5 = a4;
  v6 = 0;
  while ( v4 )
  {
    v6 = (v4 & 0xDF) + __ROL4__(v6 ^ v4 & 0xDF, 8);
    v4 = *++v5;
  }
  return v6;
}

For this case, I also asked Gepetto to generate Python code. This feature can be quite useful when writing configuration extractors and bot emulators. Gepetto was able to completed the task successfully.

def _hashing(input_string):
    """
    Computes a custom hash value from a string.

    This function iterates through each byte, performing a bitwise AND with 0xDF
    (which converts lowercase ASCII letters to uppercase, treating the input 
    case-insensitively), then combines this with the accumulated hash using XOR,
    a left rotation by 8 bits, and addition.

    Args:
        input_string: A string or bytes object to hash

    Returns:
        A 32-bit hash value (unsigned integer)
    """
    def rol32(value, shift):
        """Rotate left operation for 32-bit values"""
        value &= 0xFFFFFFFF  # Ensure 32-bit
        shift %= 32  # Normalize shift amount
        return ((value << shift) | (value >> (32 - shift))) & 0xFFFFFFFF

    # Convert string to bytes if necessary
    if isinstance(input_string, str):
        data = input_string.encode('latin-1')  # or 'utf-8' depending on context
    else:
        data = input_string

    hash_value = 0

    for byte in data:
        normalized = byte & 0xDF  # Convert lowercase to uppercase (for ASCII letters)
        hash_value = (normalized + rol32(hash_value ^ normalized, 8)) & 0xFFFFFFFF

    return hash_value

RC4

RC4 is a quite popular algorithm among malware developers. Experienced reversers would find identifying this code straightforward. Gepetto was able to describe its behavior accurately.

// ----- Comment generated by Gepetto -----
//
// This function implements the RC4 stream cipher algorithm. It first initializes a
// 256-byte state array with values 0-255, then performs the key-scheduling
// algorithm (KSA) using the key provided in parameter a3 of length a6 to permute
// the state array. Finally, it executes the pseudo-random generation algorithm
// (PRGA) to generate a keystream that is XORed with the input data pointed to by
// a4 for a5 bytes, performing in-place encryption or decryption. The function
// modifies the buffer at a4 as a side effect and returns the last generated
// keystream byte.
//
// Proposed name: rc4_crypt

char __fastcall sub_2DA5168(__int64 a1, __int64 a2, __int64 a3, _BYTE *a4, unsigned int a5, unsigned __int8 a6)
{
  __int64 v6; // r10
  __int64 v8; // rbx
  _BYTE *v11; // r11
  unsigned int i; // eax
  __int64 v13; // r14
  unsigned int v14; // r9d
  char *v15; // r11
  char v16; // r8
  unsigned int v17; // edx
  __int64 v18; // rax
  __int64 v19; // r9
  __int64 v20; // r11
  char v21; // dl
  _BYTE v23[256]; // [rsp+0h] [rbp-108h] BYREF

  LOBYTE(v6) = 0;
  v8 = a5;
  v11 = v23;
  for ( i = 0; i < 0x100; ++i )
    *v11++ = i;
  LOBYTE(v13) = 0;
  v14 = 0;
  v15 = v23;
  do
  {
    v16 = *v15;
    v17 = v14 % a6;
    ++v14;
    v13 = (unsigned __int8)(*v15 + v13 + *(_BYTE *)(v17 + a3));
    LOBYTE(v18) = v23[v13];
    *v15++ = v18;
    v23[v13] = v16;
  }
  while ( v14 < 0x100 );
  LOBYTE(v19) = 0;
  if ( (_DWORD)v8 )
  {
    v20 = v8;
    do
    {
      v19 = (unsigned __int8)(v19 + 1);
      v21 = v23[v19];
      v6 = (unsigned __int8)(v6 + v21);
      v23[v19] = v23[v6];
      v23[v6] = v21;
      v18 = (unsigned __int8)(v21 + v23[v19]);
      *a4++ ^= v23[v18];
      --v20;
    }
    while ( v20 );
  }
  return v18;
}

MD5 hashing

This case was quite curious because besides pinpointing the general behavior of the function, Gepetto was able to recognize that the sample dynamically imports APIs. After that, it was able to infer the functionality of each function call by analyzing the disposition of the stack (my intuition of the LLM reasoning). Finally, it interprets the post-processing to the resulting hash into a hexadecimal string.

// ----- Comment generated by Gepetto -----
//
// This function computes an MD5 hash of data and formats it as a hexadecimal
// string
. It uses a function pointer table (accessed via offsets from a4) to call
// various cryptographic operations: initializing a hash context, computing the
// hash value, retrieving the 16-byte digest, and then formatting each byte as a
// two-character uppercase hex string (e.g., "A3F2..."). The formatted result is
// written to a buffer, and the function cleans up by closing the hash context
// before returning.
//
// Proposed name: compute_and_format_md5_hash

__int64 __fastcall sub_2DA5314(
        __int64 a1,
        __int64 a2,
        __int64 a3,
        __int64 a4,
        __int64 a5,
        __int64 a6,
        int a7,
        __int64 a8,
        __int64 a9,
        __int64 a10)
{
  __int64 v10; // rbx
  __int64 v12; // rsi
  __int64 result; // rax
  int v15; // ebx
  __int64 v16; // rdx
  unsigned int v17; // eax
  int v18; // eax
  __int64 v19; // rdx
  __int64 v20; // [rsp+30h] [rbp-20h]
  __int64 v21; // [rsp+38h] [rbp-18h] BYREF
  _BYTE v22[16]; // [rsp+40h] [rbp-10h] BYREF

  a8 = v10;
  a9 = a2;
  v12 = a5;
  result = (*(__int64 (__fastcall **)(__int64, __int64, _QWORD, __int64 *, _QWORD, __int64))(a4 + 4334))(
             a4,
             a5,
             0,
             &v21,
             0,
             1);
  v15 = 0;
  if ( (_DWORD)result )
  {
    (*(void (__fastcall **)(__int64, __int64, __int64, __int64, _QWORD, _QWORD))(a4 + 4350))(a4, v12, 32771, v21, 0, 0);
    v17 = (*(__int64 (__fastcall **)(__int64, __int64, __int64, __int64))(a4 + 3806))(a4, v12, v16, a3);
    (*(void (__fastcall **)(__int64, __int64, __int64, __int64, _QWORD, _QWORD))(a4 + 4358))(a4, v12, a3, v20, v17, 0);
    a7 = 16;
    (*(void (__fastcall **)(__int64, __int64, __int64, __int64, _BYTE *, int *))(a4 + 4366))(a4, v12, 2, v20, v22, &a7);
    strcpy((char *)&a10, "%02X");
    a7 = 0;
    do
    {
      v18 = (*(__int64 (__fastcall **)(__int64, __int64, __int64 *, __int64, _QWORD))(a4 + 4414))(
              a4,
              v12,
              &a10,
              v12,
              (unsigned __int8)v22[v15]);
      v15 = a7 + 1;
      v12 += v18;
      a7 = v15;
    }
    while ( v15 < 16 );
    (*(void (__fastcall **)(__int64, __int64, __int64))(a4 + 4374))(a4, v12, v19);
    return (*(__int64 (__fastcall **)(__int64, __int64, _QWORD, __int64))(a4 + 4342))(a4, v12, 0, v21);
  }
  return result;
}

Anti-Analysis

This case shows a more elaborated snip of code implementing anti-debugging and anti-reversing techniques. The context is limited and it is difficult to pinpoint what is going on without dynamic data. Gepetto provided vague insights but was able to suggest hypotheses about the code's purpose.

// ----- Comment generated by Gepetto -----
//
// This function iterates through a loop that repeatedly calls a function pointer
// (loc_402A57) with arguments from the stack frame, storing results into an array
// pointed to by the frame pointer offset +12. The loop terminates when the
// function pointer returns zero or the input a1 is zero, setting a flag at offset
// -4 accordingly. After the loop, it performs several obfuscated pointer
// arithmetic operations and bit manipulations before calling another function
// (loc_4011DC) and returning the flag value. The presence of debugbreak
// instructions, unusual arithmetic patterns, and deliberate obfuscation (magic
// constants, pointer manipulation) suggests this may be anti-debugging or anti-
// analysis code, possibly part of a software protection scheme.

//
// Proposed name:
// iterate_and_collect_with_obfuscation

int __userpurge sub_402BA6@<eax>(int a1@<eax>, char a2@<cl>, int a3@<ebx>, int a4@<ebp>, int a5, int a6)
{
  int *v6; // esi
  int *i; // edi
  _DWORD *v8; // eax
  char *v9; // eax

  v6 = *(int **)(a4 + 12);
  for ( i = v6; ; ++i )
  {
    __debugbreak();
    if ( !a1 )
      break;
    a1 = ((int (__cdecl *)(_DWORD, int))loc_402A57)(*(_DWORD *)(a4 + 8), a1);
    if ( !a1 )
    {
      *(_DWORD *)(a4 - 4) = 0;
      break;
    }
    *(_DWORD *)(a4 - 4) = 1;
    *i = a1;
  }
  __debugbreak();
  v8 = (_DWORD *)(i[2 * (_DWORD)v6 - 0x1665C7CD] + a1);
  v9 = (char *)v8 - *v8;
  *(_BYTE *)(a3 - 0x3B7CDBFC) += a2;
  LOBYTE(v9) = (_BYTE)v9 - 21;
  do
  {
    v9 += 0xEFEB34EB;
    LOBYTE(v9) = (unsigned __int8)v9 | 0x1A;
  }
  while ( (char)v9 <= 0 );
  ((void (__fastcall *)(int, int))loc_4011DC)(112, 179);
  return *(_DWORD *)(a4 - 4);
}

Persistence

In this case, I manually labeled few functions and asked for Gepetto to describe the main function using them. Gepetto was able to infer the main function (sub_2DA35BC).

// ----- Comment generated by Gepetto -----
//
// This function establishes persistence by creating a Windows scheduled task. It
// retrieves a task name using the fourth parameter, creates the scheduled task via
// the Windows Task Scheduler (with the third/sixth parameters set to 0, likely
// indicating default or minimal options), and then frees the memory allocated for
// the task name. The function returns the result of the memory deallocation
// operation.
//
// Proposed name: create_scheduled_task_persistence

__int64 __fastcall sub_2DA35BC(int a1, __int64 a2, __int64 a3, __int64 a4)
{
  int v5; // r8d
  __int64 schedule_task_name; // [rsp+20h] [rbp-18h]

  schedule_task_name = _get_schedule_task_name(a4, a2, a3, a4);
  _create_windows_scheduler_task(a4, a2, 0, a4, v5, 0);
  return _wrapper_RtlFreeHeap(a4, a2, schedule_task_name, a4);
}

Take Aways

  • More experienced reversers can use this tool to speed up analysis. It helps in creating a high-level picture of the subject and provides hints on unknown patterns.

  • Beginners can use it as a training tool to become familiar with known patterns.

  • It can be utilized in capture-the-flag competitions.

  • For best results, use it interactively. The tool provides more accurate descriptions as more context is provided, such as function and variable names.