[Tool] Messing Around with Gepetto

[Tool] Messing Around with Gepetto

AI-Assisted Reversing Engineering in 2026

Recently, I discovered this IDAPro plugin called Gepetto [1]. It connects IDA to LLMs and assists in annotating disassembled code interactively directly from the UI.

It is as simple as right-clicking in a decompiled function’s name asking Gepetto to explaining its behavior.

This article lists few examples (using Anthropic’s model Claude 4.5 Sonnet) of annotations provided by Gepetto while co-reversing a sample that I had fully analyzed some time ago. I found it quite accurate and would like to share some thoughts.

Cryptography

Gepetto is quite straight forward and accurate when identifying crypto and variants of popular crypto algorithms. This capability can save time of trained analysts and level-up beginner reversers (by training them in common patterns).

Custom Hashing

Gepetto was able to correctly identify a custom hashing function. It was able also to describe sub-operations such as: converting ascii characters to lowercase and accumulative XOR.

__int64 __fastcall _hashing(__int64 a1, __int64 a2, __int64 a3, _BYTE *a4)
{
  char v4; // al
  _BYTE *v5; // r8
  unsigned int v6; // edx

  v4 = *a4;
  v5 = a4;
  v6 = 0;
  while ( v4 )
  {
    v6 = (v4 & 0xDF) + __ROL4__(v6 ^ v4 & 0xDF, 8);
    v4 = *++v5;
  }
  return v6;
}

// ----- Comment generated by Gepetto -----
//
// This function computes a custom hash value from a null-terminated string. It
// iterates through each byte, performing a bitwise AND with 0xDF (which converts
// lowercase ASCII letters to uppercase, treating the input case-insensitively),
// then combines this with the accumulated hash using XOR, a left rotation by 8
// bits, and addition. The function returns a 32-bit hash value derived from the
// string's case-normalized characters.
//
// Proposed name:
// compute_case_insensitive_hash

For this case, I also asked Gepetto to generate Python code. This feature can be quite useful when writing configuration extractors and bot emulators. Gepetto was able to completed the task successfully.

def _hashing(input_string):
    """
    Computes a custom hash value from a string.

    This function iterates through each byte, performing a bitwise AND with 0xDF
    (which converts lowercase ASCII letters to uppercase, treating the input 
    case-insensitively), then combines this with the accumulated hash using XOR,
    a left rotation by 8 bits, and addition.

    Args:
        input_string: A string or bytes object to hash

    Returns:
        A 32-bit hash value (unsigned integer)
    """
    def rol32(value, shift):
        """Rotate left operation for 32-bit values"""
        value &= 0xFFFFFFFF  # Ensure 32-bit
        shift %= 32  # Normalize shift amount
        return ((value << shift) | (value >> (32 - shift))) & 0xFFFFFFFF

    # Convert string to bytes if necessary
    if isinstance(input_string, str):
        data = input_string.encode('latin-1')  # or 'utf-8' depending on context
    else:
        data = input_string

    hash_value = 0

    for byte in data:
        normalized = byte & 0xDF  # Convert lowercase to uppercase (for ASCII letters)
        hash_value = (normalized + rol32(hash_value ^ normalized, 8)) & 0xFFFFFFFF

    return hash_value

RC4

RC4 is a quite popular algorithm among malware developers. Identifying this code would be quite straight forward to more experienced reversers. Gepetto was able to describe its behavior accurately.

char __fastcall sub_2DA5168(__int64 a1, __int64 a2, __int64 a3, _BYTE *a4, unsigned int a5, unsigned __int8 a6)
{
  __int64 v6; // r10
  __int64 v8; // rbx
  _BYTE *v11; // r11
  unsigned int i; // eax
  __int64 v13; // r14
  unsigned int v14; // r9d
  char *v15; // r11
  char v16; // r8
  unsigned int v17; // edx
  __int64 v18; // rax
  __int64 v19; // r9
  __int64 v20; // r11
  char v21; // dl
  _BYTE v23[256]; // [rsp+0h] [rbp-108h] BYREF

  LOBYTE(v6) = 0;
  v8 = a5;
  v11 = v23;
  for ( i = 0; i < 0x100; ++i )
    *v11++ = i;
  LOBYTE(v13) = 0;
  v14 = 0;
  v15 = v23;
  do
  {
    v16 = *v15;
    v17 = v14 % a6;
    ++v14;
    v13 = (unsigned __int8)(*v15 + v13 + *(_BYTE *)(v17 + a3));
    LOBYTE(v18) = v23[v13];
    *v15++ = v18;
    v23[v13] = v16;
  }
  while ( v14 < 0x100 );
  LOBYTE(v19) = 0;
  if ( (_DWORD)v8 )
  {
    v20 = v8;
    do
    {
      v19 = (unsigned __int8)(v19 + 1);
      v21 = v23[v19];
      v6 = (unsigned __int8)(v6 + v21);
      v23[v19] = v23[v6];
      v23[v6] = v21;
      v18 = (unsigned __int8)(v21 + v23[v19]);
      *a4++ ^= v23[v18];
      --v20;
    }
    while ( v20 );
  }
  return v18;
}

// ----- Comment generated by Gepetto -----
//
// This function implements the RC4 stream cipher algorithm. It first initializes a
// 256-byte state array with values 0-255, then performs the key-scheduling
// algorithm (KSA) using the key provided in parameter a3 of length a6 to permute
// the state array. Finally, it executes the pseudo-random generation algorithm
// (PRGA) to generate a keystream that is XORed with the input data pointed to by
// a4 for a5 bytes, performing in-place encryption or decryption. The function
// modifies the buffer at a4 as a side effect and returns the last generated
// keystream byte.
//
// Proposed name: rc4_crypt

Modified MD5 hashing

This case was quite curious because besides pinpointing the general behavior of the function, Gepetto was able to realize that the sample imports some APIs dynamically. After that, it was able to infer the functionality of each function call by analyzing the disposition of the stack (my intuition of the LLM reasoning). Finally, it interprets the post-processing to the resulting hash into a hexadecimal string.

__int64 __fastcall sub_2DA5314(
        __int64 a1,
        __int64 a2,
        __int64 a3,
        __int64 a4,
        __int64 a5,
        __int64 a6,
        int a7,
        __int64 a8,
        __int64 a9,
        __int64 a10)
{
  __int64 v10; // rbx
  __int64 v12; // rsi
  __int64 result; // rax
  int v15; // ebx
  __int64 v16; // rdx
  unsigned int v17; // eax
  int v18; // eax
  __int64 v19; // rdx
  __int64 v20; // [rsp+30h] [rbp-20h]
  __int64 v21; // [rsp+38h] [rbp-18h] BYREF
  _BYTE v22[16]; // [rsp+40h] [rbp-10h] BYREF

  a8 = v10;
  a9 = a2;
  v12 = a5;
  result = (*(__int64 (__fastcall **)(__int64, __int64, _QWORD, __int64 *, _QWORD, __int64))(a4 + 4334))(
             a4,
             a5,
             0,
             &v21,
             0,
             1);
  v15 = 0;
  if ( (_DWORD)result )
  {
    (*(void (__fastcall **)(__int64, __int64, __int64, __int64, _QWORD, _QWORD))(a4 + 4350))(a4, v12, 32771, v21, 0, 0);
    v17 = (*(__int64 (__fastcall **)(__int64, __int64, __int64, __int64))(a4 + 3806))(a4, v12, v16, a3);
    (*(void (__fastcall **)(__int64, __int64, __int64, __int64, _QWORD, _QWORD))(a4 + 4358))(a4, v12, a3, v20, v17, 0);
    a7 = 16;
    (*(void (__fastcall **)(__int64, __int64, __int64, __int64, _BYTE *, int *))(a4 + 4366))(a4, v12, 2, v20, v22, &a7);
    strcpy((char *)&a10, "%02X");
    a7 = 0;
    do
    {
      v18 = (*(__int64 (__fastcall **)(__int64, __int64, __int64 *, __int64, _QWORD))(a4 + 4414))(
              a4,
              v12,
              &a10,
              v12,
              (unsigned __int8)v22[v15]);
      v15 = a7 + 1;
      v12 += v18;
      a7 = v15;
    }
    while ( v15 < 16 );
    (*(void (__fastcall **)(__int64, __int64, __int64))(a4 + 4374))(a4, v12, v19);
    return (*(__int64 (__fastcall **)(__int64, __int64, _QWORD, __int64))(a4 + 4342))(a4, v12, 0, v21);
  }
  return result;
}

// ----- Comment generated by Gepetto -----
//
// This function computes an MD5 hash of data and formats it as a hexadecimal
// string. It uses a function pointer table (accessed via offsets from a4) to call
// various cryptographic operations: initializing a hash context, computing the
// hash value, retrieving the 16-byte digest, and then formatting each byte as a
// two-character uppercase hex string (e.g., "A3F2..."). The formatted result is
// written to a buffer, and the function cleans up by closing the hash context
// before returning.
//
// Proposed name: compute_and_format_md5_hash

Anti-Analysis

Persistence

Fingerprinting