[Tool] Messing Around with Gepetto
AI-Assisted Reversing Engineering in 2026
Recently, I discovered this IDAPro plugin called Gepetto [1]. It connects IDA to LLMs and assists in annotating disassembled code interactively directly from the UI.
You can simply right-click on a decompiled function’s name and ask Gepetto to explain its behavior.

This article lists few examples (using Anthropic’s model Claude 4.5 Sonnet) of annotations provided by Gepetto while co-reversing a sample that I had fully analyzed some time ago. I found it quite accurate and would like to share some thoughts.
Cryptography
Gepetto is quite straight forward and accurate when identifying crypto and variants of popular crypto algorithms. This capability can save time of trained analysts and level-up beginner reversers (by training them in common patterns).
Custom Hashing
Gepetto accurately identified a custom hashing function and described its sub-operations, such as converting ASCII characters to lowercase and performing accumulative XOR.
// ----- Comment generated by Gepetto -----
//
// This function computes a custom hash value from a null-terminated string. It
// iterates through each byte, performing a bitwise AND with 0xDF (which converts
// lowercase ASCII letters to uppercase, treating the input case-insensitively),
// then combines this with the accumulated hash using XOR, a left rotation by 8
// bits, and addition. The function returns a 32-bit hash value derived from the
// string's case-normalized characters.
//
// Proposed name:
// compute_case_insensitive_hash
__int64 __fastcall _hashing(__int64 a1, __int64 a2, __int64 a3, _BYTE *a4)
{
char v4; // al
_BYTE *v5; // r8
unsigned int v6; // edx
v4 = *a4;
v5 = a4;
v6 = 0;
while ( v4 )
{
v6 = (v4 & 0xDF) + __ROL4__(v6 ^ v4 & 0xDF, 8);
v4 = *++v5;
}
return v6;
}
For this case, I also asked Gepetto to generate Python code. This feature can be quite useful when writing configuration extractors and bot emulators. Gepetto was able to completed the task successfully.
def _hashing(input_string):
"""
Computes a custom hash value from a string.
This function iterates through each byte, performing a bitwise AND with 0xDF
(which converts lowercase ASCII letters to uppercase, treating the input
case-insensitively), then combines this with the accumulated hash using XOR,
a left rotation by 8 bits, and addition.
Args:
input_string: A string or bytes object to hash
Returns:
A 32-bit hash value (unsigned integer)
"""
def rol32(value, shift):
"""Rotate left operation for 32-bit values"""
value &= 0xFFFFFFFF # Ensure 32-bit
shift %= 32 # Normalize shift amount
return ((value << shift) | (value >> (32 - shift))) & 0xFFFFFFFF
# Convert string to bytes if necessary
if isinstance(input_string, str):
data = input_string.encode('latin-1') # or 'utf-8' depending on context
else:
data = input_string
hash_value = 0
for byte in data:
normalized = byte & 0xDF # Convert lowercase to uppercase (for ASCII letters)
hash_value = (normalized + rol32(hash_value ^ normalized, 8)) & 0xFFFFFFFF
return hash_value
RC4
RC4 is a quite popular algorithm among malware developers. Experienced reversers would find identifying this code straightforward. Gepetto was able to describe its behavior accurately.
// ----- Comment generated by Gepetto -----
//
// This function implements the RC4 stream cipher algorithm. It first initializes a
// 256-byte state array with values 0-255, then performs the key-scheduling
// algorithm (KSA) using the key provided in parameter a3 of length a6 to permute
// the state array. Finally, it executes the pseudo-random generation algorithm
// (PRGA) to generate a keystream that is XORed with the input data pointed to by
// a4 for a5 bytes, performing in-place encryption or decryption. The function
// modifies the buffer at a4 as a side effect and returns the last generated
// keystream byte.
//
// Proposed name: rc4_crypt
char __fastcall sub_2DA5168(__int64 a1, __int64 a2, __int64 a3, _BYTE *a4, unsigned int a5, unsigned __int8 a6)
{
__int64 v6; // r10
__int64 v8; // rbx
_BYTE *v11; // r11
unsigned int i; // eax
__int64 v13; // r14
unsigned int v14; // r9d
char *v15; // r11
char v16; // r8
unsigned int v17; // edx
__int64 v18; // rax
__int64 v19; // r9
__int64 v20; // r11
char v21; // dl
_BYTE v23[256]; // [rsp+0h] [rbp-108h] BYREF
LOBYTE(v6) = 0;
v8 = a5;
v11 = v23;
for ( i = 0; i < 0x100; ++i )
*v11++ = i;
LOBYTE(v13) = 0;
v14 = 0;
v15 = v23;
do
{
v16 = *v15;
v17 = v14 % a6;
++v14;
v13 = (unsigned __int8)(*v15 + v13 + *(_BYTE *)(v17 + a3));
LOBYTE(v18) = v23[v13];
*v15++ = v18;
v23[v13] = v16;
}
while ( v14 < 0x100 );
LOBYTE(v19) = 0;
if ( (_DWORD)v8 )
{
v20 = v8;
do
{
v19 = (unsigned __int8)(v19 + 1);
v21 = v23[v19];
v6 = (unsigned __int8)(v6 + v21);
v23[v19] = v23[v6];
v23[v6] = v21;
v18 = (unsigned __int8)(v21 + v23[v19]);
*a4++ ^= v23[v18];
--v20;
}
while ( v20 );
}
return v18;
}
MD5 hashing
This case was quite curious because besides pinpointing the general behavior of the function, Gepetto was able to recognize that the sample dynamically imports APIs. After that, it was able to infer the functionality of each function call by analyzing the disposition of the stack (my intuition of the LLM reasoning). Finally, it interprets the post-processing to the resulting hash into a hexadecimal string.
// ----- Comment generated by Gepetto -----
//
// This function computes an MD5 hash of data and formats it as a hexadecimal
// string. It uses a function pointer table (accessed via offsets from a4) to call
// various cryptographic operations: initializing a hash context, computing the
// hash value, retrieving the 16-byte digest, and then formatting each byte as a
// two-character uppercase hex string (e.g., "A3F2..."). The formatted result is
// written to a buffer, and the function cleans up by closing the hash context
// before returning.
//
// Proposed name: compute_and_format_md5_hash
__int64 __fastcall sub_2DA5314(
__int64 a1,
__int64 a2,
__int64 a3,
__int64 a4,
__int64 a5,
__int64 a6,
int a7,
__int64 a8,
__int64 a9,
__int64 a10)
{
__int64 v10; // rbx
__int64 v12; // rsi
__int64 result; // rax
int v15; // ebx
__int64 v16; // rdx
unsigned int v17; // eax
int v18; // eax
__int64 v19; // rdx
__int64 v20; // [rsp+30h] [rbp-20h]
__int64 v21; // [rsp+38h] [rbp-18h] BYREF
_BYTE v22[16]; // [rsp+40h] [rbp-10h] BYREF
a8 = v10;
a9 = a2;
v12 = a5;
result = (*(__int64 (__fastcall **)(__int64, __int64, _QWORD, __int64 *, _QWORD, __int64))(a4 + 4334))(
a4,
a5,
0,
&v21,
0,
1);
v15 = 0;
if ( (_DWORD)result )
{
(*(void (__fastcall **)(__int64, __int64, __int64, __int64, _QWORD, _QWORD))(a4 + 4350))(a4, v12, 32771, v21, 0, 0);
v17 = (*(__int64 (__fastcall **)(__int64, __int64, __int64, __int64))(a4 + 3806))(a4, v12, v16, a3);
(*(void (__fastcall **)(__int64, __int64, __int64, __int64, _QWORD, _QWORD))(a4 + 4358))(a4, v12, a3, v20, v17, 0);
a7 = 16;
(*(void (__fastcall **)(__int64, __int64, __int64, __int64, _BYTE *, int *))(a4 + 4366))(a4, v12, 2, v20, v22, &a7);
strcpy((char *)&a10, "%02X");
a7 = 0;
do
{
v18 = (*(__int64 (__fastcall **)(__int64, __int64, __int64 *, __int64, _QWORD))(a4 + 4414))(
a4,
v12,
&a10,
v12,
(unsigned __int8)v22[v15]);
v15 = a7 + 1;
v12 += v18;
a7 = v15;
}
while ( v15 < 16 );
(*(void (__fastcall **)(__int64, __int64, __int64))(a4 + 4374))(a4, v12, v19);
return (*(__int64 (__fastcall **)(__int64, __int64, _QWORD, __int64))(a4 + 4342))(a4, v12, 0, v21);
}
return result;
}
Anti-Analysis
This case shows a more elaborated snip of code implementing anti-debugging and anti-reversing techniques. The context is limited and it is difficult to pinpoint what is going on without dynamic data. Gepetto provided vague insights but was able to suggest hypotheses about the code's purpose.
// ----- Comment generated by Gepetto -----
//
// This function iterates through a loop that repeatedly calls a function pointer
// (loc_402A57) with arguments from the stack frame, storing results into an array
// pointed to by the frame pointer offset +12. The loop terminates when the
// function pointer returns zero or the input a1 is zero, setting a flag at offset
// -4 accordingly. After the loop, it performs several obfuscated pointer
// arithmetic operations and bit manipulations before calling another function
// (loc_4011DC) and returning the flag value. The presence of debugbreak
// instructions, unusual arithmetic patterns, and deliberate obfuscation (magic
// constants, pointer manipulation) suggests this may be anti-debugging or anti-
// analysis code, possibly part of a software protection scheme.
//
// Proposed name:
// iterate_and_collect_with_obfuscation
int __userpurge sub_402BA6@<eax>(int a1@<eax>, char a2@<cl>, int a3@<ebx>, int a4@<ebp>, int a5, int a6)
{
int *v6; // esi
int *i; // edi
_DWORD *v8; // eax
char *v9; // eax
v6 = *(int **)(a4 + 12);
for ( i = v6; ; ++i )
{
__debugbreak();
if ( !a1 )
break;
a1 = ((int (__cdecl *)(_DWORD, int))loc_402A57)(*(_DWORD *)(a4 + 8), a1);
if ( !a1 )
{
*(_DWORD *)(a4 - 4) = 0;
break;
}
*(_DWORD *)(a4 - 4) = 1;
*i = a1;
}
__debugbreak();
v8 = (_DWORD *)(i[2 * (_DWORD)v6 - 0x1665C7CD] + a1);
v9 = (char *)v8 - *v8;
*(_BYTE *)(a3 - 0x3B7CDBFC) += a2;
LOBYTE(v9) = (_BYTE)v9 - 21;
do
{
v9 += 0xEFEB34EB;
LOBYTE(v9) = (unsigned __int8)v9 | 0x1A;
}
while ( (char)v9 <= 0 );
((void (__fastcall *)(int, int))loc_4011DC)(112, 179);
return *(_DWORD *)(a4 - 4);
}
Persistence
In this case, I manually labeled few functions and asked for Gepetto to describe the main function using them. Gepetto was able to infer the main function (sub_2DA35BC).
// ----- Comment generated by Gepetto -----
//
// This function establishes persistence by creating a Windows scheduled task. It
// retrieves a task name using the fourth parameter, creates the scheduled task via
// the Windows Task Scheduler (with the third/sixth parameters set to 0, likely
// indicating default or minimal options), and then frees the memory allocated for
// the task name. The function returns the result of the memory deallocation
// operation.
//
// Proposed name: create_scheduled_task_persistence
__int64 __fastcall sub_2DA35BC(int a1, __int64 a2, __int64 a3, __int64 a4)
{
int v5; // r8d
__int64 schedule_task_name; // [rsp+20h] [rbp-18h]
schedule_task_name = _get_schedule_task_name(a4, a2, a3, a4);
_create_windows_scheduler_task(a4, a2, 0, a4, v5, 0);
return _wrapper_RtlFreeHeap(a4, a2, schedule_task_name, a4);
}
Take Aways
More experienced reversers can use this tool to speed up analysis. It helps in creating a high-level picture of the subject and provides hints on unknown patterns.
Beginners can use it as a training tool to become familiar with known patterns.
It can be utilized in capture-the-flag competitions.
For best results, use it interactively. The tool provides more accurate descriptions as more context is provided, such as function and variable names.
![[Tool] Messing Around with Gepetto](https://cdn.hashnode.com/res/hashnode/image/upload/v1767142238902/4dd6591e-4631-4ad3-9204-d567d5a06435.jpeg?w=1600&h=840&fit=crop&crop=entropy&auto=compress,format&format=webp)