Programmatically Identifying and Isolating Functions Inside Executables Like IDA Does.



Even though it’s one of the tools I use on a daily basis, Hex-Rays IDA always fascinates me for its completeness and the huge amount of informations it is able to extract using just a “simple” static analysis approach and being myself a “make yourself the tools you need” guy a couple of weeks ago I’ve started to study it, trying to understand its internal mechanisms, algorithms and tricks.

I’ve focused on the identification and isolation of subroutines inside an executable due to the fact that this seemed to me the simplest thing to start with and because I came accross this blog post that shows how great IDA python libraries are.

1
2
3
4
5
6
7
8
9
10
11
12
13
# for all function offsets
for fn_ea in Functions():
if fn_ea == None:
continue

# get function from offset
f = idaapi.get_func(fn_ea)

# get function bytes
start = f.startEA
size = f.endEA - start
bytes = GetManyBytes(start, size)
...

Wouldn’t be cool to have such features without the whole Python and IDA SDK distribution ? :)

Actually this showed to be way much a simpler task than I’ve initially imagined, you only need a good portable executable parsing library and a fast disassembler library, for this purpose I’ve used Distorm which is fast and easy to integrate.

NOTE

A few smart people pointed out that the following system is prone to false positives. This is totally true, I never claimed it to be a perfect approach, but just a proof of concept of what can be achieved using a couple of well written libraries and a few lines of C code.

The main algorithm is very simple.

  • Search for every code/executable section in the PE.
1
2
3
4
5
6
7
8
9
10
11
PE_FOREACH_SECTION( &pe, pSection )
{
// skip empty sections
if( pSection->SizeOfRawData == 0 )
continue;
// skip non executable or non code sections
else if( !( pSection->Characteristics & IMAGE_SCN_CNT_CODE ) && !( pSection->Characteristics & IMAGE_SCN_MEM_EXECUTE ) )
continue;

...
}
  • Analyze each section and search for CALL branch instructions to a relative address, save that address as a function start.
1
2
3
4
5
6
// Is this instruction a suitable call ?
if( ( inst->opcode == I_CALL || inst->opcode == I_CALL_FAR ) && inst->ops[0].type == O_PC )
{
uint32_t dwFunctionAddress = inst->addr + inst->imm.sdword + inst->size;
...
}
  • Analyze the function start searching for the first RET instruction, that will be the function end.
1
2
3
4
5
6
if( inst->opcode == I_RET || inst->opcode == I_RETF )
{
pFunction->Address.Size = inst->addr - pFunction->Address.VA;

return false;
}

Easy isn’t it ? ^_^

I’ve implemented this algorithm as the new pefunctions project sample inside libpe repository, enjoy :)