Dynamically Inject a Shared Library Into a Running Process on Android/ARM



If you’re familiar with Windows runtime code injection you probably know the great API CreateRemoteThread which lets us force an arbitrary running process to call LoadLibrary and load a DLL into its address space, this technique called DLL Injection is often used to perform user space API hooking, you can find a good post about it on Gianluca Braga’s blog.

Unfortunately there’s no CreateRemoteThread equivalent on Linux system, therefore we can only rely on ptrace and our brain :D
In this post I’ll explain how to perform DLL Injection on Linux systems and more specifically on Android/ARM.

Part 2 of this post on “Android Native API Hooking with Library Injection and ELF Introspection.”

TL;DR

Fuck you, really! <3
I’m awesome, you’re a lazy scumbag … and the full source code can be found on the arminject repository on my github page.

Defeating ASLR

Once we’re attached to the process with ptrace, the first task we have is to obtain the address of the functions we’re gonna need for our purpose, namely:

  • dlopen for obvious reasons.
  • dlsym if we want to remotely call a function of the injected library.
  • calloc/malloc to allocate strings in the target process memory.
  • free to release that memory.

The problem here is to somehow defeat/bypass the address space layout randomization, we know the address of these symbols in our own process but we surely don’t in the target process since ASLR screwed these up.

impossibru!

What we do know is that a given symbol will have the same exact offset from the library base address and we definitely can determine the library base address in the target process analyzing its /proc/-pid-/maps file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
/*
* This method will open /proc/<pid>/maps and search for the specified
* library base address.
*/
uintptr_t findLibrary( const char *library, pid_t pid = -1 ) {
char filename[0xFF] = {0},
buffer[1024] = {0};
FILE *fp = NULL;
uintptr_t address = 0;

sprintf( filename, "/proc/%d/maps", pid == -1 ? _pid : pid );

fp = fopen( filename, "rt" );
if( fp == NULL ){
perror("fopen");
goto done;
}

while( fgets( buffer, sizeof(buffer), fp ) ) {
if( strstr( buffer, library ) ){
address = (uintptr_t)strtoul( buffer, NULL, 16 );
goto done;
}
}

done:

if(fp){
fclose(fp);
}

return address;
}

Once we know the base address of a given library both in our process and in the target process, what we can do to resolve the remote function address is:

REMOTE_ADDRESS = LOCAL_ADDRESS + ( REMOTE_BASE - LOCAL_BASE )

Basically we take the local address of the function and apply to it the difference between the local library base address and the remote one, which is exactly what the following code does:

1
2
3
4
5
6
7
8
9
10
11
12
/*
* Compute the delta of the local and the remote modules and apply it to
* the local address of the symbol ... BOOM, remote symbol address!
*/
void *findFunction( const char* library, void* local_addr ){
uintptr_t local_handle, remote_handle;

local_handle = findLibrary( library, getpid() );
remote_handle = findLibrary( library );

return (void *)( (uintptr_t)local_addr + (uintptr_t)remote_handle - (uintptr_t)local_handle );
}

Finally we’ve bypassed the ASLR problem :)

fuck yeah

ARM Calling Convention

Next, we need to figure out how to force the process to execute a call to an address controlled by us ( one of the previously mentioned functions ), in order to do that we need to understand the ARM calling convention which, fortunately, is quite easy.

The first four arguments for a function are put inside registers from R0 to R3 while any other argument ( if any of course ) are pushed onto the stack.
Eventually the function address is put on the PC ( R15 ) register and the return address into the LR ( R14 ) register, this will cause the effective call to that function. The return value will be found inside the RO register.
You can find a pretty good document about this, the “Practical ARM exploitation manual”, here.

What I did is the following:

  • Use PTRACE_GETREGS to save the current process registers.
  • Put the arguments of the function into R0-R3 and on the stack if needed.
  • Set LR to 0, so we can catch the SIGSEGV after the call.
  • Set PC to the function address.
  • Mask PC and CPSR accordingly to the mode ( thumb or arm ).
  • Update the registers with PTRACE_SETREGS.
  • Trigger the call with PTRACE_CONT and wait for the process to SIGSEGV while returing to address 0 in LR.
  • Get the function return value from RO.
  • Restore the original registers.

The code, which uses variadic macros for convenience, is the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
unsigned long call( void *function, int nargs, ... ) {
int i = 0;
struct pt_regs regs = {{0}}, rbackup = {{0}};

// get registers and backup them
trace( PTRACE_GETREGS, 0, &regs );
memcpy( &rbackup, &regs, sizeof(struct pt_regs) );

va_list vl;
va_start(vl,nargs);

for( i = 0; i < nargs; ++i ){
unsigned long arg = va_arg( vl, long );

// fill R0-R3 with the first 4 arguments
if( i < 4 ){
regs.uregs[i] = arg;
}
// push remaining params onto stack
else {
regs.ARM_sp -= sizeof(long) ;
write( (size_t)regs.ARM_sp, (uint8_t *)&arg, sizeof(long) );
}
}

va_end(vl);

regs.ARM_lr = 0;
regs.ARM_pc = (long int)function;
// setup the current processor status register
if ( regs.ARM_pc & 1 ){
/* thumb */
regs.ARM_pc &= (~1u);
regs.ARM_cpsr |= CPSR_T_MASK;
}
else{
/* arm */
regs.ARM_cpsr &= ~CPSR_T_MASK;
}

// do the call
trace( PTRACE_SETREGS, 0, &regs );
trace( PTRACE_CONT );
waitpid( _pid, NULL, WUNTRACED );

// get registers again, R0 holds the return value
trace( PTRACE_GETREGS, 0, &regs );

// restore original registers state
trace( PTRACE_SETREGS, 0, &rbackup );

return regs.ARM_r0;
}

Putting all together

The next steps are basically putting all of this together:

  • Get the needed functions addresses.
  • Use the remote malloc/calloc to copy the library name string into the remote process.
  • Use the remote dlopen with the previously allocated buffer to load the library.
  • Use the remote dlsym if needed.
  • Profit.

Once you have your library injected, you can do quite a few things, like dynamic API hooking/tracing/patching ( libandroid_runtime.so anyone ? :D ), process introspection, runtime memory patching and generally speaking …

insanity wolf