Ingeniería inversa de software

Infecting a binary with 'malicious' code

The program that you will be analyzing is from:https://github.com/BR903/ELFkickers/tree/master/infect

Some references to understand the ELF format: http://www.linuxjournal.com/article/1059 https://corkami.googlecode.com/files/elf101.pdf

In this exercises we will manually insert code into a binary file to slightly change its behavior. Binary files have a strict structure and we have to choose carefully where we insert our code. Otherwise we can render the binary file useless.

We will practice inserting code into the ls binary. Every command that we execute in the linux (or windows or mac) shell is a simply a program. For example, the linux ls command is an executable file usually found in the /bin folder. To find a command's path you can use the whichcommand, for example which ls returns /bin/ls.

Step 1 : Make a copy of the ls program to your working directory. We will modify the copy throughout this exercise.

Step 2: Let's gather some information about the binary. Use the file command to investigate what type of file is ls:

What type of file is ls?

ELF refers to Executable and Linkable Format, a common file format for executables, object code, shared libraries, and core dumps.

Step 3: Let's look at information from inside the ls binary. The utility called readelfis used to inspect the internall components of binary files. To list the header of executable ls, do areadelf -h ls. The header is the part of the ELF file that describes general information about the rest of the contents of the file. The command file reads this part of the ELF file to report its information.

What is the entry point address?


How many program headers does this file have?>


Step 4 The program headers refer to the segments inside the ELF binary file. Segments are one of the components into which binary files are internally organized. This organization helps the Operating System decide which parts of the executable file should be loaded to memory when the program is run.

Let's find about the segments in ls. Do readelf --segments ls.

How many headers are listed?

The number of headers that are listed is supposed to be equal to the number that you read in the header.

Step 5: For now we shall pay attention only to the headers called LOAD. Notice that there are two of them.

What are the flags of the first LOAD segment (the read, write, execute access)?


What are the flags of the second LOAD segment (the read, write, execute access)?


Based on the read, write, execute permissions, which of the LOAD segments do you think will contain the instructions of the program? Which will contain the data of the program?


Fill the following table with the information about the two LOAD segments:

Segment | Offset in file | VM offset | FileSize | MemSize | Flags |
--------+----------------+-----------+----------+---------+-------|
1st LOAD|                |           |          |         |       |
--------+----------------+-----------+----------+---------+-------|
2nd LOAD|                |           |          |         |       |
--------+----------------+-----------+----------+---------+-------|

Step 6: If we are going to insert code into the binary file without damaging the normal operation of the program, we need to find a proper location. Our best bet is to insert our code in the code cave between the end of the first and second LOAD segments.

In the file, how many bytes are there between where the first LOAD segment ends and where the second one begins?


What will be the address of your new code? Please write this down since you will change the entry point of the program to this address.


Step 7: Let's construct the code to insert. To keep things simple, we will just have the modified program print out a harmless message before doing its usual job.

We will craft a snippet of code that prints a string using linux system calls. A complete list of these calls is at: http://blog.rchapman.org/post/36801038863/linux-system-call-table-for-x86-64.

In essence, our code need to execute a call to the

syscall(1, 1, str, sizeof(str)-1);

One way to call system calls in linux is through the assembly command syscall. We just need to pass the correct parameters using the correct registers.

Here is an assembly language program that prints Hello by invoking the syscall:

[SECTION .text]

global _start

_start: 
  push rax ; save the status of the registers 
  push rsi ; that we will modify 
  push rdi 
  push rdx

  jmp short ender

starter: 
  xor eax, eax ; clear the eax and edx registers 
  xor edx, edx

  pop rsi ; get the address of the string

  mov al, 1 ; eax = 1 for invoking sys_write 
  mov edi, eax ; edi = 1 for file descriptor stdout 
  mov dl, 5 ; edx = 5 the length of Hello 
  syscall

  pop rdx ; return the registers to their original values 
  pop rdi  
  pop rsi 
  pop rax

  jmp (the dif between the address of this ins and orig entry point)

ender: 
  call starter ;put the address of the string on the stack 
  db 'Hello'

You may use the command nasm -f elf64 (filename) to assemble the file into a binary ELF. You can the use objdump -d (filename.o) to look a the machine code and disassembly. You may copy the machine code almost identical to the place you identified in the ls binary.