The program that you will be analyzing is from:https://github.com/BR903/ELFkickers/tree/master/infect
Some references to understand the ELF format: http://www.linuxjournal.com/article/1059 https://corkami.googlecode.com/files/elf101.pdf
In this exercises we will manually insert code into a binary file to slightly change its behavior. Binary files have a strict structure and we have to choose carefully where we insert our code. Otherwise we can render the binary file useless.
We will practice inserting code into the ls
binary. Every command that we execute in the linux (or windows or mac) shell is a simply a program. For example, the linux ls
command is an executable file usually found in the /bin
folder. To find a command's path you can use the which
command, for example which ls
returns /bin/ls
.
Step 1 : Make a copy of the ls
program to your working directory. We will modify the copy throughout this exercise.
Step 2: Let's gather some information about the binary. Use the file
command to investigate what type of file is ls
:
What type of file is
ls
?
ELF refers to Executable and Linkable Format, a common file format for executables, object code, shared libraries, and core dumps.
Step 3: Let's look at information from inside the ls
binary. The utility called readelf
is used to inspect the internall components of binary files. To list the header of executable ls
, do areadelf -h ls
. The header is the part of the ELF file that describes general information about the rest of the contents of the file. The command file
reads this part of the ELF file to report its information.
What is the entry point address?
How many program headers does this file have?>
Step 4 The program headers refer to the segments inside the ELF binary file. Segments are one of the components into which binary files are internally organized. This organization helps the Operating System decide which parts of the executable file should be loaded to memory when the program is run.
Let's find about the segments in ls
. Do readelf --segments ls
.
How many headers are listed?
The number of headers that are listed is supposed to be equal to the number that you read in the header.
Step 5: For now we shall pay attention only to the headers called LOAD. Notice that there are two of them.
What are the flags of the first LOAD segment (the read, write, execute access)?
What are the flags of the second LOAD segment (the read, write, execute access)?
Based on the read, write, execute permissions, which of the LOAD segments do you think will contain the instructions of the program? Which will contain the data of the program?
Fill the following table with the information about the two LOAD segments:
Segment | Offset in file | VM offset | FileSize | MemSize | Flags |
--------+----------------+-----------+----------+---------+-------|
1st LOAD| | | | | |
--------+----------------+-----------+----------+---------+-------|
2nd LOAD| | | | | |
--------+----------------+-----------+----------+---------+-------|
Step 6: If we are going to insert code into the binary file without damaging the normal operation of the program, we need to find a proper location. Our best bet is to insert our code in the code cave between the end of the first and second LOAD segments.
In the file, how many bytes are there between where the first
LOAD
segment ends and where the second one begins?
What will be the address of your new code? Please write this down since you will change the entry point of the program to this address.
Step 7: Let's construct the code to insert. To keep things simple, we will just have the modified program print out a harmless message before doing its usual job.
We will craft a snippet of code that prints a string using linux system calls. A complete list of these calls is at: http://blog.rchapman.org/post/36801038863/linux-system-call-table-for-x86-64.
In essence, our code need to execute a call to the
syscall(1, 1, str, sizeof(str)-1);
One way to call system calls in linux is through the assembly command syscall
. We just need to pass the correct parameters using the correct registers.
Here is an assembly language program that prints Hello
by invoking the syscall:
[SECTION .text]
global _start
_start:
push rax ; save the status of the registers
push rsi ; that we will modify
push rdi
push rdx
jmp short ender
starter:
xor eax, eax ; clear the eax and edx registers
xor edx, edx
pop rsi ; get the address of the string
mov al, 1 ; eax = 1 for invoking sys_write
mov edi, eax ; edi = 1 for file descriptor stdout
mov dl, 5 ; edx = 5 the length of Hello
syscall
pop rdx ; return the registers to their original values
pop rdi
pop rsi
pop rax
jmp (the dif between the address of this ins and orig entry point)
ender:
call starter ;put the address of the string on the stack
db 'Hello'
You may use the command nasm -f elf64 (filename)
to assemble the file into a binary ELF. You can the use objdump -d (filename.o)
to look a the machine code and disassembly. You may copy the machine code almost identical to the place you identified in the ls
binary.