Lab 02 - Program Analysis
Read the explanations from https://ocw.cs.pub.ro/courses/cns/labs/lab-02. Then come back here for the exercises
Supporting files
Exercises
1. Where do they live?
In this exercise you will be validating some truths about executables and processes in linux.
- Copy the following code to a file:
live.c
- Now we will create two versions of the executable. One that uses the Position Indendent Executable (PIE) option and one that does not: Create the no PIE first using the
no-pie
option: gcc -o livenopie live.c -fPIC -no-pie - ... and now the PIE version the
-pie
option: gcc -o live live.c -fPIC -no-pie - Validate that they are different using
md5sum
: md5sum live livenopie - Is the
file
command able to tell the difference between the PIE and non-PIE versions? file live livenopie - Validate that both executables work, by running each of them. The hex address that they print might be a little different, but we will learn why in further steps. Since they have a
while(1);
each will stay running until you do a CTRL-C - The main difference between the PIE and non-PIE versions is that:
- In the non-PIE version, the addresses of the functions are established and the loader will use the instrutions in exactly those addresses.
- In the PIE version, the addresses of the functions are offsets to be used by the loader once it chooses (perhaps randomly) the base address where it wants to load the text.
- We will validate that difference between PIE/non-PIE in the next section. But first, use
objdump
to determine the address of the functionfoo
both in thelive
andlivenopie
binaries.
ASLR and PIE
- [https://blog.morphisec.com/aslr-what-it-is-and-what-it-isnt/] Address Space Layout Randomization (ASLR) is a computer security technique which involves randomly positioning the base address of an executable and the position of libraries, heap, and stack, in a process's address space. Linux has this featured turned on by default.
- Turn off the ASLR in linux by issuing the following command: echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
- With ASLR off, everytime that you run an executable, it will be assigned the same base addresses in virtual memory for the stack, heap, and text (program instructions).
- run
live
several times and register the output - run
livenopie
several times and register the output Make an observation in your report about what just happened.
- run
- Turn on the ASLR in linux by issuing the following command: echo 2 | sudo tee /proc/sys/kernel/randomize_va_space
- Run
live
several times. What do you notice about the various addresses printed in the program? - Run
livenopie
several times. What do you notice about the various addresses printed in the program? Comment about which addresses differ between runs and which stay the same. - Complete the following table regarding
live
andlivenopie
multiple runs (when ASLR is activated). Write eitherchange
orno change
.
Type | Stack | Heap | Text |
---|---|---|---|
PIE | |||
noPIE |
Another way of monitoring the addresses layout of programs
-
run
./live &
(leaves the program executing in the background and lets you know its process ID). Linux allows you to see the memory map of your process by looking at the file/proc/<PID>/maps
. Here a sample run in my computer.$ ./live & [2] 1479 Hello world stack 0x7ffe1b272744 stack in foo 0x7ffe1b272724 heap 0: 0x557e4e49a420 heap 1 : 0x557e4e49cb40 foo's address: 0x557e4c688850 $ cat /proc/1479/maps 557e4c688000-557e4c689000 r-xp 00000000 08:02 9863938 etc....
-
In your report, present the output of the map, the output of the program and explain how each address printed by the program corresponds to the correct area in the map. For example,
0x7ffe1b272744
from the program is in the range7ffe1b253000-7ffe1b275000
presented in themap
file.
2. Warm-up: Shellcode
The purpose of this task is to get you acquainted with some tools that can be used to manipulate ELF files.
Go to the tutorial/
directory.
Inspect the source code of shellcode.c
shellcode.c
contains a buffer SC, that has raw instructions. Open shellcode.c
to notice the buffer and its assigned value. The program tries to execute the contents of the buffer!s
-
What is going on here?
- What is the meaning of this declaration
int (*ret)();
Variableret
is a function pointer. It points to a function that returns an integer. Here is a very good and simple example of a function pointer https://cs.nyu.edu/courses/spring12/CSCI-GA.3033-014/Assignment1/function_pointers.html - What are those funny looking
\x??
things in the buffer? They are bytes represented as hex. For instance,"x\61x\62x\63"
is another way to express string"abc"
. - What is
write(1, SC, sizeof SC);
its a way of printing the contents of SC to the standard input, using the linuxwrite
system call.
- What is the meaning of this declaration
-
Compile the program. Run the program. What does it print? What happens after that?
It prints the address of SC, then it exits because of a segmentation fault.
-
Corroborate the address where SC lives, using
readelf
. You should get the same address that was printed in the previous step.readelf -s ./shellcode | grep SC
-
Why is it happening? in which section is the
SC
variable? With what flags is this segment loaded?$ readelf -S ./shellcode [24] .data PROGBITS 0000000000601020 0000000000000058 0000000000000000 WA 0 0 32
-
Try to change the flags of the .data section to make it executable .
objcopy --set-section-flags .data=alloc,code,load ./shellcode
Verify using
readelf
that thedata
section now hasX
among its flags. -
Is it working now? If not why?
NO. remember the two views of a file! the segment is still loaded RW, the loader only knows about segments
and segment 03 is:
LOAD 0x0000000000000e28 0x0000000000600e28 0x0000000000600e28 0x0000000000000250 0x0000000000000260 RW 200000
A trick that might work is making the stack executable (execstack -s ./shellcode
). If you can successfully run the executable, the shell code invokes a shell using /bin/sh
. You will notice that the prompt changes to a $
. To exit that shell do a CTRL-D
.
Compile run and save the generated shellcode
-
Recompile the
shellcode.c
. Then run it with an argument and direct its output tomycode.bin
gcc -O0 -o shellcode shellcode.c ./shellcode generate > mycode.bin
-
What is the type of the file
mycode.bin
?$ file ./mycode.bin ./mycode.bin: data
-
Why is
file
saying this is a data file, when our intention was to write some machine code in it?the
file
command is just reading some magic bytes at the beginning of the file, this is misleading. To be classified as an executable themycode.bin
would need to be structured like and ELF file. -
Try to execute
./mycode.bin
!chmod +x ./mycode.bin # to change the execute permission ./mycode.bin
-
Who is throwing the error?
The loader, which resides in the operating system.
How to actually run the generated shellcode.
The problem so far is that the shellcode (SC) ends up in a segment that does not have the executable bit set. One solution to this is, at runtime, remap the segment (page) with the exec flag – this solution requires writing some code. We can focus on another solution: use tools and .ELF's capability:
-
Generate an .ELF object file from the raw binary
objcopy -I binary -O elf64-x86-64 ./mycode.bin ./mycode.bin.o
-
Check the flags of the .data section! Where are the segments? (
readelf -S mycode.bin.o
)It should be WA! The segments are linktime info, we didn't link yet
-
Adjust the .data section of this elf as text
objcopy -I elf64-x86-64 --set-section-flags .data=alloc,code,load ./mycode.bin.o
Verify that the flags of the
.data
section are nowWAX
. -
We have not set the architecture to which our future executable is aimed. Try:
readelf -h mycode.bin.o
and read the field
Machine:
. -
Set machine (artifact of objcopy)
elfedit --output-mach x86-64 ./mycode.bin.o
Verify that the
Machine:
field has been assigned to. -
How do we actually use the data from this .o file? What symbols are exported?
$ readelf -s ./mycode.bin.o 0000000000000035 D _binary___mycode_bin_end 0000000000000035 A _binary___mycode_bin_size 0000000000000000 D _binary___mycode_bin_start
Link against ./mycode.bin.o
-
Inspect
use-my-code.c
! What does it do?- It uses the variables previously listed to call the code.
- Quick recap:
- starting from a binary blob we generated a object file (.ELF)
- the contents of the .data section are the bytes from the binary blob
- the data section is marked WAX (executable)
-
Compile and link!
gcc -O0 use-my-code.c ./mycode.bin.o -o my
-
The stack is still executable, remove this flag!
execstack -c ./my
-
Even if the stack is not executable, you should be able to run the shellcode, the data section is executable, please check it!
readelf -e my | grep .data
, check for segment 03 (which maps the.data
section)
If calling func
causes a Segmentation Fault, it's likely that your system produces PIE executables by default. Modify the “Compile and link!” command to:
gcc -no-pie -O0 use-my-code.c ./mycode.bin.o -o my
3. Warm-up: stripped
Someone has given us a stripped binary called stripped
. Let's run it and give it a brief view:
$ ./stripped
Hello, there!
I am looping, looping, looping, looping, looping,
$ file ./stripped
./stripped: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, stripped
The executable file is stripped, so we can't rely on any symbol information to look at it. However, it's small enough, so we can try to reverse engineer it by hand. To do that, answer the following questions:
-
What is the file's entry point? (you may find this using
readelf
) -
What instructions get executed started from that entry point?
-
What does the first function called during the program do? ( the one that has the following instructions)
mov eax,0x1 mov edi,0x1 syscall retq
-
How would you make the program print "looping" 12 times instead of 5.
-
What other control-flow altering instructions are executed besides
call
andret
?
4. stripped, re-loaded
Looking more carefully at our stripped
binary, we notice that there is one string that it never prints out:
$ strings -t x stripped
154 Hello, there!
162 I am looping,
171 All done!
17c .shstrtab
186 .text
18c .data
The string All done!
is at offset 0x171
in the binary, that is equivalent to 0x600171` in the loaded program.
$ objdump -d stripped -M intel
00000000004000b1 <h>:
4000b1: ba 0a 00 00 00 mov edx,0xa
4000b6: 48 be 71 01 60 00 00 movabs rsi,0x600171
4000bd: 00 00 00
4000c0: e8 75 00 00 00 call 0x40013a
4000c5: c3 ret
This means that the function that does the print (0x4000b1
) is never reached! Why? The reason is that the program exits before doing that.
Find the call to the exit function that occurs at run-time exactly before this print and manually replace it with NOP
instructions using the hex editor of your choice. At the end the program should display the following:
./stripped
Hello, there!
I am looping, looping, looping, looping, looping,
All done!
Note that the program should still exit cleanly!
Hint: the NOP
instruction has opcode 0x90
, so just replace all the bytes of the offending call
instruction with that
5. Memory Dump Analysis
Using your newfound voodoo skills you are now able to tackle the following task. In the middle of two programs I added the following lines:
{
int i;
int *a[1];
for( i = 0 ; i < 20; i++)
printf("%p\n", a[i]);
}
The results were the following. respectively:
0x400610
0xf5ad3276b90ff00
0x7ffd285c1f30
0x400601
0x400610
0x7f3b7a144830
0x1
0x7ffd285c2018
0x17a713ca0
0x4005f3
(nil)
0x49eb39c23ee94624
0x4004a0
0x7ffd285c2010
(nil)
(nil)
0xb61169fa0c494624
0xb79dcd6abd194624
(nil)
(nil)
Answer these questions:
- Explain why the displayed values are from the stack.
- Do the results reflect a PIE or no-PIE binary?
- Which values from the stack trace correspond to addresses in .text region?
- Which of the values do not point to valid memory addresses?
- Which of the values point to the stack?
- Which of the values point to the library/mmap zone?
6. Smash the Stack
- Download level01 from Smash the stack and solve it using gdb. Break on
main
, step through the execution and figure out what it does and how to crack it.$ scp level1@io.netgarage.org:/levels/level01 . # Password is level1
7. GDB
- Use GDB to run the code provided at [s5_pp_bash.tar
.gz](https://ocw.cs.pub.ro/courses/_media/cns/labs/s5_pp_bash.tar.gz). The executable gets input from the user and evaluates it against a static condition. If it succeeds it then calls a
password_accepted` function that prints out a success message and spawns a shell.
Your task is to use GDB to force the executable to call the password_accepted
function.
Gather as much info about the executable as possible through the techniques you have learned in previous sessions. Think of modifying registers for forcing the executable to call the function (there is more than one way of doing this).
Deliverables
Turn in a report through Moodle (online.upr.edu).