Lab 02 - Program Analysis

Read the explanations from https://ocw.cs.pub.ro/courses/cns/labs/lab-02. Then come back here for the exercises

Supporting files

Lab archive

Exercises

1. Where do they live?

In this exercise you will be validating some truths about executables and processes in linux.

  1. Copy the following code to a file: live.c
  2. Now we will create two versions of the executable. One that uses the Position Indendent Executable (PIE) option and one that does not: Create the no PIE first using the no-pieoption: gcc -o livenopie live.c -fPIC -no-pie
  3. ... and now the PIE version the -pieoption: gcc -o live live.c -fPIC -no-pie
  4. Validate that they are different using md5sum: md5sum live livenopie
  5. Is the file command able to tell the difference between the PIE and non-PIE versions? file live livenopie
  6. Validate that both executables work, by running each of them. The hex address that they print might be a little different, but we will learn why in further steps. Since they have a while(1); each will stay running until you do a CTRL-C
  7. The main difference between the PIE and non-PIE versions is that:
    1. In the non-PIE version, the addresses of the functions are established and the loader will use the instrutions in exactly those addresses.
    2. In the PIE version, the addresses of the functions are offsets to be used by the loader once it chooses (perhaps randomly) the base address where it wants to load the text.
  8. We will validate that difference between PIE/non-PIE in the next section. But first, use objdump to determine the address of the function foo both in the live and livenopie binaries.

ASLR and PIE

  1. [https://blog.morphisec.com/aslr-what-it-is-and-what-it-isnt/] Address Space Layout Randomization (ASLR) is a computer security technique which involves randomly positioning the base address of an executable and the position of libraries, heap, and stack, in a process's address space. Linux has this featured turned on by default.
  2. Turn off the ASLR in linux by issuing the following command: echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
  3. With ASLR off, everytime that you run an executable, it will be assigned the same base addresses in virtual memory for the stack, heap, and text (program instructions).
    1. run live several times and register the output
    2. run livenopieseveral times and register the output Make an observation in your report about what just happened.
  4. Turn on the ASLR in linux by issuing the following command: echo 2 | sudo tee /proc/sys/kernel/randomize_va_space
  5. Run live several times. What do you notice about the various addresses printed in the program?
  6. Run livenopie several times. What do you notice about the various addresses printed in the program? Comment about which addresses differ between runs and which stay the same.
  7. Complete the following table regarding live and livenopie multiple runs (when ASLR is activated). Write either change or no change.
Type Stack Heap Text
PIE
noPIE

Another way of monitoring the addresses layout of programs

  1. run ./live & (leaves the program executing in the background and lets you know its process ID). Linux allows you to see the memory map of your process by looking at the file /proc/<PID>/maps. Here a sample run in my computer.

    $ ./live &
    [2] 1479
    Hello world                                                                
    stack 0x7ffe1b272744
    stack in foo 0x7ffe1b272724
    heap 0: 0x557e4e49a420
    heap 1 : 0x557e4e49cb40
    foo's address: 0x557e4c688850
    
    $ cat /proc/1479/maps
    557e4c688000-557e4c689000 r-xp 00000000 08:02 9863938 etc....
    
  2. In your report, present the output of the map, the output of the program and explain how each address printed by the program corresponds to the correct area in the map. For example, 0x7ffe1b272744 from the program is in the range 7ffe1b253000-7ffe1b275000 presented in the map file.

2. Warm-up: Shellcode

The purpose of this task is to get you acquainted with some tools that can be used to manipulate ELF files.

Go to the tutorial/ directory.

Inspect the source code of shellcode.c

shellcode.c contains a buffer SC, that has raw instructions. Open shellcode.c to notice the buffer and its assigned value. The program tries to execute the contents of the buffer!s

  1. What is going on here?

    • What is the meaning of this declaration int (*ret)(); Variable ret is a function pointer. It points to a function that returns an integer. Here is a very good and simple example of a function pointer https://cs.nyu.edu/courses/spring12/CSCI-GA.3033-014/Assignment1/function_pointers.html
    • What are those funny looking \x?? things in the buffer? They are bytes represented as hex. For instance, "x\61x\62x\63" is another way to express string "abc".
    • What is write(1, SC, sizeof SC); its a way of printing the contents of SC to the standard input, using the linux write system call.
  2. Compile the program. Run the program. What does it print? What happens after that?

    It prints the address of SC, then it exits because of a segmentation fault.

  3. Corroborate the address where SC lives, using readelf. You should get the same address that was printed in the previous step.

    readelf -s ./shellcode | grep SC
    
  4. Why is it happening? in which section is the SC variable? With what flags is this segment loaded?

    $ readelf -S ./shellcode
    [24] .data             PROGBITS         0000000000601020  
         0000000000000058  0000000000000000  WA       0     0     32
    
  5. Try to change the flags of the .data section to make it executable .

    objcopy --set-section-flags .data=alloc,code,load ./shellcode
    

    Verify using readelf that the data section now has X among its flags.

  6. Is it working now? If not why?

    NO. remember the two views of a file! the segment is still loaded RW, the loader only knows about segments

    and segment 03 is:

    LOAD           0x0000000000000e28 0x0000000000600e28 0x0000000000600e28
                   0x0000000000000250 0x0000000000000260  RW     200000
    

A trick that might work is making the stack executable (execstack -s ./shellcode). If you can successfully run the executable, the shell code invokes a shell using /bin/sh . You will notice that the prompt changes to a $. To exit that shell do a CTRL-D.

Compile run and save the generated shellcode

  1. Recompile the shellcode.c. Then run it with an argument and direct its output to mycode.bin

    gcc -O0 -o shellcode shellcode.c
    ./shellcode generate > mycode.bin
    
  2. What is the type of the file mycode.bin?

    $ file ./mycode.bin
    ./mycode.bin: data
    
  3. Why is file saying this is a data file, when our intention was to write some machine code in it?

    the filecommand is just reading some magic bytes at the beginning of the file, this is misleading. To be classified as an executable the mycode.bin would need to be structured like and ELF file.

  4. Try to execute ./mycode.bin!

    chmod +x ./mycode.bin    # to change the execute permission
    ./mycode.bin
    
  5. Who is throwing the error?

    The loader, which resides in the operating system.

How to actually run the generated shellcode.

The problem so far is that the shellcode (SC) ends up in a segment that does not have the executable bit set. One solution to this is, at runtime, remap the segment (page) with the exec flag – this solution requires writing some code. We can focus on another solution: use tools and .ELF's capability:

  1. Generate an .ELF object file from the raw binary

    objcopy -I binary -O elf64-x86-64 ./mycode.bin ./mycode.bin.o
    
  2. Check the flags of the .data section! Where are the segments? (readelf -S mycode.bin.o)

    It should be WA! The segments are linktime info, we didn't link yet

  3. Adjust the .data section of this elf as text

    objcopy -I elf64-x86-64 --set-section-flags .data=alloc,code,load ./mycode.bin.o
    

    Verify that the flags of the .data section are now WAX.

  4. We have not set the architecture to which our future executable is aimed. Try:

    readelf -h mycode.bin.o
    

    and read the field Machine:.

  5. Set machine (artifact of objcopy)

    elfedit --output-mach x86-64 ./mycode.bin.o
    

    Verify that the Machine: field has been assigned to.

  6. How do we actually use the data from this .o file? What symbols are exported?

    $ readelf -s ./mycode.bin.o
    0000000000000035 D _binary___mycode_bin_end
    0000000000000035 A _binary___mycode_bin_size
    0000000000000000 D _binary___mycode_bin_start
    
  1. Inspect use-my-code.c! What does it do?

    • It uses the variables previously listed to call the code.
    • Quick recap:
      1. starting from a binary blob we generated a object file (.ELF)
      2. the contents of the .data section are the bytes from the binary blob
      3. the data section is marked WAX (executable)
  2. Compile and link!

    gcc -O0 use-my-code.c ./mycode.bin.o -o my
    
  3. The stack is still executable, remove this flag!

    execstack -c ./my

  4. Even if the stack is not executable, you should be able to run the shellcode, the data section is executable, please check it!

    readelf -e my | grep .data, check for segment 03 (which maps the .data section)


If calling func causes a Segmentation Fault, it's likely that your system produces PIE executables by default. Modify the “Compile and link!” command to:

gcc -no-pie -O0 use-my-code.c ./mycode.bin.o -o my

3. Warm-up: stripped

Someone has given us a stripped binary called stripped. Let's run it and give it a brief view:

$ ./stripped 
Hello, there!
I am looping, looping, looping, looping, looping,
$ file ./stripped
./stripped:  ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, stripped

The executable file is stripped, so we can't rely on any symbol information to look at it. However, it's small enough, so we can try to reverse engineer it by hand. To do that, answer the following questions:

4. stripped, re-loaded

Looking more carefully at our stripped binary, we notice that there is one string that it never prints out:

$ strings -t x stripped
    154 Hello, there!
    162 I am looping, 
    171 All done!
    17c .shstrtab
    186 .text
    18c .data

The string All done! is at offset 0x171 in the binary, that is equivalent to 0x600171` in the loaded program.

$ objdump -d stripped -M intel
00000000004000b1 <h>:
  4000b1:   ba 0a 00 00 00          mov    edx,0xa
  4000b6:   48 be 71 01 60 00 00    movabs rsi,0x600171
  4000bd:   00 00 00 
  4000c0:   e8 75 00 00 00          call   0x40013a
  4000c5:   c3                      ret    

This means that the function that does the print (0x4000b1) is never reached! Why? The reason is that the program exits before doing that.

Find the call to the exit function that occurs at run-time exactly before this print and manually replace it with NOP instructions using the hex editor of your choice. At the end the program should display the following:

./stripped
Hello, there!
I am looping, looping, looping, looping, looping,
All done!

Note that the program should still exit cleanly!


Hint: the NOP instruction has opcode 0x90, so just replace all the bytes of the offending call instruction with that


5. Memory Dump Analysis

Using your newfound voodoo skills you are now able to tackle the following task. In the middle of two programs I added the following lines:

    {
        int i;
        int *a[1];
        for( i = 0 ; i < 20; i++)
            printf("%p\n", a[i]);
    }

The results were the following. respectively:

0x400610
0xf5ad3276b90ff00
0x7ffd285c1f30
0x400601
0x400610
0x7f3b7a144830
0x1
0x7ffd285c2018
0x17a713ca0
0x4005f3
(nil)
0x49eb39c23ee94624
0x4004a0
0x7ffd285c2010
(nil)
(nil)
0xb61169fa0c494624
0xb79dcd6abd194624
(nil)
(nil)

Answer these questions:

6. Smash the Stack

7. GDB

Your task is to use GDB to force the executable to call the password_accepted function.

Gather as much info about the executable as possible through the techniques you have learned in previous sessions. Think of modifying registers for forcing the executable to call the function (there is more than one way of doing this).

Deliverables

Turn in a report through Moodle (online.upr.edu).