II. Memory Corruption Defense
 A. Problem and approaches
------------------------------------------
  HOW TO PREVENT BUFFER OVERFLOW ATTACKS?


------------------------------------------
        How could buffer overflows be prevented in C/C++?
 B. Stack Canaries
------------------------------------------
       STACK CANARIES

Goal: catch overwrite
      before 

     |------------------|
     |   strarg         |
     |------------------|
     |   return address |
     |------------------|
     |   old esp        |
     |------------------|
     |  canary value    |
     |------------------|
     |  buf[99]         |
     |                  |
     |   ... buf ...    |
     |                  |
     |  buf[1]          |
     |  buf[0]          |
     |vvvvvvvvvvvvvvvvvv| <- %esp

------------------------------------------
     What would be a good value for the canary?
------------------------------------------
    WHEN DO STACK CANARIES FAIL?

When attacker:

    
------------------------------------------
 C. Electric Fences
------------------------------------------
        ELECTRIC FENCES

Every heap allocation between pages

             [ Guard Page ]
             [ Heap Alloc.]
             [ Guard Page ]

Guard pages are protected by hardware

Write (or read) from guard pages
   causes an OS trap
------------------------------------------
        What's the cost of this?
 D. Bounds Checking
  1. goals and costs
------------------------------------------
         BOUNDS CHECKING

Goal:


What does it mean in C?


------------------------------------------
    What does it mean for a pointer in C to be "in bounds"?
    Can you enforce bounds in C without help from the compiler?
------------------------------------------
       WHAT HAS TO BE DONE

Interject code whenever program does:

   - pointer arithmetic
       p + 1
       
   - pointer dereference
------------------------------------------
        In C, why can't we give an error when forming a pointer
           past the end of a bound?
        In C, why can't we just give an error when dereferencing a pointer?
------------------------------------------
     PRACTICAL CONSIDERATIONS

How expensive are defenses?

Would people pay for an OS that is:


------------------------------------------
  2. fat pointers
------------------------------------------
         FAT POINTERS

Instead of just an address, each pointer
stores:


------------------------------------------
       What's the overhead of this?
       Will this interoperate with old code?
       Does using fat pointers need compiler support?
       Can fat pointers be updated atomically?
  3. overview
------------------------------------------
     BOUNDS CHECKING TECHNIQUES

Splay trees:
   - record sizes for each object
   - grow larger as objects allocated

Shadow table:
   - memory divided into
     constant-sized slots
   - pointers converted to indexes
     into shadow table
   - can locate metadata
     in


------------------------------------------
        What resources will limit shadow table precision?
  4. Baggy Bounds checking
   a. for 32 bit systems
------------------------------------------
          BAGGY BOUNDS CHECKING

Goals:
  - detect and stop out of bounds accesses
  - ensure derived pointers
    point to same object
  - don't allow reading/writing other
    allocations
    
Constraints
  - use same size pointers
  - permit pointers one past either end

Ideas:
  - round up each allocation to nearest
    power of 2 (bytes, say)
       e.g., 9 -> 16, 28 -> 32, ...
  - enforce bounds of the allocation
  - store binary log of allocation
    in bounds table, bt
  - allocate memory with granularity
    of a slot (fixed size, say 16 bytes)

Let size be the allocation size (in bytes)
    e = log_2(size)
  so 1 << e = 2^e = size

So, in bounds table, bt, store e value

  recover the size by look up in bt
    
------------------------------------------
        How do we know the index into bt to look up the size?
------------------------------------------
  RECOVERING LENGTH AND BASE FROM POINTER

Find the allocated size from pointer p:
  len(p) = 1 << bt[p >> log_2(slot_size)]

Example: allocate 32 bytes
          char *p = malloc(32);
          
      for slot_size = 16
          log_2(slot_size) = 4
          
    len(p) = 1 << bt[p>>4]
    would put 5 (i.e., log_2(32))
      in bt entries for 16 byte slots:
        bt[p>>4] = 5
        bt[(p>>4)+1] = 5

Find starting address from pointer p:
   base(p) = p & ~(len(p)-1)

   Example:
        int a[100];
        int *pa = &a[0];

    allocate sizeof(int)*100 rounded
      up to power of 2 bytes = 512 bytes
      so need 512/16 == 32 slots
       each bt entry stores log_2(512) = 9

    suppose address of pa == 0x400
    so use 32 bt entries:
       pa >> 4 = 0x40 (== 64)

       bt[pa>>4] = 9
       bt[(pa>>4)+1] = 9
       ...
       bt[(pa>>4)+31] = 9

    suppose:
        int *pb = &a[75];
                == 0x400 + 4*0x4B 
                == 0x400 + 0x12C
                == 0x52C (== 1324)

        len(pb) = 1 << bt[pb>>4]
                = 1 << bt[0x52C>>4]
                = 1 << bt[0x52]
                = 1 << 9
                = 0x200 (== 512)
        base(pb) = pb & ~(len(pb)-1)
                 = 0x52C & ~(1FF)
                 = 0x400
------------------------------------------
    What is 1 << s in normal mathematical terms?
    Do we need to align the allocations?
    Why do we allocate 512 bytes for the array a (of 100 ints)?
    Why do we need 32 entries in bt for pa (or a)?
------------------------------------------
    BAGGY BOUNDS CHECKING (32 bit Arch.)      


inBounds(p) = base(p) <= p
              and p < base(p)+len(p)

Result of pointer arithmetic:
 for computation of
    p2 = p + n;

 add the code:
    p2 = result(p2);

 where
 
 result(p2) =
   if inBounds(p2)
   then p2
   else if base(p2)-7 <= p2
           and p2 < base(p2)+len(p2)+8
        then (1 << 31) | p2
        else error("illegal pointer")

------------------------------------------
   What does (1 << 31) | p2 do?
   Why set high-order bit of out-of-bounds pointers to 1?
   Why allow 1/2 a slot under or over?
   How do we find the correct base and length for illegal pointers?
   Why not allow larger than 1/2 a slot under or over bounds?
   Does code need to check bounds of dereferences?
------------------------------------------
      WORKING WITH UNINSTRUMENTED CODE

Code in libraries that was not compiled
  with baggy bounds checks

  bt entries set to maximum possible (31)

   
------------------------------------------
       Will there ever be dereference errors for pointers to space
          allocated by library code?
       Can an out-of-bounds pointer be passed to uninstumented code?
   b. for 64 bit systems
------------------------------------------
    BAGGY BOUNDS WITH 64 BIT POINTERS

Idea:
   Store more information in the pointers

 Valid pointer:
 [ zeros | log2(size) | address     ]
  21 bits    5 bits      38 bits

 Invalid pointer:
 [ offset | log2(size) | zeros | address ]
   13 bits   5 bits      8 bits  38 bits
------------------------------------------
        What are the advantages of putting this information in the pointers?
   c. evaluation
------------------------------------------
      EVALUATING BAGGY BOUNDS SYSTEM

Disadvantages:


Advantages:


------------------------------------------
        Can still have memory attacks with this system?
 E. Non-executable Memory
------------------------------------------
         NON-EXECUTABLE MEMORY

3 bits for permission on a page:
   read (R), write (W), execute (X)

Execute means that


Policy:  W xor X
   so can't both write and execute
    in a page


------------------------------------------
        What kind of program would need to both write and execute code?
 F. Randomized Address Space (ASLR)
------------------------------------------
        RANDOMIZED ADDRESS SPACE

Idea:
   - make it harder for attacker
     to


Often called ASLR
  = Address Space Layout Randomization
------------------------------------------
------------------------------------------
       DEFEATING ASLR

Approaches:

   - extract the randomness


   - heap attack:
     . spread shell code all over the heap
     . jump to a random address

   - nop slide
     . put lots of nops in heap
     . put shell code at end
     . jump to random address
------------------------------------------
 G. practice
------------------------------------------
    WHAT IS USED IN PRACTICE?

Both gcc and Visual Studio:
  - use stack canaries by default

Linux and Windows both have
   W xor X memory
 and
   ASLR

    WHAT IS NOT USED IN PRACTICE?

Baggy bounds


------------------------------------------
    Why would systems not use baggy bounds?