II. Memory Corruption Defense A. Problem and approaches ------------------------------------------ HOW TO PREVENT BUFFER OVERFLOW ATTACKS? ------------------------------------------ How could buffer overflows be prevented in C/C++? B. Stack Canaries ------------------------------------------ STACK CANARIES Goal: catch overwrite before |------------------| | strarg | |------------------| | return address | |------------------| | old esp | |------------------| | canary value | |------------------| | buf[99] | | | | ... buf ... | | | | buf[1] | | buf[0] | |vvvvvvvvvvvvvvvvvv| <- %esp ------------------------------------------ What would be a good value for the canary? ------------------------------------------ WHEN DO STACK CANARIES FAIL? When attacker: ------------------------------------------ C. Electric Fences ------------------------------------------ ELECTRIC FENCES Every heap allocation between pages [ Guard Page ] [ Heap Alloc.] [ Guard Page ] Guard pages are protected by hardware Write (or read) from guard pages causes an OS trap ------------------------------------------ What's the cost of this? D. Bounds Checking 1. goals and costs ------------------------------------------ BOUNDS CHECKING Goal: What does it mean in C? ------------------------------------------ What does it mean for a pointer in C to be "in bounds"? Can you enforce bounds in C without help from the compiler? ------------------------------------------ WHAT HAS TO BE DONE Interject code whenever program does: - pointer arithmetic p + 1 - pointer dereference ------------------------------------------ In C, why can't we give an error when forming a pointer past the end of a bound? In C, why can't we just give an error when dereferencing a pointer? ------------------------------------------ PRACTICAL CONSIDERATIONS How expensive are defenses? Would people pay for an OS that is: ------------------------------------------ 2. fat pointers ------------------------------------------ FAT POINTERS Instead of just an address, each pointer stores: ------------------------------------------ What's the overhead of this? Will this interoperate with old code? Does using fat pointers need compiler support? Can fat pointers be updated atomically? 3. overview ------------------------------------------ BOUNDS CHECKING TECHNIQUES Splay trees: - record sizes for each object - grow larger as objects allocated Shadow table: - memory divided into constant-sized slots - pointers converted to indexes into shadow table - can locate metadata in ------------------------------------------ What resources will limit shadow table precision? 4. Baggy Bounds checking a. for 32 bit systems ------------------------------------------ BAGGY BOUNDS CHECKING Goals: - detect and stop out of bounds accesses - ensure derived pointers point to same object - don't allow reading/writing other allocations Constraints - use same size pointers - permit pointers one past either end Ideas: - round up each allocation to nearest power of 2 (bytes, say) e.g., 9 -> 16, 28 -> 32, ... - enforce bounds of the allocation - store binary log of allocation in bounds table, bt - allocate memory with granularity of a slot (fixed size, say 16 bytes) Let size be the allocation size (in bytes) e = log_2(size) so 1 << e = 2^e = size So, in bounds table, bt, store e value recover the size by look up in bt ------------------------------------------ How do we know the index into bt to look up the size? ------------------------------------------ RECOVERING LENGTH AND BASE FROM POINTER Find the allocated size from pointer p: len(p) = 1 << bt[p >> log_2(slot_size)] Example: allocate 32 bytes char *p = malloc(32); for slot_size = 16 log_2(slot_size) = 4 len(p) = 1 << bt[p>>4] would put 5 (i.e., log_2(32)) in bt entries for 16 byte slots: bt[p>>4] = 5 bt[(p>>4)+1] = 5 Find starting address from pointer p: base(p) = p & ~(len(p)-1) Example: int a[100]; int *pa = &a[0]; allocate sizeof(int)*100 rounded up to power of 2 bytes = 512 bytes so need 512/16 == 32 slots each bt entry stores log_2(512) = 9 suppose address of pa == 0x400 so use 32 bt entries: pa >> 4 = 0x40 (== 64) bt[pa>>4] = 9 bt[(pa>>4)+1] = 9 ... bt[(pa>>4)+31] = 9 suppose: int *pb = &a[75]; == 0x400 + 4*0x4B == 0x400 + 0x12C == 0x52C (== 1324) len(pb) = 1 << bt[pb>>4] = 1 << bt[0x52C>>4] = 1 << bt[0x52] = 1 << 9 = 0x200 (== 512) base(pb) = pb & ~(len(pb)-1) = 0x52C & ~(1FF) = 0x400 ------------------------------------------ What is 1 << s in normal mathematical terms? Do we need to align the allocations? Why do we allocate 512 bytes for the array a (of 100 ints)? Why do we need 32 entries in bt for pa (or a)? ------------------------------------------ BAGGY BOUNDS CHECKING (32 bit Arch.) inBounds(p) = base(p) <= p and p < base(p)+len(p) Result of pointer arithmetic: for computation of p2 = p + n; add the code: p2 = result(p2); where result(p2) = if inBounds(p2) then p2 else if base(p2)-7 <= p2 and p2 < base(p2)+len(p2)+8 then (1 << 31) | p2 else error("illegal pointer") ------------------------------------------ What does (1 << 31) | p2 do? Why set high-order bit of out-of-bounds pointers to 1? Why allow 1/2 a slot under or over? How do we find the correct base and length for illegal pointers? Why not allow larger than 1/2 a slot under or over bounds? Does code need to check bounds of dereferences? ------------------------------------------ WORKING WITH UNINSTRUMENTED CODE Code in libraries that was not compiled with baggy bounds checks bt entries set to maximum possible (31) ------------------------------------------ Will there ever be dereference errors for pointers to space allocated by library code? Can an out-of-bounds pointer be passed to uninstumented code? b. for 64 bit systems ------------------------------------------ BAGGY BOUNDS WITH 64 BIT POINTERS Idea: Store more information in the pointers Valid pointer: [ zeros | log2(size) | address ] 21 bits 5 bits 38 bits Invalid pointer: [ offset | log2(size) | zeros | address ] 13 bits 5 bits 8 bits 38 bits ------------------------------------------ What are the advantages of putting this information in the pointers? c. evaluation ------------------------------------------ EVALUATING BAGGY BOUNDS SYSTEM Disadvantages: Advantages: ------------------------------------------ Can still have memory attacks with this system? E. Non-executable Memory ------------------------------------------ NON-EXECUTABLE MEMORY 3 bits for permission on a page: read (R), write (W), execute (X) Execute means that Policy: W xor X so can't both write and execute in a page ------------------------------------------ What kind of program would need to both write and execute code? F. Randomized Address Space (ASLR) ------------------------------------------ RANDOMIZED ADDRESS SPACE Idea: - make it harder for attacker to Often called ASLR = Address Space Layout Randomization ------------------------------------------ ------------------------------------------ DEFEATING ASLR Approaches: - extract the randomness - heap attack: . spread shell code all over the heap . jump to a random address - nop slide . put lots of nops in heap . put shell code at end . jump to random address ------------------------------------------ G. practice ------------------------------------------ WHAT IS USED IN PRACTICE? Both gcc and Visual Studio: - use stack canaries by default Linux and Windows both have W xor X memory and ASLR WHAT IS NOT USED IN PRACTICE? Baggy bounds ------------------------------------------ Why would systems not use baggy bounds?