CIS 6614 meeting -*- Outline -*- * Memory Corruption Defense Based on MIT's OCW 6.858 course, lecture 2: https://ocw.mit.edu/courses/6-858-computer-systems-security-fall-2014/ resources/lecture-2-control-hijacking-attacks/ ** Problem and approaches ------------------------------------------ HOW TO PREVENT BUFFER OVERFLOW ATTACKS? ------------------------------------------ Q: How could buffer overflows be prevented in C/C++? ... - insert checks on indexing at compile time but: - requires recompilation - not compatible with programs that use libc so, would like OS to prevent overwriting without recompilation - stop using all dangerous (libc) functions but: - programmers may make mistakes - attackers will find ways around this (e.g., return-oriented programming) Another way to look at the problem: Attacker must: 1. Divert program's control flow (away from normal) - gain control over program counter 2. Redirect program's execution to attacker's code - make program counter point to malicious code Return-oriented programming shows that stopping the attacker from executing on data is not enough ... - stop changes to program's control flow ** Stack Canaries Like the "canary in the coal mine" (if it stops singing, you are in trouble...) ------------------------------------------ STACK CANARIES Goal: catch overwrite before |------------------| | strarg | |------------------| | return address | |------------------| | old esp | |------------------| | canary value | |------------------| | buf[99] | | | | ... buf ... | | | | buf[1] | | buf[0] | |vvvvvvvvvvvvvvvvvv| <- %esp ------------------------------------------ ... using the return address The idea is that the buffer overflow attack will write past the end of the buffer, and thus overwrite the canary (since it will need to write in increasing addresses) So compiler must put special code to check that the canary is unchanged before returning Q: What would be a good value for the canary? "\0\n\r\0xff" - so that standard library functions will stop - should be hard to guess, depends on how really random the value is (how much actual randomness?) Why might a deterministic value be bad? Could be used in attacks! ------------------------------------------ WHEN DO STACK CANARIES FAIL? When attacker: ------------------------------------------ ... - overwrites function pointers (e.g., on the heap) - there might not be enough randomness (entropy) to cause the attack to become easier for the attacker - malloc/free attacks (overflowing a buffer causes data structure in malloc/free system to be corrupted, e.g., a doubly-linked list, then the free operation uses metadata that was overwritten and writes into memory based on attacker-controlled data) ** Electric Fences ------------------------------------------ ELECTRIC FENCES Every heap allocation between pages [ Guard Page ] [ Heap Alloc.] [ Guard Page ] Guard pages are protected by hardware Write (or read) from guard pages causes an OS trap ------------------------------------------ Q: What's the cost of this? 2 pages for each allocated object, so space intensive thus not used in production Each OS page is typically about 4K ** Bounds Checking *** goals and costs ------------------------------------------ BOUNDS CHECKING Goal: What does it mean in C? ------------------------------------------ ... stop buffer overflows by checking bounds on array/pointer accesses Q: What does it mean for a pointer in C to be "in bounds"? maybe something like: if the pointer was to an array, then it points to part of that array? but do all pointers in C point into arrays? no, they could point to stack or heap locations, if so, then they should always point to those locations... Q: Can you enforce bounds in C without help from the compiler? No, arrays don't have bounds in them at runtime ------------------------------------------ WHAT HAS TO BE DONE Interject code whenever program does: - pointer arithmetic p + 1 - pointer dereference ------------------------------------------ Q: In C, why can't we give an error when forming a pointer past the end of a bound? It's actually legal in C to go 1 past the end of an array, which might be a stopping condition in a loop, so that is legal and might not be an error Q: In C, why can't we just give an error when dereferencing a pointer? Because need to know where the bounds are, and that has to be recorded when doing pointer arithmetic... ------------------------------------------ PRACTICAL CONSIDERATIONS How expensive are defenses? Would people pay for an OS that is: ------------------------------------------ ... - slower (due to bounds checking) - takes more space (for bounds info) Why is bounds checking slower? Can't put pointers in a register and just increment them... Every array access (or pointer dereference) involves at least 2 possible memory fetches (the bound and the array, although the bound might be cached) Are there complications due to multi-threading? Yes, have to check and access atomically, (but now there's more code...) Why does bounds checking take more space? Need to store bounds information for all arrays and pointers *** fat pointers ------------------------------------------ FAT POINTERS Instead of just an address, each pointer stores: ------------------------------------------ - base address - ending address - pointed to (current) address Q: What's the overhead of this? 3 times the space, plus 2 comparisons in time per access Q: Will this interoperate with old code? No, all pointers will now have different sizes Q: Does using fat pointers need compiler support? Yes, because need to pass larger pointers, and need to emit code to check bounds Q: Can fat pointers be updated atomically? No, now there are 3 words per pointer, so can cause concurrency bugs *** overview Based on: Zhengyang Liu and John Criswell. Flexible and efficient memory object metadata. In Proceedings of the 2017 ACM SIGPLAN International Symposium on Memory Management (ISMM 2017). Association for Computing Machinery, New York, NY, USA, 36--46. https://doi.org/10.1145/3092255. ------------------------------------------ BOUNDS CHECKING TECHNIQUES Splay trees: - record sizes for each object - grow larger as objects allocated Shadow table: - memory divided into constant-sized slots - pointers converted to indexes into shadow table - can locate metadata in ------------------------------------------ Liu and Criswell say that Splay trees "grow larger as the program allocates more objects" (p. 36) so are "impractical for programs manipulating large amounts of data" (p. 36) and "optimizations require sophisticated inter-procedural points-to analysis" ... constant time Q: What resources will limit shadow table precision? space! Address Sanitizer uses 1 bit per slot, for example and larger slots imply less precision *** Baggy Bounds checking See P. Akritidis, M. Costa, M. Castro, and S. Hand. Baggy bounds checking: An efficient and backwards-compatible defense against out-of-bounds errors. In Proceedings of the Eighteenth Usenix Security Symposium, August 2009. https://www.usenix.org/legacy/event/sec09/tech/full_papers/sec09_memory.pdf and B. Ding, Y. He, Y. Wu, A. Miller and J. Criswell. Baggy Bounds with Accurate Checking. IEEE 23rd International Symposium on Software Reliability Engineering Workshops, 2012, pp. 195-200. doi: 10.1109/ISSREW.2012.24. I also used: Zhengyang Liu and John Criswell. Flexible and efficient memory object metadata. In Proceedings of the 2017 ACM SIGPLAN International Symposium on Memory Management (ISMM 2017). Association for Computing Machinery, New York, NY, USA, 36--46. https://doi.org/10.1145/3092255. and MIT's OCW 6.858 course, lecture 2: https://ocw.mit.edu/courses/6-858-computer-systems-security-fall-2014/ resources/lecture-2-control-hijacking-attacks/ **** for 32 bit systems ------------------------------------------ BAGGY BOUNDS CHECKING Goals: - detect and stop out of bounds accesses - ensure derived pointers point to same object - don't allow reading/writing other allocations Constraints - use same size pointers - permit pointers one past either end Ideas: - round up each allocation to nearest power of 2 (bytes, say) e.g., 9 -> 16, 28 -> 32, ... - enforce bounds of the allocation - store binary log of allocation in bounds table, bt - allocate memory with granularity of a slot (fixed size, say 16 bytes) Let size be the allocation size (in bytes) e = log_2(size) so 1 << e = 2^e = size So, in bounds table, bt, store e value recover the size by look up in bt ------------------------------------------ allocation size is the requested size rounded up to the next power of 2, so if request 400 bytes, round up to 512 Q: How do we know the index into bt to look up the size? shift the pointer right by a constant (log of the slot size) ------------------------------------------ RECOVERING LENGTH AND BASE FROM POINTER Find the allocated size from pointer p: len(p) = 1 << bt[p >> log_2(slot_size)] Example: allocate 32 bytes char *p = malloc(32); for slot_size = 16 log_2(slot_size) = 4 len(p) = 1 << bt[p>>4] would put 5 (i.e., log_2(32)) in bt entries for 16 byte slots: bt[p>>4] = 5 bt[(p>>4)+1] = 5 Find starting address from pointer p: base(p) = p & ~(len(p)-1) Example: int a[100]; int *pa = &a[0]; allocate sizeof(int)*100 rounded up to power of 2 bytes = 512 bytes so need 512/16 == 32 slots each bt entry stores log_2(512) = 9 suppose address of pa == 0x400 so use 32 bt entries: pa >> 4 = 0x40 (== 64) bt[pa>>4] = 9 bt[(pa>>4)+1] = 9 ... bt[(pa>>4)+31] = 9 suppose: int *pb = &a[75]; == 0x400 + 4*0x4B == 0x400 + 0x12C == 0x52C (== 1324) len(pb) = 1 << bt[pb>>4] = 1 << bt[0x52C>>4] = 1 << bt[0x52] = 1 << 9 = 0x200 (== 512) base(pb) = pb & ~(len(pb)-1) = 0x52C & ~(1FF) = 0x400 ------------------------------------------ Q: What is 1 << s in normal mathematical terms? 2 to the power of s Q: Do we need to align the allocations? Yes, so that the exact byte is addressed by the base calculation Q: Why do we allocate 512 bytes for the array a (of 100 ints)? because it needs 400 bytes and 512 is the next largest power of 2 Q: Why do we need 32 entries in bt for pa (or a)? because 512 / 16 = 32 These calculations can be optimized... ------------------------------------------ BAGGY BOUNDS CHECKING (32 bit Arch.) inBounds(p) = check(p,base(p),len(p)) check(p,begin,end) = begin <= p and p < begin+end Result of pointer arithmetic: for computation of p2 = p + n; add the code: p2 = result(p2,p); where result(p2,p) = let begin = base(p) end = len(p) in if check(p2,begin,end) then p2 else if begin-p2 < 8 and p2 < begin+end+8 then set_high_bit(p2) else error("illegal pointer") ------------------------------------------ Q: Why set high-order bit of out-of-bounds pointers to 1? So that virtual memory manager will give an error if use it. Q: Why allow 1/2 a slot under or over? To allow pointers that are just beyond allocation to be formed, as required by C (the standard says that a program can index 1 past end) Q: How do we find the correct base and length for illegal pointers? If we know they are illegal, then can subtract 1/2 slot length to find the base for over the high end (assuming they are in the left half of the slot) and similarly by adding 1 slot length to find the base for under the low end (see p. 54) is_illegal(p) = is_high_bit_set(p) illegal_len(p) = if unset_high_bit(p)/8 >= 1 /* p is below base */ then 1 << bt[unset_high_bit(p)/16 + 1] else 1 << bt[unset_high_bit(p)/16 - 1] illegal_base(p) = if unset_high_bit(p)/8 >= 1 /* p is below base */ then base(unset_high_bit(p)+8) else base(unset_high_bit(p)-8) Q: Why not allow larger than 1/2 a slot under or over bounds? Because then can't find right base and length! Need some convention, and it would be ambiguous otherwise. Q: Do we need any other special treatment of illegal pointers? Yes (end of section 2.4): - for inequality comparisons (<, >, <=, >=) need to unset the high bit first, so compiler injects code for that. - similarly for pointer differences - don't allow passing these to uninstrumented code (libraries) (but, no need to change code that compares for equality (==, !=), as that will work already) Q: Does code need to check bounds of dereferences? No, the hardware does that, so fast. ------------------------------------------ WORKING WITH UNINSTRUMENTED CODE Code in libraries that was not compiled with baggy bounds checks bt entries set to maximum possible (31) ------------------------------------------ Q: Will there ever be dereference errors for pointers to space allocated by library code? No, but then all those errors will be missed (false negatives) So no memory safety for operations in uninstrumented code Q: Can an out-of-bounds pointer be passed to uninstumented code? No, because the address in the pointer is illegal (will trap) (But paper says, p. 55, that doesn't need to happen) **** for 64 bit systems ------------------------------------------ BAGGY BOUNDS WITH 64 BIT POINTERS Idea: Store more information in the pointers Valid pointer: [ zeros | log2(size) | address ] 21 bits 5 bits 38 bits Invalid pointer: [ offset | log2(size) | zeros | address ] 13 bits 5 bits 8 bits 38 bits ------------------------------------------ Q: What are the advantages of putting this information in the pointers? - atomicity, still all in one word - size of pointers stays the same (one word) - no separate bounds table space **** evaluation ------------------------------------------ EVALUATING BAGGY BOUNDS SYSTEM Disadvantages: Advantages: ------------------------------------------ Q: Can still have memory attacks with this system? Yes, from uninstrumented code and from overflows (within baggy bounds) affecting: malloc/free system (overflows destroying invariants) structures with function pointers (no bounds checking on pointers in code) so disadvantages: ... - doesn't prevent all overflow attacks (due to imprecision from uninstrumented code and bagginess) - use of more memory (for bounds table in 32 bit systems) - time overhead for bounds checking - false alarms from pointers beyond 1/2 slot size (on 32 bit systems) - need nontrivial compiler support ... mitigates some of buffer overflows ** Non-executable Memory ------------------------------------------ NON-EXECUTABLE MEMORY 3 bits for permission on a page: read (R), write (W), execute (X) Execute means that Policy: W xor X so can't both write and execute in a page ------------------------------------------ ... the instruction pointer (PC) can point into that page Hardware watches every memory reference But hard to generate code and execute it Q: What kind of program would need to both write and execute code? a JIT compiler! ** Randomized Address Space (ASLR) ------------------------------------------ RANDOMIZED ADDRESS SPACE Idea: - make it harder for attacker to Often called ASLR = Address Space Layout Randomization ------------------------------------------ ... know addresses to use as inputs for stack smashing i.e., put stack, heap, code at random addresses makes it harder to get right return address ------------------------------------------ DEFEATING ASLR Approaches: - extract the randomness - heap attack: . spread shell code all over the heap . jump to a random address - nop slide . put lots of nops in heap . put shell code at end . jump to random address ------------------------------------------ ... attack can find out what "random" offsets were applied (get a leak of the locations used) ** practice ------------------------------------------ WHAT IS USED IN PRACTICE? Both gcc and Visual Studio: - use stack canaries by default Linux and Windows both have W xor X memory and ASLR WHAT IS NOT USED IN PRACTICE? Baggy bounds ------------------------------------------ Q: Why would systems not use baggy bounds? overhead costs, false alarms