Memory Corruption Defense
Based on MIT's OCW 6.858 course, lecture 2:
https://ocw.mit.edu/courses/6-858-computer-systems-security-fall-2014/
resources/lecture-2-control-hijacking-attacks/
Problem and approaches
------------------------------------------
HOW TO PREVENT BUFFER OVERFLOW ATTACKS?
------------------------------------------
How could buffer overflows be prevented in C/C++?
... - insert checks on indexing at compile time
but: - requires recompilation
- not compatible with programs that use libc
so, would like OS to prevent overwriting
without recompilation
- stop using all dangerous (libc) functions
but: - programmers may make mistakes
- attackers will find ways around this
(e.g., return-oriented programming)
Another way to look at the problem:
Attacker must:
1. Divert program's control flow (away from normal)
- gain control over program counter
2. Redirect program's execution to attacker's code
- make program counter point to malicious code
Return-oriented programming shows that
stopping the attacker from executing on data is not enough
... - stop changes to program's control flow
Stack Canaries
Like the "canary in the coal mine"
(if it stops singing, you are in trouble...)
------------------------------------------
STACK CANARIES
Goal: catch overwrite
before
|------------------|
| strarg |
|------------------|
| return address |
|------------------|
| old esp |
|------------------|
| canary value |
|------------------|
| buf[99] |
| |
| ... buf ... |
| |
| buf[1] |
| buf[0] |
|vvvvvvvvvvvvvvvvvv| <- %esp
------------------------------------------
... using the return address
The idea is that the buffer overflow attack will write
past the end of the buffer, and thus overwrite the canary
(since it will need to write in increasing addresses)
So compiler must put special code to check that the canary is
unchanged before returning
What would be a good value for the canary?
"\0\n\r\0xff" - so that standard library functions will stop
- should be hard to guess,
depends on how really random the value is
(how much actual randomness?)
Why might a deterministic value be bad?
Could be used in attacks!
------------------------------------------
WHEN DO STACK CANARIES FAIL?
When attacker:
------------------------------------------
... - overwrites function pointers (e.g., on the heap)
- there might not be enough randomness (entropy)
to cause the attack to become easier for the attacker
- malloc/free attacks (overflowing a buffer causes
data structure in malloc/free system to be corrupted,
e.g., a doubly-linked list,
then the free operation uses metadata that was overwritten
and writes into memory based on attacker-controlled data)
Electric Fences
------------------------------------------
ELECTRIC FENCES
Every heap allocation between pages
[ Guard Page ]
[ Heap Alloc.]
[ Guard Page ]
Guard pages are protected by hardware
Write (or read) from guard pages
causes an OS trap
------------------------------------------
What's the cost of this?
2 pages for each allocated object, so space intensive
thus not used in production
Each OS page is typically about 4K
Bounds Checking
goals and costs
------------------------------------------
BOUNDS CHECKING
Goal:
What does it mean in C?
------------------------------------------
... stop buffer overflows
by checking bounds on array/pointer accesses
What does it mean for a pointer in C to be "in bounds"?
maybe something like:
if the pointer was to an array,
then it points to part of that array?
but do all pointers in C point into arrays?
no, they could point to stack or heap locations,
if so, then they should always point to those locations...
Can you enforce bounds in C without help from the compiler?
No, arrays don't have bounds in them at runtime
------------------------------------------
WHAT HAS TO BE DONE
Interject code whenever program does:
- pointer arithmetic
p + 1
- pointer dereference
------------------------------------------
Q: In C, why can't we give an error when forming a pointer
past the end of a bound?
It's actually legal in C to go 1 past the end of an array,
which might be a stopping condition in a loop,
so that is legal and might not be an error
In C, why can't we just give an error when dereferencing a pointer?
Because need to know where the bounds are, and that has to
be recorded when doing pointer arithmetic...
------------------------------------------
PRACTICAL CONSIDERATIONS
How expensive are defenses?
Would people pay for an OS that is:
------------------------------------------
... - slower (due to bounds checking)
- takes more space (for bounds info)
Why is bounds checking slower?
Can't put pointers in a register and just increment them...
Every array access (or pointer dereference)
involves at least 2 possible memory fetches
(the bound and the array, although the bound might be cached)
Are there complications due to multi-threading?
Yes, have to check and access atomically,
(but now there's more code...)
Why does bounds checking take more space?
Need to store bounds information for all arrays and pointers
fat pointers
------------------------------------------
FAT POINTERS
Instead of just an address, each pointer
stores:
------------------------------------------
- base address
- ending address
- pointed to (current) address
What's the overhead of this?
3 times the space, plus 2 comparisons in time per access
Will this interoperate with old code?
No, all pointers will now have different sizes
Does using fat pointers need compiler support?
Yes, because need to pass larger pointers,
and need to emit code to check bounds
Can fat pointers be updated atomically?
No, now there are 3 words per pointer,
so can cause concurrency bugs
overview
Based on:
Zhengyang Liu and John Criswell.
Flexible and efficient memory object metadata.
In Proceedings of the 2017 ACM SIGPLAN International Symposium
on Memory Management (ISMM 2017).
Association for Computing Machinery, New York, NY, USA, 36--46.
https://doi.org/10.1145/3092255.
------------------------------------------
BOUNDS CHECKING TECHNIQUES
Splay trees:
- record sizes for each object
- grow larger as objects allocated
Shadow table:
- memory divided into
constant-sized slots
- pointers converted to indexes
into shadow table
- can locate metadata
in
------------------------------------------
Liu and Criswell say that Splay trees
"grow larger as the program allocates more objects" (p. 36)
so are "impractical for programs manipulating large amounts of data"
(p. 36)
and "optimizations require
sophisticated inter-procedural points-to analysis"
... constant time
What resources will limit shadow table precision?
space! Address Sanitizer uses 1 bit per slot, for example
and larger slots imply less precision
Baggy Bounds checking
See P. Akritidis, M. Costa, M. Castro, and S. Hand.
Baggy bounds checking: An efficient and backwards-compatible
defense against out-of-bounds errors.
In Proceedings of the Eighteenth Usenix Security Symposium,
August 2009.
https://www.usenix.org/legacy/event/sec09/tech/full_papers/sec09_memory.pdf
and
B. Ding, Y. He, Y. Wu, A. Miller and J. Criswell.
Baggy Bounds with Accurate Checking.
IEEE 23rd International Symposium on Software Reliability Engineering Workshops,
2012, pp. 195-200.
doi: 10.1109/ISSREW.2012.24.
I also used:
Zhengyang Liu and John Criswell.
Flexible and efficient memory object metadata.
In Proceedings of the 2017 ACM SIGPLAN International Symposium
on Memory Management (ISMM 2017).
Association for Computing Machinery, New York, NY, USA, 36--46.
https://doi.org/10.1145/3092255.
and
MIT's OCW 6.858 course, lecture 2:
https://ocw.mit.edu/courses/6-858-computer-systems-security-fall-2014/
resources/lecture-2-control-hijacking-attacks/
**** for 32 bit systems
------------------------------------------
BAGGY BOUNDS CHECKING
Goals:
- detect and stop out of bounds accesses
- ensure derived pointers
point to same object
- don't allow reading/writing other
allocations
Constraints
- use same size pointers
- permit pointers one past either end
Ideas:
- round up each allocation to nearest
power of 2 (bytes, say)
e.g., 9 -> 16, 28 -> 32, ...
- enforce bounds of the allocation
- store binary log of allocation
in bounds table, bt
- allocate memory with granularity
of a slot (fixed size, say 16 bytes)
Let size be the allocation size (in bytes)
e = log_2(size)
so 1 << e = 2^e = size
So, in bounds table, bt, store e value
recover the size by look up in bt
------------------------------------------
allocation size is the requested size rounded up to the next
power of 2, so if request 400 bytes, round up to 512
How do we know the index into bt to look up the size?
shift the pointer right by a constant (log of the slot size)
------------------------------------------
RECOVERING LENGTH AND BASE FROM POINTER
Find the allocated size from pointer p:
len(p) = 1 << bt[p >> log_2(slot_size)]
Example: allocate 32 bytes
char *p = malloc(32);
for slot_size = 16
log_2(slot_size) = 4
len(p) = 1 << bt[p>>4]
would put 5 (i.e., log_2(32))
in bt entries for 16 byte slots:
bt[p>>4] = 5
bt[(p>>4)+1] = 5
Find starting address from pointer p:
base(p) = p & ~(len(p)-1)
Example:
int a[100];
int *pa = &a[0];
allocate sizeof(int)*100 rounded
up to power of 2 bytes = 512 bytes
so need 512/16 == 32 slots
each bt entry stores log_2(512) = 9
suppose address of pa == 0x400
so use 32 bt entries:
pa >> 4 = 0x40 (== 64)
bt[pa>>4] = 9
bt[(pa>>4)+1] = 9
...
bt[(pa>>4)+31] = 9
suppose:
int *pb = &a[75];
== 0x400 + 4*0x4B
== 0x400 + 0x12C
== 0x52C (== 1324)
len(pb) = 1 << bt[pb>>4]
= 1 << bt[0x52C>>4]
= 1 << bt[0x52]
= 1 << 9
= 0x200 (== 512)
base(pb) = pb & ~(len(pb)-1)
= 0x52C & ~(1FF)
= 0x400
------------------------------------------
What is 1 << s in normal mathematical terms?
2 to the power of s
Do we need to align the allocations?
Yes, so that the exact byte is addressed by the base calculation
Why do we allocate 512 bytes for the array a (of 100 ints)?
because it needs 400 bytes and 512 is the next largest power of 2
Why do we need 32 entries in bt for pa (or a)?
because 512 / 16 = 32
These calculations can be optimized...
------------------------------------------
BAGGY BOUNDS CHECKING (32 bit Arch.)
inBounds(p) = check(p,base(p),len(p))
check(p,begin,end) = begin <= p
and p < begin+end
Result of pointer arithmetic:
for computation of
p2 = p + n;
add the code:
p2 = result(p2,p);
where
result(p2,p) =
let begin = base(p)
end = len(p)
in
if check(p2,begin,end)
then p2
else if begin-p2 < 8
and p2 < begin+end+8
then set_high_bit(p2)
else error("illegal pointer")
------------------------------------------
Why set high-order bit of out-of-bounds pointers to 1?
So that virtual memory manager will give an error if use it.
Why allow 1/2 a slot under or over?
To allow pointers that are just beyond allocation to be formed,
as required by C (the standard says that a program can index 1 past end)
How do we find the correct base and length for illegal pointers?
If we know they are illegal, then can subtract 1/2 slot length
to find the base for over the high end
(assuming they are in the left half of the slot)
and similarly by adding 1 slot length
to find the base for under the low end (see p. 54)
is_illegal(p) = is_high_bit_set(p)
illegal_len(p) = if unset_high_bit(p)/8 >= 1 /* p is below base */
then 1 << bt[unset_high_bit(p)/16 + 1]
else 1 << bt[unset_high_bit(p)/16 - 1]
illegal_base(p) = if unset_high_bit(p)/8 >= 1 /* p is below base */
then base(unset_high_bit(p)+8)
else base(unset_high_bit(p)-8)
Why not allow larger than 1/2 a slot under or over bounds?
Because then can't find right base and length!
Need some convention, and it would be ambiguous otherwise.
Do we need any other special treatment of illegal pointers?
Yes (end of section 2.4):
- for inequality comparisons (<, >, <=, >=)
need to unset the high bit first, so compiler injects code for that.
- similarly for pointer differences
- don't allow passing these to uninstrumented code (libraries)
(but, no need to change code that compares for equality (==, !=),
as that will work already)
Does code need to check bounds of dereferences?
No, the hardware does that, so fast.
------------------------------------------
WORKING WITH UNINSTRUMENTED CODE
Code in libraries that was not compiled
with baggy bounds checks
bt entries set to maximum possible (31)
------------------------------------------
Q: Will there ever be dereference errors for pointers to space
allocated by library code?
No, but then all those errors will be missed
(false negatives)
So no memory safety for operations in uninstrumented code
Can an out-of-bounds pointer be passed to uninstumented code?
No, because the address in the pointer is illegal (will trap)
(But paper says, p. 55, that doesn't need to happen)
**** for 64 bit systems
------------------------------------------
BAGGY BOUNDS WITH 64 BIT POINTERS
Idea:
Store more information in the pointers
Valid pointer:
[ zeros | log2(size) | address ]
21 bits 5 bits 38 bits
Invalid pointer:
[ offset | log2(size) | zeros | address ]
13 bits 5 bits 8 bits 38 bits
------------------------------------------
What are the advantages of putting this information in the pointers?
- atomicity, still all in one word
- size of pointers stays the same (one word)
- no separate bounds table space
**** evaluation
------------------------------------------
EVALUATING BAGGY BOUNDS SYSTEM
Disadvantages:
Advantages:
------------------------------------------
Can still have memory attacks with this system?
Yes, from uninstrumented code
and from overflows (within baggy bounds) affecting:
malloc/free system
(overflows destroying invariants)
structures with function pointers
(no bounds checking on pointers in code)
so disadvantages:
... - doesn't prevent all overflow attacks
(due to imprecision from uninstrumented code and bagginess)
- use of more memory (for bounds table in 32 bit systems)
- time overhead for bounds checking
- false alarms from pointers beyond 1/2 slot size
(on 32 bit systems)
- need nontrivial compiler support
... mitigates some of buffer overflows
Non
-executable Memory
------------------------------------------
NON-EXECUTABLE MEMORY
3 bits for permission on a page:
read (R), write (W), execute (X)
Execute means that
Policy: W xor X
so can't both write and execute
in a page
------------------------------------------
... the instruction pointer (PC)
can point into that page
Hardware watches every memory reference
But hard to generate code and execute it
What kind of program would need to both write and execute code?
a JIT compiler!
Randomized Address Space
(ASLR)
------------------------------------------
RANDOMIZED ADDRESS SPACE
Idea:
- make it harder for attacker
to
Often called ASLR
= Address Space Layout Randomization
------------------------------------------
... know addresses to use as inputs
for stack smashing
i.e., put stack, heap, code at random addresses
makes it harder to get right return address
------------------------------------------
DEFEATING ASLR
Approaches:
- extract the randomness
- heap attack:
. spread shell code all over the heap
. jump to a random address
- nop slide
. put lots of nops in heap
. put shell code at end
. jump to random address
------------------------------------------
... attack can find out what "random" offsets were applied
(get a leak of the locations used)
practice
------------------------------------------
WHAT IS USED IN PRACTICE?
Both gcc and Visual Studio:
- use stack canaries by default
Linux and Windows both have
W xor X memory
and
ASLR
WHAT IS NOT USED IN PRACTICE?
Baggy bounds
------------------------------------------
Why would systems not use baggy bounds?
overhead costs, false alarms