CIS 6614 meeting -*- Outline -*- * Reference monitors Based on notes by Suman Jana, which are based on slides by Vitaly Shmakov Martin Abadi, Mihai Budiu, Ulfar Erlingsson, and Jay Ligatti. Control-flow integrity. In Proceedings of the 12th ACM conference on Computer and communications security (CCS '05). ACM, New York, NY, USA, pages 340--353, 2005. https://doi.org/10.1145/1102120.1102165 and Ulfar Erlingsson, Martin Abadi, Michael Vrable, Mihai Budiu, and George C. Necula. XFI: Software guards for system address spaces. In Proceedings of the 7th symposium on Operating systems design and implementation (OSDI '06), pages 75-88, Usenix, Nov., 2006. https://www.usenix.org/legacy/event/osdi06/tech/ full_papers/erlingsson/erlingsson.pdf ** problem to solve ------------------------------------------ PROBLEM TO SOLVE Prevent hijacking of a process by memory attacks Solution Approaches (so far): (a) eliminate code causing overflows: - rewrite program (not in C) - expensive - static analysis - imprecise for C (b) make running attacker's code harder - dynamic taint checking - too expensive - ASLR - not effective (fork) - stack canaries - defeated if use fork - W xor X - defeated by ROP (c) eliminate overflows (dynamically): - bounds checking - too expensive ------------------------------------------ Q: Why are the approaches in (a) not practical? too much legacy code for rewriting it all C language is hard to analyze, so hard to avoid false positives Q: Why are the approaches in (b) not practical? taint checking requires recompilation and runtime overhead ASLR and stack canaries can be defeated by the techniques in the Blind Return-oriented Programming paper (in general techniques that use secrets are vulnerable) W xor X - is defeated by Return-oriented programming since only return addresses need to be overwritten... Q: Why is bounds checking for (c) not practical? worries about runtime efficiency Q: So, what else could be done? ** another approach ------------------------------------------ ANOTHER APPROACH Goal: eliminate overflows at runtime. Constraints: - no recompilation - ensure attacks must fail - be simple to explain - be simple to enforce Approach: - Monitor program's execution: - prevent attacker from ------------------------------------------ Q: Why should a technique be simple to explain and enforce? So that it can be trusted and have a small attack surface (and hopefully will protect against new attacks) ... diverting flow of control (which is necessary in stack smashing and ROP) - exfiltrating confidential data - making unauthorized system calls in general prevent attacker from violating security policies *** enforcible security policies ------------------------------------------ WHAT POLICIES CAN BE ENFORCED? Def: a *safety policy* is Once a safety policy is violated, What is not a safety policy? - policies that concern future events e.g., - policies that concern all possible executions e.g., ------------------------------------------ ... a predicate on a prefix of a program's execution history if the history is a sequence (of states), then it is a set of such prefixes of such sequences ... then it stays violated (forever) ... if a login is attempted, eventually either it is accepted or rejected ... the program should never reveal PII ------------------------------------------ EXAMPLE SAFETY POLICIES Access control: - A process only accesses files in ways permitted by ACLs Type safety: - A program only calls functions with the declared number of arguments of the declared types Memory safety: - a process never writes outside the bounds of an array - a process never writes into another process's memory Control flow safety: - A program only makes jumps that were in its original code (CFG) - A program can only call library functions that are in its original code ------------------------------------------ Q: What is an ACL? An access control list, says for each user what permissions it has Q: Does an Operating System enforce safety policies? Yes, for example access control and the second memory safety policy *** implementation architectures ------------------------------------------ IMPLEMENTATION ARCHITECTURES Kernelized Reference Monitor (RM) [ Application ] | ^ v | ==================== [ RM ] Kernel Inline Reference Monitor (IRM): Compiler produces: [ Application + checks (RM) ] | ^ v | ==================== Kernel ------------------------------------------ The Kernelized RM is part of the OS kernel it mediates all system calls Q: What are the pros and cons of the kernelized approach? pros: It can be sure to capture all communication with the kernel It can make use of data in the kernel without system calls cons: It cannot take advantage of app's semantics (a web browser is a good example of security policies that are application-level) It is only active when a system call is made It has considerable overhead, due to need for context switch (so would only be practical for enforcing properties that concern the system calls, not internal application properties) Q: Could the kernelized approach be generalized to apply to more APIs? Yes, the same approach could be used for a library API, e.g., for libc Q: Have we seen anything like an IRM before? Yes, bounds checking systems and dynamic taint checking that require code recompilation Q: What are the pros and cons of the IRM approach? pros: Can be active at each point of the app Can use data from the app No context switching to kernel, so more efficient cons: Requires recompilation Requires monitoring to prevent jumping around the monitor (or jumping into the middle of an instruction, on x86) Can only access data in kernel via a system call (and thus requires a context switch, which is costly) ** related work: isolation of processes *** policy ------------------------------------------ ISOLATION OF PROCESSES Example of safety policy and enforcement Policy: - Processes cannot write into memory of other processes Enforces access control and sharing of: - CPU - storage (disks) ------------------------------------------ *** costs ------------------------------------------ COSTS OF ISOLATION OF PROCESSES Communication between processes: - mediated by kernel - Context switch to kernel is expensive Costs of kernel call: ------------------------------------------ Q: Why is communication between processes mediated by the kernel? To enforce safety policies! ... saving context (registers, etc.) flush TLB (can cause more page faults, which are expensive) check user ID against ACL copy data to other process's memory (no direct sharing, why?) restore context Q: Is there a tradeoff shown here? Yes, between security and cheap communication *** software fault isolation ------------------------------------------ SOFTWARE FAULT ISOLATION (SFI) See: Robert Wahbe, Steven Lucco, Thomas E. Anderson, and Susan L. Graham. Efficient software-based fault isolation. In SOSP '93. ACM, NY, Dec. 1993, pages 203--216. https://doi.org/10.1145/168619.168635 Fault domain: - memory controlled by a process - code and data in one memory segment Policy enforced: - process can only write to its own fault domain - process can only jump to its own fault domain Approach: - processes live in same address space - software reference monitor (injected code) - masks statically known addresses to be inside fault domain - jump targets must be in dedicated register - register only changed by inserted code to a stored value ------------------------------------------ Q: Does SFI protect against buffer overflows? No, only enforces process isolation ** inline reference monitor ------------------------------------------ INLINE REFERENCE MONITOR Generalizes SFI to other safety policy enforcement Policy checks integrated into binary Rely on SFI to prevent circumvention of policy checks ------------------------------------------ ... either when compiling or by binary rewriting *** implementation ------------------------------------------ SASI: ENFORCEMENT OF SAFETY POLICIES Project from Cornell See Ulfar Erlingsson and Fred B. Schneider. SASI Enforcement of Security Policies: A Retrospective. In Proceedings New Security Paradigms Workshop 1999 https://ecommons.cornell.edu/bitstream/ handle/1813/7412/99-1758.pdf Efficiency by eliminating unneeded checks using partial evaluation Partial evalution: execute as much of program as possible at compile time example: define bool p(int i) {return i > 4;} so f(2+3) becomes becomes becomes Problem: semantic gap between Approach: ------------------------------------------ ... f(5) 5 > 4 true ... machine instructions and app-level concepts used in policies ... synthesize function calls etc. from machine code ** control-flow integrity ------------------------------------------ CONTROL-FLOW INTEGRITY See: Martin Abadi, Mihai Budiu, Ulfar Erlingsson, and Jay Ligatti. Control-flow integrity. In CCS '05, ACM, NY, pp. 340--353, 2005. https://doi.org/10.1145/1102120.1102165 Goal: prevent attacker from hijacking control of ip Attack model: attacker can make arbitrary changes to the program's memory Approach: - statically determine program's CFG - only allow control to follow the CFG - binary instrumentation of code ------------------------------------------ Q: Is that attack model realistic? Yes, allows attacker to change all of memory in a program Note that the approach has to solve the problem under the attack model *** Control-flow Graph Example ------------------------------------------ CONTROL-FLOW GRAPH EXAMPLE (Figure 1 in Abadi et al., 2005) #include #include bool lt(int x, int y) { return x <= y; } bool gt(int x, int y) { return x >= y; } void sort2(int a[], int b[], int len) { qsort(a, len, sizeof(int), lt); qsort(b, len, sizeof(int), gt); } lt() sort2() qsort() label17 . . . . . . call sort call 17,R ret 23 gt() label55 label23 label17 . . . . . . call sort ret 55 ret 23 label55 . . ret ... ------------------------------------------ Imagine sort is implementing a quick-sort algorithm draw arrows from calls to start of routines called and (dashed arrows) from ret to label where returns Q: Can the CFG be computed statically in a precise way? No, so some checks are performed dynamically. *** enforcement ------------------------------------------ CFI ENFORCEMENT (a) For each control transfer, statically determine its possible destination(s) (b) insert unique ID at each destination - use same ID for destinations that are targets of same source (e.g., labels 55, 23, and 17) (c) before transfer of control check that bit pattern of target matches that found in the CFG ------------------------------------------ A *unique ID* is checked to not occur in the code or data of the program Q: Why is using the same ID for destinations that can receive control from the same source imprecise? Because won't check that these are jumped to in the intended order ------------------------------------------ BINARY REWRITING rewrite jmp ecx ; computed jump to mov eax, 12345677h ; load ID-1 inc eax ; ID now in eax cmp [ecx+4], eax ; cmp ID with dest's jne error_label ; fail if not same jmp ecx rewrite mov eax, [esp+4] ; destination to prefetchnta ; label [12345678h] ; ID mov eax, [esp+4] ; destination ------------------------------------------ *** assumptions (section 3.3) ------------------------------------------ ASSUMPTIONS NEEDED FOR SECURE CFI Unique IDs: After CFI instrumentation bit patterns used as IDs only present in label IDs and ID checks Non-writeable code: program cannot modify code at runtime Non-executable data: program cannot execute instructions from data area ------------------------------------------ Q: Is the unique IDs assumption reasonable? Yes, have 2^32 possible values to use Q: How would the non-writeable code and non-executable data assumptions be checked? The OS or hardware can enforce W xor X permissions, as is already done in Linux and Windows Also need to prohibit system calls that could change these protections *** fixing imprecision **** imprecision of equivalent targets ------------------------------------------ IMPRECISION FROM REUSE OF IDs Suppose in the code: A calls C and B calls C and D Would CFI allow A to call D? ------------------------------------------ ... yes, because C and D would have the same ID Q: How could the CFG be made more precise? either - duplicate code for C (A calls one copy, B calls another, so use different IDs) or - use multiple IDs for C (one for calls from A, another for calls from B) but these both complicate the enforcement mechanism **** imprecision of not matching calls and returns in time ------------------------------------------ IMPRECISION FROM CALLS AND RETURNS Suppose in the code: A calls F then B calls F Would CFI allow F to always return to A? ------------------------------------------ ... yes, because the return labels would use the same ID Q: How could this be made more precise? use a call stack that can be trusted, a "shadow call stack" in protected memory ------------------------------------------ SHADOW CALL STACK Track calls on shadow call stack to check that return is to right place Does the shadow call stack need to protected from attacker? ------------------------------------------ ... yes, otherwise the attacker can control it! but also necessary to protect the shadow call stack from malicious program execution changes (e.g., rop), (this can be done using memory protection, see p. 349 of the CFI paper) *** evaluation ------------------------------------------ EVALUATION CFI prevents: + smashing the stack (buffer overflow attacks) + return-to-libc exploits and ROP + changes to function pointers CFI does not protect against: attacks that do not violate the CFG: - changing arguments to system calls (e.g., changing file names) - other attacks that only change data ------------------------------------------