CIS 6614 meeting -*- Outline -*- * Taint Analysis (or Taint Checking) ** tool support for avoiding injection attacks and other attacks ------------------------------------------ HOW COULD TOOLS HELP What could a tool do to prevent code injection attacks? ------------------------------------------ ... track data that is controlled by users can be used in code reviews (human or automated) *** taint analysis ------------------------------------------ TAINT ANALYSIS Def: *taint* is Static taint analysis is Dynamic taint analysis is History: - a feature of Perl since 1989 ------------------------------------------ ... a bit of some bad thing, like a poison or contaminant Think of a water supply or bottle of water that is "tainted" This is the idea/intuition ... a static analysis technique that identifies what data may be tainted by user inputs. Typically there is a built-in notion of how inputs are sanitized (e.g., by calling some function) or an annotation for saying when some data is sanitized ... a dynamic analysis technique for tracking: - user input - sensitive data (output) that identifies when that data may be used or output also tries to identify when taint is sent over the network or used in the rest of a system (taint sinks) Q: Why would either static or dynamic analysis be favored? - Dynamic analysis would work without recompilation or source code, and could be more precise (fewer false alarms) - Static analysis could work for all possible executions, may be more efficient at runtime, but could generate more false alarms (false positives) **** DIFT: architectural support ------------------------------------------ DYNAMIC INFORMATION FLOW TRACKING (DIFT) See: G. Edward Suh, Jae W. Lee, David Zhang, and Srinivas Devadas. "Secure program execution via dynamic information flow tracking." In ASPLOS XI, pp. 85--96, ACM 2004. https://doi.org/10.1145/1024393.1024404 Idea: - Hard to stop memory overwrites (any bug can cause them) - Instead Approach: - architectural support to - trap to handler if detects ------------------------------------------ ... block attacker from changing program's control flow so that they can't execute malicious code ... track inputs and monitor their use ... tainted data being used to change control flow Measured overhead only about 1% (both time and space) Transparent to users ------------------------------------------ ATTACK MODEL Of the Suh et al. 2004 paper: - Attacker can send "a malicious input that exploits a vulnerability in the program." - Programs may be buggy, but not malicious - Bugs may cause changes to memory ------------------------------------------ Explain why attack models are important (for papers especially) Q: How could a bug change memory? Buffer overflow or format string **** information flow security Should not be confused with DIFT A more restrictive notion than taint analysis or DIFT (and static!) ------------------------------------------ INFORMATION FLOW SECURITY See: Dorothy E. Denning and Peter J. Denning. "Certification of programs for secure information flow." CACM 20(7):504-513, July 1977. https://doi.org/10.1145/359636.359712 Confinement problem: - "confidential results should not depend on non-confidential data" Policy specified by: - set of security class (e.g., secret) - bindings of objects to security classes - flow relation specifying permitted flows Example: int Y = 0; if (X == 0) { Y = 0; } else { Y = 1; } Approach: static analysis ------------------------------------------ Q: When does information flow from an object, X, to another object, Y? When Y is computed using X or when the flow of control is influenced by X and that flow of control determines computation of Y Q: In the example, does information flow from X to Y? Yes, from Y's value one can tell if X was 0 (or not) although this would be missed by taint checking Indeed if X must be either 0 or 1, then this is the same as the assignment Y = X; ------------------------------------------ RULES FOR INFORMATION FLOW ANALYSIS Taint checking plus: - implicit flows: from objects used in a condition to objects changed by code run conditionally ------------------------------------------ Q: Does taint checking check implicit flows? No, only *explicit flows* **** Does information flow matter? ------------------------------------------ DOES INFORMATION FLOW MATTER? Format String attacks: printf(str); Stack could be: str: [ *--]--> "%*x%n\0" ------------------------------------------ Q: What does %n do in C as a format string? it takes an (int) value from the stack and uses that as the number of bytes to write and it stores the number of bytes written in the next argument Q: What does %* do in C as a format string? it takes the width of the field from the next argument ... width: [ ] // used and written by printf dest: [ ] // written as destination address So this is writing into the stack some values Q: Would the changes to the stack be caught by taint analysis? No, since the values come from the stack... Q: Could this make an untainted value depend on a tainted value? Yes! The problem is jumping via a tainted index ------------------------------------------ CODE FROM VSPRINTF (EXCERPT) if (ch == '+') do_plus(); else if (ch == '%') do_percent(); /* ... */ else if (ch == '*') { width = read_from_stack(); do_width_asterics (); } ------------------------------------------ Q: If ch is tainted, does taint checking give an alarm here? No, it's an implicit flow **** algorithm/rules for taint analysis ------------------------------------------ RULES FOR TAINTING From: Asia Slowinska and Herbert Bos. "Pointer tainting still pointless: (but we all see the point of tainting)". SIGOPS Oper. Syst. Rev. 44(3):88-92, July 2010. https://doi.org/10.1145/1842733.1842748 Basic tainting: - taint data (bytes) originally - taint data that is Alternative/complimentary approach: - mark sensitive data - warn if marked data is sent to an untrusted sink Can detect attacks on flow of control ------------------------------------------ ... from untrusted source (e.g., network) ... computed using tainted data as input Note that Slowinska and Bos say that basic taint checking cannot detect attacks that do not change control flow **** Implementation in valgrind ------------------------------------------ TAINT CHECKING IN VALGRIND Works on x86 binaries (no source) Skins used: - TaintSeed: identifies sources - TaintTracker: policy for propagation - TaintAssert: traps dangerous uses Implementation: TaintSeed: - each byte of memory has 4-byte pointer to taint structure if that byte is tainted (else NULL) TaintTracker: = sets 4-byte pointer if result of op should be tainted TaintAssert: - checks uses of potentially dangerous ops against policy Exploit Analyzer: - backtrace of taint structures - helps understand attack - transfer to sandbox for analysis ------------------------------------------ Q: Why would a user want to understand an attack? To check that nothing bad actually happened, To recognize attacks more quickly in the future To have firewall reject attacks **** example of taint analysis ------------------------------------------ EXAMPLE OF TAINT ANALYSIS example0: strcpy(buff, argv[1]); /* ... */ return; example1: x = get_input(user); /* ... */ fptr = (void *fun())(x+42); /* ... */ *fptr(); example2: x = get_input(user); /* ... */ fptr = load(x); /* ... */ *fptr(); ------------------------------------------ Q: In example0, could there be an exploit? Yes, the strcpy could overwrite past the end of buff! Q: What could happen in example0? The jump to the return address might jump to user-supplied data! Q: Will a static analysis be able to detect the potential exploit? Possibly, but hard to avoid false positives and false negatives Q: How is example1 different from example0? Not much, both could cause jump using user-supplied data Q: In example1, Should fptr be tainted? Yes, depends on user input! Q: How should TaintTracker deal with x+42? It should be tainted, depends on x. Q: Should calling a function through a pointer raise an alarm? Yes, that changes flow of control! Q: In example2: What does the analysis need to know about load? That its result may depend on x (and doesn't sanitize it) This might require function summaries or whole program analysis **** efficiency considerations ------------------------------------------ EFFICIENCY CONSIDERATIONS How much memory does taint tracking use? How much time does taint tracking use? ------------------------------------------ ... 4 bytes for each byte of data (so factor of 4 increase in memory!) ... 1 check per memory access and use (so slower by factor of 2!) **** policy considerations ------------------------------------------ POLICY CONSIDERATIONS x = get_input(user); /* ... */ fptr = load(x); /* ... */ *fptr(); Should the call be allowed? ------------------------------------------ ... Basic policy: taint memory used but not pointers so call allowed! but then analysis might miss exploits (false negative) PI policy: taint value if either pointer or memory is tainted so call not allowed! but then analysis might warn/disallow useful code (false positives) *** pointer injection, tainting with pointers ------------------------------------------ POINTER TAINTING RULES All the rules of basic tainting plus: - propagate taint to pointers, ------------------------------------------ ... in C: *p = E; makes both p and *p be tainted if E is **** rules for pointer injection detection ------------------------------------------ POINTER INJECTION (PI) DETECTION RULES From Slowinska and Bos, 2010: Track pointers: - at program start: - track all pointers to statically allocated memory - during runtime: - track all pointers returned by system/library calls that dynamically allocate memory Taint check: - propagation of untrusted data (basic) - propagation of pointers alert when: - use untrusted data for control flow - dereference untrusted pointer not tracked as a pointer ------------------------------------------ **** problems with pointer injection detection ------------------------------------------ PROBLEMS WITH POINTER INJECTION What could cause a false negative? What could cause a false positive? ------------------------------------------ Note: a false negative means not alarming when should do so ... - tracking a user-supplied pointer as a (legit) pointer Q: Does the C programming language make it easy to find pointers? No, can convert ints to pointers (and vice versa) So, could scan a binary for possible pointer values but being too permissive could lead to false negatives (This is especially a problem on Windows, as in Linux one can find pointers using constants in the header of an object file) ... - tracking a legit pointer as tainted ------------------------------------------ CONFLICTING USE CASES FOR PI Slowinska and Bos (2010) observe that: - table lookups frequent and confuse analysis but - should not propagate taint for memory corruption analysis (to prevent false positives) - should propagate taint for malware analysis (to prevent false negatives) ------------------------------------------ Example of table lookup, translating ASCII to UNICODE or vice versa Slowinska and Bos argue that it is undecidable when to propagate taint (and that there is no safe answer) and that in malware analysis there will be too many false positives ** evaluation of taint checking ------------------------------------------ EVALUATION OF TAINT CHECKING Advantages: + dynamic checking prevents many false positives Disadvantages: - doesn't check information flow int Y = 0; if (X == 0) { Y = 0; } else { Y = 1; } - misses changes to array indexes - PI has trouble balancing false positives vs. false negatives ------------------------------------------ It's a general problem in security that it's hard to tolerate either false positives (too many alarms ==> people stop using it) or false negatives (too few alarms ==> exploits!)