CIS 6614 meeting -*- Outline -*- * Detecting and Preventing SQL injection and XSS attacks ** goal/problem ------------------------------------------ DETECTING SQL INJECTION AND XSS ATTACKS Based on paper Adam Kieyzun, Philip J. Guo, Karthick Jayaraman, and Michael D. Ernst. Automatic creation of SQL Injection and cross-site scripting attacks. In ICSE '09. IEEE Computer Society, USA, pp. 199--209, 2009. https://doi.org/10.1109/ICSE.2009.5070521 Goal: Automatically expose SQL injection and XSS vulnerabilities in code Claims: - create real attack vectors - few false positives - no runtime overhead - handles dynamic code, i.e., PHP code = handles type 2 XSS attacks ------------------------------------------ Q: What is a type 2 XSS attack? an attack that stores malicious input in the system (in this case, in a database) ** Approach ------------------------------------------ Approach: - Automatically generate inputs - Dynamically track taint - Mutate inputs to produce exploits ------------------------------------------ Q: Is this a static or dynamic approach? dynamic, which is important for having few false positives ** Background ------------------------------------------ PHP WEB APPLICATIONS $_GET[] ---> [ PHP ] [ WEB ] --> HTML page $_PUT[] ---> [ APP ] Browser ---> [ Server ] --SQL-> | [ running ] | DBMS HTML <--- [ PHP ] <-Data- | ------------------------------------------ The $_GET[] and $_PUT[] pseudo-variables in PHP contain a mapping from keys to values from input and forms A PHP web application transforms inputs (get and put) into web pages, which are displayed by a browser The server can store data in a database *** example ------------------------------------------ EXAMPLE: BULLETIN BOARD (p. 201) function addMessageForTopic() { if (!isset($_GET['msg']) || !isset($_GET['topicid']) || !isset($_GET['poster'])) { exit; } $my_msg = $_GET['msg']; $my_topicid = $_GET['topicid']; $my_poster = $_GET['poster']; //construct SQL statement $sqlstmt = "INSERT INTO messages VALUES('$my_msg','$my_topicid')"; //store message in database $result = mysql_query($sqlstmt); echo "Thank you, $my_poster"; } What if addMessageForTopic is called with: mode == add topicid == 1 msg == '' poster == Villain ------------------------------------------ Q: What happens in that case? The script is stored in the database, then every user gets a popup/alert. Q: Is there a type 1 or type 2 XSS attack? This one is type 2, the message (msg) with the script is stored... Q: What's the output in this case? "Thanks for posting, Villain" Q: Could this program have a type 1 XSS attack? Yes, if the poster name contains a script. ------------------------------------------ TYPE 2 (STORED) XSS ATTACK $_GET[] is: mode == add topicid == 1 msg == '' poster == Villain ~~> addMessageForTopic() ~~> Database System ------------------------------------------ Q: What does the DBMS store for the message? A script! Q: Does the attacker need to do anything to get users to execute the script? No, some users will display the messages for that topic, and then they will run the script! ** Ardilla system architecture ------------------------------------------ ARDILLA SYSTEM OVERVIEW -> [ Input Generator ] / PHP DB Source->[ Taint Propagator ] <--> + Code Taint \ Tracking \->[ Attack Generator ] [ / Checker ] overall output: malicious inputs causing XSS attacks ------------------------------------------ *** input generation via concolic execution ------------------------------------------ INPUT GENERATION VIA CONCOLIC EXECUTION For this code: if($_GET['mode'] == "add") addMessageForTopic(); else if($_GET['mode'] == "display") displayAllMessagesForTopic(); else exit; Example inputs: mode == add topicid == "1" msg == "1" poster == "1" Where does that lead? What would be the PC? What could be the next example? And then? ------------------------------------------ ... $_GET['mode'] != "add" && $_GET['mode'] != "display" Q: How does concolic execution proceed? it negates last conjunct and solves so get ... mode == "display" topicid == "1" msg == "1" poster == "1" then that doesn't lead to a DB update (sensitive sink), so with PC: $_GET['mode'] != "add" try negating another conjunct, get something like: ... mode == "add" topicid == "1" msg == "1" poster == "1" this does lead to a sensitive sink *** taint tracking ------------------------------------------ SENSITIVE SINKS For SQL injection attack detection: mysql_query() For XSS: echo(), print() ------------------------------------------ ------------------------------------------ TAINT SETS Track taint set for each value: set of variables that may influence it e.g., after $my_msg = $_GET['msg']; the taint set of $my_msg is: ------------------------------------------ Q: How would the program represent an untainted value? it has an empty taint set Q: What would be the taint set of the value of $_GET['msg']? ... {msg} Q: What would be the taint after concatenating two values? the union of the value's taint sets Q: What would be the taint set of the result of a sanitizing function? empty! Q: What would be the taint set after chopping off whitespace from the end of a string value? the same as it was before! (this doesn't sanitize anything!) ------------------------------------------ BUILDING THE TAINT PROPOGATION CODE "implemented by modifying the Zend PHP interpreter" -- p. 204 ------------------------------------------ Q: Why modify a PHP interpreter? To get most of the details right *** Attack Generation and Checking ------------------------------------------ ATTACK GENERATION 1. Find PC leading to a sensitive sink 2. For each such PC: Replace each tainted input string with possible attack string ------------------------------------------ PC is a path condition, as in symbolic execution The attack strings are taken from work of security professionals Q: What else could be done in step 2? Use an SMT solver to generate a concrete input Q: Why do this (only) when reaching a sensitive sink? It could be expensive, so only do it when know it may matter (could lead to an attack) ------------------------------------------ ATTACK CHECKING Report XSS attack only if: ------------------------------------------ ... result with replaced tainted value (with attack string) contains a script (i.e., script is output to user's browser) Q: Why is that the right thing to check? The essence of an XSS attack is getting the user's browser to run a script, so needs to be able to run one Q: What should be checked for SQL injection attacks? that the parse of the query passed to mysql_query is different, so that the attack string changes the shape of the command Q: What would happen in the bulletin board example? (show that) *** Concrete and symbolic database ------------------------------------------ CONCRETE AND SYMBOLIC DATABASE Goal: - find stored type 2 XSS attacks Approach: - store values + taint information as Example: msg topicid msg_s topicid_s ====================================== "Test" 1 {} {} "Attack" 2 {msg} {topic} Tracking symbolic state: - dynamically rewrite SQL to read or write taint sets ------------------------------------------ ... taint sets in added columns (compare fig. 5 in the paper) "Each concrete column ([the] left-most two columns) has a symbolic counterpart ([the] right-most two columns) that contains taint sets. The {} values represent empty taint sets." p. 205 Q: What are some SQL statements that would need rewriting? Implementation handles CREATE TABLE, INSERT, UPDATE, SELECT Note that: DELETE and WHERE need no rewriting Q: What would be involved in the rewrite? Tracking the taint. E.g., for update, storing the taints of the values in the appropriate column of the table ------------------------------------------ FINDING TYPE 2 XSS ATTACKS let P be a program db be database state attacks := {} dbsym := makeSymbolicCopy(db) while not timeExpired() do { inputs := inputs + generateNewInput(P) input1 := pickInput(inputs) input2 := pickInput(inputs) (taints1,dbsym1) := exec(P,input1,dbsym) (taints2,dbsym2) := exec(P,input2,dbsym1) attacks := attacks + checkAttacks(taints2, P,(input1,input2)) } return attacks ------------------------------------------ Q: What should generateNewInput(P) do? using concolic exeuction techniques to get to a sensitive sink (if possible) What if the input doesn't do that? then checkAttacks will return an empty set Q: Why are two inputs needed? One to store an (attack) value in the database, and another to retrieve the (attack) value Q: Should there be any link between input1 and input2? Yes input1 should write a database table, say X and input2 should read from that table, X Q: What should checkAttacks do? it should replace the first input with an attack pattern (like ) ------------------------------------------ EXAMPLE OF A TYPE 2 ATTACK GENERATION input1 is $_GETS['mode'] == 'add' $_GETS['topicid'] == 1 $_GETS['msg'] == 1 $_GETS['poster'] == 1 input2 is $_GETS['mode'] == 'display' $_GETS['topicid'] == 1 Running input1 puts in database: msg == 1, topicid == 1, msg_s == {msg}, topicid_s == {topicid} Running input2 retrieves msg from database with msg_s == {msg} sends value of msg to echo() Use an attack pattern to alter input1's msg to: msg == "" Run program on altered input1 and input2 See that a script gets into output ------------------------------------------ Q: What is special about echo? It's a sensitive sink! Q: What does sequence tell us? That the altered input1 and unchanged input2 is an attack *** evaluation ------------------------------------------ EVALUATION On 4 open source programs: Program Source Size ========================= schoolmate 8181 LOC webchess 4722 LOC faqforge 1712 LOC EVE 915 LOC geccbblite 326 LOC Program Type Vuln. False Pos ========================================= schoolmate XSS1 10 0 schoolmate XSS2 2 0 webchess XSS1 13 0 XSS2 0 0 faqforge XSS1 4 0 XSS2 0 0 EVE XSS1 2 0 XSS2 2 0 geccbblite XSS1 0 0 XSS2 4 0 ========================================= Total XSS1 29 0 XSS2 8 0 ------------------------------------------ Section 5 gives details, the numbers reported are for "strict mode" (as described above for checking attacks) These seem very good, with no false positives! *** Vs. other approaches ------------------------------------------ COMPARISON WITH OTHER APPROACHES Defensive coding: + can completely prevent attacks, if done properly - must re-write existing code Static analysis: + could prove absence of errors - false positives, - doesn't produce concrete attacks Dynamic monitoring: + can prevent all attacks - runtime overhead - false positives affect app. behavior Random fuzzing: + easy to use, + produces concrete attacks - creates mostly invalid inputs ------------------------------------------ Q: Is there a way to know if defensive coding is "done properly"? Yes, using static analysis