CIS 6614 meeting -*- Outline -*-
* Detecting and Preventing SQL injection and XSS attacks
** goal/problem
------------------------------------------
DETECTING SQL INJECTION AND XSS ATTACKS
Based on paper
Adam Kieyzun, Philip J. Guo,
Karthick Jayaraman, and Michael D. Ernst.
Automatic creation of SQL Injection and
cross-site scripting attacks.
In ICSE '09. IEEE Computer Society, USA,
pp. 199--209, 2009.
https://doi.org/10.1109/ICSE.2009.5070521
Goal:
Automatically expose SQL injection
and XSS vulnerabilities in code
Claims:
- create real attack vectors
- few false positives
- no runtime overhead
- handles dynamic code,
i.e., PHP code
= handles type 2 XSS attacks
------------------------------------------
Q: What is a type 2 XSS attack?
an attack that stores malicious input in the system
(in this case, in a database)
** Approach
------------------------------------------
Approach:
- Automatically generate inputs
- Dynamically track taint
- Mutate inputs to produce exploits
------------------------------------------
Q: Is this a static or dynamic approach?
dynamic, which is important for having few false positives
** Background
------------------------------------------
PHP WEB APPLICATIONS
$_GET[] ---> [ PHP ]
[ WEB ] --> HTML page
$_PUT[] ---> [ APP ]
Browser ---> [ Server ] --SQL-> |
[ running ] | DBMS
HTML <--- [ PHP ] <-Data- |
------------------------------------------
The $_GET[] and $_PUT[] pseudo-variables in PHP
contain a mapping from keys to values from input and forms
A PHP web application transforms inputs (get and put) into web
pages, which are displayed by a browser
The server can store data in a database
*** example
------------------------------------------
EXAMPLE: BULLETIN BOARD (p. 201)
function addMessageForTopic() {
if (!isset($_GET['msg']) ||
!isset($_GET['topicid']) ||
!isset($_GET['poster'])) {
exit;
}
$my_msg = $_GET['msg'];
$my_topicid = $_GET['topicid'];
$my_poster = $_GET['poster'];
//construct SQL statement
$sqlstmt = "INSERT INTO messages
VALUES('$my_msg','$my_topicid')";
//store message in database
$result = mysql_query($sqlstmt);
echo "Thank you, $my_poster";
}
What if addMessageForTopic is called with:
mode == add
topicid == 1
msg == ''
poster == Villain
------------------------------------------
Q: What happens in that case?
The script is stored in the database,
then every user gets a popup/alert.
Q: Is there a type 1 or type 2 XSS attack?
This one is type 2, the message (msg) with the script is stored...
Q: What's the output in this case?
"Thanks for posting, Villain"
Q: Could this program have a type 1 XSS attack?
Yes, if the poster name contains a script.
------------------------------------------
TYPE 2 (STORED) XSS ATTACK
$_GET[] is:
mode == add
topicid == 1
msg == ''
poster == Villain
~~>
addMessageForTopic()
~~>
Database System
------------------------------------------
Q: What does the DBMS store for the message?
A script!
Q: Does the attacker need to do anything to get users to
execute the script?
No, some users will display the messages for that topic,
and then they will run the script!
** Ardilla system architecture
------------------------------------------
ARDILLA SYSTEM OVERVIEW
-> [ Input Generator ]
/
PHP DB
Source->[ Taint Propagator ] <--> +
Code Taint
\ Tracking
\->[ Attack Generator ]
[ / Checker ]
overall output:
malicious inputs
causing XSS attacks
------------------------------------------
*** input generation via concolic execution
------------------------------------------
INPUT GENERATION VIA CONCOLIC EXECUTION
For this code:
if($_GET['mode'] == "add")
addMessageForTopic();
else if($_GET['mode'] == "display")
displayAllMessagesForTopic();
else
exit;
Example inputs:
mode == add
topicid == "1"
msg == "1"
poster == "1"
Where does that lead?
What would be the PC?
What could be the next example?
And then?
------------------------------------------
... $_GET['mode'] != "add"
&& $_GET['mode'] != "display"
Q: How does concolic execution proceed?
it negates last conjunct and solves
so get
... mode == "display"
topicid == "1"
msg == "1"
poster == "1"
then that doesn't lead to a DB update (sensitive sink),
so with PC:
$_GET['mode'] != "add"
try negating another conjunct, get something like:
... mode == "add"
topicid == "1"
msg == "1"
poster == "1"
this does lead to a sensitive sink
*** taint tracking
------------------------------------------
SENSITIVE SINKS
For SQL injection attack detection:
mysql_query()
For XSS:
echo(),
print()
------------------------------------------
------------------------------------------
TAINT SETS
Track taint set for each value:
set of variables that may influence it
e.g., after
$my_msg = $_GET['msg'];
the taint set of $my_msg is:
------------------------------------------
Q: How would the program represent an untainted value?
it has an empty taint set
Q: What would be the taint set of the value of $_GET['msg']?
... {msg}
Q: What would be the taint after concatenating two values?
the union of the value's taint sets
Q: What would be the taint set of the result of a sanitizing
function?
empty!
Q: What would be the taint set after chopping off whitespace
from the end of a string value?
the same as it was before! (this doesn't sanitize anything!)
------------------------------------------
BUILDING THE TAINT PROPOGATION CODE
"implemented by modifying the Zend PHP
interpreter"
-- p. 204
------------------------------------------
Q: Why modify a PHP interpreter?
To get most of the details right
*** Attack Generation and Checking
------------------------------------------
ATTACK GENERATION
1. Find PC leading to a sensitive sink
2. For each such PC:
Replace each tainted input string
with possible attack string
------------------------------------------
PC is a path condition, as in symbolic execution
The attack strings are taken from work of security professionals
Q: What else could be done in step 2?
Use an SMT solver to generate a concrete input
Q: Why do this (only) when reaching a sensitive sink?
It could be expensive, so only do it when know it may
matter (could lead to an attack)
------------------------------------------
ATTACK CHECKING
Report XSS attack only if:
------------------------------------------
... result with replaced tainted value (with attack string)
contains a script (i.e., script is output to user's browser)
Q: Why is that the right thing to check?
The essence of an XSS attack is getting the user's browser
to run a script, so needs to be able to run one
Q: What should be checked for SQL injection attacks?
that the parse of the query passed to mysql_query is different,
so that the attack string changes the shape of the command
Q: What would happen in the bulletin board example?
(show that)
*** Concrete and symbolic database
------------------------------------------
CONCRETE AND SYMBOLIC DATABASE
Goal:
- find stored type 2 XSS attacks
Approach:
- store values + taint information
as
Example:
msg topicid msg_s topicid_s
======================================
"Test" 1 {} {}
"Attack" 2 {msg} {topic}
Tracking symbolic state:
- dynamically rewrite SQL to
read or write taint sets
------------------------------------------
... taint sets in added columns
(compare fig. 5 in the paper)
"Each concrete column ([the] left-most two columns)
has a symbolic counterpart ([the] right-most two columns)
that contains taint sets.
The {} values represent empty taint sets." p. 205
Q: What are some SQL statements that would need rewriting?
Implementation handles
CREATE TABLE, INSERT, UPDATE, SELECT
Note that: DELETE and WHERE need no rewriting
Q: What would be involved in the rewrite?
Tracking the taint.
E.g., for update, storing the taints of the values
in the appropriate column of the table
------------------------------------------
FINDING TYPE 2 XSS ATTACKS
let P be a program
db be database state
attacks := {}
dbsym := makeSymbolicCopy(db)
while not timeExpired() do
{
inputs := inputs + generateNewInput(P)
input1 := pickInput(inputs)
input2 := pickInput(inputs)
(taints1,dbsym1) := exec(P,input1,dbsym)
(taints2,dbsym2) := exec(P,input2,dbsym1)
attacks := attacks
+ checkAttacks(taints2,
P,(input1,input2))
}
return attacks
------------------------------------------
Q: What should generateNewInput(P) do?
using concolic exeuction techniques
to get to a sensitive sink (if possible)
What if the input doesn't do that?
then checkAttacks will return an empty set
Q: Why are two inputs needed?
One to store an (attack) value in the database, and
another to retrieve the (attack) value
Q: Should there be any link between input1 and input2?
Yes input1 should write a database table, say X
and input2 should read from that table, X
Q: What should checkAttacks do?
it should replace the first input with an attack pattern
(like )
------------------------------------------
EXAMPLE OF A TYPE 2 ATTACK GENERATION
input1 is
$_GETS['mode'] == 'add'
$_GETS['topicid'] == 1
$_GETS['msg'] == 1
$_GETS['poster'] == 1
input2 is
$_GETS['mode'] == 'display'
$_GETS['topicid'] == 1
Running input1 puts in database:
msg == 1,
topicid == 1,
msg_s == {msg},
topicid_s == {topicid}
Running input2
retrieves msg from database
with msg_s == {msg}
sends value of msg to echo()
Use an attack pattern to alter
input1's msg to:
msg ==
""
Run program on altered input1 and input2
See that a script gets into output
------------------------------------------
Q: What is special about echo?
It's a sensitive sink!
Q: What does sequence tell us?
That the altered input1 and unchanged input2 is an attack
*** evaluation
------------------------------------------
EVALUATION
On 4 open source programs:
Program Source Size
=========================
schoolmate 8181 LOC
webchess 4722 LOC
faqforge 1712 LOC
EVE 915 LOC
geccbblite 326 LOC
Program Type Vuln. False Pos
=========================================
schoolmate XSS1 10 0
schoolmate XSS2 2 0
webchess XSS1 13 0
XSS2 0 0
faqforge XSS1 4 0
XSS2 0 0
EVE XSS1 2 0
XSS2 2 0
geccbblite XSS1 0 0
XSS2 4 0
=========================================
Total XSS1 29 0
XSS2 8 0
------------------------------------------
Section 5 gives details,
the numbers reported are for "strict mode"
(as described above for checking attacks)
These seem very good, with no false positives!
*** Vs. other approaches
------------------------------------------
COMPARISON WITH OTHER APPROACHES
Defensive coding:
+ can completely prevent attacks,
if done properly
- must re-write existing code
Static analysis:
+ could prove absence of errors
- false positives,
- doesn't produce concrete attacks
Dynamic monitoring:
+ can prevent all attacks
- runtime overhead
- false positives affect app. behavior
Random fuzzing:
+ easy to use,
+ produces concrete attacks
- creates mostly invalid inputs
------------------------------------------
Q: Is there a way to know if defensive coding is
"done properly"?
Yes, using static analysis