CIS 6614 meeting -*- Outline -*- * Concolic Testing ** What is concolic testing? See paper by Koushik Sen, "Concolic testing". In ASE '07: Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering November 2007, pp. 571--572. https://doi.org/10.1145/1321631.1321746 Earliest referece is Eric Larson and Todd Austin "High coverage detection of input-related security faults". In 12th USENIX Security Symposium, pp. 121-133, 2003, USENIX Association. https://www.usenix.org/legacy/events/sec03/tech/ full_papers/larson/larson.pdf but that is of limited generality (focuses on array accesses in C) *** problems with symbolic execution ------------------------------------------ PRACTICAL PROBLEMS WITH SYMBOLIC EXECUTION Loops and recursions: Path explosion: Heap modeling: SMT solver limitations: Environment modeling: Coverage Problem ------------------------------------------ ... Loops and recursion: give infinite execution trees ... Path explosion: exponentially many paths ... Heap modeling: symbolic data structures and pointers must be handled ... SMT solver limitations: Solvers are incomplete, and slower for larger path conditions ... Environment modeling: native/system/library calls/file operations/network events must be handled ... Coverage Problem: Symbolic execution may not reach deep into the execution tree, especially when there are loops/recursion This last problem is addressed by Concolic execution *** solution approach ------------------------------------------ CONCOLIC EXECUTION Concolic = CONCrete + SymbOLIC goal: ------------------------------------------ ... visit deep into the execution tree of program run more efficiently **** algorithm ------------------------------------------ CONCOLIC TESTING ALGORITHM Start: with a random (concrete) input Collect: Explore by: Then: ------------------------------------------ ... a path constraint (PC) during execution it has the form: e.g., A && B && C where A, B, and C are symbolic expressions (abstractions of the input as in symbolic execution) ... negating the last conjunct in the PC e.g., A && B && !C solve this new PC to get a concrete input ... repeat (e.g., until cover all branches) **** example ------------------------------------------ EXAMPLE Consider program: void testme (int x, int y) { l1: z = 2*y; if (l2: z != x) { goto last; } else if (l3: x <= y+10) { goto last; } else { l4: assert false; } } } last: ; } CFG of this is: ------------------------------------------ [ l1: z = 2*y ] | v [ l2: z != x ] 2*Y!=X/ \ 2*Y == X / v | [ l3: x <= y+10 ] | (*)/ \ 2*Y==X && X>Y+10 | / \ v v v [ last:; ] [ l4: assert false; ] (*) is 2*Y==X && X>Y+10 ------------------------------------------ EXAMPLE TRACE after Concrete Symbolic PC x==22, y==7 x==X, y==Y true l1: x==22, y==7, x==X, y==Y true z==14 z==2*Y l2: l3: ------------------------------------------ ... l2: x==22, y==7, x==X, y==Y 2*Y==X z==14 z==2*Y l3: x==22, y==7, x==X, y==Y 2*Y==X z==14 z==2*Y && X>Y+10 Q: Where (what label) does this concrete execution end at? last, because 22 <= 27 Q: What do we do to explore more paths? negate the last conjunct in the PC Q: Why not start from the other end of the PC? because that wouldn't use other parts of the PC that we know (so would move the exeuction to some random spot...) Q: How do we find another concrete input to test? solve the new PC with the last conjunct negated ------------------------------------------ EXPLORING ANOTHER PATH Modified PC from previous run: Possible concrete input: ------------------------------------------ 2*Y == X && !(X>Y+10) which is: 2*Y == X && X<=Y+10 Q: What's a concrete input (for X and Y) that satisfies the modified PC? (we could find this using an SMT solver) let x == 2, y == 1 (note that 2*1==2 && 2<=1+10) ------------------------------------------ SECOND EXAMPLE TRACE after Concrete Symbolic PC x== , y== x==X, y==Y true l1: x== , y== , x==X, y==Y true z== z==2*Y l2: l3: ------------------------------------------ (keep going until we hit assert false) Q: Where (what label) does this concrete execution end at? last, because 22 <= 27 Q: What do we do to explore more paths? negate the last conjunct in the PC Q: Why not start from the other end of the PC? becuase that wouldn't use other parts of the PC that we know (so would move the exeuction to some random spot...) Q: How do we find another concrete input to test? solve the new PC with the last conjunct negated ------------------------------------------ EXPLORING ANOTHER PATH Modified PC from previous run: Possible concrete input: ------------------------------------------ Q: What's a concrete input (for X and Y) that satisfies the modified PC? (we could find this using an SMT solver) ------------------------------------------ THIRD EXAMPLE TRACE after Concrete Symbolic PC x== , y== x==X, y==Y true l1: x== , y== , x==X, y==Y true z== z==2*Y l2: l3: ------------------------------------------ *** limitations ------------------------------------------ LIMITATIONS Path space explored: /\ / \ / \ / \ / \ / \ / \ / \ / \ ------------------------------------------ Q: How would you describe the paths explored by concolic testing? some random zig-zag deep into the tree, and then broadening out from the last branch... ... draw a path explored by concolic execution as a path going from top to bottom, then a triangle of exploration at the end So although the path space of the program is huge, the concolic execution only explores a small part of it ** hybrid concolic testing ------------------------------------------ ANOTHER VIEW OF CONCOLIC EXPLORATION OF THE PATH SPACE Program paths |-----------------------------------| | | | | | | | | | | | | | | | | |-----------------------------------| ------------------------------------------ ... Draw concolic testing as circles expanding after some initial probe (the concrete input) Q: Is there a way to explore more of the path space? Yes, combine with more random inputs (hybrid of testing + concolic execution) ------------------------------------------ HYBRID RANDOM + CONCOLIC TESTING Program paths |-----------------------------------| | | | | | | | | | | | | | | | | |-----------------------------------| ------------------------------------------ ... draw several random probes and expanding circles at the end of each Hybrid approach can give more coverage than either random testing or concolic testing Experimental results in the following suggest that "for the same testing budget, almost 4 [times] the branch coverage than random testing and almost 2 [times] that of concolic testing." Rupak Majumdar and Koushik Sen. "Hybrid Concolic Testing". In ICSE '07: Proceedings of the 29th international conference on Software Engineering, pp. 416--426, May 2007. https://doi.org/10.1109/ICSE.2007.41