CIS 6614 meeting -*- Outline -*-

* Concolic Testing

** What is concolic testing?
   See paper by Koushik Sen, "Concolic testing".
   In ASE '07: Proceedings of the twenty-second
   IEEE/ACM international conference on Automated software engineering
   November 2007, pp. 571--572.
   https://doi.org/10.1145/1321631.1321746

   Earliest referece is
   
   Eric Larson and Todd Austin
   "High coverage detection of input-related security faults".
   In 12th USENIX Security Symposium, pp. 121-133, 2003,
   USENIX Association.
   https://www.usenix.org/legacy/events/sec03/tech/
      full_papers/larson/larson.pdf

   but that is of limited generality (focuses on array accesses in C)
   
*** problems with symbolic execution
------------------------------------------
PRACTICAL PROBLEMS WITH SYMBOLIC EXECUTION

Loops and recursions:


Path explosion:

Heap modeling:


SMT solver limitations:


Environment modeling:


Coverage Problem


------------------------------------------
   ... Loops and recursion:
          give infinite execution trees

   ... Path explosion:
          exponentially many paths

   ... Heap modeling:
         symbolic data structures and pointers must be handled

   ... SMT solver limitations:
         Solvers are incomplete, and slower for larger path conditions

   ... Environment modeling:
         native/system/library calls/file operations/network events
         must be handled

   ... Coverage Problem:
         Symbolic execution may not reach deep into the execution tree,
            especially when there are loops/recursion

         This last problem is addressed by Concolic execution

*** solution approach
------------------------------------------
       CONCOLIC EXECUTION

Concolic = CONCrete + SymbOLIC

goal:


------------------------------------------
   ... visit deep into the execution tree of program
       run more efficiently

**** algorithm
------------------------------------------
      CONCOLIC TESTING ALGORITHM

Start:
   with a random (concrete) input

Collect:


Explore by:


Then:


------------------------------------------
   ... a path constraint (PC) during execution
       it has the form:
           e.g., A && B && C
       where A, B, and C are symbolic expressions
           (abstractions of the input as in symbolic execution)

   ... negating the last conjunct in the PC
           e.g., A && B && !C
       solve this new PC to get a concrete input

   ... repeat (e.g., until cover all branches)

**** example
------------------------------------------
             EXAMPLE

Consider program:

void testme (int x, int y) {
  l1: z = 2*y;
  if (l2: z != x) {
     goto last;
     } else if (l3: x <= y+10) {
        goto last;
     } else { l4: assert false; }
  } }
  last: ;
}

CFG of this is:


------------------------------------------

     [ l1: z = 2*y ]
       |
       v
     [ l2: z != x ]
2*Y!=X/    \  2*Y == X
     /      v
    |    [ l3: x <= y+10 ]
    | (*)/    \ 2*Y==X && X>Y+10
    |   /      \
    v  v        v
  [ last:; ] [ l4: assert false; ]

    (*) is 2*Y==X && X>Y+10

------------------------------------------
      EXAMPLE TRACE

after Concrete       Symbolic      PC

      x==22, y==7  x==X, y==Y      true

l1:   x==22, y==7, x==X, y==Y      true
      z==14        z==2*Y

l2:


l3:


------------------------------------------
    
   ...
l2:   x==22, y==7, x==X, y==Y   2*Y==X
      z==14        z==2*Y

l3:   x==22, y==7, x==X, y==Y   2*Y==X
      z==14        z==2*Y       && X>Y+10

     Q: Where (what label) does this concrete execution end at?
          last, because 22 <= 27

     Q: What do we do to explore more paths?
          negate the last conjunct in the PC

     Q: Why not start from the other end of the PC?
          because that wouldn't use other parts of the PC that we know
          (so would move the exeuction to some random spot...)

     Q: How do we find another concrete input to test?
         solve the new PC with the last conjunct negated

------------------------------------------
           EXPLORING ANOTHER PATH

Modified PC from previous run:


Possible concrete input:


------------------------------------------
    2*Y == X && !(X>Y+10)
    which is: 2*Y == X && X<=Y+10

    Q: What's a concrete input (for X and Y) that satisfies the modified PC?
        (we could find this using an SMT solver)
          let x == 2, y == 1 (note that 2*1==2 && 2<=1+10)

------------------------------------------
      SECOND EXAMPLE TRACE

after Concrete       Symbolic      PC

      x==  , y==   x==X, y==Y      true

l1:   x==  , y== , x==X, y==Y      true
      z==          z==2*Y

l2:


l3:


------------------------------------------

     (keep going until we hit assert false)

     Q: Where (what label) does this concrete execution end at?
          last, because 22 <= 27

     Q: What do we do to explore more paths?
          negate the last conjunct in the PC

     Q: Why not start from the other end of the PC?
          becuase that wouldn't use other parts of the PC that we know
          (so would move the exeuction to some random spot...)

     Q: How do we find another concrete input to test?
         solve the new PC with the last conjunct negated

------------------------------------------
           EXPLORING ANOTHER PATH

Modified PC from previous run:


Possible concrete input:


------------------------------------------

    Q: What's a concrete input (for X and Y) that satisfies the modified PC?
        (we could find this using an SMT solver)
          

------------------------------------------
        THIRD EXAMPLE TRACE

after Concrete       Symbolic      PC

      x==  , y==   x==X, y==Y      true

l1:   x==  , y== , x==X, y==Y      true
      z==          z==2*Y

l2:


l3:


------------------------------------------


*** limitations
------------------------------------------
          LIMITATIONS

Path space explored:

                /\
               /  \
              /    \
             /      \
            /        \
           /          \
          /            \
         /              \
        /                \
        
------------------------------------------
        Q: How would you describe the paths explored by concolic testing?
             some random zig-zag deep into the tree,
                  and then broadening out from the last branch...

  ... draw a path explored by concolic execution
       as a path going from top to bottom,
          then a triangle of exploration at the end

      So although the path space of the program is huge,
         the concolic execution only explores a small part of it

** hybrid concolic testing
------------------------------------------
  ANOTHER VIEW OF CONCOLIC EXPLORATION
          OF THE PATH SPACE


     Program paths

  |-----------------------------------|
  |                                   |
  |                                   |
  |                                   |
  |                                   |
  |                                   |
  |                                   |
  |                                   |
  |                                   |
  |-----------------------------------|


------------------------------------------
    ... Draw concolic testing as circles expanding
          after some initial probe (the concrete input)

    Q: Is there a way to explore more of the path space?
         Yes, combine with more random inputs
           (hybrid of testing + concolic execution)

------------------------------------------
  HYBRID RANDOM + CONCOLIC TESTING

     Program paths

  |-----------------------------------|
  |                                   |
  |                                   |
  |                                   |
  |                                   |
  |                                   |
  |                                   |
  |                                   |
  |                                   |
  |-----------------------------------|


------------------------------------------

    ... draw several random probes and expanding circles at the end of each
    

    Hybrid approach can give more coverage than either random testing or
    concolic testing

    Experimental results in the following suggest that "for the same
    testing budget, almost 4 [times] the branch coverage than random testing
    and almost 2 [times] that of concolic testing."

    Rupak Majumdar and Koushik Sen.
    "Hybrid Concolic Testing".
    In ICSE '07: Proceedings of the 29th international conference on
    Software Engineering, pp. 416--426, May 2007.
    https://doi.org/10.1109/ICSE.2007.41