CIS 6614 meeting -*- Outline -*-

* Side-channel attacks in Web Applications
  Based on notes from Suman Jana and
  Shuo Chen,Rui Wang, XiaoFeng Wang and Kehuan Zhang

  The target is web apps
      e.g., ssh, voip, video streaming, Tor
  esp. software as a service

------------------------------------------
     SIDE CHANNEL ATTACKS ON WEB APPS

Side-channel attacks targeting web apps,
for example:
      - ssh
      - Voice over IP
      - Video streaming
      - Tor
      - software-as-a-service apps
          e.g., Salesforce

Based on:

  S. Chen, R. Wang, X. Wang and K. Zhang,
  "Side-Channel Leaks in Web Applications:
  A Reality Today, a Challenge Tomorrow,"
  In IEEE Symp. on Security and Privacy,
  2010, pp. 191-206,
  doi: 10.1109/SP.2010.20.
------------------------------------------

** attack model
------------------------------------------
        ATTACK MODEL FOR WEB APPS

Assume encryption used so
    attacker cannot see message contents
    
Attacker can see for each message:

  - Number sent
  - Timing and direction
  - Size


------------------------------------------

        Q: Is there a military analogy here?
        Yes, this is typical on a battlefield using radio communications

        Q: What could an army infer about an enemy from radio traffic?
            (even if it can't understand any details,
             but can tell where it's coming from,
             and what it's about in general, like requests for supplies)
           traffic analysis, see if there is a lot of traffic,
               e.g., in one area of the front
               or less traffic in some area


        Q: Could an attacker identify what app is running? How?
           Yes, using nslookup on the website's IP address

** vulnerabilities
------------------------------------------
  DIFFERENCE FROM APP ON SINGLE COMPUTER

Internal communications are


Effects, leaks of:

    - personal health data
    - family income
    - investment details
    - search queries

despite use of HTTPS and WPA2 encryption

Causes:
   - stateful communication
   - low entropy input
   - significant traffic distinctions
------------------------------------------
        ... split between computers,
              client and server,
            and sent over network

        Q: What's the analogy of traffic analysis for a web app?
            Seeing packets and sizes and timings

------------------------------------------
     WEB-BASED PRIVACY VULNERABILITIES

Attackers can fingerprint web pages:

     - resource objects of diff. sizes


How?


Web flows split between client & server:

   - input points
   - program logic
   - program states
------------------------------------------

            "because each web page has a distinct size,
            and usually loads some resource objects (e.g., images)
            of different sizes, the attacker can fingerprint
            the page so that even when a user visits it
            through HTTPS, the page can be re-identified"

            ... attacker tries out each option
                on page,
                sees sizes of responses

            Features of Web 2.0 (AJAX...) make
            "side-channel vulnerability fundamental to
            Web 2.0 applications"

            A similar problem is being able to learn something
            about a process running on the same machine
            (e.g., in cloud computing)

** mitigations
------------------------------------------
          MITIGATIONS APP-SPECIFIC

Mitigations different for each app:

Revise:
 - feature designs,
 - traffic characteristics,
 - publicly available domain knowledge

to


Need to protect app state transitions

------------------------------------------
        ... avoid traffic analysis

** model of attack, measurement

*** ambiguity reduction
------------------------------------------
   ATTACKER'S GOAL: REDUCE AMBIGUITY

Ambiguity set of data:


Measuring loss of ambiguity

     If ambiguity set reduced by
        factor of 1/R,
        then
              
              
------------------------------------------
        ... set containing all possible values
            of that data

        ... log_2(R) bits of entropy are lost

        The attacker tries to reduce ambiguity in the signal
            determine which possible states/transitions are happening

*** Web app model
------------------------------------------
            WEB APP MODEL

A quintuple (S, Sigma, delta, f, V),
where:
 - S = set of program states
 - Sigma = set of inputs accepted
 - delta = state transition function
     delta : S x Sigma -> S
 - f = output function
     f : S x Sigma -> V
 - V set of visible outputs
     e.g., packet sizes

Notation:
   - 50 -> Browser sends 50 bytes
   - 1024 <- Server sends 1024 bytes
   
------------------------------------------

*** What attacker must do
------------------------------------------
       WHAT ATTACKER TRIES TO DO

From unknown state s in S:
   observe N outputs
      (v1, v2, ..., vN)
   determine inputs
      (sigma1, sigma2, ..., sigmaN)
   or

   
------------------------------------------
     ... reduce the ambiguity in those inputs

*** How web app designs help attacker
     Q: What can lead to a large(r) reduction in ambiguity?

------------------------------------------
          FACTORS THAT HELP ATTACKER


------------------------------------------

  ... - Inputs from a small(er) space
          (low entropy inputs)

        Why? because attacker can more easily profile web app
             each input elicits a predictable response

       - AJAX (asynchronous JavaScript and XML)
         used for GUI widgets
         to make it responsive:
           (mouse clicks and char entry)
           can trigger web traffic

         Q: What kinds of inputs typically lead immediately to a response?
                (e.g., auto-suggestion, auto-complete)

      - UI guides user through data entry
           step by step

           (attacker will know what the sequence of steps is)

      - Responses from a small(er) space
           (low entropy responses)

        Why? each response will be strongly correlated to a particular input
             so attacker can more easily determine the inputs

        Q: How are reduction factors combined?
           They are multiplied together,
           so small reductions in each step lead to big overall reductions

*** Density measures difficulty/ease of attack
------------------------------------------
           DENSITY

def: Let P be a set of packet sizes.
     Then
       density(P)= #P / [max(P) - min(P)]

     where
        #P is cardinality of P
        max(P) is maximum size in P
        min(P) is minimum size in P

"A density below 1.0 often indicates
 packets that are easy to distinguish"
      - Chen et al., 2010, p. 195

------------------------------------------
        Q: Why would a density < 1.0 help the attacker?
           Lots of variation in sizes
           
*** Examples
**** Health App
------------------------------------------
          HEALTH APP EXAMPLE

Tabbed data entry with tabs for:
 - Conditions
 - Medications
 - Procedures
 - Test results
 - Immunizations

Density was 0.000211

Problems:
 - Auto-suggestion for typing
    Each keystroke generates web flows
      (253 ->, 581 <-, x <-)
      where x is size of suggestion list

   Density of first character: 0.11
    Density after initial 'a': 0.064

 - Selction from structured dialog/menu
     2670 conditions, density = 0.0046
     
------------------------------------------

        Q: Does anyone play Wordle?
           Show how each information accumulates
             with a small input space

        Q: Would clicking on a suggestion also reduce ambiguity?
            Yes, the density is 0.10

        Q: Would selecting from a hierarchical menu benefit the attacker?
            Yes, very low density

        Q: Would information from "find a doctor" reveal a condition?
           Yes, can use IP address to determine city/zipcode
           Then the result of selection determines kind of doctor
              with web flow vector (1507 ->, 270±10 <-, 582±1 <-, x <-).
              "every speciality is uniquely identifiable" (p. 197)

**** Tax Form App
------------------------------------------
         TAX FORM APP

Clear workflow
   starting with personal information
   
------------------------------------------

        Q: How many filing statuses are there?
            5  (single, married x 2, widowed x 2)
                  whether married filing jointly or separately
                  when spouse died (before or after start of tax year)
               these determine state transitions
                    and the states are all distinguishable

        Q: Does family income (AGI) determine which forms to file?
           Yes, determines the "tax bracket"
                each of which has different execution paths among states

**** Other applications
------------------------------------------
          EXAMPLE: ONLINE INVESTING

Funds displayed as GIF images
    and each has a web page
    
Can infer:


------------------------------------------
        Q: Would the GIF image size correlate to a particular fund?
           yes, given that they have different histories
           could distinguish all of the (9) funds
           
        Q: Can the size of an image be determined separately from HTML size?
           Yes!

        ... - funds invested in
        
            - allocation of funds

        Q: How could attacker determine allocation of funds from
           size of a pie chart?
           They change daily, and compression changes the sizes
              (especially since used lossless compression of LZW alg.)
              (in simulation, it only took 4 days of charts to
               determine allocation among 3 funds)

**** Web search engines (Google, Bing, etc.)
------------------------------------------
         WEB SEARCH ENGINES

Google, Bing, etc.

  - attack can reveal query history
  
Attacker can use:

  - Auto-suggestion sizes
  
------------------------------------------

        Q: Would a company want its employee's search histories revealed?
           No, could reveal new IP...

        Q: Does capitalization matter in a web query?
           No, so that reduces the input space per character
              (which is about size 27)

**** Wi-Fi
------------------------------------------
            WPA2 STANDARD FOR WI-FI

Uses CCMP = 128 bit AES in counter mode
   counter mode means
     size of cyphertext = size of message

   So, size is


------------------------------------------
        ... NOT hidden by WPA2
              (padding only by app itself)

        Q: Does lack of padding help attackers?
           Yes, so each keystroke and response gives exact size on Wi-Fi

** mitigations
*** application agnostic
------------------------------------------
 SAMPLE APPLICATION AGNOSTIC MITIGATIONS

Example, SSH:
  - sends a packet every 50 msec

VOIP:
  - round up all packet sizes to 128 bytes
------------------------------------------
        The SSH example keeps SSH responsive

        the VOIP example is efective

**** padding
------------------------------------------
          PADDING PACKETS

Rounding:
 - Round up to nearest multiple of D bytes

Random padding:
 - Append padding of 0 to D bytes


Measurements for health app found:

  - D = 128 not enough
         (responses about 200 bytes)
         average overhead was ~ 14%
  - D = 512 hid information
         but average overhead was ~ 33%


For Income Tax App
  - D = 1024 only allows attacker
          to distinguish 7 income ranges
          but about 25% overhead

Apps also need to:
   - merge states on longer paths
   - or add extra states on shorter paths
         
------------------------------------------

        Q: Why do these tactics help?
           Increase the ambiguity of sizes

        Q: What is the average overhead of these tactics?
           D/2 bytes per packet

        Q: So, is there one effective mitigation tactic for all apps?
           No, size of padding (and type) depends on app
           State merging/splitting depends on app
              so these aren't general enough (are app specific)

**** problems with mashups
------------------------------------------
        MASHUP PROBLEMS

Online investment app
   fetches charts from financial data app
      which makes charts public

 Will the financial data app pad data?


------------------------------------------

        ... no, unless online investment app pays for it
              all their data is public!

**** Prospects for generality
------------------------------------------
      IS THERE A GENERAL MITIGATION?

Is there a general mitigation?


------------------------------------------
        ... authors of the paper say it's "unlikely" (p. 204)

        Q: What does that mean for developers?
           They have to find app-specific mitigations

**** App-specific mitigation
------------------------------------------
        CHALLENGES FOR DEVELOPERS

- Finding side-channel vulnerabilities
   look for:
    - stateful communication
    - communication based on user inputs
    - correlations between
        inputs and outputs

- Specifying mitigation policies

- Building policy enforcement mechanisms
   - coordinating browsers and web servers

------------------------------------------
        Q: What problems would there be in determining message sizes?
             server will affect
                 - element tags and macro expansion
                 - encoding
                 - compression
             server and browser may not know policy

------------------------------------------
     RECOMMENDED DEVELOPMENT PRACTICE

1. specify privacy policies
2. track info. flows, including web flows
3. vulnerabilities related to policies?
    if yes:
    4. can they be solved by manipulating
       individual packets?
       if yes: add mitigations to packets
            (rounding or random padding)
       if no:
          5. change design of app features
          6. goto step 2
    if no:
       done!
------------------------------------------

     Q: What kind of tool could help in step 3?
        a traffic analysis tool