CIS 6614 meeting -*- Outline -*- * Side-channel attacks in Web Applications Based on notes from Suman Jana and Shuo Chen,Rui Wang, XiaoFeng Wang and Kehuan Zhang The target is web apps e.g., ssh, voip, video streaming, Tor esp. software as a service ------------------------------------------ SIDE CHANNEL ATTACKS ON WEB APPS Side-channel attacks targeting web apps, for example: - ssh - Voice over IP - Video streaming - Tor - software-as-a-service apps e.g., Salesforce Based on: S. Chen, R. Wang, X. Wang and K. Zhang, "Side-Channel Leaks in Web Applications: A Reality Today, a Challenge Tomorrow," In IEEE Symp. on Security and Privacy, 2010, pp. 191-206, doi: 10.1109/SP.2010.20. ------------------------------------------ ** attack model ------------------------------------------ ATTACK MODEL FOR WEB APPS Assume encryption used so attacker cannot see message contents Attacker can see for each message: - Number sent - Timing and direction - Size ------------------------------------------ Q: Is there a military analogy here? Yes, this is typical on a battlefield using radio communications Q: What could an army infer about an enemy from radio traffic? (even if it can't understand any details, but can tell where it's coming from, and what it's about in general, like requests for supplies) traffic analysis, see if there is a lot of traffic, e.g., in one area of the front or less traffic in some area Q: Could an attacker identify what app is running? How? Yes, using nslookup on the website's IP address ** vulnerabilities ------------------------------------------ DIFFERENCE FROM APP ON SINGLE COMPUTER Internal communications are Effects, leaks of: - personal health data - family income - investment details - search queries despite use of HTTPS and WPA2 encryption Causes: - stateful communication - low entropy input - significant traffic distinctions ------------------------------------------ ... split between computers, client and server, and sent over network Q: What's the analogy of traffic analysis for a web app? Seeing packets and sizes and timings ------------------------------------------ WEB-BASED PRIVACY VULNERABILITIES Attackers can fingerprint web pages: - resource objects of diff. sizes How? Web flows split between client & server: - input points - program logic - program states ------------------------------------------ "because each web page has a distinct size, and usually loads some resource objects (e.g., images) of different sizes, the attacker can fingerprint the page so that even when a user visits it through HTTPS, the page can be re-identified" ... attacker tries out each option on page, sees sizes of responses Features of Web 2.0 (AJAX...) make "side-channel vulnerability fundamental to Web 2.0 applications" A similar problem is being able to learn something about a process running on the same machine (e.g., in cloud computing) ** mitigations ------------------------------------------ MITIGATIONS APP-SPECIFIC Mitigations different for each app: Revise: - feature designs, - traffic characteristics, - publicly available domain knowledge to Need to protect app state transitions ------------------------------------------ ... avoid traffic analysis ** model of attack, measurement *** ambiguity reduction ------------------------------------------ ATTACKER'S GOAL: REDUCE AMBIGUITY Ambiguity set of data: Measuring loss of ambiguity If ambiguity set reduced by factor of 1/R, then ------------------------------------------ ... set containing all possible values of that data ... log_2(R) bits of entropy are lost The attacker tries to reduce ambiguity in the signal determine which possible states/transitions are happening *** Web app model ------------------------------------------ WEB APP MODEL A quintuple (S, Sigma, delta, f, V), where: - S = set of program states - Sigma = set of inputs accepted - delta = state transition function delta : S x Sigma -> S - f = output function f : S x Sigma -> V - V set of visible outputs e.g., packet sizes Notation: - 50 -> Browser sends 50 bytes - 1024 <- Server sends 1024 bytes ------------------------------------------ *** What attacker must do ------------------------------------------ WHAT ATTACKER TRIES TO DO From unknown state s in S: observe N outputs (v1, v2, ..., vN) determine inputs (sigma1, sigma2, ..., sigmaN) or ------------------------------------------ ... reduce the ambiguity in those inputs *** How web app designs help attacker Q: What can lead to a large(r) reduction in ambiguity? ------------------------------------------ FACTORS THAT HELP ATTACKER ------------------------------------------ ... - Inputs from a small(er) space (low entropy inputs) Why? because attacker can more easily profile web app each input elicits a predictable response - AJAX (asynchronous JavaScript and XML) used for GUI widgets to make it responsive: (mouse clicks and char entry) can trigger web traffic Q: What kinds of inputs typically lead immediately to a response? (e.g., auto-suggestion, auto-complete) - UI guides user through data entry step by step (attacker will know what the sequence of steps is) - Responses from a small(er) space (low entropy responses) Why? each response will be strongly correlated to a particular input so attacker can more easily determine the inputs Q: How are reduction factors combined? They are multiplied together, so small reductions in each step lead to big overall reductions *** Density measures difficulty/ease of attack ------------------------------------------ DENSITY def: Let P be a set of packet sizes. Then density(P)= #P / [max(P) - min(P)] where #P is cardinality of P max(P) is maximum size in P min(P) is minimum size in P "A density below 1.0 often indicates packets that are easy to distinguish" - Chen et al., 2010, p. 195 ------------------------------------------ Q: Why would a density < 1.0 help the attacker? Lots of variation in sizes *** Examples **** Health App ------------------------------------------ HEALTH APP EXAMPLE Tabbed data entry with tabs for: - Conditions - Medications - Procedures - Test results - Immunizations Density was 0.000211 Problems: - Auto-suggestion for typing Each keystroke generates web flows (253 ->, 581 <-, x <-) where x is size of suggestion list Density of first character: 0.11 Density after initial 'a': 0.064 - Selction from structured dialog/menu 2670 conditions, density = 0.0046 ------------------------------------------ Q: Does anyone play Wordle? Show how each information accumulates with a small input space Q: Would clicking on a suggestion also reduce ambiguity? Yes, the density is 0.10 Q: Would selecting from a hierarchical menu benefit the attacker? Yes, very low density Q: Would information from "find a doctor" reveal a condition? Yes, can use IP address to determine city/zipcode Then the result of selection determines kind of doctor with web flow vector (1507 ->, 270±10 <-, 582±1 <-, x <-). "every speciality is uniquely identifiable" (p. 197) **** Tax Form App ------------------------------------------ TAX FORM APP Clear workflow starting with personal information ------------------------------------------ Q: How many filing statuses are there? 5 (single, married x 2, widowed x 2) whether married filing jointly or separately when spouse died (before or after start of tax year) these determine state transitions and the states are all distinguishable Q: Does family income (AGI) determine which forms to file? Yes, determines the "tax bracket" each of which has different execution paths among states **** Other applications ------------------------------------------ EXAMPLE: ONLINE INVESTING Funds displayed as GIF images and each has a web page Can infer: ------------------------------------------ Q: Would the GIF image size correlate to a particular fund? yes, given that they have different histories could distinguish all of the (9) funds Q: Can the size of an image be determined separately from HTML size? Yes! ... - funds invested in - allocation of funds Q: How could attacker determine allocation of funds from size of a pie chart? They change daily, and compression changes the sizes (especially since used lossless compression of LZW alg.) (in simulation, it only took 4 days of charts to determine allocation among 3 funds) **** Web search engines (Google, Bing, etc.) ------------------------------------------ WEB SEARCH ENGINES Google, Bing, etc. - attack can reveal query history Attacker can use: - Auto-suggestion sizes ------------------------------------------ Q: Would a company want its employee's search histories revealed? No, could reveal new IP... Q: Does capitalization matter in a web query? No, so that reduces the input space per character (which is about size 27) **** Wi-Fi ------------------------------------------ WPA2 STANDARD FOR WI-FI Uses CCMP = 128 bit AES in counter mode counter mode means size of cyphertext = size of message So, size is ------------------------------------------ ... NOT hidden by WPA2 (padding only by app itself) Q: Does lack of padding help attackers? Yes, so each keystroke and response gives exact size on Wi-Fi ** mitigations *** application agnostic ------------------------------------------ SAMPLE APPLICATION AGNOSTIC MITIGATIONS Example, SSH: - sends a packet every 50 msec VOIP: - round up all packet sizes to 128 bytes ------------------------------------------ The SSH example keeps SSH responsive the VOIP example is efective **** padding ------------------------------------------ PADDING PACKETS Rounding: - Round up to nearest multiple of D bytes Random padding: - Append padding of 0 to D bytes Measurements for health app found: - D = 128 not enough (responses about 200 bytes) average overhead was ~ 14% - D = 512 hid information but average overhead was ~ 33% For Income Tax App - D = 1024 only allows attacker to distinguish 7 income ranges but about 25% overhead Apps also need to: - merge states on longer paths - or add extra states on shorter paths ------------------------------------------ Q: Why do these tactics help? Increase the ambiguity of sizes Q: What is the average overhead of these tactics? D/2 bytes per packet Q: So, is there one effective mitigation tactic for all apps? No, size of padding (and type) depends on app State merging/splitting depends on app so these aren't general enough (are app specific) **** problems with mashups ------------------------------------------ MASHUP PROBLEMS Online investment app fetches charts from financial data app which makes charts public Will the financial data app pad data? ------------------------------------------ ... no, unless online investment app pays for it all their data is public! **** Prospects for generality ------------------------------------------ IS THERE A GENERAL MITIGATION? Is there a general mitigation? ------------------------------------------ ... authors of the paper say it's "unlikely" (p. 204) Q: What does that mean for developers? They have to find app-specific mitigations **** App-specific mitigation ------------------------------------------ CHALLENGES FOR DEVELOPERS - Finding side-channel vulnerabilities look for: - stateful communication - communication based on user inputs - correlations between inputs and outputs - Specifying mitigation policies - Building policy enforcement mechanisms - coordinating browsers and web servers ------------------------------------------ Q: What problems would there be in determining message sizes? server will affect - element tags and macro expansion - encoding - compression server and browser may not know policy ------------------------------------------ RECOMMENDED DEVELOPMENT PRACTICE 1. specify privacy policies 2. track info. flows, including web flows 3. vulnerabilities related to policies? if yes: 4. can they be solved by manipulating individual packets? if yes: add mitigations to packets (rounding or random padding) if no: 5. change design of app features 6. goto step 2 if no: done! ------------------------------------------ Q: What kind of tool could help in step 3? a traffic analysis tool