CIS 6614 meeting -*- Outline -*- * Web Application Security Based on notes by Suman Jana, which are based on slides by John Mitchell ** Goal ------------------------------------------ GOALS OF WEB SECURITY Confidentiality: - No unwanted information disclosure by browsing web Isolation: - Site A cannot interfere with session browsing site B Web app security: - Apps on web can achieve same security as on desktop ------------------------------------------ Q: What does the first goal mean about user behavior? We shouldn't depend on them to not visit malicious websites Q: What would be one way to formalize (or check) the second goal? - That it is impossible for one frame in the browser to read from or write to the state of another frame, unless granted permission - That a website can delegate to another one, in a frame, and not be interfered with Q: What does it mean for one site to interfere with a session browsing another site? - popping up a window (over that page) - reading or writing data or queries ** threat model ------------------------------------------ WEB ATTACKER THREAT MODEL Attacker can: - controls website (attacker.com) - can obtain SSL/TLS certificate(s) However, attacker does NOT: - control network ------------------------------------------ The website would presumably be visited by the victim Q: What does controlling a website mean from the user's viewpoint? can send any HTML code (and Javascript...) Q: What would a web-based attacker want to do? - steal information (e.g., PII) - control user's machine (privilege escalation, remote code execution) In network attack model: attacker controls the network and can intercept communications So we are not considering that... *** focus ------------------------------------------ FOCUS NOT ON WEB MALWARE Web malware (exploiting browsers): - trojans - adware (called "drive-by-downloads") - control as in our previous study - but NOT our focus now Instead: - we now focus on the web attacks that are specific to the web ------------------------------------------ Browser vulnerabilities could result in some threats: information disclosure, tampering, spoofing, ... For background on browser vulnerabilities, see Provos, N., McNamee, D., Mavrommatis, P., Wang, K., & Modadugu, N. The Ghost in the Browser: Analysis of Web-based Malware. HotBots, 7, 4-4. Usenix. https://www.usenix.org/event/hotbots07/tech/full_papers/ provos/provos.pdf (This makes several points: - many applications make up a website (e.g., web server, PHP, databases, ...) and all need to be updated, which is difficult - ads and other content may be external to a website (e.g., blog post comments) - scripting applications are a big problem (e.g., pBB2) - ad space can be rented out by renters, so a website ends up with content it may not trust (but trust is not transitive!) - scripts can be used to find vulnerable software, and then a download to exploit it can be selected... - attacks are scalable as the user's computer does the work! - social engineering, tricking the user into downloading malware, can bypass security measures - while merely escaping Javascipt code seems simple, "it is highly effective against both signature and anomaly-based intrusion detection systems." however, it is often the case that "reputable web-pages obfuscate the Javascript they serve. Thus, obfuscated Javascript is not in itself a good indicator of malice..." ) This paper defines: - Trojan: software that contains or installs a malicious program with a harmful impact on a user's computer. - Adware: software that automatically displays advertising material The paper says that: - Trojans were installed on over 300K web pages in 2006 - These can result in "web-based botnets" ------------------------------------------ OUR FOCUS Web-based attacks, not attacks on browsers themselves Examples: - Cross-site Scripting (XSS) - SQL injection - Cross-site Request Forgery (CSRF) ------------------------------------------ Q: Are XSS and CSRF important kinds of attacks? Yes, XSS is part of A03 (injection), 3rd most important for OWASP and CSRF was formerly int the top 10 ** background *** URLs ------------------------------------------ URL http://columbia.edu:80/class?name=4995#h ^ ^ ^ ^ ^ ^ | | ^ \path \query | | \host name \port | protocol fragment Special characters are encoded as hexadecimal escapes (e.g.): - %0A = newline - %20 = space ------------------------------------------ *** HTTP **** requests ------------------------------------------ HTTP REQUESTS Method File name version | | | v v v GET /index.html HTTP/1.1 Accept: image/gif, image/x-bitmap, image/jpeg, */* Accept-Language: en Connection: Keep-Alive User-Agent: Mozilla/1.22 (compatible; MSIE 2.0; Windows 95) Host: www.example.com Referer: http://www.google.com?q=dingbats <- Blank line <- Data (none) ------------------------------------------ The indented lines are continuations of previous lines Q: Is this any different in HTTPS? No, same info, but the information is encrypted (using SSL or TLS) Q: Does HTTPS guarantee that the browser and server can trust each other? No, it just prevents evesdropping Q: What is the difference between GET and POST? Post sends data (e.g., from a form) to the server (on the data line) which can change the session's state **** response ------------------------------------------ HTTP RESPONSE Protocol Status Reason phrase | Code / | | / v v v HTTP/1.1 200 OK Date: Thu, 24 Jul 2008 17:36:27 GMT Server: Apache-Coyote/1.1 Content-Type: text/html;charset=UTF-8 <-- Blank line ... data ... ------------------------------------------ The Date, Server, and Content-type lines are headers Q: Does the data need to be HTML? No, it could be other things Q: Could the response redirect the browser to another URL? Yes, it could set Location header to some other URL and return 307 status (used for POST requests and for when pages have moved) **** Browser execution model ------------------------------------------ BASIC BROWSER EXECUTION MODEL Loop for each window/tab/frame: - Load content - Render content - Processes HTML and scripts, possibly: - display images - recursively process subframes - Respond to events, which may be: - user actions (OnClick, OnMouseover) - rendering (OnLoad, OnBeforeUnload) - timing: (setTimeout, clearTimeout) ------------------------------------------ ------------------------------------------ EXAMPLE WEBPAGE Adapted from: http://www.w3schools.com/js/js_output.asp My First Web Page ------------------------------------------ Q: How would this execute? (go through the steps: load it, render it, process HTML to show the "Try it" button, then if clicked, it writes over the button with 80) Can try running this at: https://www.cs.ucf.edu/~leavens/CIS6614/lectures/web-app-security/ my-first-webpage.html **** Document-Object Model (DOM) ------------------------------------------ DOCUMENT-OBJECT MODEL (DOM) - API for web pages - Web pages are hierarchically-structured data Property examples: document.alinkColor document.URL document.forms[] document.links[] document.anchors[] ... Methods: document.write() ... DOM includes Browser-Object Model (BOM): window, document, frames[], history location, navigator ------------------------------------------ Q: Have you used this before in Javascript? ------------------------------------------ CHANGING THE HTML USING JAVASCRIPT Examples of Javascript methods that can change HTML: - createElement(elementName) - createTextNode(text) - appendChild(newChild) - removeChild(node) ------------------------------------------ Q: Could Javascript be used to add a new list item to a displayed list? Yes, with something like: var list = document.getElementById('t1') var newitem = document.createElement('li') var newtext = document.createTextNode(text) list.appendChild(newitem) newitem.appendChild(newtext) ** isolation of web sessions ------------------------------------------ FRAMES and IFRAMES Frames are HTML elements Uses of frames: - delegate screen area to another source - isolation from browser, so parent may work even if frame broken Kinds of frames: - Frame: rigid division of webpage - iFrame: floating inline frame ------------------------------------------ Q: What webpages have you seen that use frames? (probably almost everything commercial) used in: - gmail interface - to serve ads (e.g., on politicalwire.com) **** browser is analogous to OS ------------------------------------------ BROWSER ACTS LIKE OS OS WEB BROWSER Data: Data: - Files - Cookies Operations: Operations: - System calls - DOM Actor: Actor: - Process - Frame Principal: Principal: - User - Origin Access control: Access control: - mandatory - discretionary Vulnerabilities: Vulnerabilities: - buffer overflow - XSS - elev. of priv. - CSRF - CPU cache hist. - Cache history ------------------------------------------ ** revisiting the goals ------------------------------------------ MORE SPECIFIC GOALS Each frame has an origin protocol://host:port Associate data with an origin Policy: ------------------------------------------ Q: Does an origin really correspond to an individual person? Yes, somewhat, unless 2 people are using the same browser Q: What would be a concrete example of an origin? https://example.com:80 Q: What would be a good way to use origins and frames to formalize isolation? use something like information flow security ... - each frame can only access data from its own origin (i.e., frame's computation doesn't depend on data from other origins) ** attacks ------------------------------------------ ATTACK OVERVIEW OWASP Top 10: 2013 2021 4. 1. Broken access control 1. 3. Injection and XSS 2. 7. Broken authentication 10. SSRF ------------------------------------------ Q: What is broken access control? Several mistakes including (from https://owasp.org/Top10/A01_2021-Broken_Access_Control/): - Not checking access control - Violation of principle of least privilege - Bypassing control checks by modifying URL - Insecure direct object references (providing object IDs to wrong users, or sending in clear) - API with missing access controls (for POST, PUT and DELETE) - Elevation of privilege - Metadata manipulation (including JWT) - CORS misconfiguration allowing access from unauthorized/untrusted origins - forced browsing with elevated access