CIS 6614 meeting -*- Outline -*-

* Web Application Security
   Based on notes by Suman Jana, which are based on slides by John Mitchell

** Goal
------------------------------------------
       GOALS OF WEB SECURITY

Confidentiality:
  - No unwanted information disclosure
       by browsing web

Isolation:
  - Site A cannot interfere with
        session browsing site B


Web app security:
  - Apps on web can achieve same security
       as on desktop
------------------------------------------
        Q: What does the first goal mean about user behavior?
            We shouldn't depend on them to not visit malicious websites

        Q: What would be one way to formalize (or check) the second goal?
            - That it is impossible for one frame in the browser to read
              from or write to the state of another frame, unless
              granted permission
            - That a website can delegate to another one, in a frame,
              and not be interfered with
              
        Q: What does it mean for one site to interfere with a session
           browsing another site?
           - popping up a window (over that page)
           - reading or writing data or queries
           
** threat model
------------------------------------------
        WEB ATTACKER THREAT MODEL

Attacker can:

  - controls website (attacker.com)
  - can obtain SSL/TLS certificate(s)

However, attacker does NOT:

  - control network

------------------------------------------
        The website would presumably be visited by the victim

        Q: What does controlling a website mean from the user's viewpoint?
           can send any HTML code (and Javascript...)

        Q: What would a web-based attacker want to do?
            - steal information (e.g., PII)
            - control user's machine (privilege escalation, remote
        code execution)

        In network attack model:
           attacker controls the network and can intercept communications
        So we are not considering that...

*** focus
------------------------------------------
        FOCUS NOT ON  WEB MALWARE

Web malware (exploiting browsers):

     - trojans
     - adware
     (called "drive-by-downloads")

  - control as in our previous study
  - but NOT our focus now

Instead:

  - we now focus on the web attacks
       that are specific to the web
------------------------------------------
        Browser vulnerabilities could result in some threats:
             information disclosure, tampering, spoofing, ...

        For background on browser vulnerabilities, see

            Provos, N., McNamee, D., Mavrommatis, P., Wang, K.,
            & Modadugu, N.
            The Ghost in the Browser: Analysis of Web-based Malware.
            HotBots, 7, 4-4. Usenix.
            https://www.usenix.org/event/hotbots07/tech/full_papers/
                    provos/provos.pdf

            (This makes several points:
                  - many applications make up a website
                       (e.g., web server, PHP, databases, ...)
                    and all need to be updated, which is difficult
                  - ads and other content may be external to a website
                       (e.g., blog post comments)
                  - scripting applications are a big problem (e.g., pBB2)
                  - ad space can be rented out by renters,
                    so a website ends up with content it may not trust
                       (but trust is not transitive!)
                  - scripts can be used to find vulnerable software,
                      and then a download to exploit it can be
                      selected...
                  - attacks are scalable as the user's computer does
                      the work!
                  - social engineering, tricking the user into
                      downloading malware, can bypass security measures
                  - while merely escaping Javascipt code seems simple,
                     "it is highly effective against both signature
                     and anomaly-based intrusion detection systems."
                    however, it is often the case that "reputable
                     web-pages obfuscate the Javascript they serve.
                     Thus, obfuscated Javascript is not in itself
                     a good indicator of malice..."
                 )

              This paper defines:

                 - Trojan: software that contains or installs a malicious
                   program with a harmful impact on a user's computer.
                 - Adware: software that automatically displays
                   advertising material

              The paper says that:
                - Trojans were installed on over 300K web pages in 2006
                - These can result in "web-based botnets"
            
------------------------------------------
          OUR FOCUS

Web-based attacks,
  not attacks on browsers themselves

Examples:

 - Cross-site Scripting (XSS)
 - SQL injection
 - Cross-site Request Forgery (CSRF)
------------------------------------------
        Q: Are XSS and CSRF important kinds of attacks?
           Yes, XSS is part of A03 (injection), 3rd most important for OWASP
                 and CSRF was formerly int the top 10

** background
*** URLs
------------------------------------------
           URL

  http://columbia.edu:80/class?name=4995#h
  ^      ^            ^ ^     ^          ^
  |      |            ^ \path \query     |
  |      \host name   \port              |
  protocol                        fragment

Special characters are encoded as
 hexadecimal escapes (e.g.):

   - %0A = newline
   - %20 = space

------------------------------------------

*** HTTP
**** requests
------------------------------------------
             HTTP REQUESTS

Method File name version
 |        |       |
 v        v       v

GET /index.html HTTP/1.1
Accept: image/gif, image/x-bitmap,
                          image/jpeg, */*
Accept-Language: en
Connection: Keep-Alive
        User-Agent: Mozilla/1.22
        (compatible; MSIE 2.0; Windows 95)
Host: www.example.com
Referer: http://www.google.com?q=dingbats
                <- Blank line
                <- Data (none)
------------------------------------------
        The indented lines are continuations of previous lines

        Q: Is this any different in HTTPS?
           No, same info, but the information is encrypted
             (using SSL or TLS)

        Q: Does HTTPS guarantee that the browser and server can trust
            each other?
           No, it just prevents evesdropping

        Q: What is the difference between GET and POST?
           Post sends data (e.g., from a form) to the server
              (on the data line)
              which can change the session's state

**** response
------------------------------------------
              HTTP RESPONSE

Protocol Status  Reason phrase
  |      Code    /
  |       |     /
  v       v    v 

HTTP/1.1 200 OK
Date: Thu, 24 Jul 2008 17:36:27 GMT
Server: Apache-Coyote/1.1
Content-Type: text/html;charset=UTF-8
                            <-- Blank line
<html> ... data ... </html>
------------------------------------------
      The Date, Server, and Content-type lines are headers

      Q: Does the data need to be HTML?
          No, it could be other things

      Q: Could the response redirect the browser to another URL?
          Yes, it could set Location header to some other URL and
          return 307 status
             (used for POST requests and for when pages have moved)

**** Browser execution model
------------------------------------------
     BASIC BROWSER EXECUTION MODEL
     
Loop for each window/tab/frame:
  - Load content
  - Render content
  - Processes HTML and scripts, possibly:
     - display images
     - recursively process subframes
  - Respond to events, which may be:
     - user actions (OnClick, OnMouseover)
     - rendering (OnLoad, OnBeforeUnload)
     - timing: (setTimeout, clearTimeout)

------------------------------------------

------------------------------------------
              EXAMPLE WEBPAGE

Adapted from:
 http://www.w3schools.com/js/js_output.asp

<!DOCTYPE html>
<html lang="en">
<meta http-equiv="Content-Type"
      content="text/html; charset=utf-8">
<title>My First Web Page</title>
<body>
<button type="button"
  onclick="document.write(5 + 6)">
  Try it</button>
</body>
</html>

------------------------------------------

  Q: How would this execute?
      (go through the steps:
          load it, render it, process HTML to show the "Try it" button,
          then if clicked, it writes over the button with 80)
          
  Can try running this at:
   https://www.cs.ucf.edu/~leavens/CIS6614/lectures/web-app-security/
           my-first-webpage.html
        
**** Document-Object Model (DOM)
------------------------------------------
     DOCUMENT-OBJECT MODEL (DOM)

- API for web pages

- Web pages are hierarchically-structured
   data

  Property examples:
     document.alinkColor
     document.URL
     document.forms[]
     document.links[]
     document.anchors[]
     ...

  Methods:
     document.write()
     ...

DOM includes Browser-Object Model (BOM):
   window, document, frames[], history
   location, navigator

------------------------------------------
        Q: Have you used this before in Javascript?

------------------------------------------
    CHANGING THE HTML USING JAVASCRIPT

Examples of Javascript methods
  that can change HTML:

 - createElement(elementName)
 - createTextNode(text)
 - appendChild(newChild)
 - removeChild(node)
------------------------------------------

        Q: Could Javascript be used to add a new list item to a
        displayed list?
          Yes, with something like:
             var list = document.getElementById('t1')
             var newitem = document.createElement('li')
             var newtext = document.createTextNode(text)
             list.appendChild(newitem)
             newitem.appendChild(newtext) 

** isolation of web sessions
------------------------------------------
         FRAMES and IFRAMES

Frames are HTML elements

Uses of frames:
 - delegate screen area to another source
 - isolation from browser, so
    parent may work even if frame broken

Kinds of frames:
  - Frame: rigid division of webpage
  - iFrame: floating inline frame
------------------------------------------

        Q: What webpages have you seen that use frames?
           (probably almost everything commercial)
           used in: - gmail interface
                    - to serve ads (e.g., on politicalwire.com)

**** browser is analogous to OS
------------------------------------------
          BROWSER ACTS LIKE OS

OS                   WEB BROWSER          
                     
Data:                Data:                
 - Files              - Cookies           
                     
Operations:          Operations:          
 - System calls       - DOM                 
                     
Actor:               Actor:               
 - Process            - Frame             
                     
Principal:           Principal:           
 - User               - Origin            
                     
Access control:      Access control:      
 - mandatory          - discretionary     

Vulnerabilities:     Vulnerabilities:
 - buffer overflow    - XSS
 - elev. of priv.     - CSRF
 - CPU cache hist.    - Cache history

------------------------------------------

** revisiting the goals
------------------------------------------
            MORE SPECIFIC GOALS

Each frame has an origin
    protocol://host:port

Associate data with an origin

Policy:


------------------------------------------
        Q: Does an origin really correspond to an individual person?
            Yes, somewhat, unless 2 people are using the same browser
            
        Q: What would be a concrete example of an origin?
            https://example.com:80

        Q: What would be a good way to use origins and frames to
            formalize isolation?
              use something like information flow security

        ... - each frame can only access data from its own origin
                (i.e., frame's computation doesn't depend on data from
                other origins)

** attacks

------------------------------------------
           ATTACK OVERVIEW

OWASP Top 10:

2013 2021
 4.  1.   Broken access control
 1.  3.   Injection and XSS
 2.  7.   Broken authentication

    10.   SSRF


------------------------------------------
        Q: What is broken access control?
           Several mistakes including
           (from https://owasp.org/Top10/A01_2021-Broken_Access_Control/):
             - Not checking access control
             - Violation of principle of least privilege
             - Bypassing control checks by modifying URL
             - Insecure direct object references
                  (providing object IDs to wrong users, or sending in clear)
             - API with missing access controls (for POST, PUT and DELETE)
             - Elevation of privilege
             - Metadata manipulation (including JWT)
             - CORS misconfiguration allowing access from
                unauthorized/untrusted origins
             - forced browsing with elevated access