CIS 6614 meeting -*- Outline -*-

* Injection Attacks

     Based in part on the book

     Michael Howard and David LeBlanc and John Viega.
     24 Deadly Sins of Software Security: Programming Flaws and How to Fix Them.
     McGraw-Hill, Inc., 2009.

** What is the problem?
     Note: this is no longer just a C language problem

     See https://xkcd.com/327
     (or the file exploits_of_a_mom.png in this directory)

------------------------------------------
          THE PROBLEM

Code Injection is
   3rd on OWASP Top Ten
   (owasp.org/www-project-top-ten/,
   formerly in first place)

From OWASP A03 Injection
  (owasp.org/Top10/A03_2021-Injection/):

  "App vulnerable when:

  "User-supplied data is not validated,
   filtered, or sanitized"

  "Dynamic queries or non-parameterized
  calls without context-aware escaping
  are used directly in the interpreter."

  "Hostile data is directly used
  or concatenated."

  "Common injections are:
     SQL, NoSQL, OS command,
     Object Relational Mapping (ORM),
     LDAP, and Expression Language (EL)
     or Object Graph Navigation Library
     (OGNL) injection."
     
------------------------------------------

*** examples
**** SQL Injection Attacks
------------------------------------------
          EXAMPLE SCENARIO IN SQL

Scenario 1 from OWSAP A03
 (owasp.org/Top10/A03_2021-Injection/):

 String query =
  "SELECT \* FROM accounts WHERE custID='"
     + request.getParameter("id")
     + "'";

------------------------------------------
        Q: What happens if the input starts with ' or '1'='1 ?
           Then the user's input is taken as an SQL command and executed!
           In the second case, this query selects all accounts!

           "More dangerous attacks could
           modify or delete data or even invoke stored procedures."

        This problem can also be caused by leaving the database port
        open to the internet (and using a default sysadmin password)!

        This is CWE-89 and also covered in the PCI Data Security
        Standard requirement 6.5.6.

        Q: What programming languages can have this problem?
        Almost any language that interfaces with SQL, including:
        C#, PHP, Python, Ruby, Java, C, C++

**** Example in Python

        The following Python example is from the 24 Deadly sins book, p. 9
        
------------------------------------------
  SQL INJECTION VULNERABILITY IN PYTHON

import MySQLdb
conn = MySQLdb.connect(host="...",
                       port=3306,
                       user=admin,
                       password="passwd",
                       db=clientsDB)
cursor = conn.cursor()
cursor.execute("select * from customer
                where id=" + id)
results = cursor.fetchall()
conn.close()
------------------------------------------

**** Example in SQL

        The vulnerability also affects SQL itself.
        (This is from the 24 Deadly Sins book, p. 12)

------------------------------------------
        SQL INJECTION IN SQL

CREATE PROCEDURE dbo.doQuery(
                           @id nchar(128))
AS
  DECLARE @query nchar(256)
  SELECT @query =
   'select ccnum from cust where id = '''
   + @id + ''''
  EXEC @query
RETURN
------------------------------------------

        Q: What does this do?
           It looks for credit card numbers where the id
           is given by a parameter.
        Q: How could this cause an injection attack?
           If @id is user input

***** What an Attacker Could Do
------------------------------------------
         WHAT ATTACKER WOULD DO

 Add more clauses to the query,

 Comment out clauses not needed for attack

Example input:

 1 or 2>1 ---
 
------------------------------------------

  Q: What is the effect of that input?
     It selects all rows in the table (since 2>1 is true).

  The classic input is "1=1" but systems may look for that...

  Q: What is necessary for the attack to work?
     That attacker can give input
     that is passed directly to the SQL interpreter.

***** Attack can be obfuscated
------------------------------------------
           AN EXPLOIT FROM 2008

orderitem.asp?IT=GM-204;DECLARE%20@S
%20NVARCHAR(4000);SET%20@S=CAST(
0x4400450043004C0041005200450020004000...
...
...F007200%20AS@20NVARCHAR(4000));
EXEC(@S);--

which decodes to:

DECLARE @T varchar(255)'@C varchar(255)
DECLARE Table_Cursor CURSOR FOR select...
------------------------------------------

        Q: How does the obsfucation work in this example?
        It is using the CAST primitive of SQL to convert a hex string
        into text

        Q: What lesson can we learn from such eamples?
        Injected code doesn't have to look like code at first...
        It's best to completely avoid using untrusted inputs

**** Example in HQL

------------------------------------------
          EXAMPLE SCENARIO in HQL

Scenario 2 from OWSAP A03
 (owasp.org/Top10/A03_2021-Injection/):

 Query HQLQuery =
    session.createQuery(
    "FROM accounts WHERE custID='"
    + request.getParameter("id") + "'");

------------------------------------------
        Q: What happens if the input starts with
            ' or '1'='1
           ?
           Again, the query selects all accounts!

        Q: So, is this kind of attack limited to SQL?
           No, but that is a particularly popular kind of attack...
           It can affect any interpreter

*** Format String Attacks
    This is CWE-134: Uncontrolled Format String.

    See also the 24 Deadly Sins book, chapter 6
      and (for details) https://seclists.org/bugtraq/2000/Sep/214

    This kind of attack mostly affects C and C++,
       but could also affect languages that are translated into C/C++.

**** Simple Example
------------------------------------------
  EXAMPLE OF FORMAT STRING VULNERABILITY

/* A Unix command, written in C */
#include <stdio.h>
int main(int argc, char *argv[]) {
    if (argc > 1) {
        printf(argv[1]);
    }
    return 0;
}
------------------------------------------

        Q: What could go wrong?
        If the input is "%p" then the output is the address
        of the top of the runtime stack.
        This can leak information about ASLR!
          (including the main function's return address, etc.)

        The current gcc will warn about such a usage of printf

**** The %n format specifier
------------------------------------------
           THE %n FORMAT SPECIFIER

%n writes
  the number of characters written so far
  into the corresponding argument

Useful example:

   unsigned int bytes;
   printf(%s%n\n, argv[1], &bytes);

then bytes is set to the number of
   characters in argv[1]
------------------------------------------
        Q: How could an attacker abuse such a format string?
          From the 24 Deadly Sins book (p. 111):
          1. Put the address desired on the stack
              (e.g., using a buffer overflow)
          2. Give input of the right length to write the number desired
              into that address

        These format strings allow attackers to probe the stack and
        correct the attack dynamically (p. 112)

**** More Revealing Example
   This example is in the current directory
     files fmtme.c and the Makefile
     
------------------------------------------
    EXAMPLE FROM TIM NEWSHAM'S BLOG
 https://seclists.org/bugtraq/2000/Sep/214

// code from:
// seclists.org/bugtraq/2000/Sep/214
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char **argv)
{
  char buf[100];
  int x;
  if(argc != 2) {
      exit(1);
  }
  x = 1;
  snprintf(buf, sizeof(buf), argv[1]);
  buf[sizeof(buf) - 1] = 0;
  printf("buf (length is %lu): \"%s\"\n",
	   strlen(buf), buf);
  for (int i = 0; i < strlen(buf); i++) {
      printf("buf[%d] is '%c' (0x%x)\n",
	     i, buf[i], buf[i]);
  }
  printf(
    "x is %d (0x%x) (@ address %p)\n",
    x, x, &x);
  return 0;
}

------------------------------------------

        Q: Does gcc give any warning when this is compiled?
           No, it apparently doesn't worry about the use of a format
           string as an argument to snprintf...

        The commands to run this are in the Makefile
------------------------------------------
         RUNNING FMTME

$ ./fmtme "hello"
buf (length is 5):
hello
x is 1 (0x1) (@ address 0x7ffffcbcc)
------------------------------------------

        Q: What does the output of the command ./fmtme "hello" show?
        That it's copying the command line argument into buf
        and printing that, and other information.

        Q: Could we use the address printed to help defeat ASLR?
           Yes, but we won't do that now...

        Q: Is that command the same as running the command
           perl -e 'system "./fmtme", "hello"'?
           yes, that will be convenient for later.

------------------------------------------
         PASSING A FORMAT STRING

Consider the command

$ ./fmtme \
'abcd\n0x%lx 0x%lx 0x%lx 0x%lx'
buf (length is 43):
abcd
0x0 0x0 0x1ffffcd30 0x3078300a64636261
x is 1 (0x1) (@ address 0x7ffffcbcc)
------------------------------------------

        You can see in the output that the format string interpreter
         is being run on the input (and that it's printing out
         extra information), so the attacker will realize that
         their input is being interpreted as a format string.

         The extra information output is showing something about the
         runtime stack when snprintf is called.
         It seems to have 2 words of zeros (space for registers?)
           then it shows the caller's frame pointer
           (i.e., main's %rbp register)
           then first 8 chars printed into buf
           (since the x86 is little-endian,
           the first character is at the far right,
           and note the value of "abcd\n" is there, starting at the
           right, where a's hex code is 0x61, b is 0x62, c is 0x63,
           d is 0x64, \n is 0x0a, and then the null char '\0' is 0x00,
           then '0' is 0x30, 'x' is 0x78, '0' is 0x30,
           so the string starts: "abcd\n" and then continues, after
           the null char, with "0x0")

         The x86 64-bit register layout is partly based on
         http://6.s081.scripts.mit.edu/sp18/x86-64-architecture-guide.html
         
         More ASCII codes: ' ' is 0x20, 'l' is 0x63, '0' is 0x30

------------------------------------------
         ATTEMPTING TO SET X

perl -e 'system "./fmtme",
"\xcc\xcb\xff\xff\x07\x00%d%n%x%x%x%x\n"'
buf (length is 5):
ÌËÿÿ
x is 1 (0x1) (@ address 0x7ffffcbcc)

------------------------------------------
        This doesn't work, but on machine that used the stack,
        this would pull the address out of buf and use that
        as the address to write the number of characters written so
        far

        So the attacker controls:
         - the address written to and
         - the value written there.

**** Related Problems

        Q: What would happen if an attacker could control the
           format string used for scanf?
           Then the attacker could write into arbitrary parts of
           memory!

        Q: Could using sprintf cause a problem too?
           Sure, since the attacker can quote format character
           specifications that are later used in printf or fprintf...
           This is what is happening in fmtme.c

        Q: What if strings are stored in an external file that isn't protected?
           Then an attacker could change that file,
             and thus get the application to use strings that contain
             format specifiers

        Q: What if a user's locale (e.g., country) tells where (i.e.,
             in what directory) language-specific files are stored?
           Then an attacker could force the application to use their
             directory of files, which could have format strings.

*** Other Kinds of Injection Attacks
        Q: Are there any other attacks where we should not trust user input?
           Yes (see below)
------------------------------------------
     OTHER KINDS OF INJECTION ATTACKS

What other kinds of attacks
 might use inputs?


------------------------------------------

     ...     - buffer overflows are certainly one
               (where the amount of text read or numbers read controls
                the output)
             - Identification and Authentication failures (A07),
               when passwords or login ids are used without
               checking... 
             - Server side request failures (A10),
               when a web app fetches a remote resource
               without validating the user-supplied URL
               (e.g., using file:///etc/passwd as a URL)
             - Commands that execute user inputs
                 (as in a command processor, interpreter, or compiler)

     Q: Could this affect programs that act on user inputs?
         Yes, if the action is to pass the user input to a command interpreter
         Could that be an interpreter you write yourself? Yes!

** Conventional Tools for Preventing Injection Attacks
*** Do's and Don'ts from the 24 Deadly Sins Book
------------------------------------------
     PREVENTION TECHNIQUES

From "24 Deadly Sins" book (p. 18, 27-28):

 - Don't use string concatenation
    to build SQL

 - Do use parameterized queries
    to build SQL

 - Do check the input for validity
    at the server,

 - Do use regular expressions
    to parse input

 - Don't check input (for validity)
    ONLY at the client

 - Don't simply strip out "bad words"

 - Don't connect to the DB as a
    highly privledged account
------------------------------------------

       Q: Will these guidelines guarantee freedom from injection attacks?
            No, but they might reduce their frequency

*** Prevention Recommendations from OWASP
------------------------------------------
      PREVENTING INJECTION ATTACKS

From OWASP A03 Injection
 (owasp.org/Top10/A03_2021-Injection/)
 
 "The Preferred option is to
  use a safe API,
  which avoids using the interpreter"

 "Use positive server-side
 input validation.
 This is not a complete defense ..."

 "For residual dynamic queries,
 escape special characters using
 the specific escape syntax
 for that interpreter."

------------------------------------------

        Q: Why should using "any interpreter" be avoided?
           Beacuse it gives the attacker power to do arbitrary things!

        Notes from OWASP A03:
        "Even when parameterized, stored procedures can still introduce
        SQL injection if PL/SQL or T-SQL concatenates queries and data or
        executes hostile data with EXECUTE IMMEDIATE or exec()."

        "SQL structures such as table names, column names, and so on
        cannot be escaped, and thus user-supplied structure names are
        dangerous. This is a common issue in report-writing software."

        Q: Will these recommendations prevent injection attacks?
         No necessarily.