COP 3402 meeting -*- Outline -*-

* Processes and Threads
   Based on chapter 9 of Systems Software by Montagne

** Process (recall the definition)
------------------------------------------
           RECALL: PROCESS

def: a *process* (or task) in an OS
    is a program that is being run

Characteristics:
 - is active (running or waiting to run)
 - has its own memory
     PAS = Process Address Space
 - has various permissions

------------------------------------------
        Recall that a process is a program that has been loaded
            "an activity or series of activities" being carried out

        The OS tries to give each process the illusion that
            it has the whole computer to itself (by default)

        The PAS includes the runtime stack, registers,
                 heap, files, etc.
        The process's permissions are 
                usually inherited from the user
                that started it

** Threads
*** Use case: servers
------------------------------------------
WRITING A SERVER: USE CASE FOR PARALLELISM

Want a server that can:

  - handle multiple asynchronous requests
    from many clients
  - access shared data
  - reserve virtual resources for clients
     e.g., airline seats

How to do that?


------------------------------------------
        Think of a web server or an airline reservation system

        Expensive way to do it:
           - create a process for each request
             (processes are "expensive" to create and switch between)
           - hard to share data

        One good structure:
           - each client request
             executes in parallel
             on behalf of the client

           Then sharing among clients is sharing among these requests...
             
*** Supporting Parallelism
------------------------------------------
        HOW TO SUPPORT PARALLELISM?

Modern computers have many Cores + GPUs...

How to write programs that use them?

  - Several processes,
      each uses a separate core, but:


Goals:
  - parallel execution
  - share memory
  - low overheads for:


------------------------------------------

      ...
        - separation of address spaces
           makes cooperation difficult
        - context switching is "expensive"

      ...
        - creating parallel executions
        - switching between them
        - having them cooperate safely

------------------------------------------
            THREADS

def: a *thread* is a unit of parallel
     execution in a process

Characteristics:


------------------------------------------
   ...
    - shares address space with other
      threads in the same process
    - but has its own stack and locals

    Each thread does not necessarily have its own core,
       can have threads in a one core machine by timesharing

    But with multiple cores,
      want the OS to dispatch threads to multiple cores

** Sharing data safely
      See chapter 10 of Montagne's book: Systems Software
------------------------------------------
           CONCURRENCY

Apparent concurrency:

   - threads that execute by
      interleaving instructions
      (sharing a core,
      timesharing simulating parallelism)

True concurrency

   - threads that execute simultaneously
      multiple instructions
      at the same time
      (multiple cores, parallelism)

------------------------------------------

*** Race conditions
------------------------------------------
           RACE CONDITIONS

Imagine 2 threads, T1 and T2
that share a global int variable k
and both execute:

     k = k + 1;

How is this compiled?


What does the hardware do?


------------------------------------------

       ... (load frame for k)
           load value of k
           load 1
           add
           store into k

       ... k fetches k from memory
            (and puts it in a register or on a stack)
           puts 1 in a register (or on a stack)
           adds the registers (or top stack elements)
              and puts the result in a register (or on stack)
           writes that value into memory

       Q: Does each thread have its own stack (or registers)?
          Yes!
       Assume that we can read an int (k) atomically
          (hardware supported)
          and can write k atomically (also due to hardware)
       But shared memory access is mediated (by a "bus controller")
          that orders requests to read and write memory

------------------------------------------
           POSSIBLE EXECUTIONS

      global k: 0

T1                  T2
 k_1: 0              k_2: 0
 o_1: 1              o_2: 1
 r_1: 1              r_2: 1
 (writes r_1)        (waiting...)
      global k: 1
                     (writes r_2)
      global k: 1


T1                  T2
 k_1: 0              (waiting ...)
 o_1: 1             
 r_1: 1             
 (writes r_1)       
      global k: 1
                     k_2: 1
                     o_2: 1
                     r_2: 2
                     (writes r_2)
      global k: 2

------------------------------------------

        Q: Is there any other way the final value of k could be 2?
           Yes, T1 could wait first
        Q: Is there any other way the final value of k could be 1?
           Yes, both could wait a bit but still read the 0 before the write

        Q: What could happen if the shared variable was a double?
            If the hardware can't read/write all of that in one step,
             then could get strange results

        Q: What could happen if the shared memory was an array?
            Could get a mix of results

------------------------------------------
           POSSIBLE EXECUTIONS

Suppose int a[2] is a global,
  initially a[0] == 0 and a[1] == 0
and T1 and T2 both execute:

 a[0] = a[0]+4; a[1] = a[1]+3;

T1                  T2
 a0_1: 0             a0_2: 0
 f_1: 4              f_2: 4
 r0_1: 4             r0_2: 4
 (writes r0_1)       (waiting...)
      a[0]: 4 a[1]: 0
                     (writes r0_2)
      a[0]: 4 a[1]: 0
 a1_1: 0             a1_2: 0
 t_1: 3              t_2: 3
 r1_1: 3             r1_2: 3
 (writes r1_1)       (waiting...)
      a[0]: 4 a[1]: 3
                     (writes r1_2)
      a[0]: 4 a[1]: 3

T1                  T2
 a0_1: 0              (waiting ...)
 f_1: 4             
 r0_1: 4             
 (writes r0_1)       
      a[0]: 4 a[1]: 0
                     a0_2: 4
                     f_2: 4
                     r0_2: 8
                     (writes r0_2)
      a[0]: 8 a[1]: 0
      ...
------------------------------------------
        Q: What possible results can appear in a[0]?
            4 and 8
        Q: What possible results can appear in a[1]?
            3 and 6

        Q: So how many possible end states are there?
            4 of them: a[0]: 4 a[1]: 3,
                       a[0]: 8 a[1]: 3,
                       a[0]: 4 a[1]: 6,
                       a[0]: 8 a[1]: 6

           This is sometimes called a state space explosion

------------------------------------------
            RACE CONDITIONS

def: a *race condition* occurs when


def: a *critical section* is an area of
     code in which a


def: *mutual exclusion* is a technique
     that


------------------------------------------
        ... two or more threads (or processes)
            can update (the state of) a shared resource
            in ways that may produce different final states

        ... thread (or process) accesses
            a shared resource (e.g., a variable)

        ... two or more threads (or processes)
            from executing their critical sections at the same time

*** safe synchronization
**** Goal
       Q: Do we want race conditions to occur?
          No...
       Q: Why?
          Because they make it hard to debug, understand, and check programs.
             Hard to debug because the situation that caused a bug
               might be hard to reproduce
             Hard to understand and check because of state space explosion
          In an OS, race conditions can cause problems including
            security issues
------------------------------------------
    GOALS OF SYNCHRONIZATION MECHANISMS

def: a *serial execution* is an execution
     equivalent to

def: an execution is *atomic* iff


Goals: Allow programmers to:


------------------------------------------
  ... executing one thread at a time
      (equivalent in terms of getting the same results = final states)

  ... it always finishes completely or not at all
  
  ...
     - serialize execution of threads
     - while having them run efficiently
        (as fast as possible measured by wall clock time)

     (other goals might be: utilize computer's resources maximally,
       or be energy efficient)

*** techniques for safe synchronization
------------------------------------------
     SAFE SYNCHRONIZATION MECHANISMS

Locking:


Monitors:


------------------------------------------
     ...
     a locked resource can only be accessed by the lock's owner

        lock();
        // critical section code
        unlock();

     ...
     an abstract server,
     which only one client can run at a time

     (in essence, each client tries to get a single lock for the monitor,
      inside the monitor's code, there is only one thread executing at
      a time)

     (found in CSP, Erlang, ...)

 *** low-level implementation
------------------------------------------
         LOW-LEVEL IMPLEMENTATION

How to implement locks?

In hardware:
   atomic
     test-and-set or compare-and-swap
   instructions


In an OS
   disable interrupts to do atomic actions
------------------------------------------