Global optimization (Global)
Compiler Project

dataflow
dataflow Skeleton code
Implementing the work list algorithm
globaloptimizer
Skeleton code for globaloptimizer
The constant propagation optimization

In this project, you will implement global (whole control-flow graph) optimizations, building on local analysis and optimization from the previous project. As with local optimization, there are two steps, analysis and rewriting. For global optimization, the analysis will use the fixed-point algorithm to iteratively update the data-flow state for each block until no more changes occur.

There are two methods you'll be implementing, dataflow and globaloptimize.

`dataflow`

First, dataflow runs the data-flow work list algorithm on the entire CFG, returning the same CFG annotated wtih the data-flow states arouand blocks (instate and outstate) and within block (analysis as in the prior project). The input to this method is the CFG and then the parameters of the analysis: the state transfer function from the prior project, the initial state, the join operator, and the direction of the analysis (forwards of backwards). For instance, the reaching definitions analysis is called like this:

cfg = dataflow(cfg, reaching_definitions, set(), union, Direction.Forwards)

where reaching_definitions is the per-instruction state-transfer function from the prior project, set() is the initial state for reaching definitions, union is the join operator (i.e., some path), and Direction.Forwards is the direction of the analysis (use Direction.Backwards for backwards analysis).

`dataflow` Skeleton code

Here is skeleton code to get started on the data-flow analysis implementation.

# enum for the analysis direction
Direction = Enum('Direction', ['Forwards', 'Backwards'])

# union and intersection function for join
union = lambda a, b: a.union(b)
intersection = lambda a, b: a.intersection(b)

def reversecfg(cfg):
 """Reverse the graph edges and block instructions for backwards analysis."""
 cfg = cfg.reverse() 
 for node in cfg.nodes:
   cfg.nodes[node]['instrs'] = list(reversed(cfg.nodes[node]['instrs']))
 return cfg

def dataflow(cfg, transfer, initial_state, join, direction):
 """Perform a data-flow analysis on a cfg.

 :param cfg: The control-flow graph of the program
 :param transfer: The statement-level state transfer function.  See analyzeblock for its signature.
 :param initial_state: The initial state of the entry block (forwards analysis) or exit block (backwards analysis).
 :param join: The join operator that combines input state with a predecessor's output state
 :param direction: The direction of the analysis, Direction.Forwards or Direction.Backwards, the latter of which reverses the graph and the instructions in each block and sets the exit node to be the entry block.
 :return: The same CFG with three new attributes on nodes, 'instate', 'outstate', and 'analysis'.  Analysis is a list of (state, instr) pairs, which contains the data-flow state just before the instr.  The instructions are the same as the 'instr' attribute, except that they are reversed for Direction.Backwards.
 """

 assert isinstance(direction, Direction)

 # invert the cfg for a backwards analysis
 if direction == Direction.Backwards: cfg = reversecfg(cfg)

 # select the entry node: always 0 for forwards, and always the last for backwards (see ControlFlow.py for this assumption)
 entry_node = 0 if direction == Direction.Forwards else max(cfg.nodes)

 # initialize state annotations
 for node in cfg.nodes:
   cfg.nodes[node]['instate'] = set()
   cfg.nodes[node]['outstate'] = set()
   cfg.nodes[node]['analysis'] = None

 # set the intial state to the instate of the entry node
 cfg.nodes[entry_node]['instate'] = initial_state

 # PROJECT: implement the worklist algorithm 

 # # uncomment for debugging output
 # for node in cfg.nodes:
 #   logging.debug(f'BB{node}')
 #   logging.debug(f'input state: {cfg.nodes[node]["instate"]}')
 #   logging.debug(f'output state: {cfg.nodes[node]["outstate"]}')
 #   logging.debug(f'analysis: {cfg.nodes[node]["analysis"]}')
 #   logging.debug("")

 # restore the cfg after a backwards analysis
 if direction == Direction.Backwards: cfg = reversecfg(cfg)

 return cfg

Implementing the work list algorithm

See Program Analysis Section 4.4 for pseudo-code.

Here are tips for implementation:

Try using a deque for the worklist:
```
from collections import deque
```
Then use
1. worklist = deque(cfg.nodes) to start with all CFG nodes
2. node = worklist.popleft() to get the next node
3. worklist.append(successor) to add a new node (successor) in this case, to the worklist
Use analyzeblock, provided in the previous project, to perform the analysis on the block, using the transfer function provided to dataflow.

analyzeblock takes the instate and set of instructions, i.e., cfg.nodes[node]['instate'] and cfg.nodes[node]['instrs']

analyzeblock returns the states for each instruction within the block and the final outstate, i.e., cfg.nodes[node]['analysis'] and cfg.nodes[node]['outstate']
Access the successors with cfg.successors(node). Recall that even in a backwards analysis, you'll use successors, because the given skeleton for dataflow reverses the CFG before and after analysis.
To detect change, check the result of the join operation against the prior state of the basic block, e.g.,
```
if newinput != cfg.nodes[succ]['instate']:
```
If any blocks changed, they need to be reanalyzed (by adding them back into the worklist). Once there is no more change, the algorithm stops.

`globaloptimizer`

Second, globaloptimizer replaces the localoptimizer from the previous project on local optimization. Call it instead:

def optimizer(cfg):
  return globaloptimizer(cfg)  # call the global optimizer

The globaloptimizer takes the entire control-flow graph, and runs all optimizations on it until no more change is detected. The implementation works by copying the CFG, in order to check for change after optimization, then runs each dataflow analysis and optimizer function on the CFG. The skeleton code below runs constant propagation, copy propagation, then dead code elimination. Each optimization runs the complete data analysis once (by calling the dataflow method), then runs the optimizer on each node in the CFG, which dataflow has annotated with the per-instruction data-flow state as in the previous project. For example, constant propagation first runs reaching definitions, then calls the the optimizer on the CFG with the constant_propagation rewriter,

cfg = dataflow(cfg, reaching_definitions, set(), union, Direction.Forwards)
# perform constant propagation optimization
for node in cfg.nodes:
  cfg.nodes[node]['instrs'] = optimizeblock(cfg.nodes[node]['analysis'], constant_propagation)

The first line calls dataflow on the cfg, setting up the parameters for a reaching definition analysis:

reaching_definitions is the per-instruction state transfer method.
set() is the initial state, i.e., no definitions reach the entry block by definitionl
union is the join function for reaching definitions, i.e., "is there some path along which the definition reaches this point in the program?"
Direction.Forwards performs dataflow in the forwards direction.

The next lines iterate over each basic block in the CFG for node in cfg.nodes:. Then it updates the set of the instructions inside the basic block by calling optimizeblock, given in the previous project, which takes the analysis results and the per-instruction rewriter function constant_propagation and returns a potentially updated set of instructions to reassign to the basic block's instrs annotation.

Skeleton code for `globaloptimizer`

Here is skeleton code for the global optimizer:

def globaloptimizer(cfg):

 logging.debug(textualize_graph(cfg))

 # perform optimizations until no more change is detected
 changed = True
 while changed:
   changed = False

   # make a copy of the cfg so we can check for change against the old graph
   original_cfg = cfg.copy()

   # EXAMPLE: performing reaching definitions analysis on the control flow graph
   cfg = dataflow(cfg, reaching_definitions, set(), union, Direction.Forwards)
   # perform constant propagation optimization
   for node in cfg.nodes:
     cfg.nodes[node]['instrs'] = optimizeblock(cfg.nodes[node]['analysis'], constant_propagation)

   # logging.debug(textualize_graph(cfg))

   # OPTIONAL: perform available expressions data flow analysis for common subexpression elimination
   # for node in cfg.nodes:
   #   cfg.nodes[node]['instrs'] = optimizeblock(cfg.nodes[node]['analysis'], common_subexpression_elimination)

   logging.debug(textualize_graph(cfg))

   # PROJECT: perform available expressions analysis on the control flow graph
   # perform copy propagation optimization
   for node in cfg.nodes:
     cfg.nodes[node]['instrs'] = optimizeblock(cfg.nodes[node]['analysis'], copy_propagation)

   logging.debug(textualize_graph(cfg))

   # PROJECT: perform live variables analysis on the control flow graph
   # perform dead code elimination
   for node in cfg.nodes:
     cfg.nodes[node]['instrs'] = list(reversed(optimizeblock(cfg.nodes[node]['analysis'], dead_code_elimination)))

   logging.debug(textualize_graph(cfg))

   # check for change
   for node in cfg.nodes:
     if original_cfg.nodes[node]['instrs'] != cfg.nodes[node]['instrs']:
       changed = True

 logging.debug(textualize_graph(cfg))

 return cfg

The constant propagation optimization

Here is the instruction rewriter for a constant propagation that uses the results of a reaching definitions analysis. Notice that it needs to iterate over all reaching definitions of a symbol to guarantee that they are all constant assignments to the same constant.

def constant_propagation(reaching, instr):
  """The constant propagation optimizer for a single instruction.

  :param available: The state before the instruction.
  :param instr: The instruction being optimized.
  :return: The instruction after optimization.
  """
  def is_definition_of(defined_var, instr):
    match instr:
      case Const(var, num):
        return var == defined_var
      case Assign(var, fromvar):
        return var == defined_var
      case Op(var, left, op, right):
        return var == defined_var
      case _:
        return False

  def is_constant(reaching, var):
    """If the variable is the same constant in all reaching definitions, return True, otherwise False."""
    return isinstance(reaching_constant(reaching, var), int)

  def reaching_constant(reaching, var):
    """If the variable is the same constant in all reaching definitions, return the constant value, otherwise return None."""
    # get all definitions of var in the data-flow state
    definitions = { x for x in reaching if is_definition_of(var, x) }
    # get all constant definitions of var
    const_definitions = { x for x in definitions if isinstance(x, Const) }
    # get all constants that var has been defined to
    constants = { x.num for x in const_definitions }
    # var is constant only if all definitions are constants, and all constant definitions are to the same constant
    if len(definitions) == len(const_definitions) and len(constants) == 1:
      return constants.pop()
    else:
      return None

  match instr:
    case Assign(var, fromvar):
      if is_constant(reaching, fromvar):
        constant = reaching_constant(reaching, fromvar)
        logging.debug(f"optimize {instr} by replacing {fromvar} with constant {constant} (constant propagation)")
        return Const(var, constant)
    case Op(var, left, op, right):
      if is_constant(reaching, left) and is_constant(reaching, right):
        constant = aop_map[op](reaching_constant(reaching, left), reaching_constant(reaching, right))
        logging.debug(f"optimize {instr} by replacing {left} {op} {right} with constant {constant} (constant propagation)")
        return Const(var, constant)
      # OPTIONAL: perform more strength reductions
      # if is_constant(reaching, right) and reaching_constant(reaching, left) and is_constant(reaching, right):
      #   constant = aop_map[op](reaching_constant(reaching, left), reaching_constant(reaching, right))
      #   logging.debug(f"optimize {instr} by replacing {left} {op} {right} with constant {constant} (constant propagation)")
      #   return Const(var, constant)
    case _:
      pass
  return instr

Global optimization (Global) Compiler Project

Table of Contents

dataflow

dataflow Skeleton code