UP | HOME

Local optimization (Local)
Compiler Project

Table of Contents

In this project, you will implement block-local optimizations that work by first analyzing the instructions in a block to infer data-flow facts then rewriting eligible instructions according to those facts. Provided below is an infrastructure that handles running the analyses and optimizations. For this project, you will write the state transfer functions and instruction rewrite functions that are taken by the infrastructure and run on each instruction in a block for you.

Here are the analyses and optimizations to implement for the project. One analysis, reaching definitions, is given, but the constant propagation function is not. The copy propagation optimization is given, but its the available expressions analysis it depends on isn't.

Analysis Optimization
(given) Reaching definitions (TODO) Constant propagation
(TODO) Available expressions (given) Copy propagation
  (optional) Common subexpression elimination
(TODO) Live variables (TODO) Dead code elimination

Local optimizer driver function

Here is the driver for local optimization. It will not work correctly in general on programs that have branching, but we will replace it with the global optimizer in the next project. The local optimizer works by iterating over over basic block nodes (in an arbtrary and running both the analysis analyzeblock and the rewriter optimizeblock on each block.

def optimizer(cfg):
  return localoptimizer(cfg)  # call the local optimizer

def localoptimizer(cfg):
  logging.debug(textualize_graph(cfg))  # example debugging, outputting the graph

  for node in cfg.nodes:
    initial_state = set()
    analysis, final_state = analyzeblock(initial_state, cfg.nodes[node]['instrs'], reaching_definitions)
    cfg.nodes[node]['instrs'] = optimizeblock(analysis, constant_propagation)

  logging.debug(textualize_graph(cfg))

  # (optional)
  # for node in cfg.nodes:
  #   initial_state = set()
  #   analysis, final_state = analyzeblock(initial_state, cfg.nodes[node]['instrs'], available_expressions)
  #   cfg.nodes[node]['instrs'] = optimizeblock(analysis, common_subexpression_elimination)

  # logging.debug(textualize_graph(cfg))

  for node in cfg.nodes:
    analysis, final_state = analyzeblock({}, cfg.nodes[node]['instrs'], available_expressions)
    cfg.nodes[node]['instrs'] = optimizeblock(analysis, copy_propagation)

  logging.debug(textualize_graph(cfg))

  for node in cfg.nodes:
    initial_state = set([util.output_variable])
    analysis, final_state = analyzeblock(initial_state, reversed(cfg.nodes[node]['instrs']), liveness)
    # example debugging
    logging.debug("block\n" + "\n".join( f'{instr}: {live}' for (live, instr) in analysis))
    logging.debug(f"final state: {final_state}")
    instrs = reversed(optimizeblock(analysis, dead_code_elimination))
    cfg.nodes[node]['instrs'] = list(filter(lambda x: x != None, instrs))

  logging.debug(textualize_graph(cfg))

  return cfg

The analyzeblock function

def analyzeblock(initial_state, instrs, transfer):
  """Take a list of instructions and a state transfer function and
return a list of (state, instruction) pairs and a final state.  The
state transfer function takes the current state and the next
instruction, then returns the updated state after interpreting the
instruction.

:param initial_state: The initial state before the first instruction
:param instrs: The list of instructions.
:param transfer: The transfer function, which takes an input state and an instruction and returns the output state.

def transfer(state: set, instr: Instruction) -> set:
  # body of the function
  return updated_state
"""
  state = initial_state
  analysis = []
  for instr in instrs:
    analysis.append((state, instr))
    state = transfer(state, instr)
  final_state = state
  return analysis, final_state

Called by localoptimizer, analyzeblock runs the state transition function transfer on each instruction in the given basic block instr, starting with an initial_state. It returns the analysis result as a list of (state, instruction) pairs that will be used by the rewriter to modify the instructions.

Example analysis: reaching definitions

def reaching_definitions(reaching, instr):
  """The reaching definitions flow function for a single instruction.

  :param reaching: The state before the instruction.
  :param instr: The instruction being interpreted.
  :return: The state after the instruction.
  """

  def is_definition_of(defined_var, instr):
    match instr:
      case Const(var, num):
        return var == defined_var
      case Assign(var, fromvar):
        return var == defined_var
      case Op(var, left, op, right):
        return var == defined_var
      case _:
        return False

  # logging.debug("reaching definitions")
  # logging.debug(f"{instr}")
  # logging.debug(f"{reaching}")
  match instr:
    case Const(var, num):
      # kill
      reaching = { x for x in reaching if not is_definition_of(var, x) }
      # gen
      reaching = reaching | {instr}
    case Assign(var, fromvar):
      # kill
      reaching = { x for x in reaching if not is_definition_of(var, x) }
      # gen
      reaching = reaching | {instr}
    case Op(var, left, op, right):
      # kill
      reaching = { x for x in reaching if not is_definition_of(var, x) }
      # gen
      reaching = reaching | {instr}
    case _:
      # no definition, nothing to update
      pass

  return reaching

The optimizeblock function

def optimizeblock(analysis, optimizeinstr):
  """Takes the analysis result (a list of (state, instr) pairs), the final_state from the analysis, and an optimizeinstr function for each instruction.

The optimizeinstr function takes the state before the instruction and the instruction and return an optimized instruction (or the same instruction if no optimization was made).

def optimizeinstr(state: set, instr: Instruction) -> Instruction:
  ...
  return updated_instruction
"""
  instrs = []
  for state, instr in analysis:
    instrs.append(optimizeinstr(state, instr))
  return instrs

optimizeblock takes the result of an analysis and runs the given rewriting function optimizeinstr.

Example optimization: copy propagation

def copy_propagation(available, instr):
  """The copy propagation optimizer for a single instruction.

  :param available: The state before the instruction.
  :param instr: The instruction being optimized.
  :return: The instruction after optimization.
  """
  # logging.debug(f"copy_propagation {instr} {available}")
  match instr:
    case Assign(var, fromvar):
      for candidate in available:
        if (isinstance(candidate, Assign) and fromvar == candidate.var):
          logging.debug(f"optimize {instr} by replacing {fromvar} with {candidate.fromvar} (copy propagation)")
          instr = Assign(var, candidate.fromvar)
          break
    case Op(var, left, op, right):
      for candidate in available:
        if (isinstance(candidate, Assign) and left == candidate.var):
          logging.debug(f"optimize {instr} by replacing {left} with {candidate.fromvar} (copy propagation)")
          left = candidate.fromvar
          break
      for candidate in available:
        if (isinstance(candidate, Assign) and right == candidate.var):
          logging.debug(f"optimize {instr} by replacing {right} with {candidate.fromvar} (copy propagation)")
          right = candidate.fromvar
          break
      instr = Op(var, left, op, right)
    case Test(var, opr):
      for candidate in available:
        if (isinstance(candidate, Assign) and var == candidate.var):
          logging.debug(f"optimize {instr} by replacing {var} with {candidate.fromvar}")
          instr = Test(candidate.fromvar, opr)
          break
      pass
    case _:
      pass
  return instr

Implementing constant propagation

The reaching definitions analysis is given to you above. You'll be implementing the constant_propagation rewriter, which takes in the data-flow state (reaching) just before the instruction and the instruction itself. It returns an instruction. If there is no rewriting possible, then return the original instruction that was passed in. Here is when rewriting can occur:

  • You see an Assign statement, and the right-hand side (fromvar) is a constant, i.e., it is in the reaching on the left-hand-side of a Const. Rewrite the Assign to a Const.
  • You see an Op statement, and both variables on the right-hand side (left and right) are constants. (Our While3Addr language doesn't support operations mixing variables and constants, so both must be constant.) Fold the constants and rewrite the Op as a Const.
  • (Optional) There are also opportunities for strength reduction optimizations, i.e., rewriting x := y * 0 to x := 0.

See the copy_propagation rewriter given above for examples of accessing information about the instruction and creating new instructions.

Implementing available expressions

Copy propagation depends on the available expressions analysis, which you'll be implementing. The copy propagation optimzer has been given to you above. The rewriter replaces any variable used on the instruction's right-hand side with the right-hand side of a previous definition of that variable, e.g., x:=a; y:=x becomes x:=a; y:=a. The available expressions analysis tracks definitions and removes them when variables on their right-hand-side get reassigned, thereby invalidating the use of that expression. Accordingly, the available expressions state transfer function will kills any definitions of or uses of a variable if and instruction redefines it, e.g., in x:=a; a:=1; y:=x, the a:=1 definition removes x:=a from the state. Formally, if our instruction is a definition x:=_, where _ means the rest of the instruction is irrelevant, the transfer function will update the set of available expressions set \(A\) to \(A'\) as follows:

\begin{equation} \begin{split} \text{KILL} \gets \{ & \text{x := _}, \\ & \text{_ := x}, \\ & \text{_ := x op _}, \\ &\text{_ := _ op x} \} \end{split} \end{equation} \begin{equation} \text{GEN} \gets \{ \text{x := _} \} \end{equation} \begin{equation} A' \gets A - KILL \cup GEN \end{equation}

Implementing live variable analysis

def liveness(live, instr):
  """The liveness flow function for a single instruction.

  :param live: The state before the instruction.
  :param instr: The instruction being interpreted.
  :return: The state after the instruction.

  # TODO: implement live variable analysis

  return live

The live variable analysis is a backwards analysis; it walks the instructions from last to first in the basic block. This is handled for you in the provided driver function by reversing the instructions before analysis and optimization and re-reversing them aftwards. Additionally, the initial state containing outparam is also provided for you.

Live variable analysis works by removing any redefined variables from the set and adding any used variables. Variables can be defined with Const, Assign, or Op and used in Assign, Op, or Test.

Implementing dead-code elimination

def dead_code_elimination(live, instr):
  """The dead code elimination optimizer for a single instruction.

  :param available: The state before the instruction.
  :param instr: The instruction being optimized.
  :return: The instruction after optimization, which can be Nop() for removed instructions.
  """

  # TODO: implement dead-code elimination, replacing dead code with a Nop() instruction

  return instr

The dead-code eliminator uses the results of the live variable analysis. The instructions are reversed before both analysis and optimizations, then re-reversed aftwards, so the instructions are provided in reverse to the dead_code_elimination function. This ensures that the live set provided with the instr is the right one (the set of live variables after executing the statement).

Dead code elimination happens when a variable assignment, Const, Assign, or Op, is to a variable that isn't in the live set.

Author: Paul Gazzillo

Created: 2024-03-01 Fri 07:54

Validate