Global optimization (Global)
Compiler Project
Table of Contents
In this project, you will implement global (whole control-flow graph) optimizations, building on local analysis and optimization from the previous project. As with local optimization, there are two steps, analysis and rewriting. For global optimization, the analysis will use the fixed-point algorithm to iteratively update the data-flow state for each block until no more changes occur.
There are two methods you'll be implementing, dataflow
and globaloptimize
.
dataflow
First, dataflow
runs the data-flow work list algorithm on the entire CFG, returning the same CFG annotated wtih the data-flow states arouand blocks (instate
and outstate
) and within block (analysis
as in the prior project). The input to this method is the CFG and then the parameters of the analysis: the state transfer function from the prior project, the initial state, the join operator, and the direction of the analysis (forwards of backwards). For instance, the reaching definitions analysis is called like this:
cfg = dataflow(cfg, reaching_definitions, set(), union, Direction.Forwards)
where reaching_definitions
is the per-instruction state-transfer function from the prior project, set()
is the initial state for reaching definitions, union
is the join operator (i.e., some path), and Direction.Forwards
is the direction of the analysis (use Direction.Backwards
for backwards analysis).
dataflow
Skeleton code
Here is skeleton code to get started on the data-flow analysis implementation.
# enum for the analysis direction Direction = Enum('Direction', ['Forwards', 'Backwards']) # union and intersection function for join union = lambda a, b: a.union(b) intersection = lambda a, b: a.intersection(b) def reversecfg(cfg): """Reverse the graph edges and block instructions for backwards analysis.""" cfg = cfg.reverse() for node in cfg.nodes: cfg.nodes[node]['instrs'] = list(reversed(cfg.nodes[node]['instrs'])) return cfg def dataflow(cfg, transfer, initial_state, join, direction): """Perform a data-flow analysis on a cfg. :param cfg: The control-flow graph of the program :param transfer: The statement-level state transfer function. See analyzeblock for its signature. :param initial_state: The initial state of the entry block (forwards analysis) or exit block (backwards analysis). :param join: The join operator that combines input state with a predecessor's output state :param direction: The direction of the analysis, Direction.Forwards or Direction.Backwards, the latter of which reverses the graph and the instructions in each block and sets the exit node to be the entry block. :return: The same CFG with three new attributes on nodes, 'instate', 'outstate', and 'analysis'. Analysis is a list of (state, instr) pairs, which contains the data-flow state just before the instr. The instructions are the same as the 'instr' attribute, except that they are reversed for Direction.Backwards. """ assert isinstance(direction, Direction) # invert the cfg for a backwards analysis if direction == Direction.Backwards: cfg = reversecfg(cfg) # select the entry node: always 0 for forwards, and always the last for backwards (see ControlFlow.py for this assumption) entry_node = 0 if direction == Direction.Forwards else max(cfg.nodes) # initialize state annotations for node in cfg.nodes: cfg.nodes[node]['instate'] = set() cfg.nodes[node]['outstate'] = set() cfg.nodes[node]['analysis'] = None # set the intial state to the instate of the entry node cfg.nodes[entry_node]['instate'] = initial_state # PROJECT: implement the worklist algorithm # # uncomment for debugging output # for node in cfg.nodes: # logging.debug(f'BB{node}') # logging.debug(f'input state: {cfg.nodes[node]["instate"]}') # logging.debug(f'output state: {cfg.nodes[node]["outstate"]}') # logging.debug(f'analysis: {cfg.nodes[node]["analysis"]}') # logging.debug("") # restore the cfg after a backwards analysis if direction == Direction.Backwards: cfg = reversecfg(cfg) return cfg
Implementing the work list algorithm
See Program Analysis Section 4.4 for pseudo-code.
Here are tips for implementation:
Try using a deque for the worklist:
from collections import deque
Then use
worklist = deque(cfg.nodes)
to start with all CFG nodesnode = worklist.popleft()
to get the next nodeworklist.append(successor)
to add a new node (successor
) in this case, to the worklist
Use
analyzeblock
, provided in the previous project, to perform the analysis on the block, using thetransfer
function provided todataflow
.analyzeblock
takes the instate and set of instructions, i.e.,cfg.nodes[node]['instate']
andcfg.nodes[node]['instrs']
analyzeblock
returns the states for each instruction within the block and the final outstate, i.e.,cfg.nodes[node]['analysis']
andcfg.nodes[node]['outstate']
- Access the successors with
cfg.successors(node)
. Recall that even in a backwards analysis, you'll use successors, because the given skeleton fordataflow
reverses the CFG before and after analysis. To detect change, check the result of the join operation against the prior state of the basic block, e.g.,
if newinput != cfg.nodes[succ]['instate']:
If any blocks changed, they need to be reanalyzed (by adding them back into the worklist). Once there is no more change, the algorithm stops.
globaloptimizer
Second, globaloptimizer
replaces the localoptimizer
from the previous project on local optimization. Call it instead:
def optimizer(cfg): return globaloptimizer(cfg) # call the global optimizer
The globaloptimizer takes the entire control-flow graph, and runs all optimizations on it until no more change is detected. The implementation works by copying the CFG, in order to check for change after optimization, then runs each dataflow analysis and optimizer function on the CFG. The skeleton code below runs constant propagation, copy propagation, then dead code elimination. Each optimization runs the complete data analysis once (by calling the dataflow
method), then runs the optimizer on each node in the CFG, which dataflow
has annotated with the per-instruction data-flow state as in the previous project. For example, constant propagation first runs reaching definitions, then calls the the optimizer on the CFG with the constant_propagation
rewriter,
cfg = dataflow(cfg, reaching_definitions, set(), union, Direction.Forwards) # perform constant propagation optimization for node in cfg.nodes: cfg.nodes[node]['instrs'] = optimizeblock(cfg.nodes[node]['analysis'], constant_propagation)
The first line calls dataflow
on the cfg
, setting up the parameters for a reaching definition analysis:
reaching_definitions
is the per-instruction state transfer method.set()
is the initial state, i.e., no definitions reach the entry block by definitionlunion
is the join function for reaching definitions, i.e., "is there some path along which the definition reaches this point in the program?"Direction.Forwards
performs dataflow in the forwards direction.
The next lines iterate over each basic block in the CFG for node in cfg.nodes:
. Then it updates the set of the instructions inside the basic block by calling optimizeblock
, given in the previous project, which takes the analysis results and the per-instruction rewriter function constant_propagation
and returns a potentially updated set of instructions to reassign to the basic block's instrs
annotation.
Skeleton code for globaloptimizer
Here is skeleton code for the global optimizer:
def globaloptimizer(cfg): logging.debug(textualize_graph(cfg)) # perform optimizations until no more change is detected changed = True while changed: changed = False # make a copy of the cfg so we can check for change against the old graph original_cfg = cfg.copy() # EXAMPLE: performing reaching definitions analysis on the control flow graph cfg = dataflow(cfg, reaching_definitions, set(), union, Direction.Forwards) # perform constant propagation optimization for node in cfg.nodes: cfg.nodes[node]['instrs'] = optimizeblock(cfg.nodes[node]['analysis'], constant_propagation) # logging.debug(textualize_graph(cfg)) # OPTIONAL: perform available expressions data flow analysis for common subexpression elimination # for node in cfg.nodes: # cfg.nodes[node]['instrs'] = optimizeblock(cfg.nodes[node]['analysis'], common_subexpression_elimination) logging.debug(textualize_graph(cfg)) # PROJECT: perform available expressions analysis on the control flow graph # perform copy propagation optimization for node in cfg.nodes: cfg.nodes[node]['instrs'] = optimizeblock(cfg.nodes[node]['analysis'], copy_propagation) logging.debug(textualize_graph(cfg)) # PROJECT: perform live variables analysis on the control flow graph # perform dead code elimination for node in cfg.nodes: cfg.nodes[node]['instrs'] = list(reversed(optimizeblock(cfg.nodes[node]['analysis'], dead_code_elimination))) logging.debug(textualize_graph(cfg)) # check for change for node in cfg.nodes: if original_cfg.nodes[node]['instrs'] != cfg.nodes[node]['instrs']: changed = True logging.debug(textualize_graph(cfg)) return cfg
The constant propagation optimization
Here is the instruction rewriter for a constant propagation that uses the results of a reaching definitions analysis. Notice that it needs to iterate over all reaching definitions of a symbol to guarantee that they are all constant assignments to the same constant.
def constant_propagation(reaching, instr): """The constant propagation optimizer for a single instruction. :param available: The state before the instruction. :param instr: The instruction being optimized. :return: The instruction after optimization. """ def is_definition_of(defined_var, instr): match instr: case Const(var, num): return var == defined_var case Assign(var, fromvar): return var == defined_var case Op(var, left, op, right): return var == defined_var case _: return False def is_constant(reaching, var): """If the variable is the same constant in all reaching definitions, return True, otherwise False.""" return isinstance(reaching_constant(reaching, var), int) def reaching_constant(reaching, var): """If the variable is the same constant in all reaching definitions, return the constant value, otherwise return None.""" # get all definitions of var in the data-flow state definitions = { x for x in reaching if is_definition_of(var, x) } # get all constant definitions of var const_definitions = { x for x in definitions if isinstance(x, Const) } # get all constants that var has been defined to constants = { x.num for x in const_definitions } # var is constant only if all definitions are constants, and all constant definitions are to the same constant if len(definitions) == len(const_definitions) and len(constants) == 1: return constants.pop() else: return None match instr: case Assign(var, fromvar): if is_constant(reaching, fromvar): constant = reaching_constant(reaching, fromvar) logging.debug(f"optimize {instr} by replacing {fromvar} with constant {constant} (constant propagation)") return Const(var, constant) case Op(var, left, op, right): if is_constant(reaching, left) and is_constant(reaching, right): constant = aop_map[op](reaching_constant(reaching, left), reaching_constant(reaching, right)) logging.debug(f"optimize {instr} by replacing {left} {op} {right} with constant {constant} (constant propagation)") return Const(var, constant) # OPTIONAL: perform more strength reductions # if is_constant(reaching, right) and reaching_constant(reaching, left) and is_constant(reaching, right): # constant = aop_map[op](reaching_constant(reaching, left), reaching_constant(reaching, right)) # logging.debug(f"optimize {instr} by replacing {left} {op} {right} with constant {constant} (constant propagation)") # return Const(var, constant) case _: pass return instr