Control-flow analysis (CFG)
Compiler Project
Table of Contents
- Our compiler's graph structure
- Algorithm overview
- Step 1: Collect the leaders (required)
- Step 2: Collect the list of instructions in each basic block (optional)
- Step 3: Collect the edges between the blocks (optional)
- Step 4: Construct the CFG from the nodes and edges (optional)
- Exploring the graph (optional)
In this phase, we will be implementing control-flow analysis on our IR. ControlFlow.py:controlflow()
will take a list of While3Addr instructions and produce a control-flow graph, which uses the python networkx library to represent the graph.
The control-flow analysis can be run on the output of while2ir
, e.g., fact.ir:
cat tests/fact.ir | controlflow | tee tests/fact.cfg | graphview fact.png
This writes the control-flow graph fact.cfg and also pipes the CFG to graphview
which will print and visualize the graph in the image fact.png.
Below are instructions on one way to generate the control-flow graph, by completing the provided template code. Feel free to design your own algorithm or attempt to implement the algorithm below without the provided template. As long as the output has the expected CFG structure, it should work with the provided GraphView
and CFG2IR
tools.
Our compiler's graph structure
- Each block is numbered sequentially starting from 0
- The entry block is always block 0
- The exit block is always the last block (
len(nodes) - 1
) - Edges are between block numbers, e.g.,
(3, 5)
- Instructions are stored on blocks in the
instrs
attribute as a list of While3Addr instructions, i.e.,cfg[node]["instrs"]
ifgoto
edges are labeled with thecondition
attribute as a python Boolean (True
orFalse
), i.e.,cfg[node][succ]["condition"]
- Goto instructions are removed from the ends of basic block, since their meaning is captured by the out-edge.
- IfGoto instructions are replaced with a
Test
instruction, which is an IfGoto without the target. Instead the two out-edges and thecondition
reflect the same behavior.
Algorithm overview
- Collect the leaders of the basic blocks (This is the minimum you need to implement for the project)
- Add leader for the starting line 0
- Add an exit block leader after the last line
len(prog)
- Add all target lines of Goto and IfGoto
- Add all lines after Goto and IfGoto
- Collect the list of instructions in each basic block
- Collect the edges between the blocks
- Find the Goto, IfGoto, or next line.
- Remove Goto and replace IfGotos with =Test=s
- Construct the CFG from the nodes and edges
For your project, the minimum needed is to implement #1 inside the provided template for ControlFlow.py.
If you would like more of a challenge, try writing controlflow from scratch, or only take parts of the provided template code or use ir only as a reference.
Step 1: Collect the leaders (required)
The controlflow
function takes prog
as input, which is a python list of While3Addr instructions.
The set leaders = set()
is initialized in the template code with
- the line for the entry block
leaders.add(0)
(since the IR starts at one, 0 can be used as the leader for the entry block) and - The line for the exit block, i.e.,
exitblock = len(prog)
then add the leaderleaders.add(exitblock)
Loop over each line number in the input While3Addr program (prog
) and check for Gotos and IfGotos. The While3Addr data types make matching easy with the match=/=case
construct, e.g., matching the Goto instruction would look like this:
match instr: case Goto(pc): print("i see a goto with target", pc)
match
takes the instruction object and case pattern matches the instruction datatype, taking the name and a tuple containing the members of the datatype, e.g., Goto(pc)
, Op(var, left, op, right)
, etc. (See While3Addr.py
for all instruction datatype specifications.)
Add the leaders for each basic block to the leaders
set.
Step 2: Collect the list of instructions in each basic block (optional)
An easy way to do this is to sort the set of leaders in a list, then get the ranges of instructions that each basic block will contain using this sorted list of leaders.
Finally, loop over each range and construct the list of instructions for each basic block from the original list of instruction prog
.
Be sure to add an empty list the exit block.
To facilitate creating the edges, you can also at this step create a mapping from leader number to basic block number. Since the entry block is always 0, start the basic block counter from 1 when numbering basic blocks for instructions in the IR.
Step 3: Collect the edges between the blocks (optional)
There will be three kinds of edges constructed, one for each of the types of leaders, e.g., (1) edges from Gotos, (2) edges from IfGotos, and (3) edges to the leaders after Goto/IfGoto.
Edges are just pairs of block numbers, e.g., (3, 5)
. This step will collect a list of these pairs.
For each range of instructions, check the last instruction to see what kind of edge to construct. You can use match=/=case
make this checking easier.
Goto edges: create the pair for the current block number (determined by the current block's leader) and the target block number (determined from the Goto's target leader).
Remove the Goto per our compiler's graph structure.
IfGoto edges: Create two edges, one for True and one for False, e.g.,
(bbnums[leader], bbnums[targetpc], {'condition': True})
Remove the IfGoto and add a
Test
with the same variable and operator at the IfGoto.- Edges to the leaders after Goto/IfGoto: just add a new edge between the two block numbers.
Step 4: Construct the CFG from the nodes and edges (optional)
Create the nodes, which contain tuples of the block number and 'instrs'
attribute, e.g.,
(bbnums[leader], {'instrs': bbinstrs[leader]})
Create the entry node with an empty list of instructions and the edge between the entry node and the first basic block of the program.
Then use networkx to construct the graph structure:
cfg = nx.DiGraph() cfg.add_nodes_from(nodes) cfg.add_edges_from(edges)
Exploring the graph (optional)
If you'd like to see in more detail how to inspect and use the networkx API to work with the CFG, see GraphView.py
and CFG2IR.py
for examples.
cfg.nodes
gets a list of basic blocks (nodes)cfg[node]
gets one of the basic blockscfg[node]["instrs"]
gets the list of instructions in the basic blockcfg[node][successor]["condition"]
gets the condition (Boolean True or False) from the edge, if its anifgoto
edge.