Example Code
This page provides some samples of C code shown in class.
Connection Between Unix and C
- The Unix echo command, which shows how Unix C code is connected to the operating system.
Const Language (demo of flex and bison)
The "const language" is a very simple demonstration of the use of flex and bison to create lexical analyzers and parsers.
As can be seen in the const.y grammar file, there a program consists of a single constant definition
(like name = 32
).
The Makefile has rules that can be used to build lexer and run it on various tests, which are the .const
files in the directory.
The flex input file for the lexical analyzer is given in the file const_lexer.l.
Some other files used to build the lexer are also present, such as the ast module (files ast.h and ast.c), the lexer module (files lexer.h and lexer.c) and the utilities module (files utilities.c and utilities.c), and some header files (machine_types.h and parser_types.h).
Const Language with Multiple Definitions
This is a version of the const language from the previous section, that was developed in class on October 20, 2023 as a demonstration of how to work with lists in a grammar and with the ASTs.
The "const multiple language" is a simple demonstration of the use of flex and bison to create lexical analyzers and parsers.
As can be seen in the const.y grammar file, there a program consists of zero or more constant definitions
(which each look like name = 32
).
The Makefile has rules that can be used to build lexer and run it on various tests, which are the .const
files in the directory.
The flex input file for the lexical analyzer is given in the file const_lexer.l.
Some other files used to build the lexer are also present, such as the ast module (files ast.h and ast.c), the lexer module (files lexer.h and lexer.c) and the utilities module (files utilities.c and utilities.c), and some header files (machine_types.h and parser_types.h).
Flex Error Explanations
General Advice
When flex gives an error message, it includes a line number, but often that line number is one past where the actual error is. So, at least consider the line above where flex is saying the error is.
Unrecognized Rule Errors
One of the most baffling errors that flex produces is "Unrecognized rule". We have found that this is often caused by assuming that C-style comments on a definition line are ignored by flex. So, to fix such errors, try making sure that when defining a regular expression (RE), there is nothing else on the line following the definition, even what looks like a comment.
As an exmple, consider the flex input file simple.l, which has the following form:
PERCENTSIGN % /* a percent sign */ %% {PERCENTSIGN} { ; } %%
Note that the formatting on this page may make it look like the text on each line is not starting in column 1, but it is. If we run flex on this file, then we get the following output (with the $ at the beginnings of the lines being the Unix shell's prompt, and what follows the $ being the command that is input to the shell):
$ flex --version flex 2.6.4 $ flex simple.l simple.l:6: unrecognized rule
My mental model of what is going on is that:
- All of what is on the right-hand side of a definition line is substituted by flex, when a named RE is used, including what looks like a comment, and all that is put in a pair of parentheses, however,
- Flex does not recognize the result.
That is, the result is the same as running flex on the file simple2.l file, which looks like what flex sees as input after substituting the definition of PERCENTSIGN.
%% (% /* a percent sign */) { ; } %%
Flex does give the same "unrecognized rule" error for this inpout.
However, if we move the comment from the line with the definition of the RE to another line, then all is fine. This is what happens with the file simple3.l. That is because what flex sees after substituting the right hand side of the definition of PERCENTSIGN is as in the file simple4.l.
Using Rule Names Requires Curly Brackets in Flex
A strange thing about flex files is that when you define a named regular expression (RE), you need to enclose it in curly braces to use that name (in another definition or in a rule). For example, consider the following flex input fie rel_ops.l.
/* from Phillip Lintereur, slightly modified */ REL_OPS ("=="|"!="|"<"|"<="|">"|">=") %% REL_OPS { return 258;} %%
Unfortunately, this will not recognize ==
as a token, because it will only recognize REL_OPS
as input. That is the lexer is looking for the characters R
followed by E
followed by L
, and so on.
The correct way to use such a defined RE is to enclose its name in curly brackets, which tells flex that the name is referring to its definition, and the name itself is not being used as a RE. That is, a corrected version of this example would be as in rel_ops_corrected.l.
Simplified RISC Machine Assembler and Disassembler
The code for the assembler and disassembler for the Simplified RISC Machine (SRM) is accessible from this section and in a subdirectory named srm-asm.
The Makefile for the assembler and disassembler has targets (asm and disasm) for building these programs.
Assembler
The asm_main.c file holds the main program for the assembler.
The lexical analyzer for the assembler is defined in the file asm_lexer.l, which is processed by the Flex lexical analyzer generator to produce the file asm_lexer.c. Some parts of lexical analysis are defined in the lexer module (files lexer.c and lexer.h).
The asm.y describes the assembler's grammar and is used with the Bison parser generator to generate the files asm.tab.c asm.tab.h that do the actual parsing. The parser_types.h file connects the parser to the definitions of the abstract syntax trees (ASTs).
The abstract syntax trees produced by the parser are defined by the ast module (files ast.c and ast.h). The parser_types.h file connects the parser to the definitions of the ASTs.
The asm's unparser, which prints ASTs, is are defined by the asm_unparser module (files asm_unparser.c and asm_unparser.h).
The file locations used by the parser and lexer are defined in the file_location module (files file_location.c and file_location.h).
The pass1 module builds the symbol table for the assembler (files pass1.c and pass1.h).
The symtab module builds the symbol table for the assembler (files symtab.c and symtab.h). The id_attrs.h file defines the attributes used in the symbol table.
The main work of the assembler happens in the assemble module (files assemble.c and assemble.h).
The instruction module (files instruction.c and instruction.h) defines the types for machine instructions and functions that work with them. Some of these functions, such as instruction_assembly_form, that returns a string with the human-readable input of a binary format instruction.
The human-readable names of the SRM's registers are defined in the regname module (files regname.c and regname.h).
The output of the assembler is a binary object file, which has a format defined by in the bof module (files bof.c and bof.h).
The machine_types module (files machine_types.h and machine_types.c) has some basic type definitions used in other modules, and some functions related to the machine definition.
The utilities module (files utilities.c and utilities.h) has some functions used by other modules, such as the bail_with_error function.
Disassembler
The disassembler (disasm) performs the opposite task of the assembler. It uses many of the same files used by the assembler, above.
The main program for the disassembler is in the file disasm_main.c.
The main work of the disassembler happens in the disasm module (files disasm.c and disasm.h).
Important modules for the disassembler are the instruction module (files instruction.c and instruction.h) and the bof module (files bof.c and bof.h).
Tiny Virtual Machine (from Spring 2023)
The tiny machine is a Von Neumann VM that is very simple. The ISA for this machine is also available. The implementation uses the following files:
- machine_main.c, which is the main program.
- machine.h and machine.c, which implement the main functionality of the VM.
- instruction.h and instruction.c, which are an abstract data type for the VM's instructions.
- utilities.h and utilities.c which has some utility functions (including printing error messages.
- The project's Makefile, which describes how to build and test it. Testing uses the .txt files in the tiny-machine directory and expected outputs are in the .out files
- The project's sources.txt file, which tells the Makefile what to compile.
Stack from Fall 2024
C Stack Module
The stack module shown here is a non-empty word-addressed stack that grows downwards. It hides the state of the single stack state by making that state static.
- The machine_types.h file, which defines a few types used in the stack module's interface.
- The stack.h and stack.c files that define the stack's interface and implementation.
- The stackTest.c file has some tests of the stack module. This can be run as ./stackTest after compilation.
- The utilities.h and utilities.c file that defines some utility functions.
- The Makefile can be used to compile the code and create the stackTest program.
Stack Demo in SSM Assembler
The SSM Assembly code shown here is a non-empty word-addressed stack that grows downwards in the VM's memory.
- The stack_demo.asm file, has the code for the demonstration written in SSM Assembly Language. (The assembled BOF file, named stack_demo.bof is also available.)
- The stack_demo.myp is the listing of the instructions and global data from the BOF file of the program in stack_demo.asm. It was produced by running the VM with the -p flag on the BOF file.
- The stack_demo.myo is the trace of executing the BOF file of the program. It was produced by running the VM on the BOF file.
- The Makefile can be used to create the BOF file and the .myp and .myo files, but assumes that you have the assembler (asm) and VM implemented.
Old Stack from Fall 2023
The stack module shown here is a byte-addressed stack that grows downwards. It is an abstract data type in C, which hides some state by making it static.
- The machine_types.h file, which defines types used in the stack
- The stack.h and stack.c files that define the stack's interface and implementation. The file stack-as-word-array.c has the first implementation of stacks, using an array of words. The file stack.c uses a union of an array of words and an array of bytes to avoid byte manipulation. Both pass all the tests in stack.c.
- The utilities.h and utilities.c file that defines some utility functions.
- The stackTest.c, which has some tests. This can be run as ./stackTest after compilation.
- The Makefile can be used to compile the program.
Old Stack from Spring 2023
The stack module shown here has support for activation records (ARs) and static scoping; it is an abstract data type in C, which hides some state by making it static.
- The machine_types.h file, which defines types used in the stack
- The stack.h and stack.c files that define the stack's interface and implementation.
- The utilities.h and utilities.c file that defines some utility functions.
Trees: a motivating example for recursion
The trees example shows how to compute the depth of a binary tree both recursively and non-recursively, so that one can see how much simpler the recursive solution is. In fact, the non-recursive solution is more difficult, and seems to be easiest to express using two queues.
- The Tdef.h file, which defines a type T for the values of the tree nodes (this is not important).
- The btree.h and btree.c files that define binary trees and their dynamic creation.
- The depth_recursive.cfile that is a recursive algorithm for computing the depth of a binary tree.
- The depth_while.c that is a non-recursive algorithm for computing the depth of a binary tree.
- The depth_test.c that tests implementations of the depth function.
- The Makefile that can be used to build and test the project.
- The utilities.h and utilities.c file that defines some utility functions.
Strings in C
The testre.c function shows that the standard C library's strcmp() function does not do regular expression matching.
Modular C
The decldef.h file shows several declarations that are not definitions in C; there are corresponding definitions in the file decldef.c file.
The non-modular program sayHelloProgram.c is broken into 3 modules in:
- A main module, found in sayHelloMain.c,
- A sayHello module, found in sayHello.h and sayHello.c, and
- A who module, found in who.h and who.c
There is a simple Makefile available that can separately compile these modules and link together a complete program.
The ADT for string sets is a module composed of:
- A header file, string_set.h, and
- An implementation file, string_set.c.
Building ASTs Manually
The directory building_ASTs_manually shows two ways to build abstract syntax trees (ASTs) for PL/0, using the ast module from homework 3.
See the file demo.c for how to build ASTs for some simple statements. There is also a header in demo.h and a main program in demo_main.c, which uses the input file write_x.pl0 in one technique.
The directory also contains several other files needed for the demonstration, including a Makefile.
FLOAT Calculator Compiler
The directory FLOAT calculator is an example compiler for a very simple calculator language called FLOAT. It shows how to do lexical analysis, parsing, construction of ASTs, declaration checking, and code generation. A Makefile that builds the compiler and can run tests is included.
The main function (in the file compiler_main.c):
- checks the arguments and options
- If the -l option was used, then it initializes the lexer, which is specified in the flex input file float_lexer.l. (See also the lexer module, files lexer.h, lexer.c.) Then the lexer prints the tokens from the argument file on stdout and exits. This can be helpful in debugging the lexical analysis phase.
- Otherwise it calls the parser, which is specified in the bison input file float.y. There is also a parser module (files parser.h, parser.c). (Note that the parseProgram function that is called also initializes the lexer.)
- the parser returns an AST for the program, which is defined in the ast module (files ast.h and ast.c),
- if the -u option was used, then the compiler prints a textual version of the AST on standard output by calling the unparser from the unparser module (files unparser.h and unparser.c),
- then the compiler initializes the symbol table (in the module symtab, which consists of the files symtab.h and symtab.c, and the module scope, which consists of the files scope.h and scope.c), and then the scope_check module (files scope_check.h and scope_check.c) performs declaration checking; that is the program checks for duplicate declarations of identifiers and undeclared identifiers.
- The modules id_attrs (files id_attrs.h and id_attrs.c) and the module id_use (files id_use.h and id_use.h) are used in declaration checking, and the AST is "decorated" with (pointers to) id_use structures to record information about the lexical address and attibutes of each identifier used.
- If the -u option was used, then the compiler exits. Otherwise the proceeds to type checking and then code generation. Type checking is done in the type_check module (files type_check.h and type_check.c). Code generation is done by first initializing the gen_code module (files gen_code.h and gen_code.c), which uses the literal_table module (files literal_table.h and literal_table.c), opening a BOF file (see module bof in files bof.h and bof.c) and then writing the binary object code into it during a walk of the ASTs.
Error messages are handled in the utilities module (see utilities.h and utilities.c). and the lexer_utilities module (see files lexer_utilities.h and lexer_utilities.c).
The type AST of abstract syntax trees for the FLOAT language is declared in the ast module (files ast.h and ast.c). This uses the file location type from the file_location module (files file_location.h and file_location.c).
Floating-point Simplified RISC Machine
The virtual machine (VM) for FLOAT is simplified from the MIPS processor. It is found in the vm subdirectory of the float calculator directory. The Floating Point Simplified RISC Machine Manual describes this VM in detail.
The implementation can be built using its Makefile, and uses the following files:
- machine_main.c, which is the main program.
- The machine module (files machine.h and machine.c), which implements the main functionality of the VM.
- The machine_types module (files machine_types.h and machine_types.c), which declares the types address and word and implements some utility functions related to those tyeps.
- The instruction module (files instruction.h and instruction.c), which are an abstract data type for the VM's instructions.
- The utilities module (files utilities.h and utilities.c), which has some utility functions (including printing error messages).
- The bof module (files bof.h and bof.c), which desribe the format of binary object files.
- The regname module (files regname.h and regname.c), which provides macros for register names and a function for returning a symbolic register name based on a register number.
- The file_location module (files file_location.h and file_location.c), which provides a way to refer to locations in source files, for example, in error messages.
Last modified .