UP | HOME

Project 2: Type-Checking SimpleC Programs

Table of Contents

Overview

In this, we will perform type checking on the abstract syntax tree (AST) produced by the parser. Part of the type checker is provided to you. typecheck.c is the file to implement the rest of the type checker in. The type checker will maintain a symbol table for each static scope and annotate the tree with type information for each expression while traversing the tree. When finished the program will print the AST with these additional type annotations. If the input program violates SimpleC's type rules, it will exit with an error code.

Getting started

  1. Accept the GitHub classroom assignment in webcourses to create your git repository for the project.
    • Submission will be via GitHub by committed and pushing all of your changes to your repository to the course GitHub assignment, which graders will use for grading.
  2. In your virtual machine, clone your repository as described in lecture, replacing USERNAME with your GitHub username.

    git clone https://github.com/cop3402/simplec-compiler-2-USERNAME
    
  3. Run make in your source folder (in your virtual machine) to build the project, which should create the simplec program.
    • If the build fails, double-check that you are in the repository directory and on a correctly-configured virtual machine.
  4. Run cat tests/example.simplec | ./simplec
    • Running ./simplec without providing input will cause the program to stop and wait for input to stdin. Use Ctrl-D to end the input (or Ctrl-C to terminate the program).
    • If ./simplec is not found, be sure to prefix the program with ./, and be sure the make step above succeeded.

Submitting your project

Submit your project by commiting your changes and pushing them to your GitHub repository.

Debugging your compiler

  1. Narrow down the problem by crafting a smaller version of the test case
  2. Go to the part of the your code that is related to the problem
  3. Trace step-by-step what your code really does (not what you think/hope/feel/guess/divine/intuit/reckon it does)

Error codes

  • exit(1) means there is a failed invariant check, e.g., an unexpected NULL value, which is already handled for you in the template.
  • exit(3) means typechecker error, which is produced by calling type_error(char *msg), given in the template for typecheck.c.

Type specification

Static scoping

  • SimpleC is a statically scoped language, that is, user-defined symbols are only accessible within a syntactically-defined region of the source file.
    • Symbols declared at the top of the program are in the the global scope and accessible by the entire program.
    • Function definitions may only appear in the global scope
    • Each function definition's body creates a nested, static scope, identified syntactically by curly braces. Its parameters and any declarations are only accessible within the function body's scope.
    • Compound statements also create their owned nested, static scope between their curly braces.
  • Declarations bind symbol names to a type in whatever scope they appear. Declarations of function types are only permitted in the global scope. Any nested scopes must only declare non-function types.
  • Function definitions (which are different from declarations) bind the function name to the function type in the global scope before creating a nested scope in which its parameters and locally-declared symbols are bound.
  • Within the same scope, only one binding per unique symbol name is permitted.

Declarations

  • Bind the declared symbol to its type
  • It's a type error if the symbol has already been bound in the current scope

Function definitions

  • Check that the function is declared with a function type
  • Check for duplicate definitions in the current scope
  • Add the function to the current (global) symbol table
  • Create a new scope
  • Add parameters to local scope
  • Check for an incorrect number of parameters (paramlist) compared to the type's list of parameters (paramtypes)

    // type error because paramlist doesn't match paramtypes
    f(a, b) : function (int) -> char { ... } 
    
  • Add the local variable declarations to the local symtab
  • Check the function body for type errors
  • Check the return expression for type errors
  • Check that the return expression's type match the function type
  • Restore the parent scope

Statements

  • Assignment statement
    • Get the type of the left-hand side and of the right-hand side
    • Check that the left-hand side is an L-value, i.e., its type represents a memory address that can take a value
      • In SimpleC, three kinds of expression can be used as L-values
        1. An identexpr
        2. A unaryexpr only where the operator is a pointer dereference
        3. An arrayexpr
    • Check that the type of the left- and right-hand side expressions match.
  • If, if-then-else, and while statements
    • The conditional expression should be an int
    • Recursively type check the if branch and the else branch, if present or the while body
  • Compound statements
    • These create a new, nested static scope. So create a new scope before type checking the body. Don't forget to restore the parent scope after.

Expression

  • identexpr
    • Lookup the symbol's binding and store the type in the expr->type field on the expression.
    • Return a type error if the symbol has not been declared in any of the nested scopes
  • callexpr
    • Get the type from the symtab or call to undeclared function type error
    • Check that the symbol is a function type
    • Check that the number of types match or type error
    • Compare the types of each argument's expression (in the list) match each corresponding parameter type from the function type or type error
    • Set this expression to the function's return type
  • int, char, and str expressions
    • Set the expression to the corresponding type INTTYPE, CHARTYPE, or STRINGTYPE.
  • arrayexpr
    • Check the type of both the expression and the index.
    • Check that the expression being dereferenced is either a pointer or an array type
    • A dereference unboxes the pointer type
    • An array access unboxes the array type
    • It's a type error if the expression is neither an array or pointer
  • unaryexpr
    • Typecheck the operand.
    • Check the type of the operator:
      • & : (type) -> pointer<type>
        • gets the pointer to a variable of a given type
      • * : (pointer<type>) -> type
        • dereferences the pointer to a variable of a given type
      • - : (type) -> type
        • arithmetic negation
      • ! : (type) -> int
        • logical negation
    • Don't forget to the set the unaryexpr's type at the end
  • binaryexpr
    • Typecheck the operands.
    • Check the type of the operator.
      • arithmetic: allows any operands of the same type and returns the same type
        • * : (type, type) -> type
        • / : (type, type) -> type
        • % : (type, type) -> type
        • + : (type, type) -> type
        • - : (type, type) -> type
      • comparison operators: takes either char or int, but not both, and returns an int to represent a boolean value
        • < : (type, type) -> int
        • > : (type, type) -> int
        • <= : (type, type) -> int
        • >= : (type, type) -> int
        • == : (type, type) -> int
        • != : (type, type) -> int
      • logical operators: takes int and returns an int to represent a boolean value
        • && : (int, int) -> int
        • || : (int, int) -> int
    • Don't forget to set the resulting type of the expression.
  • castexpr
    • Permit casting between any types (except functions, which expression can't be any way). Generate a type error if it's a function type.

API

  • storing expression types in the ->type field

    type information about expressions is communicated to the rest of the type checker by assigning the ->type field on expressions. see the template and in-class guides to see examples of this.

  • current_scope holds the current scope

    the scope holds the symbol table current_scope->table. the current scope needs to be updated upon entering a new static scope by creating a new scope data structure. once the scope is over, the current_scope needs to be restored to the parent scope.

  • static bool compare_types(T_type type1, T_type type2)

    this compares two types and returns true if they match.

  • static void type_error(char *msg);

    this should be called with a message when there is a type error. it will exit with the appropriate error code necessary for automatic grading.

  • static T_scope create_scope(T_scope parent);

    this creates a new scope that points to the given parent scope.

  • static void destroy_scope(T_scope scope);

    this frees a scope once it is no longer needed

  • void insert(T_table table, string key, void *binding);

    bind a symbol name (key) to a type (binding) in a given symbol table. access the current scope's symbol table with current_scope->table.

  • void *lookup(T_table table, string key);

    retrieve the binding for a symbol name (key) in a given symbol table. if the symbol is not bound, then this function returns NULL.

  • static T_type lookup_in_all_scopes(T_scope scope, string ident);

    search for a binding of a symbol name (ident) starting from the current scope and recursively checking each parent scope until either a symbol is found until reaching the global scope. if no binding is found, this function returns NULL.

Example implementations

static void check_decl(T_decl decl) {
  // check for duplicate definitions in local scope
  if (NULL != lookup(current_scope->table, decl->ident)) {
    type_error("duplicate declaration of same symbol");
  }
  // add the binding
  insert(current_scope->table, decl->ident, decl->type);
}

static void check_identexpr(T_expr expr) {
  fprintf(stderr, "check_identexpr\n");
  // lookup the symbol
  T_type binding = lookup_in_all_scopes(current_scope, expr->identexpr);
  // check that the symbol has been declared
  if (NULL == binding) {
    type_error("use of undeclared symbol in identexpr");
  }
  // check that the symbol is not a function type
  if (E_functiontype == binding->kind) {
    type_error("cannot use a function as a value");
  }
  // set the type to the binding from the symtab
  expr->type = binding;
}

static void check_intexpr(T_expr expr) {
  fprintf(stderr, "check_intexpr\n");
  // integer constants are int type by definition
  expr->type = INTTYPE;
}

Testing

As you program, be sure to create small (unit) test cases for any changes you make. There are existing tests with the typed AST output in the project repository that you will clone.

A complete example

int x;

main {
  int x;
  int y;
  y = 2;
  x = y;
  return 0;
}
+-prog
|  +-decllist
|  |  +-decl
|  |  |  +-primitivetype
|  |  |  |  +-typename_int
|  |  |  +-x
|  |  +-decllist(empty)
|  +-funclist(empty)
|  +-main
|  |  +-decllist
|  |  |  +-decl
|  |  |  |  +-primitivetype
|  |  |  |  |  +-typename_int
|  |  |  |  +-x
|  |  |  +-decllist
|  |  |  |  +-decl
|  |  |  |  |  +-primitivetype
|  |  |  |  |  |  +-typename_int
|  |  |  |  |  +-y
|  |  |  |  +-decllist(empty)
|  |  +-stmtlist
|  |  |  +-assignstmt
|  |  |  |  +-identexpr
|  |  |  |  |  +-y
|  |  |  |  |  +-primitivetype
|  |  |  |  |  |  +-typename_int
|  |  |  |  +-intexpr
|  |  |  |  |  +-2
|  |  |  |  |  +-primitivetype
|  |  |  |  |  |  +-typename_int
|  |  |  +-stmtlist
|  |  |  |  +-assignstmt
|  |  |  |  |  +-identexpr
|  |  |  |  |  |  +-x
|  |  |  |  |  |  +-primitivetype
|  |  |  |  |  |  |  +-typename_int
|  |  |  |  |  +-identexpr
|  |  |  |  |  |  +-y
|  |  |  |  |  |  +-primitivetype
|  |  |  |  |  |  |  +-typename_int
|  |  |  |  +-stmtlist(empty)
|  |  +-intexpr
|  |  |  +-0
|  |  |  +-primitivetype
|  |  |  |  +-typename_int

Tips

Need to keep open three files to reference simultaneously while writing the type checker

  1. The AST reference, in order to get the relevant language construct elements
  2. The Type AST reference, in order to pull out the relevant type information
  3. The type rules described in this document

    Keep this three references available to you while you implement the type checker

Every check_XXXexpr function should set expr->type

Types are represented using the AST structure for the syntax of types

If you need to create a type, e.g., for constants, use the AST create method corresponding to that type.

Author: Paul Gazzillo

Created: 2024-04-04 Thu 10:32

Validate