SYSTEMS SOFTWARE Helping people run programs Human -----> [ Translator ] Readable / | Input / | (Textual) / | / v v Object Library Code (Binary) (Binary) \ | \ | \ v \->[ Linker/ Loader ] | | v Process | | v [ OS + Computer ] | | v Computation RECALL DEFINITIONS def: An *assembler* translates a low-level language that is close to machine code into machine code. def: A *compiler* translates a (high-level) language into a form that can be (more easily) executed by a computer. def: An *interpreter* executes a programming language directly, often without translating it into object code first COMPILER PICTURE Source -----> [ Lexical Analyzer ] | v Token stream | v [ Parser ] | v Parse Tree (AST) + Symbol Table | v [ Static Analysis ] | v Parse Tree (AST) + Symbol Table | v [ Code Generator ] | v Object Code | v [ Linker/ Loader ] | v Process | v [ OS + Computer ] INTERPRETER PICTURE Source -----> [ Lexical Analyzer ] | v Token stream | v [ Parser ] | v Parse Tree (AST) + Symbol Table | v [ Interpreter ] | v [ OS + Computer ] ADVANTAGES AND DISADVANTAGES Compiler Advantages: Disadvantages: Interpreters Advantages: Disadvantages: HYBRID COMPILER/INTERPRETER PICTURE Source -----> [ Lexical Analyzer ] | v Token stream | v [ Parser ] | v Parse Tree (AST) + Symbol Table | v [ Static Analysis ] | v Parse Tree (AST) + Symbol Table | v [ Code Generator ] | v VM Code | v [ VM + JIT Compiler ] | v [ OS + Computer ] HYBRID ADVANTAGES AND DISADVANTAGES Advantages: Disadvantages: STANDARD COMPILER ARCHITECTURE Source code (text) | v [ Lexer ] | v Stream of tokens | v [ Parser ] | v AST + Symbol Table | / | v v | [ Static Analyzer ] / | / v / Intermediate Rep. / | / v v [ Code Generator ] | v Instruction Rep. | v [ Optimizer ] | v Machine Code TOKENS Represent distinct symbols in the input including punctuation and operators Typically, comments are ignored Reserved words: Keywords: White space delimits identifiers/ aNumber vs. a Number TOKEN TYPES Each type of token has a code: typedef enum { /* ... */ identsym, numbersym, plussym, minussym, multsym, slashsym, oddsym, eqsym, /* ... */ lesssym, leqsym, semicolonsym, becomessym, /* ... */ lparensym, rparensym, /* ... */ ifsym, /* ... */ elsesym } token_type; Examples: Input Token Type ident identsym 34 numbersym + plussym < lesssym <= leqsym := becomessym else elsesym if 34 < 34 then x := 12 TOKEN DATA STRUCTURE If the lexer returns tokens as objects, use something like: typedef struct { token_type ttype; const char *text; const char *filename; unsigned int line; unsigned int start_col; unsigned int end_col; } token; or can use global variables like: token_type ttype; char *text; unsigned int line; unsigned int column; SYMBOLS What information should remembered for identifiers? SYMBOL TABLE A *symbol table* Does each scope have its own symbol table? What operations would a symbol table need? What data structure would be good?