I. Overview ------------------------------------------ SYSTEMS SOFTWARE Helping people run programs Human -----> [ Translator ] Readable / | Input / | (Textual) / | / v v Object Library Code (Binary) (Binary) \ | \ | \ v \->[ Linker/ Loader ] | | v Process | | v [ OS + Computer ] | | v Computation ------------------------------------------ How would a debugger fit into this picture? What job or jobs does a translator have? A. Compiler vs. Interpreter ------------------------------------------ RECALL DEFINITIONS def: An *assembler* translates a low-level language that is close to machine code into machine code. def: A *compiler* translates a (high-level) language into a form that can be (more easily) executed by a computer. def: An *interpreter* executes a programming language directly, often without translating it into object code first ------------------------------------------ What are some examples of languages with interpreters? ------------------------------------------ COMPILER PICTURE Source -----> [ Lexical Analyzer ] | v Token stream | v [ Parser ] | v Parse Tree (AST) + Symbol Table | v [ Static Analysis ] | v Parse Tree (AST) + Symbol Table | v [ Code Generator ] | v Object Code | v [ Linker/ Loader ] | v Process | v [ OS + Computer ] ------------------------------------------ ------------------------------------------ INTERPRETER PICTURE Source -----> [ Lexical Analyzer ] | v Token stream | v [ Parser ] | v Parse Tree (AST) + Symbol Table | v [ Interpreter ] | v [ OS + Computer ] ------------------------------------------ 1. advantages and disadvantages ------------------------------------------ ADVANTAGES AND DISADVANTAGES Compiler Advantages: Disadvantages: Interpreters Advantages: Disadvantages: ------------------------------------------ 2. hybrids of compilers and interpreters ------------------------------------------ HYBRID COMPILER/INTERPRETER PICTURE Source -----> [ Lexical Analyzer ] | v Token stream | v [ Parser ] | v Parse Tree (AST) + Symbol Table | v [ Static Analysis ] | v Parse Tree (AST) + Symbol Table | v [ Code Generator ] | v VM Code | v [ VM + JIT Compiler ] | v [ OS + Computer ] ------------------------------------------ ------------------------------------------ HYBRID ADVANTAGES AND DISADVANTAGES Advantages: Disadvantages: ------------------------------------------ B. compiler structure ------------------------------------------ STANDARD COMPILER ARCHITECTURE Source code (text) | v [ Lexer ] | v Stream of tokens | v [ Parser ] | v AST + Symbol Table | / | v v | [ Static Analyzer ] / | / v / Intermediate Rep. / | / v v [ Code Generator ] | v Instruction Rep. | v [ Optimizer ] | v Machine Code ------------------------------------------ 1. tokens a. definitions ------------------------------------------ TOKENS Represent distinct symbols in the input including punctuation and operators Typically, comments are ignored Reserved words: Keywords: White space delimits identifiers/ aNumber vs. a Number ------------------------------------------ b. data structures i. token types ------------------------------------------ TOKEN TYPES Each type of token has a code: typedef enum { /* ... */ identsym, numbersym, plussym, minussym, multsym, slashsym, oddsym, eqsym, /* ... */ lesssym, leqsym, semicolonsym, becomessym, /* ... */ lparensym, rparensym, /* ... */ ifsym, /* ... */ elsesym } token_type; Examples: Input Token Type ident identsym 34 numbersym + plussym < lesssym <= leqsym := becomessym else elsesym if 34 < 34 then x := 12 ------------------------------------------ What would be the token types for the input if 34 < 34 then x := 12 ? How are reserved words represented? ii. tokens ------------------------------------------ TOKEN DATA STRUCTURE If the lexer returns tokens as objects, use something like: typedef struct { token_type ttype; const char *text; const char *filename; unsigned int line; unsigned int start_col; unsigned int end_col; } token; or can use global variables like: token_type ttype; char *text; unsigned int line; unsigned int column; ------------------------------------------ What would be the advantages and disadvantages of each architecture for communicating tokens? 2. symbols and symbol table What part of a compiler should populate the symbol table? Why should the that be the tool? a. symbols ------------------------------------------ SYMBOLS What information should remembered for identifiers? ------------------------------------------ How should we track lexical levels? What would be a suitable data structure for a symbol? b. symbol table ------------------------------------------ SYMBOL TABLE A *symbol table* Does each scope have its own symbol table? What operations would a symbol table need? What data structure would be good? ------------------------------------------