I. Overview ------------------------------------------ SYSTEMS SOFTWARE Helping people run programs Human -----> [ Translator ] Readable / | Input / | (Prog. text) / | / v v Object Library Code (Binary) (Binary) \ | \ | \ v \->[ Linker/ Loader ] | | v Process | | v [ OS + Computer ] | | v Computation ------------------------------------------ How would a debugger fit into this picture? What job or jobs does a translator have? What jobs would an IDE (like Visual Studio) do? A. Compiler vs. Interpreter ------------------------------------------ RECALL DEFINITIONS def: An *assembler* translates a low-level language that is close to machine code into machine code. def: A *compiler* translates a (high-level) language into a form that can be (more easily) executed by a computer. def: An *interpreter* executes a programming language directly, often without translating it into object code first ------------------------------------------ Which of these is most like what a VM does? Does the VM you are writing do loading? What are some examples of languages with interpreters? ------------------------------------------ COMPILER PICTURE Source -----> [ Lexical Analyzer ] | v Token stream | v [ Parser ] | v Parse Tree (AST) + Symbol Table | v [ Static Analysis ] | v Parse Tree (AST) + Symbol Table | v [ Code Generator ] | v Object Code | v [ Linker/ Loader ] | v Process | v [ OS + Computer ] ------------------------------------------ ------------------------------------------ INTERPRETER PICTURE Source -----> [ Lexical Analyzer ] | v Token stream | v [ Parser ] | v Parse Tree (AST) + Symbol Table | v [ Interpreter ] | v [ OS + Computer ] ------------------------------------------ 1. advantages and disadvantages ------------------------------------------ ADVANTAGES AND DISADVANTAGES Compiler Advantages: Disadvantages: Interpreters Advantages: Disadvantages: ------------------------------------------ 2. hybrids of compilers and interpreters ------------------------------------------ HYBRID COMPILER/INTERPRETER PICTURE Source -----> [ Lexical Analyzer ] | v Token stream | v [ Parser ] | v Parse Tree (AST) + Symbol Table | v [ Static Analysis ] | v Parse Tree (AST) + Symbol Table | v [ Code Generator ] | v VM Code | v /------->[ VM + JIT Compiler ] |statistics | | v \--------[ OS + Computer ] ------------------------------------------ ------------------------------------------ HYBRID ADVANTAGES AND DISADVANTAGES Advantages: Disadvantages: ------------------------------------------ B. compiler structure ------------------------------------------ STANDARD COMPILER ARCHITECTURE (Based on Apel's book "Modern Compiler Implementation") Source code (text) | v [ Lexer ] | v Stream of tokens | v [ Parser ] | v AST -------------\ | \ v v [ Static Analyzer ] <- Symbol Table | / v / Intermediate Rep. / | / [ HL Optimizer ] / | / v / Intermediate Rep. / | / v v [ Code Generator ] | v Instruction Rep. | v [ LL Optimizer ] | v Machine Code ------------------------------------------ 1. tokens a. definitions ------------------------------------------ TOKENS Represent distinct symbols in the input including punctuation and operators Typically, comments are ignored Reserved words: Keywords: White space delimits identifiers/ aNumber vs. a Number ------------------------------------------ b. data structures i. token types ------------------------------------------ TOKENS Defined in *.tab.h file produced by bison Example from the SSM assembler (asm.tab.h) #ifndef YYTOKENTYPE # define YYTOKENTYPE enum yytokentype { YYEMPTY = -2, YYEOF = 0, /* "end of file" */ YYerror = 256, /* error */ YYUNDEF = 257, /* "invalid token" */ eolsym = 258, /* eolsym */ identsym = 259, /* identsym */ unsignednumsym = 260, plussym = 261, /* "+" */ minussym = 262, /* "-" */ commasym = 263, /* "," */ dottextsym = 264, /* ".text" */ dotdatasym = 265, /* ".data" */ dotstacksym = 266, /* ".stack" */ dotendsym = 267, /* ".end" */ colonsym = 268, /* ":" */ lbracketsym = 269, /* "[" */ rbracketsym = 270, /* "]" */ equalsym = 271, /* "=" */ noopsym = 272, /* "NOP" */ addopsym = 273, /* "ADD" */ subopsym = 274, /* "SUB" */ /* ... */ wordsym = 318, /* "WORD" */ charsym = 319, /* "CHAR" */ stringsym = 320, /* "STRING" */ charliteralsym = 321, stringliteralsym = 322 }; typedef enum yytokentype yytoken_kind_t; #endif Examples: Input yytokentype ======================================= ident identsym 34 unsignednumbersym + plussym ------------------------------------------ What would be the token types for the input WORD x = +24 wordsym identsym equalsym plussym unsignednumsym Q: How are reserved words represented? How are reserved words represented? 2. symbols and symbol table What part of a compiler should populate the symbol table? Why should the that be the tool? a. symbols ------------------------------------------ SYMBOLS What information should remembered for identifiers? ------------------------------------------ What would be a suitable data structure for a symbol? b. symbol table ------------------------------------------ SYMBOL TABLE A *symbol table* Does each scope have its own symbol table? What operations would a symbol table need? What data structure would be good? ------------------------------------------