COP 3402 meeting -*- Outline -*- * Review for the final exam Please ask questions, especially about any concepts you are unsure of as we go... ** itegrative question ------------------------------------------ INTEGRATIVE QUESTION How would you organize the code for an assembler? ------------------------------------------ ... 1. tokenize using lexical analyzer 2. parser that builds ASTs 3. Build symbol table for identifiers (labels, names of data) attributes: type address 4. translate statements into instructions also mnemonics to op codes literals (numbers) into binary ** Terminology *** Compilers **** Semantic analysis and symbol tables ------------------------------------------ COMPILER CONCEPTS What is a scope? What syntactic features start scopes in C? What starts a scope in PL/0? How is a symbol table used? Is there just one symbol table? ------------------------------------------ ... an area of program text where declarations are effective and in which duplicate declarations are prohibited ... files, functions, and { and } delimit scopes in C ... procedure blocks are scopes in PL/0. ... to check that identifiers used are declared (by looking up the identifiers used), to check that each name declared is unique in its scope (however, for some languages, there are multiple declarations of the same name allowed for different uses, e.g., classes and methods in Java; how to handle that?) ... yes, at any moment there is just one symbol table, but it changes, especially when scopes are entered or left **** Code generation ------------------------------------------ CODE GENERATION What did we use for an IR in HW4? What information from a name's use is needed to generate code in PL/0? In PL/0, why not have the parser determine the lexical address of each variable? ------------------------------------------ ... enhanced ASTs how were they enhanced? With ASTs for identifier uses, and those ASTs contained id_uses, and those contained in the used identifier's attributes... Why were attributes needed? in the ASTs? so that there was no need to recreate the symbol table or to look up a name more than once. ... its offset from the start of the AR is needed to load/store from/to its location in an AR In a more interesting language the type information would be needed (to generate the right opcodes for arithmetic, for example, or to understand what kind of register or space needed to be reserved for it.) ... because to know the number of static links to follow would require knowing how deeply the AR would be nested; this is a static property, but one that the declaration checker is better suited to determining, as the parser does not need to track different scopes for any other reason. ***** code generation for statements ------------------------------------------ CODE GENERATION FOR STATEMENTS How is the code generator written in C? What does the C code look like for generating code for PL/0 statements? What does the C code look like for generating code for a PL/0 while statement? ------------------------------------------ ... following the grammar of the abstract syntax ... it's an alternative in the abstract syntax, so the code does a switch based on the AST's type tag to determine which function to call for each type of statement ... [code to evaluate the condition] JPC 2 JMP [past loop body] [code for the loop body] JMP [backwards to the start of the condition] How is the start of the condition determined? Its the size of the conditional code + size of the loop body How is the code generated for a while loop's condition? by calling _gen_code_cond with the condition's AST as arg How is the code generated for a while loop's body? by calling gen_code_stmt with the body's AST as an argument What happens if the body is another while loop? that loop's code goes in the body, no problem Why does the recursion terminate? because each recursive call passes a smaller AST ------------------------------------------ GENERATING CODE FOR IF STATEMENTS In C there are 2 kinds of if-statements with syntax if (Exp) Stmt and if (Exp) Stmt1 else Stmt2 How would these be compiled? What would the generated code look like for a C switch statement? ------------------------------------------ ... There would be 2 different kinds of ASTs for these, and the compilation would follow the grammar ... evaluate the expression to get a number (order the numbers and/or compute an index from the number) Test if the number is in range (if not jump to default code) Use the number to load an offset address from a table, jump to the address given by the offset ------------------------------------------ LABELS Would labels (as in HW4) be useful if the VM's ISA required absolute addresses for jumps? How would that work? ------------------------------------------ ... Yes! ... For forward jumps, make a second pass to fill in the address from the label ***** code generation for nested scopes ------------------------------------------ ACTIVATION RECORDS What are ARs used for on the runtime stack? Why not have all calls of a procedure share the same storage? In C, how many levels of static scopes can there be in a program? In C, what causes that nesting of scopes? ------------------------------------------ ... they store local variables they give each recursive call its own local variables ... because then recursion would not work (each call needs its own return address, at least) ... there can be arbitrarily many levels of scopes, as there is no limit on how deeply blocks can nest (but there is only one level of functions, and the compiler may move all locals out to the function level, so that at most one static link needs to be followed, so as an optimization, that can be stored in a global (register)) ... blocks ({ ... }) with local variable declarations ------------------------------------------ PROCEDURE CALLS What does the CAL instruction do? Does the code for a procedure body need to reserve space for the links? In C, when the function main returns, what is it returning to? ------------------------------------------ ... It pushes the static link (the BP value) on the stack, then it pushes the old value of BP on the stack (the dynamic link), then it pushes the return address (old PC) on the stack ... No, the CAL instruction does that ... the OS (which called it), and which uses the value as the exit code ------------------------------------------ PROCEDURE CALLS AND RETURNS How is the starting address of a procedure obtained (for use in a CAL instruction)? Why does the stack need to be trimmed before a procedure returns? With static scoping, what does the BP register point to on the stack? ------------------------------------------ ... when the starting address is determined (after the procedure's code is generated and when it is registered), the label for that procedure is set with the starting address, that label is shared by all calls to the procedure those that need the address have it filled in during another pass ... the location (at the bottom of the AR) where the procedure's static link is stored. ... so that the RTN instruction can find the return address and old BP value in the top two slots on the stack ------------------------------------------ SCOPING AND ADDRESSING If some variables were dynamically scoped, how would they be addressed? ------------------------------------------ ... Use the old BP as the dynamic link, and look back through the dynamic links, but would need to keep variable names ***** generating code for declarations ------------------------------------------ GENERATING CODE FOR DECLARATIONS How would the C declaration const double e = 2.718281828459; be compiled for a stack machine? How would the C declaration int i; be compiled for a stack machine? How would the C declaration void incI() { i += 1; } be compiled? ------------------------------------------ ... space would be reserved for a double (2 words) and initialized with the given value (use something like a LIT instruction) ... space would be allocated on the stack, not initialized ... the body statement would have code generated and a return instruction added to the end, then the code generated would be put somewhere for later use ***** generating code for expressions ------------------------------------------ CODE GENERATION FOR EXPRESSIONS Where does the compiled code put the result of an expression (in a stack machine)? Why are identifier uses important? ------------------------------------------ ... on top of the stack! ... because they are a base case for generating code and that is where the identifier's attributes are stored ------------------------------------------ CODE GENERATION FOR EXPRESSIONS 2 How is a constant's lexical address used to load the constant's value? How is a variable's lexical address used in an assignment statement? ------------------------------------------ ... If the constant is at lexical address (levels, offset), then the generated code follows levels static links and loads (with the LOD instruction) using the offset from the start of that AR's constant area (past the 3 links) ... If the variable is at lexical address (levels, offset), then the generated code follows levels static links, putting the frame pointer for the variable's AR on top of the stack, then, after evaluating the expression being assigned and putting it on top of the stack, the store instruction (STO) stores the expression's value at the offset (plus the 3 links) past the frame pointer's value ** Assembly Language ------------------------------------------ ASSEMBLY LANGUAGE What is an assembly language? What features does assembly language provide to help programmers? How does assembly language differ from C? ------------------------------------------ ... a language in which each statement corresponds to one machine instruction, and which is specialized for a particular machine ISA ... it provides: - mnemonics for instructions (and registers) - symbolic names for addresses - translation of literals into binary (e.g., chars, floats) - (macros for abstracting from sequences of instructions) ... assembly language has: - no structure - no type checking - explicit control over temporaries and registers - direct encoding of control structures (loops, etc.) - direct control over storage layout C provides: - control structures - expression evaluation that manages temporaries, registers - code optimizations that are automatically applied - type checking and warnings about mistakes - function abstractions - macros ------------------------------------------ ASSEMBLERS How does an assembler translate forward jumps to a label? How would you organize an assembler as a program? ------------------------------------------ ... it uses two passes, one to find the address, another to translate the instructions ... in several passes: 1. lexer to tokenize and ignore comments 2. Parse and form ASTs for statements 3. build symbol table to include labels and locations, (counting to find addresses) 4. translation and binary output ------------------------------------------ ASSEMBLER OUTPUT FORMAT What would be the output format for an assembler on Unix? What are the main sections of an object file? What addresses would need to be relocated and would need to be marked as such? ------------------------------------------ ... ELF ... header, text (code), data, relocation, debugging info ... labels (jump targets), data names (for globals) ** Linkers ------------------------------------------ LINKERS What does a linker do? How many passes would a linker need? Does a linker treat user program code differently than library code? ------------------------------------------ ... it puts together object files, resolving symbolic addresses (of data and subroutines) ... two: one to lay out the code (and build a symbol table), and another to fill in addresses ... Yes, it will statically link user code, but may dynamically link library code Why? to share library code ** Loaders ------------------------------------------ LOADERS What does a loader do? What is a boot loader? How is the boot loader loaded? Does a boot loader do relocation? ------------------------------------------ ... it puts a program into computer memory and prepares to run it Is that just copying code? - in simple cases yes, but if it relocates it (as with ASLR), then it must add an offset to each relocatable address ... A loader that puts the OS in memory and starts running it ... it's automatic in the hardware (copied from ROM) ... no, it's an absolute loader, and the OS needs to go in certain addresses anyway Why? so it's memory can be protected ** Operating Systems ------------------------------------------ OS What are the goals of an OS? What is the basic technique for running programs efficiently used by an OS? Would it be better if an OS were just a collection of libraries that worked with the hardware? ------------------------------------------ ... make it easy to run programs provide a standard interface between programs and the hardware share resources on the computer also: - have minimal overhead - protect users and their data - be reliable (not crash) ... limited direct execution, running the program in user mode directly on the CPU but in user mode it can't do certain things directly, must ask the OS to do them, which can check permissions, ... No, there would be no security or protection from other processes it would be hard to make any guarantees about resource sharing it would be too easy to make mistakes and lose information *** processes ------------------------------------------ PROCESSES What is a process? How is a process different from a program? How does a program become a process? Why is the process concept useful for programmers? ------------------------------------------ ... a program that is executing (or able to execute) ... a process is running (or runnable), a program is a description of an execution ... by the loader loading it, and the OS running it ... it gives the illusion of controlling the entire machine it protects the program's execution from interference by others (so debugging is easier) ------------------------------------------ PROCESS TRANSITIONS What states can a process be in? What causes transitions between states? ------------------------------------------ ... ready, running, waiting (for I/O), suspended (due to interrupt), ended (finished), abend (due to error or violation) ... events (interrupts in general), the specific ones are: [Suspended] ^ / dispatch |stop resume create /-----\ | v *---> [Ready] |--> [Running] ^ ^ / | || EOP | \--------- / | \->[End] I/O| timer /SIO | interrupt / |trap \--[Waiting]-/ v [Abend] *** resource sharing ------------------------------------------ RESOURCE SHARING How does an OS facilitate sharing of resources? Why are interrupts needed? What are the different kinds of interrupts? ------------------------------------------ ... by controlling their use only allowing the OS to execute privileged instructions (in kernel mode), requiring user processes to make system calls to ask for permission to execute privileged instructions ... to manage asynchronous communication, such as from I/O devices to stop processes from violating policies (related to security or sharing) and thus enable sharing of resources (because it's impossible to statically check for all violations, and because it's impossible to share a CPU if some processes can go into infinite loops) ... traps (program errors or violations) I/O interrupts (when device completes a task) timer interrupts (when a timeout occurs) system calls What does (each) do? Why are these needed? ------------------------------------------ INTERRUPT PROCESSING What is an interrupt handler? What mode does an interrupt execute in? What does an interrupt handler do to save the state of the running process? ------------------------------------------ ... code that is called (run) in response to an interrupt occurring ... kernel (supervisory) mode ... it saves the state on a kernel stack for that process Why one per process? so that the state of that process doesn't interfere with other processes (that may be suspended), and so that the OS can decide which process to run next ------------------------------------------ TRAP TABLES (INTERRUPT VECTORS) What is a trap table? What software controls the trap table? ------------------------------------------ ... an array of addresses for interrupt handlers the OS will use that to call the appropriate interrupt handler ... the OS Why? otherwise a user process could take control *** limited direct execution ------------------------------------------ LIMITED DIRECT EXECUTION What is limited direct execution? How does it work? What mode does your program run in? What kinds of instructions are privileged? ------------------------------------------ ... running user processes directly on the CPU, but not permitting them to execute dangerous (privileged) instructions ... the hardware needs to support 2 modes: user mode and kernel mode and the hardware needs to protect some memory - privileged instructions can only be executed in kernel mode - protected memory can only be written in kernel mode ... user mode ... I/O instructions writing to protected memory set up of parameters (area of protected memory, trap table, ...) (manipulation of the PSW and other resources that could be used to do privileged actions) **** system calls ------------------------------------------ SYSTEM CALLS What is a system call? Why are system calls needed? ------------------------------------------ ... a call to a library routine that sets some parameters for the OS and executes a special (trap) instruction (int on x86) that transfers control to an OS interrupt handler (in kernel mode) ... otherwise either: - a user process could not do I/O, or - the system's resources would be unprotected so, to enable controlled sharing of resources ** threads ------------------------------------------ THREADS What is a thread? Does each thread have its own stack? Why are threads useful? ------------------------------------------ ... a unit of parallel execution within a process ... yes ... they are less expensive than processes and allow for parallel execution *** synchronization ------------------------------------------ SYNCHRONIZATION What is a race condition? Why are race conditions a problem? What is a critical section? ------------------------------------------ ... when 2 or more threads can update a shared resource in ways that can produce multiple final states ... because they make debugging (exponentially) harder due to state space explosion ... an area of code in which only one thread should be executing and other threads are suspended (unable to proceed, excluded) ------------------------------------------ CORRECTNESS FOR THREADS When is an execution serializable? When is execution atomic? ------------------------------------------ ... when it is equivalent to a serial execution, where first one thread runs, then another, with no parallelism ... when no other thread can execute during that execution (indivisible would be another word for it) ** skills Q: Would you be able to write code for traversing an AST and checking declarations (given some helping functions)? Q: Would you be able to write code to traverse an enhanced AST and generate stack machine code (given some helping functions)?