ASSEMBLY LANGUAGE AND ASSEMBLERS def: *Assembly language* is a language that directly maniplulates a (virtual) machine's state and in which each statement corresponds to one (1) machine instruction. def: An *assembler* translates from assembly language to machine code ASSEMBLERS VS. COMPILERS Statements correspond to In assembly language: one machine instruction In a higher-level programming language: several machine instructions Expressions correspond to: In assembly language: must be explicitly programmed with many machine instructions and explicit use of temporary storage In a higher-level programming language: are implicitly computed with many machine instructions but with implicit use of temporary storage Names are abstractions of In assembly language: locations (of data) and address (of program code) In a higher-level programming language: variables or constants or computations (procedures, functions) GOALS OF ASSEMBLERS - Relieve tedium of machine code by: translating names to location translation mnemonics to opcodes translation of decimal to binary - Help communication between people with: comments symbolic names (labels) translating decimal (or hex, octal) to binary Programmer still needs to know how the machine works in assembler BASIC PROBLEM FOR ASSEMBLERS ... JMP ahead ; forward reference ... ahead: ; label How can assembler know the address of the label "ahead"? TWO PASS DESIGN OF ASSEMBLER Pass 1: count instructions determine address of each label Pass 2: check that all labels are defined generate machine code RELATIVE JUMPS Additional task for assembler: If computer architecture has jumps to (absolute) addresses Then translate jumps relative to PC into SECTIONS OF EXECUTABLE FILES (header: info about each section) Text Section: executable instructions (binary format) Data Section: data (e.g., for constants) in binary format Relocation Section: identifies locations needing adjustment when the program is moved in memory (debugging sections: symbol table section: global labels debugging section: file + line info EXECUTABLE ELF FILE LAYOUT |------------------------| [ General info ] (header [ Program name ] section) [ Start address of text ] [ Length of text section ] [ Start address of data ] [ Length of data section ] [ Start address of reloc.] [ Length of reloc. sect. ] |------------------------| | | | | | Text Section | | | | | |------------------------| | | | | | Data Section | | | | | |------------------------| | | | | | Relocation Section | | | | | |------------------------| RELOCATION OF EXECUTABLE FILES Assemblers generate code assuming starting address is 0 Relocation data identifies parts of instructions that need offset added if the starting address is changed to 0 + offset WHAT A LINKER DOES Combines object files resolving symbolic names For example, a program and Object Code P Executable [ header ] [ header ] [ text P ] [ text P ] [ data P ] \ [ text L ] [ sym tab P] \ [ data P ] > Linker ->[ data L ] Object Code L > [ sym tab P ] [ header ] / [ sym tab L ] [ text L ] / [ data L ] [ sym tab L] KINDS OF LINKING def: *static linking* is linking that happens before runtime, once and for all time def: *dynamic linking* is linking that happens during runtime or at beginning of runtime the library code is typically shared Advantages: + static linking: faster at runtime code is not shared among processes may prevent security problems of changing code + dynamic linking: security updates immediately reflected for all (so faster updates) less memory is used for code WHAT A LOADER DOES A loader places a program (its instructions and data) into memory so it can be run Types of loaders: - Absolute loader: puts a program's text (code) into a specified memory address - Bootstrap loader: loads (a loader to load) the operating system itself - Relocating loader: puts a program in memory anywhere (that there is space) uses information in the relocation section