Compilers - Principles, Techniques and Tools

Intro

This is a synopsis of the book "Compilers - Principles, Techniques and Tools". This summary is intended for me to look up some details at a later time. It is more or less in the order as it's written in the book. The text consists mostly of lists and keywords, but very few real sentences. After all, simplification is the real reason behind a synopsis.

By the way, this document is extremely incomplete.

Compilation

Split into two parts:

  1. Analysis: creates intermediate representation
  2. Synthesis: uses the intermediate representation to construct the target program.

Analysis

Lexical analysis

Also called "Linear analysis". Reads from left to right grouping the input into tokens.

Syntax analysis

"Hierarchical analysis".

Semantic analysis

Phases

  1. Lexical analyzer creates symbol table
  2. Syntax analyzer creates parse tree
  3. Semantic analyzer performs checks and is able to solve very simple `errors'
  4. Intermediate code generator generates easy-to-produce and easy-to-translate code
  5. Code optimizer improves the intermediate code.
  6. Code generator

Context-free grammar

Example of a production: stmt -> if ( expr ) stmt else stmt

stmt, expr
sequences of tokens, nonterminals
->
"can have the form"
if, else
lexical element
(, )
token

The left side of a production contains exactly one nonterminal and the right side a sequence of tokens and/or nonterminals. One of the nonterminals in a set of productions is the start symbol.

Parse tree

Ambiguity

If a expression can result in different parse trees, the grammar is ambiguous. The grammar should be edited, this isn't always easy, though.

Associativity of operators

When an operand has the same operator on the left and right side, we need a rule to decide which operator gets the operand.

Precedence

Syntax-directed translation

Syntax-directed definition

For each production rule one semantic rule is defined. The semantic rule defines how the attribute values of a node change. The following shows a syntax-directed definition for an imaginary language defining robot moves.

productions

seq -> seq instr | begin instr -> east | north | west | south

syntax-directed definition

production semantic rule
seq -> begin seq.x = 0
seq.y = 0
seq -> seq1 instr seq.x = seq1.x + instr.dx
seq1.y + instr.dy
seq -> east instr.dx = 1
instr.dy = 0
seq -> north instr.dx = 0
instr.dy = 1
seq -> west instr.dx = -1
instr.dy = 0
seq -> south instr.dx = 0
instr.dy = -1

depth-first traversal

translation-scheme

Parsing

predictive parsing

Top-down method. A set of recursive procedures to process input. Each nonterminal has one procedure. The to-be-called procedure is determined using the lookahead symbol. So it implicitly defines the parse-tree.

The production procedure with the right side α is called, when the lookahead symbol is contained in FIRST(α).
FIRST(α) returns a set with all tokens which could appear as the first token of the given production rule.

When a nonterminal is encountered, the next token is read.

BNF (Backus-Naur Form)
context-free grammar
context-free grammar
used to specify syntax
parsing
Finding a parse tree for a given string
Semantic
What the language means, e.g. that x1+x2 is an addition.
Syntax
What the language looks like