Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore Design and Implementation of Compiler

Design and Implementation of Compiler

Published by Willington Island, 2021-08-15 04:05:39

Description: This well-organized text provides the design techniques of complier in a simple and straightforward manner. It describes the complete development of various phases of complier with their imitation of C language in order to have an understanding of their application. Primarily designed as a text for undergraduate students of Computer Science and Information Technology and postgraduate students of MCA.

Key Features:

Chapter1 covers all formal languages with their properties.
More illustration on parsing to offer enhanced perspective of parser and also more examples in each segment to improve students skill to grasp the content discussed.
Code optimization refers various special techniques in much detail.
One complier project is also included to provide a most excellent simulation of compiler.
Some more instances in C to provide superior simulation of relevant section.

Search

Read the Text Version

122 Design and Implementation of Compiler num → dig num.val = dig.val dig.base = num.base dig → 0 dig.val = 0 dig → 1 dig.val = 1 dig → 2 dig.val = 2 dig → 3 dig.val = 3 dig → 4 dig.val = 4 dig → 5 dig.val = 5 dig → 6 dig.val = 6 dig → 7 dig.val = 7 dig → 8 dig.val = { if dig.base = 8 then error else 8 } dig → 9 dig.val = { if dig.base = 8 then error else 9 } ( ii ) Draw the syntax tree for octal number 345 and convert it into decimal base mm.val = 229 (mm.val = 28*8 + 5 = 229 num.base = 8) (basechar = octal) octal (num.val = 3*8 + 4 = 28 num.base = 8) (dig.val = 5 dig.base = 8) num.val = 3 num.base = 8 dig.val = 4 dig.base = 8 5 dig.val = 3 mm.base = 8 4 3 4.6 DEPENDENCY GRAPH If an attribute ‘y’ is evaluated after the evolution of ‘z’ then we say that y is dependent on z this dependency can be shown by parse tree with some other relation known as dependency graph. Dependency graph exist in both synthesized and inherit attribute as :

Syntax Analysis and Directed Translation 123 num.val = 45 * 10 + 6 = 456 num.val = 40 * 10 + 5 = 45 dig.val = 6 num.val = 4 dig.val = 5 6 dig.val = 4 5 4 Figure 4.4: Dependency graph for synthesized attribute Def Term , 6 4 identifier Len 3 int 5 7 Len 1 , 8 9 Len 1 , identifier c 10 2 identifier b 1 a Figure 4.5: Dependency graph for inherit attribute

124 Design and Implementation of Compiler 4.6.1 Evaluation Order Any topological sort of any dependency graph gives a valid order in which the semantic rules associated with the nodes in a parse tree can be evaluated. Except topological sort some other mehod are also evaluates semantic rules as: • Topological Sort • Depth First Traversal • Parse Tree Method • Rule base Method • Oblivious Method 4.6.1.1 Topological sort A topological sort of a directed acyclic graph (DAG) is a linear ordering of its nodes. Algorithms for topological sorting have running time linear in the number of nodes plus the number of edges (θ(|V|+|E|)). An alternative algorithm for topological sorting is based on depth first search. Topological sort works by choosing vertices in the same order as the eventual topological sort. First, find a list of “start nodes” which have no incoming edges and insert them into a queue Q. Then, Q ← Set of all nodes with no incoming edges while Q is non-empty do remove a node n from Q output n for each node m with an edge e from n to m do remove edge e from the graph if m has no other incoming edges then insert m into Q if graph has edges then output error message (graph has a cycle) If this algorithm terminates without outputting all the nodes of the graph, it means the graph has at least one cycle and therefore is not a DAG, so a topological sort is impossible. 4.6.1.2 Depth-first traversal A depth-first traversal of a parse tree is one way of evaluating attributes. Note that a syntax-directed definition does not impose any particular order as long as order computes attribute of parent after all its children’s attributes. PROCEDURE visit (n: node) BEGIN FOR each child m of n, from left to right Do visit (m); Evaluate semantic rules at node n END 4.6.1.3 Parse tree method These methods obtain an evolution order from a topological sort of dependency graph constructed from the parse tree for each input. This method will fail if parse tree have a cycle.

Syntax Analysis and Directed Translation 125 4.6.1.4 Rule based method At compiler construction time semantic rules associated with production are analysized. For each production, order in which the attributes with that production are evaluated is predetermined at compiler construction time. 4.6.1.5 Oblivious method Evolution order is chosen without considering any semantic rule. For example – If translation takes place during parsing then order of evolution is forced by the parsing method. 4.7 PARSING Parsing is the process of analyzing a sequence of tokens in order to determine its grammatical structure with respect to a given formal grammar. It is formally named syntax analysis. A parser is a computer program that carries out this task. The term parseable is generally applied to text or data which can be parsed. Parsing transforms input text into a data structure, usually a tree, which is suitable for later processing and which captures the implied hierarchy of the input. Generally, parsers operate in two stages, first identifying the meaningful tokens in the input, and then building a parse tree from those tokens. So, most common use of parsers is to parse computer programming languages. * Parsing 1* 1 + 2*3 2 3 Figure 4.6 Overview of process The following example demonstrates the common case of parsing a computer language with two levels of grammar: lexical and syntactic. The first stage is the token generation, or lexical analysis, by which the input character stream is split into meaningful symbols defined by a grammar of regular expressions as described in chapter 3. For example, a calculator program would look at an input such as “12*(3+4)^2” and split it into the tokens 12, *, (, 3, +, 4, ), ^ and 2, each of which is a meaningful symbol in the context of an arithmetic expression. The parser would contain rules to tell it that the characters *, +, ^, ( and ) mark the start of a new token, so meaningless tokens like “12*” or “(3” will not be generated. The next stage is syntactic parsing or syntax analysis, which is checking that the tokens form an allowable expression. This is usually done with reference to a context-free grammar which recursively defines compo- nents that can make up an expression and the order in which they must appear. However, not all rules defining programming languages can be expressed by context-free grammar alone, for example type validity and proper declaration of identifiers. These rules can be formally expressed with attribute grammar. The final phase is semantic parsing or analysis, which is working out the implications of the expression just validated and taking the appropriate action. DISCUSSED IN NEXT CHAPTER. Types of parsers The task of the parser is essentially to determine if and how the input can be derived from the start symbol within the rules of the formal grammar. This can be done in essentially two ways:

126 Design and Implementation of Compiler • Top-down parsing: A parser can start with the start symbol and try to transform it to the input. Intuitively, the parser starts from the largest elements and breaks them down into incrementally smaller parts. LL parsers are examples of top-down parsers. • Bottom-up parsing: A parser can start with the input and attempt to rewrite it to the start symbol. Intuitively, the parser attempts to locate the most basic elements, then the elements containing these, and so on. LR parsers are examples of bottom-up parsers. Another term used for this type of parser is Shift-Reduce parsing. Another important distinction is whether the parser generates a leftmost derivation or a rightmost deriva- tion. LL parsers will generate a leftmost derivation and LR parsers will generate a rightmost derivation (al- though usually in reverse). • Top-down parsers • Recursive descent parser • LL parser • Packrat parser • Unger parser • Tail Recursive Parser • Bottom-up parsers • Precedence parsing • BC (bounded context) parsing • LR parser • SLR parser • LALR parser • Canonical LR parser • GLR parser • Earley parser • CYK parser • Top-down vs Bottom-up • Bottom-up more powerful than top-down; • Can process more powerful grammar than LL, will explain later. • Bottom-up parsers too hard to write by hand, but JavaCUP (and yacc) generates parser from spec; • Bottom up parser uses right most derivation; Top down uses left most derivation; • Less grammar translation is required, hence the grammar looks more natural; • Intuition: bottom-up parse postpones decisions about which production rule to apply until it has more data than was available to top-down. 4.7.1 Parsing Table A parsing table is a table describing what action its parser should take when a given input comes while it is in a given state. It is a tabular representation of a pushdown automaton that is generated from the

Syntax Analysis and Directed Translation 127 context-free grammar of the language to be parsed. This is possible because the set of viable prefixes (that is, the sequence of input that can be set aside before a parser needs to either perform a reduction or return an error) of a context-free language is a regular language, so the parser can use a simple parse state table to recognize each viable prefix and determine what needs to be done next. A parsing table is used with a stack and an input buffer. The stack starts out empty, and symbols are shifted onto it one by one from the input buffer with associated states; in each step, the parsing table is consulted to decide what action to take. The parsing table consists of two parts, the action table and the goto table. The action table takes the state at the top of the stack and the next symbol in the input buffer (called the “lookahead” symbol) and returns the action to take, and the next state to push onto the stack. The goto table returns the next state for the parser to enter when a reduction exposes a new state on the top of the stack. The goto table is needed to convert the operation of the deterministic finite automaton of the Action table to a pushdown automaton. Action table consist three operations Reduction, a Shift, or Accept or None. • Reduction : When the parser recognizes a sequence of symbols to represent a single nonterminal, and substitutes that nonterminal for the sequence then reduction take place. For example, in an LR parser, this means that the parser will pop 2*N entries from the stack (N being the length of the sequence recognized - this number is doubled because with each token pushed to the stack, a state is pushed on top, and removing the token requires that the parser also remove the state) and push the recognized nonterminal in its place. This nonterminal will not have an associated state, so to determine the next parse state, the parser will use the goto table. • Shift : In this case, the parser makes the choice to shift the next symbol onto the stack. The token is moved from the input buffer to the stack along with an associated state, also specified in the action table entry, which will be the next parser state. The parser then advances to the next token. • Accept : When a parse is complete and the sequence of tokens in question can be matched to an accept state in the parse table. 4.7.2 Top-Down Parsing Top-down parsing is a strategy of analyzing unknown data relationships by hypothesizing general parse tree structures and then considering whether the known fundamental structures are compatible with the hypoth- esis. It occurs in the analysis of both natural languages and computer languages. The main idea behind the top down parsing is to contruct an efficient non backtracking form of top down parser called a predictive parser or LL parser. Topdown parsing with backtracking • S → cAd • A → ab|a • w = cad

128 Design and Implementation of Compiler SS S cA d cAd cAd ab a Figure 4.5 Parsing trace: Expansion Remaining Input Action S cad Try S → cAd cAd cad Match c Ad ad Try A → ab abd ad Match a bd d Dead end, backtrack ad ad Try A → a ad ad Match a d d Match d Success 4.7.2.1 Recursive descent parser Recursive descent parser is a top-down parser built from a set of mutually-recursive procedures (or a non- recursive equivalent) where each such procedure usually implements one of the production rules of the grammar. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes. Recursive descent with backup or backtracking is a technique that determines which production to use by trying each production in turn. Although predictive parsers are widely used, programmers often prefer to create LR or LALR parsers via parser generators without transforming the grammar into LL(k)* form. Recursive descent parsers are particularly easy to implement in functional languages such as Haskell or ML. Example 4.19: here is a grammar for recursive descent parser • program → statement program | statement • statement → assignment • assignment → ID EQUAL expr Example 4.20: Consider following grammar as expression → expression op term | term term → term op1 factor | factor factor → expression | number Op → + | - Op1 → *

Syntax Analysis and Directed Translation 129 Now we present a procedure that recognizes a factor. In this procedure there is a ‘token’ variable that keeps the current next token in the input and ‘match’ is an another procedure that match current next token with its parameter. Procedure factor Begin Case token (match) ( ) ; expression; match ( ) ); number : match (number); Else error End case End Procedure match (expected token ); Begin If token = expected token then Gettoken Else Error End if End Example 4.21: Recursive descent calculator for simple integer arithmetic.For this purpose we consider the following grammar as : #include<stdio.h> #include<conio.h> char token; int exp(void); int term(void); void factor(void); void error void { fprintf(stderr, “error”); exit(1); } void match (void) { if(token = = expectedToken) token = getchar( ); else error( ); } int exp (void) { int temp = term ( ); while ((token = = ‘+’ || token = = ‘–’ )); switch(token) { case ‘+’ : match(‘+’);


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook