SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING 101
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING CS6660 - COMPILER DESIGNCS6660 COMPILER DESIGN LT P C 3003UNIT I INTRODUCTION TO COMPILERS 5 Translators-Compilation and Interpretation-Language processors -The Phases ofCompiler-Errors Encountered in Different Phases-The Grouping of Phases-CompilerConstruction Tools - Programming Language basics.UNIT II LEXICAL ANALYSIS 9 Need and Role of Lexical Analyzer-Lexical Errors-Expressing Tokens by RegularExpressions-Converting Regular Expression to DFA- Minimization of DFA-Language forSpecifying Lexical Analyzers-LEX-Design of Lexical Analyzer for a sample Language.UNIT III SYNTAX ANALYSIS 10 Need and Role of the Parser-Context Free Grammars -Top Down Parsing -General Strategies-Recursive Descent Parser Predictive Parser-LL(1) Parser-Shift ReduceParser-LR Parser-LR (0)Item-Construction of SLR Parsing Table -Introduction to LALRParser - Error Handling and Recovery in Syntax Analyzer-YACC-Design of a syntaxAnalyzer for a Sample Language .UNIT IV SYNTAX DIRECTED TRANSLATION & RUN TIME ENVIRONMENT 12Syntax directed Definitions-Construction of Syntax Tree-Bottom-up Evaluation of S-Attribute Definitions- Design of predictive translator - Type Systems-Specification of asimple type checker-Equivalence of Type Expressions-Type Conversions.RUN-TIME ENVIRONMENT: Source Language Issues-Storage Organization-StorageAllocation-Parameter Passing-Symbol Tables-Dynamic Storage Allocation-StorageAllocation in FORTAN.UNIT V CODE OPTIMIZATION AND CODE GENERATION 9 Principal Sources of Optimization-DAG- Optimization of Basic Blocks-GlobalData Flow Analysis-Efficient Data Flow Algorithms-Issues in Design of a Code Generator -A Simple Code Generator Algorithm.TOTAL: 45TEXTBOOK1. Alfred V Aho, Monica S. Lam, Ravi Sethi and Jeffrey D Ullman, “Compilers – Principles,Techniques and Tools”, 2nd Edition, Pearson Education, 2007.REFERENCES1. Randy Allen, Ken Kennedy, “Optimizing Compilers for Modern Architectures: ADependence-based Approach”, Morgan Kaufmann Publishers, 2002.2. Steven S. Muchnick, “Advanced Compiler Design and Implementation”, MorganKaufmann Publishers - Elsevier Science, India, Indian Reprint 2003.3. Keith D Cooper and Linda Torczon, “Engineering a Compiler”, Morgan KaufmannPublishers Elsevier Science, 2004.4. Charles N. Fischer, Richard. J. LeBlanc, “Crafting a Compiler with C”, Pearson 102
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING CS6660 - COMPILER DESIGN Index SheetUNIT TOPIC PAGE NUMBER No. Notes PART-A PART-BI Introduction to Compilers 104 105 109 111 112 114II Lexical Analysis 122 124 127III Syntax AnalysisIV Syntax Directed Translation & Run Time 129 131 133 EnvironmentV Code Optimization and Code Generation 137 138 140ANNA UNIVERSITY QUESTION PAPER 145 103
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING UNIT I INTRODUCTION TO COMPILERSTranslators-Compilation and Interpretation-Language processors -The Phases of Compiler-Errors Encountered in Different Phases-The Grouping of Phases-Compiler ConstructionTools - Programming Language basics. KEY POINTSTRANSLATORS: Programming language. Translates source language to target language.COMPILATION AND INTERPRETATION: Both are producing target language. Mapping input to output. Error identification.LANGUAGE PROCESSORS: Language processor is used to produce the target program for the corresponding source program using preprocessor, compiler, assembler and linker/loader.THE PHASES OF COMPILER: Lexical analysis Syntax analysis Semantic analysis Intermediate code generation Code optimization Code generationERRORS ENCOUNTERED IN DIFFERENT PHASES: Syntactic Error Semantic ErrorTHE GROUPING OF PHASES: Analysis Phases Synthesis PhasesCOMPILER CONSTRUCTION TOOLS: Parser Generators Scanner Generators Syntax directed Translation Engine Code Generator Generators Data flow analysis Engine Compiler Construction toolkit.PROGRAMMING LANGUAGE BASICS: The static/Dynamic Distinction Environments and states 104
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING Static scope and block structure Explicit access control Dynamic Scope Parameter passing mechanisms Aliasing PART-A1. What is a Complier? A Complier is a program that reads a program written in one language (the sourcelanguage) and translates it in to an equivalent program in another language (the targetlanguage). As an important part of this translation process, the compiler reports to its user thepresence of errors in the source program Source Program Compiler Target Program Error Message2. What are the cousins of compiler? April/May 2004, April/May 2005, April/May2012 The following are the cousins of compilers i. Preprocessors ii. Assemblers iii. Loaders iv. Link editors.3. What are the main two parts of compilation? What are they performing? The two main parts areAnalysis part breaks up the source program into constituent pieces and creates anintermediate representation of the source program.Synthesis part constructs the desired target program from the intermediate representation4. What is an interpreter?(may 2011) Certain other translators transform a programming language into a simplifiedlanguage called intermediate code, which can directly executed using a program called aninterpreter.5. State some compiler construction tools? April /May 2008 Parse generator Scanner generators Syntax-directed translation engines Automatic code generator Data flow engines. Compiler construction Toolkit6. What is a preprocessor? Nov/Dev 2004 What are the functions ofpreprocessors?(nov 2007) A preprocessor is one, which produces input to compilers. A source program maybe divided into modules stored in separate files. The task of collecting the source program issometimes entrusted to a distinct program called a preprocessor. The preprocessor may alsoexpand macros into source language statements. 105
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING Skeletal Source Program Preprocessor Source Program Produce input to compilers. Functions: Macro processing, file inclusion, rationalpreprocessors and language extensions.7. What is an interpreter?(may 2011) Certain other translators transform a programming language into a simplified language called intermediate code, which can directly executed using a program called an interpreter.8. For the expression a = 50 +c, show how this statement is processed at the first two phases of the compiler.(nov 2011) Source program is a stream of characters: E.g. a =50+c lexical analysis: groups characters into non-separable units, called token, and generates token stream: id1 = const + id2 The information about the identifiers must be stored somewhere (symbol table). Syntax analysis: checks whether the token stream meets the grammatical specification of the language and generates the syntax tree. := id1 + 50 id29. What is the need for separating the analysis phase into lexical analysis and parsing?(Or) What are the issues of lexical analyzer?(may 2009, nov 2011, may 2013) Simpler design is perhaps the most important consideration. The separation of lexical analysis from syntax analysis often allows us to simplify one or the other of these phases. Compiler efficiency is improved. Compiler portability is enhanced.10. Define a symbol table.(nov 2007) A symbol table is a data structure containing a record for each identifier, with fields for the attributes of the identifier. The data structure allows us to find the record for each identifier quickly and to store or retrieve data from that record quickly. Whenever an identifier is detected by a lexical analyzer, it is entered into the symbol table. The attributes of an identifier cannot be determined by the lexical analyzer. 106
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING11. Write the difference between Compiler and Interpreter.(may 2008) Compiler Interpreter Target Program produced faster Target Program produced slower Not better error diagnostics Better error diagnosticsTranslation of source program happening No translationExecution happens on the whole Execution happens in statement by statement12. Define tokens, Patterns and Lexemes.(may 2013)Tokens- Sequence of characters that have a collective meaning.Patterns- There is a set of strings in the input for which the same token is producedas output. This set of strings is described by a rule called a pattern associated with thetokenLexeme- A sequence of characters in the source program that is matched by thepattern for a token.13. How will you group the phases of compiler?(nov 2013)Grouping of PhasesFront end : machine independent phases Lexical analysis Syntax analysis Semantic analysis Intermediate code generation Some code optimizationBack end : machine dependent phases Final code generation Machine-dependent optimizations14. Write the regular expression for identifier. (nov 2013)An identifier is defined as a letter followed by zero or more letters or digits. Theregular expression for an identifier is given as letter (letter | digit)*15. What are the classifications of compiler? Single pass compiler Multi pass compiler Load-and-go compiler Debugging or optimizing compiler16. Write short notes on error handler?The error handler is invoked when a flaw in the source program is detected. Itmust warn the programmer by issuing a diagnostic, and adjust the information beingpassed from phase to phase so that each phase can proceed. So that as many errors aspossible can be detected in one compilation.17. What do you meant by passes?A pass reads the source program or the output of the previous pass, makes thetransformations specified by its phases and writes output into an intermediate file, which 107
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING may then be read by a subsequent pass. In an implementation of a compiler, portions of one or more phases are combined into a module called pass.18. Why is buffering used in lexical analysis? What are the commonly used buffering methods? (may 2014) Determining the next lexeme requires reading the input beyond the end of the lexeme. Buffer Pairs: Concerns with efficiency issues Used with a look ahead on the input19. Write short notes on buffer pair. (may 2008) Concerns with efficiency issues Used with a lookahead on the input It is a specialized buffering technique used to reduce the overhead required to process an input character. Buffer is divided into two N-character halves. Use two pointers. Used at times when the lexical analyzer needs to look ahead several characters beyond the lexeme for a pattern before a match is announced.20. Write short notes on YACC. YACC is an automatic tool for generating the parser program. YACC stands for Yet Another Compiler which is basically the utility available from UNIX. Basically YACC is LALR parser generator. It can report conflict or ambiguities in the form of error messages. 108
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING PART-B1. What is a compiler? State various phases of a compiler and explain them in detail. OrDescribe how various phases could be combined as a pass in a compiler? April/May ‘082. What are the cousins of a Compiler? Explain them in detail. 109
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING3. Explain the various phases of a compiler in detail. Also write down the output for thefollowing expression after each phase a:= b*c-d. OrFor the following expression Position:=initial+ rate*60 Write down the output aftereach phase 110
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING4. Explain Compiler construction tools Parse generator: Automatically produces syntax analyzers from grammatical description of a programming language. Scanner generators: Produce Lexical analyzers from a regular expression description of the tokens of a language. Syntax-directed translation engines: Produces collection of routines for walking a parse tree and generating intermediate code. Automatic code generator: Produces a code generator from a collection of rules for translating each operation of the intermediate language into the machine language for a target machine. Data flow engines: Information about how the values are transmitted from one part of a program to each other part. Compiler Construction Toolkit: Provides an integrated set of routines for constructing various phases of compiler. UNIT II LEXICAL ANALYSIS Need and Role of Lexical Analyzer-Lexical Errors-Expressing Tokens by RegularExpressions-Converting Regular Expression to DFA- Minimization of DFA-Language forSpecifying Lexical Analyzers-LEX-Design of Lexical Analyzer for a sample Language. KEYPOINTSNEED AND ROLE OF LEXICAL ANALYZER: Grouped into Lexemes, parser (syntax analysis ) Interaction with Symbol table. Two Processes Scanning, lexical analysis. Lexical analysis Vs Parsing Simplicity Efficiency Portability Token, pattern, attributes and lexemeLEXICAL ERRORS: Source code error Misspelling Recovering strategy- panic code.EXPRESSING TOKENS BY REGULAR EXPRESSIONS: Input Buffering Buffer pairs Sentinels Specification of tokens Strings and Languages Operations on Languages 111
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING Regular Expressions Regular Definitions Extensions of Regular Expressions. Recognition of tokens Transition Diagram Recognition of Reserved Words and Identifiers Completion of the Running Architecture of a transition diagram based lexical analyzerCONVERTING REGULAR EXPRESSION TO DFA: Simulation of an NFA Efficiency of NFA simulation Construction of an NFA from regular expressionMINIMIZATION OF DFA: Minimizing the procedure to generate DFA.LANGUAGE FOR SPECIFYING LEXICAL ANALYZERS: Finite Automata Nondeterministic finite automata Transition tables Acceptance of input string by automataLEX-DESIGN OF LEXICAL ANALYZER FOR A SAMPLE LANGUAGE: Use of lex Structure of lex programs Conflict resolution in lex Lookahead operator PART-A1. What is a lexeme? Define a regular set. Nov/Dec 2006 A Lexeme is a sequence of characters in the source program that is matched by the pattern for a token. A language denoted by a regular expression is said to be a regular set2. What is a sentinel? What is its usage? April/May 2004 A Sentinel is a special character that cannot be part of the source program.Normally we use ‘eof’ as the sentinel. This is used for speeding-up the lexical analyzer.3. What are the Error-recovery actions in a lexical analyzer? Deleting an extraneous character Inserting a missing character Replacing an incorrect character by a correct character Transposing two adjacent characters4. What is recognizer? Recognizers are machines. These are the machines which accept the stringsbelonging to certain language. If the valid strings of such language are accepted by themachine then it is said that the corresponding language is accepted by that machine,otherwise it is rejected. 112
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING5. Construct Regular expression for the languageL= {w ε{a,b}/w ends in abb} Ans: {a/b}*abb.6. What is a regular expression? State the rules, which define regular expression? Regular expression is a method to describe regular language.Rules: ε-is a regular expression that denotes {ε} that is the set containing the empty string If a is a symbol in ∑, then a is a regular expression that denotes {a} Suppose r and s are regular expressions denoting the languages L(r ) and L(s) Then, (r )/(s) is a regular expression denoting L(r) U L(s). (r )(s) is a regular expression denoting L(r )L(s) (r )* is a regular expression denoting L(r)*. (r) is a regular expression denoting L(r ).7. What is the purpose of Scanner? The purpose of scanner is Describe what the tokens of the language are The recognition of tokens8. What are the issues to be considered in the design of lexical analyzer?(may 2009, nov 2011, may 2013) Simpler design Compiler efficiency is improved by specialized buffering techniques for reading input characters and processing tokens and significantly speeds up the performance of a compiler. Compiler portability is enhanced9. What are the possible error recovery actions in lexical analyzer?(may 2012, nov 2012, may 2015) Deleting an extraneous character Inserting a missing character Replacing an incorrect character by a correct character.10. What is the role of lexical analyzer? (nov 2014) Main Task: Take a token sequence from the scanner and verify that it is a syntactically correct program. Secondary Tasks: Process declarations and set up symbol table information accordingly, in preparation for semantic analysis. Construct a syntax tree in preparation for intermediate code generation.11. Define lexeme.(may 2014) Lexeme- A sequence of characters in the source program that is matched by the pattern for a token. 113
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING12. Compare the features of DFA and NFA.(may 2014) “DFA” stands for “Deterministic Finite Automata” while “NFA” stands for “Nondeterministic Finite Automata.” Both are transition functions of automata. In DFA the next possible state is distinctly set while in NFA each pair of state and input symbol can have many possible next states. NFA can use empty string transition while DFA cannot use empty string transition. NFA is easier to construct while it is more difficult to construct DFA. Backtracking is allowed in DFA while in NFA it may or may not be allowed. DFA requires more space while NFA requires less space. While DFA can be understood as one machine and a DFA machine can be constructed for every input and output, 8.NFA can be understood as several little machines that compute together, and there is no possibility of constructing an NFA machine for every input and output.13. Write regular definitions to define a decimal number to be used by Pascal.(nov 2011) digit → 0 | 1 | 2 | . . . | 9 id → letter (letter | digit)* digit → 0 | 1 | 2 | . . . | 9 digit → digit digit* Optimal-fraction→ . digits | ε digit is the regular for the set of all decimal digits. PART-B1. Explain specification of tokens. Regular expressions are the notations for specifying the patterns. Each patternmatches a set of stringsStrings and languages: An alphabet is a finite set of symbols. A string over an alphabet is a finitesequence of symbols from the alphabet. Terms for parts of a string: Prefix, Suffix, Substring,Proper prefix and proper suffix Language: It is a set of strings over some fixed alphabet.Operations on languages: Concatenation Union Kleene closure Positive closure Regular expressions: ε is a regular expression that denotes {ε} If a is a symbol in Ó, then a is a regular expression that denotes {a} suppose r and s are regular expressions denoting the languages L(r) and L(s).Then, 114
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING (r) | (s) is a regular expression denoting L(r) U L(s) (r) (s) is a regular expression denoting L(r) L(s) (r)* is a regular expression denoting L(r)* (r) is a regular expression denoting L(r)A language denoted by a regular expression is said to be a regular set. Unary operator * has the highest precedence and is left associative Concatenation has the second highest precedence and is left associative has lowest precedence and is left associativeRegular definitions: It is a sequence of definitions of the form d1->r1, d2->r2 . dn->rnWhere each di is a distinct name and each ri is a regular expression over the symbols in ÓU {d1, d2, .. di-1}2. Construct a NFA using Thompson.s construction algorithm for the regularexpression (a|b)*abb(a|b)* and convert it into DFA.NFA:The start state of DFA is ε-closure (0)ε-closure (0) = {0, 1, 2, 4, 7} = AThe input symbol alphabet is {a, b}ε-closure (move(A, a)) = ε-closure (move({0,1, 2, 4, 7}, a)) = ε-closure (3, 8) = {1, 2, 3, 4, 6,7, 8} = BDTrans [A, a] = Bε-closure (move(A, b)) = ε-closure (move({0,1, 2, 4, 7}, b)) = ε-closure (5) = {1, 2, 4, 5, 6,7} = CDTrans [A, b] = Cε-closure (move(B, a)) = ε-closure (move({1, 2, 3, 4, 6, 7, 8}, a)) = ε-closure (3, 8) = {1, 2,3, 4, 6, 7, 8} = BDTrans [B, a] = Bε-closure (move(B, b)) = ε-closure (move({1, 2, 3, 4, 6, 7, 8}, b)) = ε-closure (5, 9) = {1, 2,4, 5, 6, 7, 9} = CDTrans [B, b] = Dε-closure (move(C, a)) = ε-closure (move({1, 2, 4, 5, 6, 7, 8}, a)) = ε-closure (3, 8) = B 115
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERINGDTrans [C, a] = Bε-closure (move(C, b)) = ε-closure (move({1, 2, 4, 5, 6, 7, 8}, b)) = ε-closure (5) = CDTrans [C, b] = Cε-closure (move(D, a)) = ε-closure (move({1, 2, 4, 5, 6, 7, 9}, a)) = ε-closure (3, 8) = BDTrans [D, a] = Bε-closure (move(D, b)) = ε-closure (move({1, 2, 4, 5, 6, 7, 9}, b)) = ε-closure (5, 10) = {1, 2,4, 5, 6, 7, 10, 11, 12, 14, 17} = EDTrans [D, b] = Eε-closure (move(E, a)) = ε-closure (move({1, 2, 4, 5, 6, 7, 10, 11, 12, 14, 17}, a)) = ε-closure(3, 8, 13) = {1, 2, 3, 4, 6, 7, 8, 11, 12, 13, 14, 16, 17} = FDTrans [E, a] = F ε-closure (move(E, b)) = ε-closure (move({1, 2, 4, 5, 6, 7, 10, 11, 12, 14,17}, b)) = ε-closure (5, 15) = {1, 2, 4, 5, 6, 7, 11, 12, 14, 15, 16, 17} = GDTrans [E, b] = Gε-closure (move(F, a)) =ε-closure (3, 8, 13) = FDTrans [F, a] = Fε-closure (move(F, b)) =ε-closure (5, 9, 15)={1, 2, 4, 5, 6, 7, 9, 11, 12, 14, 15, 16, 17}= HDTrans [F, b] = Hε-closure (move(G, a)) =ε-closure (3, 8, 13) = FDTrans [G, a] = Fε-closure (move(G, b)) =ε-closure (5, 15) = GDTrans [G, b] = Gε-closure (move(H, a)) =ε-closure (3, 8, 13) = FDTrans [H, a] = Fε-closure (move(H, b)) =ε-closure (5, 10, 15) ={1, 2, 4, 5, 6, 7, 10, 11, 12, 14, 15, 16, 17} = IDTrans [H, b] = Iε-closure (move(I, a)) =ε-closure (3, 8, 13) = FDTrans [I, a] = Fε-closure (move(I, b)) =ε-closure (5, 15) = GDTrans [I, b] = GTransition table for DFA: 116
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERINGTransition diagram for DFA:3. Discuss about the input buffering scheme in lexical analyzer. Determining the next lexeme requires reading the input beyond the end of thelexeme.Buffer Pairs:Concerns with efficiency issuesUsed with a look ahead on the inputIt is a specialized buffering technique used to reduce the overhead required to process aninput character. Buffer is divided into two N-character halves. Use two pointers. Used attimes when the lexical analyzer needs to look ahead several characters beyond the lexeme fora pattern before a match is announced. One pointer called forward pointer, points to firstcharacter of the next lexeme found. The string of characters between two forms the lexeme.Increment procedure for forward pointer:If forward at end of first half thenreload second halfforward+=1else if forward at end of second halfreload the first halfmove forward to beginning of first halfelseforward+=1Sentinels:It is the special character which cannot be a part of source program. It is usedto reduce the two tests into one. e.g. eofIncrement procedure for forward pointer using sentinels:forward+=1if forward ↑=eof thenIf forward at end of first half thenreload second halfforward+=1else if forward at end of second halfreload the first halfmove forward to beginning of first half 117
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERINGelseterminate lexical analysis4. Explain with example Minimization of DFA. Minimization Algorithm for DFA Construct a partition = { A, Q - A } of the set of states Q ; new := new_partition( } ;while ( new ):= new ;new := new_partition( )final := ;function new_partition( )for each set S of do partition S into subsets such that two states p and q of S are in the same subset of S if and only if for each input symbol, p and q make a transition to (states of) the sameset of . The subsets thus formed are sets of the output partition in place of S. If S is not partitioned in this process, S remains in the output partition.endMinimum DFA M1 is constructed from final as follows:Select one state in each set of the partition final as the representative for the set. Theserepresentatives are states of minimum DFA M1. Let p and q be representatives i.e. states of minimum DFA M1. Let us also denote by p and q the sets of states of the original DFA M represented by p and q, respectively. Let s be a state in p and t a state in q. If a transition from s to t on symbol a exists in M, then the minimum DFA M1 has a transition from p to q on symbol a. The start state of M1 is the representative which contains the start state of M. The accepting states of M1 are representatives that are in A. Note that the sets of final are either a subset of A or disjoint from A. Remove from M1 the dead states and the states not reachable from the start state, if there are any. Any transitions to a dead state become undefined. A state is a dead state if it is not an accepting state and has no out-going transitions except. Example 1: Let us try to minimize the number of states of the following DFA. 118
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING Initially = { { 1 , 5 } , { 2 , 3 , 4 } }. new_partition is applied to . Since on b state 2 goes to state 1, state 3 goes to state 4 and 1 and 4 are in different sets in , states 2 and 3 are going to be separated from each other in new . Also since on a sate 4 goes to sate 4, state 3 goes to state 5 and 4 and 5 are in different sets in , states 3 and 4 are going to be separated from each other in new. Further, since on b 2 goes to 1, 4 goes to 4 and 1 and 4 are in different sets in , 2 and 4 are separated from each other in new. On the other hand 1 and 5 make the same transitions. So they are not going to be split. Thus the new partition is { { 1 , 5 } , { 2 } , { 3 } , { 4 ] }. This becomes the in the second iteration. When new_partition is applied to this new , since 1 and 5 do the same transitions, remains unchanged. Thus final = { { 1 , 5 } , { 2 } , { 3 } , { 4 ] }. Select 1 as the representative for { 1 , 5 }. Since the rest are singletons, they have the obvious representatives. Note here that state 4 is a dead state because the only transition out of it is to itself. Thus the set of states for the minimized DFA is { 1 , 2 , 3 }. For the transitions, since 1 goes to 3 on a, and to 2 on b in the original DFA, in the minimized DFA transitions are added from 1 to 3 on a, and 1 to 2 on b. Also since 2 goes to 1 on b, and 3 goes to 1 on a in the original DFA, in the minimized DFA transitions are added from 2 to 1 on b, and from 3 to 1 on a. Since the rest of the states are singletons, all transitions between them are inherited for the minimized DFA. 119
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERINGThus the minimized DFA is as given in the following figure:5. Construct a DFA directly from the regular expression (a|b)*abb without constructing NFA.Syntax tree for (a|b)*abb#:Calculation of firstpos, lastpos and nullable for nodes in syntax tree:Calculation of followpos: 120
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERINGNow, the start state of DFA is firstpos of the rootSo, A= {1, 2, 3}Consider the input symbol .a.:Position 1 and 3 are for .a. in ASo, let B = followpos(1) U followpos(3) = {1, 2, 3} U {4} = {1, 2, 3, 4}DTrans[A, a] = BConsider the input symbol .b.:Position 2 is for .b. in ASo, let B = followpos(2)= {1, 2, 3} = ADTrans[A, b] = ANow continue with B,Consider the input symbol .a.:Position 1 and 3 are for .a. in ASo, followpos(1) U followpos(3) = {1, 2, 3} U {4} = {1, 2, 3, 4} = BDTrans[B, a] = BConsider the input symbol .b.:Position 2 and 4 are for .b. in BSo, followpos(2) U followpos(4) = {1, 2, 3, 4, 5} = CDTrans[B, b] = CNow continue with C,Consider the input symbol .a.:Position 1 and 3 are for .a. in ASo, followpos(1) U followpos(3) = {1, 2, 3} U {4} = {1, 2, 3, 4} = BDTrans[C, a] = BConsider the input symbol .b.:Position 2 and 5 are for .b. in CSo, followpos(2) U followpos(5) = {1, 2, 3, 6} = DDTrans[C, b] = DNow continue with D,Consider the input symbol .a.:Position 1 and 3 are for .a. in DSo, followpos(1) U followpos(3) = {1, 2, 3} U {4} = {1, 2, 3, 4} = BDTrans[D, a] = BConsider the input symbol .b.:Position 2 is for .b. in D 121
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERINGSo, followpos(2) = {1, 2, 3} = ADTrans[D, b] = AThe position associated with the end marker #, 6 is in D. So, D is the final state.DFA UNIT III SYNTAX ANALYSIS Need and Role of the Parser-Context Free Grammars -Top Down Parsing -General Strategies-Recursive Descent Parser Predictive Parser-LL(1) Parser-Shift ReduceParser-LR Parser-LR (0)Item-Construction of SLR Parsing Table -Introduction to LALRParser - Error Handling and Recovery in Syntax Analyzer-YACC-Design of a syntaxAnalyzer for a Sample Language . KEYPOINTSNEED AND ROLE OF THE PARSER: Create an intermediate code Interaction with symbol table Generally 3 types of parsers universal, top down and bottom up. Most efficient method. Grammar Representation Syntax error handling Error recovery strategies Panic mode Recovery Phrase level recovery Error productions Global correctionCONTEXT FREE GRAMMARS: Definitions Terminals Nonterminals Start symbol Production of grammar Notational Conventions Derivations Parse tree and derivations 122
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING Ambiguity PREDICTIVE VerificationTOP DOWN PARSING: Preorder construction of tree FIRST and FOLLOWGENERAL STRATEGIES: RECURSIVE DESCENT PARSERPARSER: LL(1)PARSER: Back tracking Nonrecursive predictive parsing Error Recovery Panic mode Phrase levelSHIFT REDUCE PARSER: Bottom up parsing Reductions Handle Pruning Four possible actions Shift Reduce Accept errorLR PARSER: Definition: construct a parsing table using one of the methods. LR is attractive because Visually all programming language construct the tree General nonbacktracking shift reduce parsing method. Detect a syntactic error Proper subset.LR (0)ITEM: Closure of item set Function GOTO Using automataCONSTRUCTION OF SLR PARSING TABLE: Discuss in class with exampleINTRODUCTION TO LALR PARSER: Lookahead LR techniqueERROR HANDLING AND RECOVERY IN SYNTAX ANALYZER: Error Recovery Panic mode Phrase level 123
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERINGYACC-DESIGN OF A SYNTAX ANALYZER FOR A SAMPLE LANGUAGE: Generator Yacc Same like a C language, declaration, error correction. PART-A1. Define terminal and non terminal symbol.Terminals: A terminal is a symbol which does not appear on the left-hand side of anyproduction. A grammar contains a set of terminal symbols (tokens) such as the plus sign, +,the times sign, *, and other tokens defined by the lexical analyzer such as IdentifiersNon terminals: Non terminals are the non-leaf nodes in a parse tree. In the Expressiongrammar, E, T, and F are non terminals. Sometimes non terminals are enclosed betweenangle brackets to distinguish them from terminals.2. Define derivation and reduction.Derivation: A grammar derives strings by beginning with the start symbol and repeatedlyreplacing a non terminal by the body of a production for that non terminal. The terminalstring derived from the start symbol from the language defined by the grammar.Reduction: The bottom up parsing as the process of reducing a string ω to the start symbolof the grammar. At each reduction step a specific substring matching the body of aproduction is replaced by the non terminal at the head of that production.3. Define context free grammar.Context free grammar has four components: A set of terminal symbols A set of non terminal symbols A set of productions A designation of one of the non terminal as the start symbol.4. What is the output of syntax analysis phase? What are the three general types ofparsers for grammars? Parser (or) parse tree is the output of syntax analysis phase.General types of parsers: 1) Universal parsing 2) Top-down 3) Bottom-up5. What are the different strategies that a parser can employ to recover from a syntactic error? Panic mode Phrase level Error productions Global correction6. What is phrase level error recovery? On discovering an error, a parser may perform local correction on the remaining input; that is, it may replace a prefix of the remaining input by some string that allows the parser to continue. This is known as phrase level error recovery. 124
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING7. How will you define a context free grammar?A context free grammar consists of terminals, non-terminals, a start symbol, andproductions.i. Terminals are the basic symbols from which strings are formed. “Token” is a synonymfor terminal. Ex: if, then, else.ii. Non terminals are syntactic variables that denote sets of strings, which help define thelanguage generated by the grammar. Ex: stmt, expr.iii. Start symbol is one of the non terminals in a grammar and the set of strings it denotesis the language defined by the grammar. Ex: S.iv. The productions of a grammar specify the manner in which the terminals and nonterminals can be combined to form strings Ex: expr id8. Define context free language. When will you say that two CFGs are equal? A language that can be generated by a grammar is said to be a context freelanguage. If two grammars generate the same language, the grammars are said to beequivalent.9. Differentiate sentence and sentential form. If S Sentence Sentential form w then the string w is called If S α then α is a sententialsentence of G form of G. Sentence is a string of terminals. Sentential form may contain nonSentence is a sentential form with no terminals.nonterminals10. What is a parse tree?A parse tree may be viewed as a graphical representation for a derivation that filtersout the choice regarding replacement order. Each interior node of a parse tree is labeledby some nonterminal A and that the children of the node are labeled from left to right bysymbols in the right side of the production by which this A was replaced in thederivation. The leaves of the parse tree are terminal symbols.11. What is an ambiguous grammar? Give an example. A grammar that produces more than one parse tree for some sentence is said to beambiguous An ambiguous grammar is one that produces more than one leftmost or rightmostderivation for the same sentence.Ex: E E+E / E*E / id12. What is Top down parsing?Starting with the root, labeled, does the top-down construction of a parse tree with thestarting nonterminal, repeatedly performing the following steps.i. At node n, labeled with non terminal “A”, select one of the productions for “A” andconstruct children at n for the symbols on the right side of the production.ii. Find the next node at which a sub tree is to be constructed. 125
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING13. What do you mean by Recursive Descent Parsing? Recursive Descent Parsing is top down method of syntax analysis in which weexecute a set of recursive procedures to process the input. A procedure is associated witheach non terminal of a grammar.14. What is meant by Predictive parsing? Nov/Dec 2007 A special form of Recursive Descent parsing, in which the look-ahead symbolunambiguously determines the procedure selected for each non terminal, where nobacktracking is required.15. Define Bottom Up Parsing. Parsing method in which construction starts at the leaves and proceeds towards theroot is called as Bottom up Parsing.16. What are Shift-Reduce parsing? A general style of bottom-up syntax analysis, which attempts to construct a parse treefor an input string beginning at the leaves and working up towards the root.17. Define handle. What do you mean by handle pruning? Nov/Dec 2004, April/May 2005 A Handle of a string is a sub string that matches the right side of production and whose reduction to the non terminal on the left side of the production represents one step along the reverse of a rightmost derivation. The process of obtaining rightmost derivation in reverse is known as Handle Pruning.18. Define LR (0) items. An LR (0) item of a grammar G is a production of G with a dot at some positionof the right side. Thus the production A → XYZ yields the following four items, A → .XYZ A → X.YZ A → XY.Z A → XYZ.19. What are the disadvantages of operator precedence parsing? May/June 2007 It is hard to handle tokens like the minus sign, which has two different precedences Since the relationship between a grammar for the language being parsed and the operator – precedence parser itself is tenuous, one cannot always be sure the parser accepts exactly the desired language. Only a small class of grammars can be parsed using operator precedence techniques.20. State error recovery in operator-Precedence Parsing. There are two points in the parsing process at which an operator-precedence parsercan discover the syntactic errors: If no precedence relation holds between the terminal on top of the stack and the current input. 126
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING If a handle has been found, but there is no production with this handle as a right side. 21. LR (k) parsing stands for what? The “L” is for left-to-right scanning of the input, the “R” for constructing a rightmost derivation in reverse, and the k for the number of input symbols of lookahead that are used in making parsing decisions. 22. Why LR parsing is attractive one? LR parsers can be constructed to recognize virtually all programming language constructs for which context free grammars can be written. The LR parsing method is the, most general nonbacktracking shift-reduce parsing method known, yet it can be implemented as efficiently as other shift reduce methods. The class of grammars that can be parsed using LR methods is a proper superset of the class of grammars that can be parsed with predictive parsers. An LR parser can detect a syntactic error as soon as it is possible to do so on a left-to-right scan of the input. 23. Why SLR and LALR are more economical to construct than canonical LR? For a comparison of parser size, the SLR and LALR tables for a grammar always have the same number of states, and this number is typically several hundred states for a language like Pascal. The canonical LR table would typically have several thousand states for the same size language. Thus, it is much easier and more economical to construct SLR and LALR tables than the canonical LR tables. 24. What is ambiguous grammer? Give an example. Nov/Dec 2005, Nov/Dec 2007 A grammer G is said to be ambiguous if it generates more than one parse trees for sentence of language L(G). Example: E-> E+E|E*E|id 25. What are the goals of error handler in a parser? The error handler in a parser has simple-to-state goals: It should report the presence of errors clearly and accurately. It should recover from each error quickly enough to be able to detect subsequent errors. It should not significantly slow down the processing of correct programs. PART-B1. Find the SLR parsing table for the given grammar and parse the sentence (a+b)*c. E->E+E | E*E | (E) | id.Given grammar:Given grammar: I5: goto(I1, *)1. E->E+E E->E*.E2. E->E*E E->.E+E3. E->(E) E->.E*E4. E->id E->.(E)Augmented grammar: E->.id 127
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING E.->E I6: goto(I2, E) E->E+E E->(E.) E->E*E E->E.+E E->(E) E->E.*E E->id I7: goto(I4, E) I0: E.->.E E->E+E. E->.E+E E->E.+E E->.E*E E->E.*E E->.(E) I8: goto(I5, E) E->.id E->E*E. I1: goto(I0, E) E->E.+E E.->E. E->E.*E E->E.+E goto(I2, ()=I2 E->E.*E goto(I2, id)=I3 I2: goto(I0, () goto(I4, ()=I2 E->(.E) goto(I4, id)=I3 E->.E+E goto(I5, ()=I2 E->.E*E goto(I5, id)=I3 E->.(E) I9: goto(I6, )) E->.id E->(E). I3: goto(I0, id) goto(I6, +)=I4 E->id. goto(I6, *)=I5 I4: goto(I1, +) goto(I7, +)=I4 E->E+.E goto(I7, *)=I5 E->.E+E goto(I8, +)=I4 E->.E*E goto(I8, *)=I5 E->.(E) E->.id First(E) = {(, id} Follow(E)={+, *, ), $} 128
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING2. Find the predictive parser for the given grammar and parse the sentence (a+b)*c. EE+E | E*E | (E) | id. (Same like previous question) Elimination of left recursion Calculation of First Calculation of Follow Predictive parsing table Parsing the sentence UNIT IV SYNTAX DIRECTED TRANSLATION & RUN TIME ENVIRONMENT Syntax directed Definitions-Construction of Syntax Tree-Bottom-up Evaluation of S- Attribute Definitions- Design of predictive translator - Type Systems-Specification of a simple type checker-Equivalence of Type Expressions-Type Conversions. RUN-TIME ENVIRONMENT: Source Language Issues-Storage Organization-Storage Allocation-Parameter Passing-Symbol Tables-Dynamic Storage Allocation-Storage Allocation in FORTAN. KEYPOINTS SYNTAX DIRECTED DEFINITIONS: SDD is a context free grammar together with attributes and rules. Attributes are associated with grammar symbol Rules are associated with productions. Two kinds of attributes Synthesized attribute. 129
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING Inherited attribute. Annotated parse tree. Evaluation orders of SDD Dependency graphsCONSTRUCTION OF SYNTAX TREE: Construct the children node represent the meaningful components. Constructor function Leaf (op, val), pointer to a new record for a leaf. A constructor function Node (op,c1,c2,….,ck) creates an object.BOTTOM-UP EVALUATION OF S: ATTRIBUTE DEFINITIONS: Using post order traversal the bottom up evaluation of S attribute could be done. S_Attributed L_AttributedDESIGN OF PREDICTIVE TRANSLATOR: Discuss in classTYPE SYSTEMS: Type checking Type expression Type equivalence Declaration Storage & Access Translation Applications Operations Incremental Translation Addressing using arraySPECIFICATION OF A SIMPLE TYPE CHECKER: Type checking is to find errors. It has two forms. Synthesis InferenceEQUIVALENCE OF TYPE EXPRESSIONS: Type expression are represented by graphs in two types Structurally equivalent Name equivalentTYPE CONVERSIONS: Conversion from one type to another type. Two type Implicit ExplicitRUN-TIME ENVIRONMENT:SOURCE LANGUAGE ISSUES: STORAGE ORGANIZATION: Logical address & Structure of storage 130
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING STORAGE ALLOCATION: DYNAMIC STORAGE ALLOCATION: PARAMETER PASSING: SYMBOL TABLES: Static and dynamic Stack storage Heap storage STORAGE ALLOCATION IN FORTAN: Case study discuss in class. PART-A1. What is the usage of syntax directed translation? Syntax trees for assignment statement are produced by syntax directed definition.2. What are the advantages and disadvantages of register allocation and assignments? Advantages: i. It simplifies the design of a compiler Disadvantages: i. It is applied too strictly. ii. It uses registers in efficiently. Certain registers may go unused over substantial portions of the code, while unnecessary load and stores are generated.3. Discuss back-end and front end? Back-end i. Intermediate to binary translation is usually done by a separate compilation pass called back end. Front end i. There are several back ends for different target machines, all of which use the same parser and code generator called front end.4. Define relocatable object module. The unpatched binary image is usually called as relocatable object module.5. What is meant by multiregister operations? We can modify our labeling algorithm to handle operations like multiplication, division, or function calls which normally requires more than one register to perform. Hence this operation is called multiregister operations.6. What is meant by peephole optimization? Peephole optimization is a technique used in many compliers, in connection with the optimization of either intermediate or object code. It is really an attempt to overcome the difficulties encountered in syntax directed generation of code.7. How the use of registers is subdivided into 2 sub-problems? During register allocation we select the set of variables that will reside in registers at a point in the program. During a subsequent register assignment phase, we pick the specific register that a variable will reside in. 131
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING8. What are the two standard storage allocation strategies? The two standard allocation strategies are 1. Static allocation. 2. Stack allocation9. Discuss about static allocation. In static allocation the position of an activation record in memory is fixed at run time.10. Give the structure of general activation record. Returned value Actual parameters Optional control link Optional access link Saved machine status Local Data Temporaries11. What are the 2 approaches to implement dynamic scope? Deep access Shallow access12. What is padding? Space left unused due to alignment consideration is referred to as padding.13. What are the 3 areas used by storage allocation strategies? Static allocation Stack allocation Heap allocation14. What are the limitations of using static allocation? The size of a data object and constraints on its position in memory must be known at compile time. Recursive procedures are restricted, because all activations of a procedure use the same bindings for local name Data structures cannot be created dynamically since there is no mechanism for storage allocation at run time15. Define calling sequence and return sequence. A call sequence allocates an activation record and enters information into its fields A return sequence restores the state of the machine so that calling procedure can continue execution.16. When dangling reference occurs? A dangling reference occurs when there is storage that has been deallocated. It is logical error to use dangling references, since the value of deallocated storage is undefined according to the semantics of most languages. 132
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING17. Define static scope rule and dynamic rule Lexical or static scope rule determines the declaration that applies to a name by a examining the program text alone. Dynamic scope rule determines the declaration applicable to name at runtime, by considering the current activations.18. Write short notes on global data flow analysis. Collecting information about the way data is used in a program. Takes control flow into account Forward flow vs. backward flow Forward: Compute OUT for given IN, GEN, KILL Information propagates from the predecessors of a vertex. Examples: Reachability, available expressions, constant propagation Backward: Compute IN for given OUT, GEN, KILL Information propagates from the successors of a vertex. Example: Live variable Analysis19. Give syntax directed translation for the following statement Call p1(int a, int b). param a param b call p1 PART-B 1. Generate intermediate code for the following code segment along with the required syntax directed translation scheme: (8) i) if(a>b) x=a+b else x=a-b where a and x are of real and b of int type data. Syntax directed translation scheme for if E then S1 else S2: E.true:= newlabel; E.false:=newlabel; S1.next:=S.next; S2.next:=S.next; S.code:=E.code || gen(E.true .:.) || S1.code || gen(.goto. S.next) || gen(E.false .:.) || S2.code Intermediate code generated: if a>b got L1 goto L2 L1: t1:=inttoreal(b) x:=a+t1 goto L3 L2: t2:=inttoreal(b) x:=a-t2 133
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERINGL3:ii) int a,b; (8)float c;a=10;switch(a){case 10: c=1;case 20: c=2;}Switch/Case statement in source language form:switch(E){Case V1: S1..Case Vn-1:Sn-1default: Sn}Translation of source switch/case statement into intermediate language 3 A C:Translation 1:code to evaluate E into temporary tgoto testL1: code for S1goto nextL2: code for S2Goto next. Ln-1: code for Sn-1go to nextLn: code for Sngoto nexttest: if t=V1 goto L1if t=V2 goto L2.if t=Vn-1 goto Ln-1goto Lnnext:Translation 2:code to evaluate E into tif t≠V1 goto L1code for S1goto nextL1: if t≠V2 goto L2 134
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERINGcode for S2goto nextL2: .Ln-2: if t≠Vn-1 goto Ln-1code for Sn-1goto nextLn-1: code for Snnext:Intermediate code generated:int a,b;float c;SYMTAB have the following information for the above declarations:Let offset=0Name Type Offset Widtha integer 0 4b integer 4 4c float 8 83AC:a:=10if a≠10 goto L1c:=1goto nextL1: if a≠20 goto nextc:=2next:2. Generate intermediate code for the following code segment along with the requiredsyntax directed translation scheme: (8)i=1; s=0;while(i<=10)s=s+a[i][i]i=i+1Semantic rules for while E do S1:S.begin:=newlabelE.true:= newlabel;E.false:=S.next;S1.next:=S.begin;S.code:=gen(S.begin.:.) || E.code || gen(E.true .:.) || S1.code || gen(.goto. S.begin)Intermediate code generated:(1) i:=1(2) s:=0(3) if i ≠ 10 goto (5) 135
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING(4) goto (15)(5) t1=i*10(6) t1=t1+i(7) t1=t1+4(8) t2=addr(a)-44 //(1*10+1)*4(9) t3=t2[t1] //a[i][j](10) t4=s+t3(11) s=t4(12) t5=i+1(13) i=t5(14) goto (3)(15) …3. Write short notes on back-patching. Back patching is the activity of filling up unspecified information of labels using appropriate semantic actions in during the code generation process. In the semantic actions the functions used are mklist(i) . create a new list having i, an index into array of quadruples. merge(p1,p2) - merges two lists pointed by p1 and p2 back patch(p,j) inserts the target label j for each list pointed by p. Example: (4) Source: if a or b then if c then x= y+1 Translation: if a go to L1 if b go to L1 go to L3 L1: if c goto L2 goto L3 L2: x= y+1 L3: After Backpatching: 100: if a goto 103 101: if b goto 103 102: goto 106 103: if c goto 105 104: goto 106 105: x=y+1 106: 136
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING UNIT V CODE OPTIMIZATION AND CODE GENERATION Principal Sources of Optimization-DAG- Optimization of Basic Blocks-GlobalData Flow Analysis-Efficient Data Flow Algorithms-Issues in Design of a CodeGenerator - A Simple Code Generator Algorithm. KEYPOINTSPRINCIPAL SOURCES OF OPTIMIZATION: Redundancy Causes Semantics preserving Common subexpressions Copy propagation Dead code elimination Code motionDAG: OPTIMIZATION OF BASIC BLOCKS: Representation of DAG Common sub expression Dead code elimination Algebraic identities Array, pointer referenceGLOBAL DATA FLOW ANALYSIS: Data flow abstraction DF analysis schema Transfer function Control flow constraintsEFFICIENT DATA FLOW ALGORITHMS: Live variable analysis Available expressionsISSUES IN DESIGN OF A CODE GENERATOR: Input to the code generator Target program Instruction selection Register allocation Evaluation orderA SIMPLE CODE GENERATOR ALGORITHM: Uses 4 principals Register and address descriptors Code generation algorithm Design and function 137
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING PART-A1. What are basic blocks? A sequence of consecutive statements which may be entered only at the beginning and when entered are executed in sequence without halt or possibility of branch, are called basic blocks.2. What is a flow graph? The basic block and their successor relationships shown by a directed graph is called a flow graph. The nodes of a flow graph are the basic blocks.3. Mention the applications of DAGs. We can automatically detect common sub expressions. We can determine the statements that compute the values, which could be used outside the block. We can determine which identifiers have their values used in the block.4. What is input to code generator? The input to code generator consists of the intermediate representation of the source program produced by the front end together with information in the symbol table that is used to determine the run time addresses of the data objects denoted by the names in the intermediate representation.5. What are the primary structure preserving transformations on basic blocks? Common sub-expression elimination Dead-code elimination Renaming of temporary variable Interchange of 2 independent adjacent statements.6. What are the characterizations of peephole optimization? Redundant –instruction elimination Flow-of control optimizations Algebraic simplifications Use of machine idioms7. Define DAG. Nov/Dec 2007 A DAG for a basic block is a directed acyclic graph with the following labels on nodes: i) Leaves are labeled by unique identifiers, either variable names or constants. ii) Interior nodes are labeled by an operator symbol. iii) Nodes are also optionally given a sequence of identifiers for labels.8. What are the issues in the design of code generators? Nov/Dec 2007 i) Input to the code generator ii) Target programs iii) Memory management iv) Instruction selection v) Register allocation vi) Choice of evaluation order 138
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING vii) Approaches to code generation14. Explain the principle sources of optimization.Code optimization techniques are generally applied after syntax analysis, usually bothbefore and during code generation. The techniques consist of detecting patterns in theprogram and replacing these patterns by equivalent and more efficient constructs.15. What are the patterns used for code optimization?The patterns may be local or global and replacement strategy may be a machinedependent or independent16. What are the 3 areas of code optimization? Local optimization Global optimization Data flow analysis17. Give the block diagram of organization of code optimizer.18. What are the rules to determine the leaders of basic blocks? The first statement is a leader Any statement that is the target of a conditional or unconditional goto is a leader Any statement that immediately follows a goto or conditional goto statement is a leader.19. What is meant by Common Subexpressions? An occurrence of an expression E is called a common sub expression, if E was previously computed, and the values of variables in E have not changed since the previous computation.20. What is meant by Dead Code? A variable is live at a point in a program if its value can be used subsequently otherwise, it is dead at that point. The statement that computes values that never get used is known Dead code or useless code.21. What are the techniques used for loop optimization? Code motion Induction variable elimination Reduction in strength22. What are the advantages of the organization of code optimizer? The operations needed to implement high level constructs are made explicit in the intermediate code, so it is possible to optimize them. 139
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING The intermediate code can be independent of the target machine, so the optimizer does not have to change much if the code generator is replaced by one for a different machine. 23. Define data flow equations. A typical equation has the form Out[S] = gen[S] U (In[S] – kill[S]) and can be read as, “ the information at the end of a statement is either generated within the statement, or enters at the beginning and is not killed as control flows through the statement”. Such equations are called data flow equations. 24. How will you find the leaders in basic block? Leaders: The first statement of basic blocks. The first statement is a leader Any statement that is the target of a conditional or unconditional goto is a leader Any statement that immediately follows a goto or conditional goto statement is a leader 25. Define code motion. It decreases the amount of code in a loop. Taking the expression which yield the same result independent of the number of times a loop is executed (a loop-invariant computation and places it before the loop. 26. Define basic block and flow graph. A basic block is a sequence of consecutive statements in which flow of control enters at the beginning and leaves at the end without halt or possibility of branching except at the end. A flow graph is a directed graph in which the flow control information is added to the basic blocks. The nodes in the flow graph are basic blocks The block whose leader is the first statement is called initial block. There is a directed edge from block B1 to block B2 if B2 immediately follows B1 in the some execution sequence. We can say that B1 is a predecessor of B2 and B2 is a successor of B1. PART-B1. Explain the various issues in the design of code generation. Input to the code generator Intermediate representation of the source program, like linear representations such as postfix notation, three address representations such as quadruples, virtual machine representations such as stack machine code and graphical representations such as syntax trees and dags. Target programs it is the output such as absolute machine language, reloadable machine language or assembly language. Memory management Mapping of names in the source program to addresses of data object in runtime memory is done by front end and the code generator. 140
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING Instruction selection Nature of the instruction set of the target machine determines the difficulty of instruction selection. Register allocation Instructions involving registers are shorter and faster. The use of registers is being divided into two sub problems: During register allocation, we select the set of variables that will reside in registers at a point in the program During a subsequent register assignment phase, we pick the specific register that a variable will reside in Choice of evaluation order the order in which computations are performed affect the efficiency of target code. Approaches to code generation2. Explain code generation phase with simple code generation algorithm. It generates target code for a sequence of three address statements. Assumptions: For each operator in three address statement, there is a corresponding target language operator. Computed results can be left in registers as long as possible. E.g. a=b+c: Add Rj,Ri where Ri has b and Rj has c and result in Ri. Cost=1; Add c, Ri where Ri has b and result in Ri. Cost=2; Mov c, Rj; Add Rj, Ri; Cost=3; Register descriptor: Keeps track of what is currently in each register Address descriptor: Keeps tracks of the location where the current value of the name can be found at run time. Code generation algorithm: For x= y op z Invoke the function getreg to determine the location L, where the result of y op z should be stored (register or memory location) Check the address descriptor for y to determine y. Generate the instruction op z., L where z. is the current location of z If the current values of y and/or z have no next uses, alter register descriptor Getreg: If y is in a register that holds the values of no other names and y is not live, return register of y for L If failed, return empty register If failed, if X has next use, find an occupied register and empty it If X is not used in the block, or suitable register is found, select memory location of x as L 141
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING 3.Generate DAG representation of the following code and list out the applications of DAG representation: i=1; s=0; while(i<=10) s=s+a[i][i] i=i+1 Intermediate code generated: (1) i:=1 (2) s:=0 (3) if i ≠ 10 goto (5) (4) goto (15) (5) t1=i*10 (6) t1=t1+i (7) t1=t1+4 (8) t2=addr(a)-44 //(1*10+1)*4 (9) t3=t2[t1] //a[i][j] (10) t4=s+t3 (11) s=t4 (12) t5=i+1 (13) i=t5 (14) goto (3) (15) . DAG generation: Applications of DAG: Determining the common sub-expressions. Determining which identifiers have their values used in the block Determining which statements compute values that could be used outside the block Simplifying the list of quadruples by eliminating the common sub-expressions and not performing the assignment of the form x: = y unless and until it is a must.4. Write short notes on next-use information with suitable example. If the name in a register is no longer needed, then the register can be assigned to some other name. This idea of keeping a name in storage only if it will be used subsequently can be applied in a number of contexts. Computing next uses: The use of a name in a three-address statement is defined as follows: Suppose a three- address statement i assigns a value to x. If statement j has x as an operand and control can flow from statement i to j along a path that has no intervening assignments to x, then we say statement j uses the value of x computed at i. Example: x:=i j:=x op y // j uses the value of x 142
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERINGAlgorithm to determine next use:The algorithm to determine next uses makes a backward pass over each basic block,recording for each name x whether x has a next use in the block and if not, whether it islive on exit from the block (using data flow analysis). Suppose we reach three-addressstatement i: x: =y op z in our backward scan. Then do the following: Attach to statement i, the information currently found in the symbol table regarding the next use and the liveness of x, y, and z. In the symbol table, set x to .not live. And .no next use. In the symbol table, set y and z to .live. And the next uses of y and z to i.5. Explain - principle sources of optimization.Code optimization is needed to make the code run faster or take less space or both.Function preserving transformations: Common sub expression elimination Copy propagation Dead-code elimination Constant foldingCommon sub expression elimination:E is called as a common sub expression if E was previously computed and the values ofvariables in E have not changed since the previous computation.Copy propagation:Assignments of the form f:=g is called copy statements or copies in short. The idea hereis use g for f wherever possible after the copy statement.Dead code elimination:A variable is live at a point in the program if its value can be used subsequently.Otherwise dead. Deducing at compile time that the value of an expression is a constantand using the constant instead is called constant folding.Loop optimization: Code motion: Moving code outside the loop Takes an expression that yields the same result independent of the number of times a loop is executed (a loop- invariant computation) and place the expression before the loop. Induction variable elimination Reduction in strength: Replacing an expensive operation by a cheaper one.6. Write short notes on:(1) Storage organizationSubdivision of run time memory:Run time storage: The block of memory obtained by compiler from OS to execute thecompiled program. It is subdivided into Generated target code Data objects Stack to keep track of the activations Heap to store all other information 143
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERINGActivation record: (Frame)It is used to store the information required by a single procedure call. Returned value Actual parameters Optional control link Optional access link Saved machine status Local data temporaries Temporaries are used to hold values that arise in the evaluation of expressions. Localdata is the data that is local to the execution of procedure. Saved machine statusrepresents status of machine just before the procedure is called. Control link (dynamiclink) points to the activation record of the calling procedure. Access link refers to thenon-local data in other activation records. Actual parameters are the one which is passedto the called procedure. Returned value field is used by the called procedure to return avalue to the calling procedureCompile time layout of local data:The amount of storage needed for a name is determined by its type. The field for the localdata is laid out as the declarations in a procedure are examined at compile time. Thestorage layout for data objects is strongly influenced by the addressing constraints on thetarget machine.Parameter passing.Call by value A formal parameter is treated just like a local name. Its storage is in the activation record of the called procedure The caller evaluates the actual parameter and place the r-value in the storage for the formalsCall by reference If an actual parameter is a name or expression having L-value, then that lvalue itself is passed However, if it is not (e.g. a+b or 2) that has no l-value, then expression is evaluated in the new location and its address is passed.Copy-Restore: Hybrid between call-by-value and call-by-ref (copy in, copy out) Actual parameters evaluated, its r-value is passed and l-value of the actual are determined When the called procedure is done, r-value of the formals are copied back to the l- value of the actualsCall by name Inline expansion (procedures are treated like a macro) 144
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING Previous Question Paper (2008 Regulation Principles of Compiler Design) PART-A (10*2=20)1. State any two reasons as to why phases of compiler should be grouped.2. Why is buffering used in lexical analysis? What are the commonly used bufferingMethods?3. Define Lexeme.4. Compare the features of DFA and NFA5. What is the significance of intermediate code?6. Write the various three address code form of intermediate code.7. Define symbol table.8. Name the techniques in loop optimation.9. What do you mean by Cross Compiler?10. How would you represent the dummy blocks with no statements indicated in global dataflow analysis? PART –B (5*16=80)11. A) 1) Define the following terms: compiler, interpreter, Translator and differentiatebetween them. (6)2) Differentiate between lexeme, token and pattern. (6)3) What are the issues in lexical analysis? (4) Orb) Explaining detail the process of compilation .Illustrate the output of each phase ofCompilation of the input “a= (b+c)*(b+c)*2”. (16)12. A) Consider the following grammar S->AS|b A->SA|a.Construct the SLR parse table for the grammar .Show the actions of the parser for the inputstring “abab”. (16) Orb) 1) What is an ambiguous grammar? Is the following grammar ambiguous? ProveE->E+|E(E)|id. The grammar should be moved to the next line ,centered. (10)2) Draw NFA for the regular expression ab*/ab. (6)13. a) How would you convert the following into intermediate code? Give a suitableexample. (16)1) Assignment Statements.2) Case Statements Orb) 1) Write notes on backpatching. (16) 145
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING2) Explain the sequence of stack allocation process for a function call.14. a) Discuss the various issues in code generation with examples . (16) Orb) Define a directed acyclic graph. Construct a DAG and write the sequence of instructionsfor the expression a+a*(b-c)+(b-c)*d. (16)15. a) Discuss in detail the process of optimization of basic blocks. Give an example (16)Orb). what is data flow analysis? Explain data flow abstraction with examples. (16) 146
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING CS6659 - ARTIFICIAL INTELLIGENCECS6659 ARTIFICIAL INTELLIGENCE LTPC3003UNIT I INTRODUCTION TO Al AND PRODUCTION SYSTEMS 9Introduction to AI-Problem formulation, Problem Definition -Production systems,Control strategies, Search strategies. Problem characteristics, Production systemcharacteristics -Specialized production system- Problem solving methods - Problemgraphs, Matching, Indexing and Heuristic functions -Hill Climbing-Depth first andBreath first, Constraints satisfaction - Related algorithms, Measure of performance andanalysis of search algorithms.UNIT II REPRESENTATION OF KNOWLEDGE 9Game playing - Knowledge representation, Knowledge representation using Predicatelogic, Introduction to predicate calculus, Resolution, Use of predicate calculus,Knowledge representation using other logic-Structured representation of knowledge.UNIT III KNOWLEDGE INFERENCE 9Knowledge representation -Production based system, Frame based system. Inference -Backward chaining, Forward chaining, Rule value approach, Fuzzy reasoning - Certaintyfactors, Bayesian Theory-Bayesian Network-Dempster - Shafer theory.UNIT IV PLANNING AND MACHINE LEARNING 9Basic plan generation systems - Strips -Advanced plan generation systems – K stripsStrategic explanations -Why, Why not and how explanations. Learning- Machinelearning, adaptive Learning.UNIT V EXPERT SYSTEMS 9Expert systems - Architecture of expert systems, Roles of expert systems - KnowledgeAcquisition – Meta knowledge, Heuristics. Typical expert systems - MYCIN,DART,XOON, Expert systems shells.TOTAL: 45 PERIODSTEXT BOOKS:1. Kevin Night and Elaine Rich, Nair B., ―Artificial Intelligence (SIE)‖, Mc Graw Hill-2008. (Units-I,II,VI & V)2. Dan W. Patterson, ―Introduction to AI and ES‖, Pearson Education, 2007. (Unit-III).REFERENCES:1. Peter Jackson, ―Introduction to Expert Systems‖, 3 rd Edition, Pearson Education,2007.2. Stuart Russel and Peter Norvig ―AI – A Modern Approach‖, 2 nd Edition, PearsonEducation 2007. 3. Deepak Khemani ―Artificial Intelligence‖, Tata Mc Graw HillEducation 2013. 4. 147
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING CS6659- ARTIFICIAL INTELLIGENCE Index SheetUNIT TOPIC PAGE NUMBER No. Notes PART-A PART-B I Introduction to AI &Production Systems 149 154 156II Representation of knowledge 162 164 166III Knowledge inference 169 171 174IV Planning &Machine Learning 178 180 182V Expert Systems 188 190 192 ANNA UNIVERSITY QUESTIONS PAPER 194 148
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING UNIT I INTRODUCTION TO Al AND PRODUCTION SYSTEMSIntroduction to AI-Problem formulation, Problem Definition -Production systems,Control strategies, Search strategies. Problem characteristics, Production systemcharacteristics -Specialized production system- Problem solving methods - Problemgraphs, Matching, Indexing and Heuristic functions -Hill Climbing-Depth first andBreath first, Constraints satisfaction - Related algorithms, Measure of performance andanalysis of search algorithms. KEY NOTESArtificial IntelligenceIt is the study of how to make computers do things right at the moment people do better.State space representation: Allows for formal definition of a problem Permits to define the process of solving a particular problem using a combination of search techniques.Problem DefinitionA problem consists of four parts: Initial state A set of actions A goal test function A path cost functionProduction Systems: Consists of A set of rules each consisting of a left side pattern and a right side that describes the operation to be performed if the rule applied. One or more knowledge or database that contain whatever information is appropriate for the particular task. A control strategy that specifies order of the rules in the database. A rule applier.Control Strategies 1st requirement of good control strategy is that it causes motion. 2nd requirement is that it be systematic.Search Strategies: Two types Uninformed Search Strategies Informed Search StrategiesUninformed search strategies Breadth First Search Depth –First Search Depth Limited Search Iterative deepening depth first search 149
SEMESTER – VI COMPUTER SCIENCE AND ENGINEERING Bidirectional search Generate and Test Hill Climbing Best –first search Problem reduction Constraint Satisfaction Means-ends AnalysisInformed Search strategies best-first search best-first search A* Greedy search memory bounded heuristic searchHeuristic function: Function that maps from problem state description to measures ofdesirability usually represented as numbers.PROBLEM CHARACTERISTICSIn order to choose most appropriate methods for a particular problem several dimensionsare analyzed. Is the problem decomposable into small sets? Can solution steps be ignored if they prove unwise? Is the problem universe predictable? Is a good solution Absolute or relative? Is the solution a State or a Path? Does the task require Interaction with person? Problem ClassificationPRODUCTION SYSTEM CHARACTERISTICS Monotonic production systems Nonmonotonic production systems Commutative production systems Partially Commutative production systemsIssues in the design of Search programs Forward versus Backward reasoning Matching Knowledge representation problem And the Frame problemGENERATE AND TEST Simplest of all approaches.Steps:1.Genearte a path from a start state.2.Test and see if this is a solution by comparing chosen or endpoint of chosen path to theactual goal state. 150
Search
Read the Text Version
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
- 213
- 214
- 215
- 216
- 217
- 218
- 219
- 220
- 221
- 222
- 223
- 224
- 225
- 226
- 227
- 228
- 229
- 230
- 231
- 232
- 233
- 234
- 235
- 236
- 237
- 238
- 239
- 240
- 241
- 242
- 243
- 244
- 245
- 246
- 247
- 248
- 249
- 250
- 251
- 252
- 253
- 254
- 255
- 256
- 257
- 258
- 259
- 260
- 261
- 262
- 263
- 264
- 265
- 266
- 267
- 268
- 269