Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

ppl

Published by rsagwal, 2022-04-27 08:37:29

Description: ppl

Search

Read the Text Version

MCA-20-24(i): Principles of Programming Languages Type: Elective Instructions to paper setter for End semester exam: Course Credits: 04 Total number of questions shall be nine. Question number one will be Contact Hours: 4 hours/week compulsory and will be consisting of short/objective type questions from Examination Duration: 3 Hours complete syllabus. In addition to compulsory first question there shall be Mode: Lecture four units in the question paper each consisting of two questions. Student External Maximum Marks: 75 will attempt one question from each unit in addition to compulsory External Pass Marks: 30(i.e. 40%) question. All questions will carry equal marks. Internal Maximum Marks: 25 Total Maximum Marks: 100 Total Pass Marks: 40(i.e. 40%) Course Objectives: The objective of this paper is to make the students familiar with different elements of programming languages such as data types/operators/statements/control constructs and their implementation with the understanding that it will help them in becoming a better programmer. Course Outcomes (COs) At the end of this course, the student will be able to: MCA-20-24(i).1 understand the programming language hierarchy and basics of compilation; MCA-20-24(i).2 understand the different types of grammar; MCA-20-24(i).3 understand the features of object oriented language and different methods of sequence control; MCA-20-24(i).4 understand the implementation of different type of functions. CO-PO Mapping Matrix for the Course Code : MCA-20-24(i) COs PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 MCA-20-24(i).1 3 3 1 3 1 3 2 3 3 1 3 MCA-20-24(i).2 3 3 1 3 1 3 2 3 3 1 3 MCA-20-24(i).3 3 3 1 3 1 3 3 3 3 1 3 MCA-20-24(i).4 3 3 1 3 1 3 3 3 3 1 3 Average 3 3 1 3 1 3 2.5 3 3 1 3 CO-PSO Mapping Matrix for the Course Code : MCA-20-24(i) COs PSO1 PSO2 PSO3 PSO4 PSO5 MCA-20-24(i).1 3 3 3 2 3 MCA-20-24(i).2 3 3 3 2 3 MCA-20-24(i).3 3 3 3 2 3 MCA-20-24(i).4 3 3 3 2 3 Average 3 3 3 2 3 Unit – I Preliminaries: History, Impact of Programming Paradigms, Role of Programming Languages, Good Language, Effects of Programming Environment, Translators and virtual architectures, Binding and Binding time, Language Syntax, Analysis of Program, Synthesis of Object program, Formal translation models: BNF Grammars, General parsing, Language translation, Recursive descent parsing. PPL- 1 Dr. Rakesh Kumar June 20, 2021

Unit – II Formal languages and automata: The Chomsky hierarchy of formal languages, regular grammars, Regular expressions, Finite State Automata, Context-free grammars, Pushdown automata, Ambiguous grammars. Language Semantics: Attribute grammars, Denotational semantics, Program verification and validation, Data objects, variables, constants, data types, declaration, type checking, type casting, type promotion, Enumerators, Composite data types. Unit – III Object Orientated concepts: Structured data types, Abstract data types, Information hiding, Subprogram concepts, Good program design, Type definitions, Type equivalence, Inheritance, Derived classes, Abstract classes, Polymorphism, Inheritance and software reuse. Sequence control: Implicit and explicit sequence control, Sequence control within arithmetic expressions, sequence control between statements, sequencing with non-arithmetic expressions, Subprogram Sequence control. Unit – IV Miscellaneous topics: Parameter passing techniques, Static & Dynamic Scoping, Storage of variables, Static storage, Heap Storage management, Distributed Processing, Exceptions and Exception handlers, Co-routines, Scheduled subprograms, Parallel programming, Processor design, Hardware and Software architectures, Network Programming, Evolution of scripting languages, Applets, XML. Text Books: 1 Pratt T.W., Zelkowitz M.V., Gopal T.V., Programming Languages Design and Implementation, Pearson Education. 2 Sebesta W. Robert, Concepts of Programming Languages, Pearson Education. Reference Books: 1 Appleby Doris &VandeKopple J. Julius, Programming Languages-Paradigm and practice, Tata McGraw Hill. 2 Sethi Ravi, Programming Languages: Concepts & Constructs, Pearson Education 3 Scott M., Programming Language Pragmatics, Elsevier India. PPL- 2 Dr. Rakesh Kumar June 20, 2021

Characteristics of a Good Programming Language There are some popular high-level programming languages, while there are others that could not become so popular in-spite of being very powerful. There might be reasons for the success of a language but one obvious reason is its characteristics. Several characteristics believed to be important for making it good: 1-Clarity, simplicity, unity A good programming language must be simple and easy to learn and use. It should provide a programmer with a clear, simple and unified set of concepts that can be grasped easily. The overall simplicity of a this strongly affects the readability of the programs written in that language and programs that are easier to read and understand are easier to maintain. It is also easy to develop and implement a compiler or an interpreter for a simple language. However, the power needed for the language should not be sacrificed for simplicity. For Example, BASIC is liked by many programmers because of its simplicity. So tThe language must provide a framework for thinking about algorithms. The language should have a minimum number of concepts (conceptual integrity). Language syntax determines ease of writing, reading, testing, modifying code. Cryptic syntax may be easy to write but is usually hard to read. Constructs that mean different things should look different. 2-Naturalness A good language should be natural for the application area for which it is designed. That is, it should provide appropriate operators, data structures, control structures and a natural syntax to facilitate programmers to code their problems easily and efficiently. FORTRAN and COBOL are good examples of languages possessing high degree of naturalness in scientific and business application areas,respectively. 3-Abstraction Abstraction means ability to define and then use complicated structures or operations in ways that allow many of the details to be ignored. The degree of abstraction allowed by a language directly affects its ease of programming. For Example, object-oriented languages support high degree of abstraction. Hence,writing programs in object-oriented languages is much easier. Object-oriented also support re-usability of program segments due to this feature. 4-Efficiency: Programs written in a good language are translated into machine code efficiently, are PPL- 3 Dr. Rakesh Kumar June 20, 2021

executed and require relatively less space in memory. That is,a good programming language is supported with a good language translator (a compiler or an interpreter) that gives due consideration to space and time efficiency. 5-Structured Programming Support: A good language should have necessary features to allow programmers to write their programs based on the concepts of structured programming. This property greatly affects the ease with which a program may be written, tested and maintained. More over, it forces a programmer to look at a problem in a logical way so that fewer errors are created while writing a program for the problem. 6-Compactness: In a good language,programmers should be able to express the intended operations concisely without losing readability. Programmers generally do not like a verbose language because they need to write too much. Many programmers dislike COBOL, because it is verbose in nature (Lacks Compactness) 7-Locality: A good language should be such that while writing a program,a programmer need not jump around the visually as the text of a program is prepared. This allows the programmer to concentrate almost solely on the part of the program around the statement currently being worked with. COBOL and to some extent C and Pascal lack locality because data definitions are separated from processing statements, perhaps by many pages of code, or have to appear before any processing statement in the function/procedure. 8-Extensibility: A good language should also allow extensions through a simply, natural and elegant mechanism. Almost all languages provide subprogram definition mechanisms for the purpose,but some languages are weak in this aspect. 9-Suitability to its Environment: Depending upon the type of application for which a programming language has been designed,the language must also be made suitable to its environment. For Example, a language designed for a real-time applications must be interactive in nature. On the other hand, languages used for data-processing jobs like payroll, stores accounting etc may be designed to operative in batch mode. PPL- 4 Dr. Rakesh Kumar June 20, 2021

10-Cost of use The cost can be further classified as (a) Cost of program execution: It was the main focus of early programming years i.e. execution time efficiency but it is not a high concern anymore due to the exponential growth in hardware technology. (b) Cost of program translation: If the language is designed to be used by the students to learn programming where programs are compiled frequently but executed few times, it is a main concern. A fast compiler is important for programming education (c) Cost of program creation, testing, use: Smalltalk and Perl can help solve problems with minimum investment in programmer time and energy. So programming time minimized, but execution time may be larger in comparison to other languages. (d) Cost of program maintenance: Studies show largest cost is over the life of the program, not initial. Maintenance includes repair of errors and enhancements. A language that makes it easy to adjust programs may be cheaper in the long run. 11-Ease of program verification Reliability of programs is a huge concern. There are methods of testing program correctness such as formal verification, desk checking, testing via input data. A combination of the above methods are used for large projects. Hard program verification may outweigh language features. Simple syntax/semantics makes program verification simpler. 11-Portability of programs Program designed in a language should have transportability I,e, one should be able to run it on any machine. So a language should be widely available and its definition should be independent of a particular machine , ADA, FORTRAN, C have standardized definitions for portable application development. 12-Orthogonality Orthogonality in a programming language means that a relatively small set of primitive constructs can be combined in a relatively small number of ways to build the control and data structures of the language. It is associated with simplicity; the more orthogonal the design, the fewer exceptions. This makes it easier to learn, read and write programs in a programming language. The meaning of an orthogonal feature is independent of context; the key parameters are symmetry and consistency (for example, a pointer is an orthogonal concept). PPL- 5 Dr. Rakesh Kumar June 20, 2021

An example from IBM Mainframe and VAX highlights this concept. An IBM mainframe has two different instructions for adding the contents of a register to a memory cell (or another register). These statements are shown below: A Reg1, memory_cell AR Reg1, Reg2 In the first case, the contents of Reg1 are added to the contents of a memory cell; the result is stored in Reg1. In the second case, the contents of Reg1 are added to the contents of another register (Reg2) and the result is stored in Reg1. In contrast to the above set of statements, VAX has only one statement for addition: ADDL operand1, operand2 In this case the two operands (operand1 and operand2) can be registers, memory cells, or a combination of both; the instruction adds the contents of operand1 to the contents of operand2, storing the result in operand1. VAX’s instruction for addition is more orthogonal than the instructions provided by IBM; hence, it is easier for the programmer to remember (and use) the one provided by VAX. The design of C language may be examined from the perspective of orthogonality. The C language is somewhat inconsistent in its treatment of concepts and language structure, making it difficult for the user to learn (and use) the language. Examples of exceptions follow: Structures (but not arrays) may be returned from a function. An array can be returned if it is inside a structure. A member of a structure can be any data type (except void, or the structure of the same type). An array element can be any data type (except void). Everything is passed by value (except arrays). Though this concept was first applied to programming languages, orthogonality has since become recognized as a valuable feature in the design of APIs and even user interfaces. There, too, having a small set of composable primitive operations without surprising cross- linkages is valuable, as it leads to systems that are easier to explain and less frustrating to use. =============================================== PPL- 6 Dr. Rakesh Kumar June 20, 2021

Objectives of Studying Principle of Programming Languages 1. To improve your ability to develop effective algorithms Many languages provide features that can be extremely useful when used properly but waste a large amount of time when used improperly. Algorithms behave similarly, so the habit of making correct decisions with a programming language can carry over to making correct decisions with an algorithm. 2. To improve your use of your existing programming language By understanding exactly how the features in a programming language are implemented, you learn how to effectively use them. When you know how the language components such as lists/arrays/strings work, you can use them more efficiently. 3. To increase your vocab of useful programming constructs When thinking about a set of data/program structures to solve a problem, it's typical to only think about the structures that are immediately available within the programming language you are working with. By knowing the constructs of other programming languages, you can find one that fits your scenario better and possibly implement it yourself. 4. To allow a better choice of programming language Some languages are better suited than others for particular projects. You can reduce effort by picking the one that works best. 5. To make it easier to learn a new language A thorough knowledge of programming language constructs and implementation techniques allows programmers to learn new languages more easily. 6. To make it easier to design a new language Though you may not be making the next C or Java programming language, it's actually fairly common to create a form of programming language on a small scale within a project. =============================================== A short history of programming languages  1950s: FORTRAN, LISP  1970s: Ada, C, Pascal, Prolog, Smalltalk  1980s: C++, ML, Perl, Postscript  1990s: Java PPL- 7 Dr. Rakesh Kumar June 20, 2021

Development of Early Languages - Numerically based languages In the 1940s, the main role for computers during WWII was solving differential equations to determine ballistics trajectories. In the 1950s, symbolic notations started to appear. For example, (i) Grace Hopper developed the A-0 language and (ii) John Backus developed Speedcoding. Both A-0 and Speedcoding were designed to compile simple arithmetic expressions into executable machine language. FORTRAN In 1957, Backus managed a team to develop FORTRAN, meaning FORmula TRANSlator. While FORTRAN was oriented around numerical calculations, its goal was a full-fledged programming language including control structures, conditionals and input/output statements. Many doubted FORTRAN would compete with hand-coded assembly, so much effort was put into making FORTRAN execution extremely efficient. Some of this efficiency came from making statements specifically for the IBM 704. FORTRAN was extremely successful and dominated in the 1970s. In 1958, FORTRAN was revised as FORTRAN II, and FORTRAN IV came a few years later. Soon, many manufacturers had implemented a version of the language, as there was no standard. In 1966, FORTRAN IV became the standard, and went by the name FORTRAN 66. FORTRAN 66 has been updated to FORTRAN 77 and then FORTRAN 90. ALGOL The success of FORTRAN caused fear of IBM's industry dominance. GAMM (the Germany Society of Applied Mathematics) organized a committee to design a universal language. The US's ACM, (Association for Computing Machinery) organized a similar committee, and the two committees eventually merged. Under the leadership of Peter Naur, the International Algorithmic Language (IAL) was developed. The name was eventually changed to ALGOL, and became known as ALGOL 58. The ALGOL 60 revision became the standard academic computing language from the 1960s to the early 1970s. FORTRAN was designed for efficient execution on the IBM 704 machine, but ALGOL had different goals:  ALGOL notation should be close to standard math notation  ALGOL should be useful for the description of algorithms  Programs in ALGOL should be compilable into machine language PPL- 8 Dr. Rakesh Kumar June 20, 2021

 ALGOL should not be bound to a single computer architecture These goals had large implications. This meant that, in the context of the time period, input/output was not included in the language, nor was special procedures. Implementations of ALGOL were incompatible. Subprograms were viewed as macro substitution, a concept named call by name. Backus was editor of the ALGOL report defining the language, and he used a notation similar to that developed by Chomsky for describing context free languages. This introduced formal grammar theory to programming languages. Peter Naur's and Backus' role in the development of ALGOL caused the notation used to represent the grammar of a language to be called Backus Naur Form (BNF). Business Languages After establishing domain of numerical calculations with computers, business data processing came soon after. Grace Hopper developed FLOWMATIC in 1955, with the goal of developing business applications with a form of English-like text. In 1959, The US DoD sponsored a meeting to develop the Common Business Language(CBL), a language whose goal was to use English as much as possible for its notation. The specifications for CBL designed in 1960 became the designs for COBOL (COmmon Business Oriented Language). Cobol was standardized in 1968. Artificial-intelligence languages The Information Processing Language (IPL) was developed by the Rand Corporation. IPL-V was widely known but had limited use. John McCarthy of MIT designed LIst PRocessing (LISP) for the IBM 704. LISP 1.5 became the standard LISP implementation for many years. Systems languages In this early era, efficiency was required, so assembly language was mostly used in the systems area. CPL and BCPL were designed but never gained traction. C changed the systems area, but that did not come until the 1970s. =============================================== Language Paradigms There are four basic computational models that describe most programming. Imperative languages Imperative programming is a programming paradigm that uses statements that change a PPL- 9 Dr. Rakesh Kumar June 20, 2021

program's state. Imperative programming focuses on describing how a program operates. The term is often used in contrast to declarative programming, which focuses on what the program should accomplish without specifying how the program should achieve the result. Imperative programming and Procedural programming, a type of imperative programming, are often used as synonyms. In procedural programming, the program is built from one or more procedures. In procedural programming, state changes are localized to procedures or restricted to explicit arguments and returns from procedures. This paradigm is Command- driven or statement-oriented. A program consists of a sequence of statements, execution causes machine to enter a new state. Syntax of imperative languages is of the form statement1; statement2; ... So Program development is building successive states to arrive at solution. Some imperative languages are C, C++, FORTRAN, ALGOL,PL/I , Pascal, Ada, Smalltalk, COBOL. Applicative languages These are also known as functional languages. Characteristic features are:  Look at the function the program represents  Look at the desired result rather than available data  Program develops by creating functions from previous functions that manipulate the initial data set until the solution is achieved.  Once the functions are created, we apply the initial data The syntax of function languages is of the form function_n(...function_2(function_1(data))...) Example: LISP and ML are functional languages Rule-based languages These are also known as logical programming languages. Their important features are:  Check for condition, executes an appropriate action  Most common rule-based language is Prolog  Set of filters to apply to data storage  Similar to imperative but statements are not sequential The syntax of rule-based languages is of the form PPL- 10 Dr. Rakesh Kumar June 20, 2021

enabling condition_1 => action_1 enabling condition_2 => action_2 ... enabling condition_n => action_n Common business application of rules-based languages are decision tables. Programming often consists of building a matrix/table of conditions and the appropriate actions. Object-oriented programming The 1980s saw a rapid growth in interest in object-oriented programming. These languages were imperative in style, but added features to support objects. The last two decades of the 20th century saw the development of many such languages. The characteristic features are:  Complex data objects are built, functions are designed to operate on the data  Complex objects are extensions of simpler objects, inheriting properties  A combination of the applicative and imperative models =============================================== Programming Environments and Effects on Language Design Programming environment is where programs are created, tested. It usually consists of support tools and command language for invoking them. Typical tools include editors, debuggers, verifiers, pretty printers and test data generators. The ffect of programming environment on language design is as under: Effects on Language Design Programming environments have had two large effects on language design: (a) Features aiding separate compilation/assembly from components. (b) Features aiding program testing and debugging. Separate compilation For large programs, different programmers will be working on separate parts. This requires a language that can compile the parts and merge together later. Separate compilation can be difficult because subprograms might need each other. There are ways to provide information to subprograms during separate compilation: (a) Information may need to be redeclared (FORTRAN) (b) An order of compilation may be required (Ada) (c) A library containing relevant specifications may be required (Java/C++) PPL- 11 Dr. Rakesh Kumar June 20, 2021

Option (a) above uses independent compilation. The subprogram is entirely self contained. The disadvantage is inability to check inconsistency of data between external declaration and internal redeclaration. You will have assembly errors even though the subprograms may have 0 errors. Options (b) and (c) require the use of libraries. The body is usually omitted during the compilation of subprograms. Separate compilation has the side effect of enabling name collisions issues. Several subprograms or portions of programs may have the same name. This may not be determined until attempting to merge all subprograms. There are three main ways languages avoid these name collisions: (i) Use of naming conventions (obligation is the programmer's), (ii) Use of scoping rules (Used by Pascal, C, Ada), (iii) Add name definitions from external library (inheritance in Object oriented). Testing and debugging A few typical examples of testing and debugging features available in the languages are are: Execution trace  Prolog, LISP, many others have execution tracing tools.  Allows for statements and variables to be tagged for tracing.  When a tagged statement is called, program stops, debug trace is printed. Breakpoints  Specified by programmer.  When breakpoint is reached, execution is interrupted, control given to user.  User can inspect and modify variables and restart the program. Assertions  A special conditional expression.  If assertion fails, program is interrupted.  Exception handler may print or take other actions upon failed assertion.  After debugging, assertions may be disabled. =============================================== Formal grammars A formal grammar consists of a finite set of production rules (left-hand side→right-hand side), where each side consists of a finite sequence of the following symbols: PPL- 12 Dr. Rakesh Kumar June 20, 2021

a finite set of nonterminal symbols (indicating that some production rule can yet be applied) a finite set of terminal symbols (indicating that no production rule can be applied) a start symbol (a distinguished nonterminal symbol) A formal grammar provides an axiom schema for a formal language, which is a set of finite- length sequences of symbols that may be constructed by applying production rules to another sequence of symbols (which initially contains just the start symbol). A rule may be applied by replacing an occurrence of the symbols on its left-hand side with those that appear on its right-hand side. A sequence of rule applications is called a derivation. Such a grammar defines the formal language: all words consisting solely of terminal symbols which can be reached by a derivation from the start symbol. Nonterminals are often represented by uppercase letters, terminals by lowercase letters, and the start symbol by S. For example, the grammar with terminals {a, b}, nonterminals {S, A, B}, production rules S → AB S → ε (where ε is the empty string) A → aS B→b and start symbol S, defines the language of all words of the form anbn (i.e. n copies of a followed by n copies of b). The following is a simpler grammar that defines the same language: Terminals {a, b}, Nonterminals {S}, Start symbol S, Production rules S → aSb S→ε CHOMSKY CLASSIFICATION Chomsky classified the grammars as under: (1) Recursively enumerable grammars –recognizable by a Turing machine (2) Context-sensitive grammars –recognizable by the linear bounded automaton (3) Context-free grammars - recognizable by the pushdown automaton (4) Regular grammars –recognizable by the finite state automaton PPL- 13 Dr. Rakesh Kumar June 20, 2021

0 –Recursively enumerable grammar Type-0 grammars (unrestricted grammars) include all formal grammars. They generate exactly all languages that can be recognized by a Turing machine. These languages are also known as the recursively enumerable languages. Note that this is different from the recursive languages which can be decided by an always-halting Turing machine. Class 0 grammars are too general to describe the syntax of programming languages and natural languages. 1 –Context-sensitive grammars Type-1 grammars generate the context-sensitive languages. These grammars have rules of the form α A β → α γ β with A a nonterminal and α,β and γ strings of terminals and nonterminals. The strings α and β may be empty,but γ must be nonempty. The languages described by these grammars are exactly all languages that can be recognized by a linear bounded automaton. Example: AB → CDB AB → CdEB ABcd → abCDBcd B→b 2 –Context-free grammars Type-2 grammars generate the context-free languages. These are defined by rules of the form A → γ with A a nonterminal and γ a string of terminals and nonterminals. These languages are exactly all languages that can be recognized by a non-deterministic pushdown automaton. Context-free languages are the theoretical basis for the syntax of most programming languages. Example: A → aBc 3 –Regular grammars Type-3 grammars generate the regular languages. Such a grammar restricts its rules to a single nonterminal on the left-hand side and a right-hand side consisting of a single terminal,possibly followed (or preceded,but not both in the same grammar) by a single nonterminal. The rule S → ε is also allowed here if S does not appear on the right side of any rule. These languages are exactly all languages that can be decided by a finite state PPL- 14 Dr. Rakesh Kumar June 20, 2021

automaton. Additionally,this family of formal languages can be obtained by regular expressions. Regular languages are commonly used to define search patterns and the lexical structure of programming languages. Example: A→ε A→ a A → abc A→ B A → abcB Regular Expression A regular expression is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. “find and replace”-like operations. Regular expressions are a generalized way to match patterns with sequences of characters. It is used in every programming language like C++, Java and Python. Writing a regular expression Boolean \"or\" A vertical bar separates alternatives. For example, gray|grey can match \"gray\" or \"grey\". Grouping Parentheses are used to define the scope and precedence of the operators (among other uses). For example, gray|grey and gr(a|e)y are equivalent patterns which both describe the set of \"gray\" or \"grey\". Quantification A quantifier after a token (such as a character) or group specifies how often that a preceding PPL- 15 Dr. Rakesh Kumar June 20, 2021

element is allowed to occur. The most common quantifiers are the question mark ?, the asterisk * (derived from the Kleene star), and the plus sign + (Kleene plus). The asterisk symbol ( * ): It tells the computer to match the preceding character (or set of characters) for 0 or more times (upto infinite). Example: The regular expression ab*c will give ac, abc, abbc, abbbc….and so on The Plus symbol ( + ): It tells the computer to repeat the preceding character (or set of characters) for atleast one or more times(up to infinite). Example: The regular expression ab+c will give abc, abbc, abbc, … and so on. Question Mark (?): The question mark indicates zero or one occurrences of the preceding element. For example, colou?r matches both \"color\" and \"colour\". Curly Braces { } {n} The preceding item is matched exactly n times. {min,} The preceding item is matched min or more times. {min,max} The preceding item is matched at least min times, but not more than max times. Wildcard (.) The wildcard . matches any character. For example, a.b matches any string that contains an \"a\", then any other character and then a \"b\", a.*b matches any string that contains an \"a\" and a \"b\" at some later point. Regular expressions describe regular languages in formal language theory. They have the same expressive power as regular grammars. Formal definition Regular expressions consist of constants, which denote sets of strings, and operator symbols, which denote operations over these sets. Given a finite alphabet Σ, the following constants are defined as regular expressions: (empty set) ∅ denoting the set ∅. (empty string) ε denoting the set containing only the \"empty\" string, which has no characters at all. (literal character) a in Σ denoting the set containing only the character a. Given regular expressions R and S, the following operations over them are defined to PPL- 16 Dr. Rakesh Kumar June 20, 2021

produce regular expressions: (concatenation) RS denotes the set of strings that can be obtained by concatenating a string in R and a string in S. For example, let R = {\"ab\", \"c\"}, and S = {\"d\", \"ef\"}. Then, RS = {\"abd\", \"abef\", \"cd\", \"cef\"}. (alternation) R | S denotes the set union of sets described by R and S. For example, if R describes {\"ab\", \"c\"} and S describes {\"ab\", \"d\", \"ef\"}, expression R | S describes {\"ab\", \"c\", \"d\", \"ef\"}. (Kleene star) R* denotes the smallest superset of the set described by R that contains ε and is closed under string concatenation. This is the set of all strings that can be made by concatenating any finite number (including zero) of strings from the set described by R. For example, {\"0\",\"1\"}* is the set of all finite binary strings (including the empty string), and {\"ab\", \"c\"}* = {ε, \"ab\", \"c\", \"abab\", \"abc\", \"cab\", \"cc\", \"ababab\", \"abcab\", ... }. To avoid parentheses it is assumed that the Kleene star has the highest priority, then concatenation and then alternation. If there is no ambiguity then parentheses may be omitted. For example, (ab)c can be written as abc, and a|(b(c*)) can be written as a|bc*. Many textbooks use the symbols ∪, +, or ∨ for alternation instead of the vertical bar. Examples:  a|b* denotes {ε, \"a\", \"b\", \"bb\", \"bbb\", ...}  (a|b)* denotes the set of all strings with no symbols other than \"a\" and \"b\", including the empty string: {ε, \"a\", \"b\", \"aa\", \"ab\", \"ba\", \"bb\", \"aaa\", ...}  ab*(c|ε) denotes the set of strings starting with \"a\", then zero or more \"b\"s and finally optionally a \"c\": {\"a\", \"ac\", \"ab\", \"abc\", \"abb\", \"abbc\", ...}  (0|(1(01*0)*1))* denotes the set of binary numbers that are multiples of 3: { ε, \"0\", \"00\", \"11\", \"000\", \"011\", \"110\", \"0000\", \"0011\", \"0110\", \"1001\", \"1100\", \"1111\", \"00000\", ... } Backus-Naur Form (BNF) notation Backus–Naur form is a notation technique for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats, instruction sets and communication protocols. When describing languages, Backus-Naur form (BNF) is a formal notation for encoding grammars intended for human consumption. Many programming languages, protocols or formats have a BNF description in their specification. Every rule in Backus-Naur form has the following structure: PPL- 17 Dr. Rakesh Kumar June 20, 2021

name ::= expansion The symbol ::= means \"may expand into\" and \"may be replaced with.\" In some texts, a name is also called a non-terminal symbol. Every name in Backus-Naur form is surrounded by angle brackets, < >, whether it appears on the left- or right-hand side of the rule. An expansion is an expression containing terminal symbols and non-terminal symbols, joined together by sequencing and choice. A terminal symbol is a literal like (\"+\" or \"function\") or a class of literals (like integer). Simply juxtaposing expressions indicates sequencing. A vertical bar | indicates choice. For example, in BNF, the classic expression grammar is: <expr> ::= <term> \"+\" <expr>| <term> <term> ::= <factor> \"*\" <term>| <factor> <factor> ::= \"(\" <expr> \")\"| <const> <const> ::= integer Definition: A Finite Automaton A finite automaton (FA) is a 5-tuple (Q,Σ,q0,A,δ) where  Q is a finite set of states;  Σ is a finite input alphabet;  q0∈Q is the initial state;  A⊆Q is the set of accepting states; and  δ:Q×Σ→Q is the transition function. For any element q of Q and any symbol σ∈Σ, we interpret δ(q,σ) as the state to which the FA moves, if it is in state q and receives the input σ. Graphical Representation of a DFA: A DFA is represented by digraphs called state diagram. Where (i) the vertices represent the states, (ii) the arcs labeled with an input alphabet show the transitions, (iii) the initial state is denoted by an empty single incoming arc, and (iv) the final state is indicated by double circles. Example Let a deterministic finite automaton be → Q = {a, b, c}, ∑ = {0, 1}, q0 = {a}, A = {c}, and PPL- 18 Dr. Rakesh Kumar June 20, 2021

Transition function δ as shown by the following table − Present State Next State for Input 0 Next State for Input 1 aa b bc a cb c Its graphical representation would be as follows − Ambiguous Grammar In computer science, an ambiguous grammar is a context-free grammar for which there exists a string that can have more than one left most derivation or right most derivation parse tree. The context free grammar A→A+A|A−A|a is ambiguous since there are two leftmost derivations for the string a + a – a: Following are some examples of ambiguous grammars: PPL- 19 Dr. Rakesh Kumar June 20, 2021

S-> aS |Sa| Є E-> E +E | E*E| id A -> AA | (A) | a S -> SS|AB , A -> Aa|a , B -> Bb|b Ambiguity of the grammar implies that at least some strings in its language have different structures (parse trees). Thus, such a grammar is unlikely to be useful for a programming language, because two structures for the same string (program) implies two different meanings (executable equivalent programs) for this program. Pushdown Automata Pushdown Automata is a finite automata with extra memory called stack which helps Pushdown automata to recognize Context Free Languages. A pushdown automaton is a way to implement a context-free grammar in a similar way a DFA for a regular grammar. A DFA can remember a finite amount of information, but a PDA can remember an infinite amount of information. Basically a pushdown automaton is − \"Finite state machine\" + \"a stack\" A pushdown automaton has three components: (a) an input tape, (b) a control unit, and (c) a stack with infinite size. The stack head scans the top symbol of the stack. A stack does two operations: (a) Push − a new symbol is added at the top. (b) Pop − the top symbol is read and removed. A PDA may or may not read an input symbol, but it has to read the top of the stack in every transition. PPL- 20 Dr. Rakesh Kumar June 20, 2021

A PDA can be formally described as a 7-tuple (Q, ∑, S, δ, q0, I, F) : Q is the finite number of states ∑ is the input alphabet S is a finite set which is called stack alphabet δ is the transition function which maps Q x {Σ ∪ ∈} x S into Q x S*. In a given state, PDA will read input symbol and stack symbol (top of the stack) and move to a new state and change the symbol of stack. q0 is the initial state or start state (q0 ∈ Q) I is the initial stack top symbol (I ∈ S) F is a set of accepting states (F ⊆ Q) Instantaneous Descriptions of a PDA A PDA goes from configuration to configuration when consuming input. The configuration of a PDA is represented by a triple (q,w, S) where 1. q is a the state, 2. w is the remaining input, and 3. S is the stack contents A configuration triple is called an instantaneous description or ID of the pushdown automaton. Turnstile Notation The \"turnstile\" notation is used for connecting pairs of ID's that represent one or many moves PPL- 21 Dr. Rakesh Kumar June 20, 2021

of a PDA. The process of transition is denoted by the turnstile symbol \"⊢\". Consider a PDA (Q, ∑, S, δ, q0, I, F). A transition can be mathematically represented by the following turnstile notation − (p, aw, Tβ) ⊢ (q, w, αβ) This implies that while taking a transition from state p to state q, the input symbol ‘a’ is consumed, and the top of the stack ‘T’ is replaced by a new string ‘α’. Note − If we want zero or more moves of a PDA, we have to use the symbol (⊢*) for it. Example : Define the pushdown automata for language {anbn | n > 0} Solution: M = where Q = { q0, q1 } and Σ = { a, b } and Γ = { A, Z } and δ is given by : δ( q0, a, Z ) = { ( q0, AZ ) } δ( q0, a, A) = { ( q0, AA ) } δ( q0, b, A) = { ( q1, ∈) } δ( q1, b, A) = { ( q1, ∈) } δ( q1, ∈, Z) = { ( q1, ∈) } Let us see how this automata works for aaabbb. Row State Input δ Top of Stack State after move 1 q0 aaabbb Z q0 2 q0 aaabbb δ(q0,a,Z)={(q0,AZ)} AZ q0 3 q0 aaabbb δ(q0,a,A)={(q0,AA)} AAZ q0 4 q0 aaabbb δ(q0,a,A)={(q0,AA)} AAAZ q0 5 q0 aaabbb δ(q0,b,A)={(q1,∈)} AAZ q1 6 q1 aaabbb δ(q1,b,A)={(q1,∈)} AZ q1 7 q1 aaabbb δ(q1,b,A)={(q1,∈)} Z q1 8 q1 ∈ δ(q1,∈,Z)={(q1,∈)} ∈ q1 Explanation : Initially, the state of automata is q0 and symbol on stack is Z and the input is aaabbb as shown in row 1. On reading ‘a’ (shown in bold in row 2), the state will remain q0 and it will push symbol A on stack. On next ‘a’ (shown in row 3), it will push another symbol A on stack. After reading 3 a’s, the stack will be AAAZ with A on the top. After reading ‘b’ (as shown in row 5), it will pop A and move to state q1 and stack will be AAZ. When all b’s are PPL- 22 Dr. Rakesh Kumar June 20, 2021

read, the state will be q1 and stack will be Z. In row 8, on input symbol ‘∈’ and Z on stack, it will pop Z and stack will be empty. This type of acceptance is known as acceptance by empty stack. Note :  The above pushdown automaton is deterministic in nature because there is only one move from a state on an input symbol and stack symbol.  The non-deterministic pushdown automata can have more than one move from a state on an input symbol and stack symbol.  It is not always possible to convert non-deterministic pushdown automata to deterministic pushdown automata.  Expressive Power of non-deterministic PDA is more as compared to expressive deterministic PDA as some languages which are accepted by NPDA but not by deterministic PDA which will be discussed in next article.  The push down automata can either be implemented using acceptance by empty stack or acceptance by final state and one can be converted to another. Attribute Grammar Attribute grammar is a special form of context-free grammar where some additional information (attributes) are appended to one or more of its non-terminals in order to provide context-sensitive information. Each attribute has well-defined domain of values, such as integer, float, character, string, and expressions. Attribute grammar is a medium to provide semantics to the context-free grammar and it can help specify the syntax and semantics of a programming language. Attribute grammar (when viewed as a parse-tree) can pass values or information among the nodes of a tree. Example: E → E + T { E.value = E.value + T.value } The right part of the CFG contains the semantic rules that specify how the grammar should be interpreted. Here, the values of non-terminals E and T are added together and the result is copied to the non-terminal E. Semantic attributes may be assigned to their values from their domain at the time of parsing and evaluated at the time of assignment or conditions. Based on the way the attributes get their values, they can be broadly divided into two categories: synthesized attributes and PPL- 23 Dr. Rakesh Kumar June 20, 2021

inherited attributes. Synthesized attributes These attributes get values from the attribute values of their child nodes. To illustrate, assume the following production: S ::= ABC If S is taking values from its child nodes (A,B,C), then it is said to be a synthesized attribute, as the values of ABC are synthesized to S. Synthesized attributes never take values from their parent nodes or any sibling nodes. As in our previous example (E → E + T), the parent node E gets its value from its child node. Inherited attributes In contrast to synthesized attributes, inherited attributes can take values from parent and/or siblings. As in the following production, S ::= ABC A can get values from S, B and C. B can take values from S, A, and C. Likewise, C can take values from S, A, and B. Expansion: When a non-terminal is expanded to terminals as per a grammatical rule Reduction: When a terminal is reduced to its corresponding non-terminal according to grammar rules. Syntax trees are parsed top-down and left to right. Whenever reduction occurs, we apply its corresponding semantic rules (actions). Semantic analysis uses Syntax Directed Translations to perform the above tasks. Semantic analyzer receives AST (Abstract Syntax Tree) from its previous stage (syntax analysis). Semantic analyzer attaches attribute information with AST, which are called Attributed AST. Attributes are two tuple value, <attribute name, attribute value> For example: int value = 5; <type, “integer”> <present value, “5”> For every production, we attach a semantic rule. S-attributed SDT If an SDT uses only synthesized attributes, it is called as S-attributed SDT. These attributes PPL- 24 Dr. Rakesh Kumar June 20, 2021

are evaluated using S-attributed SDTs that have their semantic actions written after the production (right hand side). As depicted above, attributes in S-attributed SDTs are evaluated in bottom-up parsing, as the values of the parent nodes depend upon the values of the child nodes. L-attributed SDT This form of SDT uses both synthesized and inherited attributes with restriction of not taking values from right siblings. In L-attributed SDTs, a non-terminal can get values from its parent, child, and sibling nodes. As in the following production S → ABC S can take values from A, B, and C (synthesized). A can take values from S only. B can take values from S and A. C can get values from S, A, and B. No non-terminal can get values from the sibling to its right. Attributes in L-attributed SDTs are evaluated by depth-first and left-to-right parsing manner. We may conclude that if a definition is S-attributed, then it is also L-attributed as L-attributed definition encloses S-attributed definitions. So an attribute grammar may be informally defined as a context-free grammar that has been extended to provide context sensitivity using a set of attributes, assignment of attribute PPL- 25 Dr. Rakesh Kumar June 20, 2021

values, evaluation rules, and conditions. A finite, possibly empty set of attributes is associated with each distinct symbol in the grammar. Each attribute has an associated domain of values, such as integers, character and string values, or more complex structures. Viewing the input sentence (or program) as a parse tree, attribute grammars can pass values from a node to its parent, using a synthesized attribute, or from the current node to a child, using an inherited attribute. In addition to passing attribute values up or down the parse tree, the attribute values may be assigned, modified, and checked at any node in the derivation tree. The following examples should clarify some of these points. Examples of Attribute Grammars We will attempt to write a grammar to recognize sentences of the form anbncn. The sentences aaabbbccc and abc belong to this grammar but the sentences aaabbbbcc and aabbbcc do not. Consider this first attempt to describe the language using a context-free grammar: <letter sequence>::=<a sequence><b sequence><c sequence> <a sequence>::=a|<a sequence>a <b sequence>::=b|<b sequence>b <c sequence>::=c|<c sequence>c this grammar can generate the string aaabbbccc . It can also generate the string aaabbbbcc. It is impossible to write a context free grammar to generate only those sentences of the form anbncn. However, it is possible to write a context-sensitive grammar for sentences of this form. Attribute grammars provide another approach for defining context-sensitivity. If we augment our grammar with an attribute describing the length of aletter sequence, we can use these values to ensure that the sequences of a’s, b’s, and c’s all have the same length. The first solution involves a synthesized attribute Size that is associated with the nonterminals <a sequence>, <b sequence> and <c sequence>. We add the condition that, at the root of the tree, the Size attribute for each of the letter sequences has the same value. If a character sequence consists of a single character, Size is set to 1; if it consists of a character sequence followed by a single character, Size for the parent character sequence is the Size of the child character sequence plus one. We have added the necessary attribute assignments and conditions to the grammar shown below. Notice that we differentiate a parent sequence from a child sequence by adding subscripts to the nonterminal symbols. <letter sequence>::=<a sequence><b sequence><c sequence> PPL- 26 Dr. Rakesh Kumar June 20, 2021

condition: Size(<a sequence>) = Size(<b sequence>) = Size(<c sequence>) <a sequence> ::= a Size(<a sequence>) ← 1 | <a sequence>2 a Size(<a sequence>) ← Size(<a sequence>) 2 + 1 <b sequence> ::= b Size(<b sequence>) ← 1 | <b sequence>2 b Size(<b sequence>) ← Size(<b sequence>) 2 + 1 <c sequence> ::= c Size(<c sequence>) ← 1 | <c sequence>2 c Size(<c sequence>) ← Size(<c sequence>) 2 + 1 This attribute grammar successfully parses the sequence aaabbbccc since the sequence obeys the BNF and satisfies all conditions in the attribute grammar. On the other hand, this attribute grammar cannot parse the sequence aaabbbbcc . Although this sequence satisfies the BNF part of the grammar, it does not satisfy the condition required of the attribute values. Denotational Semantics There are two main aspects to a programming language - its syntax and its semantics. The syntax defines the correct form for legal programs and the semantics determines what they compute (if anything). Programming Language = Syntax + Semantics The treatment of syntax in programming languages has been very successful. Backus Naur Form (BNF) was first used to describe the grammar of Algol-60. BNF is a meta language for syntax. Once a good notation exists it is natural to use it mathematically to describe classes of grammar (e.g., regular, context free, context sensitive), to investigate parsing algorithms for various classes and so on. Most programming languages from BNF's inception onwards have had \"nice\" syntax whereas most of those from before it have not - e.g., Fortran. Computer programs to manipulate BNF were written and could check various properties of a grammar (e.g., LL(1)) or translate it into a parser, giving us parser generators. The other popular meta language for grammars is the syntax diagram which is equivalent to PPL- 27 Dr. Rakesh Kumar June 20, 2021

BNF and there are easy translations from one to the other. Models for semantics have not caught-on to the same extent that BNF and its descendants have in syntax. This may be because semantics does seem to be just plain harder than syntax. The most successful system is denotational semantics which describes all the features found in imperative programming languages and has a sound mathematical basis. Many denotational definitions can be executed as interpreters or translated into \"compilers\" but this has not yet led to generators of efficient compilers which may be another reason that denotational semantics is less popular the BNF. Motivation The motivation for denotational semantics is to be able to provide a precise definition of the semantics of a language. This is important in a number of different contexts: Programmers: Need a precise definition to be sure that their code does what they expect, and to reason about programs (e.g., to prove properties of their code). Compiler writers: Need a precise definition to implement a correct compiler. Without a precise definition, a program may do different things depending on which compiler is used. Tool generator writers: Given a way to specify a language semantics, it might be possible to write a code-generator generator (or an interpreter generator), just as we currently have parser generators that work using a precise definition of a language's syntax. There are other ways to specify a language's semantics, but they have shortcomings: English description: For a non-trivial language, an English description is almost sure to be ambiguous, or to have some details omitted. Operational semantics: This means giving a compiler or interpreter for the language. The semantics of the language are implicitly defined by the behavior of the compiled/interpreted code. The problem is that it provides little help with reasoning about programs, and does not a provide an input language for a tool generator. Axiomatic semantics: This involves giving an axiom or a rule of inference for each construct in the language. It does form a good foundation for proving properties about a program, but does not necessarily specify the precise meaning of a program. Overview The basic idea of denotational semantics is, given a language L, define the meaning of L by supplying a valuation function for each construct. The valuation function for a construct is PPL- 28 Dr. Rakesh Kumar June 20, 2021

defined in terms of the valuation functions for the sub-constructs; thus, this is a kind of syntax- directed translation from a program's abstract-syntax tree to its meaning. Given the valuation functions for a language L, we can determine the meaning of a program written in L: usually a function (from inputs to outputs), though it may be just a value for simple languages. Every denotational definition includes: 1. A definition of the language's abstract syntax (via a set of syntactic domains and a grammar; the elements in the syntactic domain are the nonterminals in the grammar). 2. A definition of the relevant semantic algebras. Each semantic algebra includes:  A semantic domain (e.g., N, the natural numbers), and  A set of operators, including constants which can be thought of as nullary operators (e.g., zero, one, plus, times ...) A semantic domain can be primitive (i.e., we already understand it, like the domain of natural numbers) or compound (built from existing domains using operators like x, →, etc). 3. Valuation functions: One function is defined for each \"syntactic domain\"; i.e., one function for each nonterminal in the grammar. Each function is defined by cases, with one case for each production associated with the nonterminal. A Simple Example Here is a very simple example for the language of binary numerals. The meaning of a \"program\" is the number it represents. 1. Abstract Syntax: A. Syntactic Domains (nonterminals of the grammar that defines the abstract syntax):  B ε BinaryNumeral;  D ε BinaryDigit B. Grammar: B → BD | D D→0|1 2. Semantic algebras: A. Natural numbers PPL- 29 Dr. Rakesh Kumar June 20, 2021

Domain Nat = N Operators zero, one, two ... : Nat plus, times : (Nat x Nat)→Nat Note that we use words \"zero\", \"one\" etc, to distinguish these semantic objects from the corresponding syntactic objects 0, 1, etc. 3. Valuation Functions (One for each syntactic domain D; each function is defined by cases on D with one case for each grammar rule with D on the left-hand side.) We use bold font for the names of the valuation functions to distinguish them from the grammar symbols, and we use double brackets ([[ and ]]) around the arguments to remind us that the arguments themselves are really the abstract-syntax tree representations. A. B : BinaryNumeral → Nat B[[BD]] = (B[[B]] times two) plus D[[D]] B[[D]] = D[[D]] B. D : BinaryDigit → Nat D[[0]] = zero D[[1]] = one As noted above, objects inside [[ ]] are really abstract-syntax trees. So for example, B [[BD]] is really: B(B) /\\ BD /\\ /\\ /t1\\ /t2\\ ---- ---- which is: (B(B) times two) plus D(D) /\\ /\\ /t1\\ /t2\\ ---- ---- PPL- 30 Dr. Rakesh Kumar June 20, 2021

A \"program\" in this language is: 101 Its AST is: B /\\ BD /\\ | BD1 || D0 | 1 And its meaning is the B function applied to the root of the tree: B[[101]]. Since the root production is B→BD, where the right-hand-side B represents 10, and the right-hand-side D represents 1, we have: B[[101]] = (B[[10]] times two) plus D[[1]] Continuing this process: = ((B[[1]] times two) plus D[[0]]) times two) plus one = ((D[[1]] times two) plus zero) times two) plus one = ((one times two) plus zero) times two) plus one = five =============================================== Parameter Passing Techniques There are different ways in which parameter data can be passed into and out of methods and functions. Let us assume that a function B() is called from another function A(). In this case A is called the “caller function” and B is called the “called function or callee function”. Also, the arguments which A sends to B are called actual arguments and the parameters of B are called formal arguments. Terminology (a) Formal Parameter : A variable and its type as they appear in the prototype of the function or method. (b) Actual Parameter : The variable or expression corresponding to a formal parameter that PPL- 31 Dr. Rakesh Kumar June 20, 2021

appears in the function or method call in the calling environment. The following techniques are used to pass arguments in traditional imperative languages: (a) call by value (b) call by result (c) call by value-result (d) call by reference (e) call by name call by value The call by value method of passing arguments to a function copies the actual value of an argument into the formal parameter of the function. In this case, changes made to the formal parameter inside the function have no effect on the argument. Call by value is particularly efficient for small pieces of data, since they are trivial to copy, and since access to the formal parameter can be done efficiently. Java uses call by value for primitive data types. By default, C programming uses call by value to pass arguments. In general, it means the code within a function cannot alter the arguments used to call the function. Consider the function swap() definition as follows. /* function definition to swap the values */ void swap(int x, int y) { int temp; temp = x; /* save the value of x */ x = y; /* put y into x */ y = temp; /* put temp into y */ return; } Now, let us call the function swap() by passing actual values as in the following example: #include <stdio.h> /* function declaration */ void swap(int x, int y); int main () { PPL- 32 Dr. Rakesh Kumar June 20, 2021

/* local variable definition */ int a = 100; int b = 200; printf(\"Before swap, value of a : %d\\n\", a ); printf(\"Before swap, value of b : %d\\n\", b ); /* calling a function to swap the values */ swap(a, b); printf(\"After swap, value of a : %d\\n\", a ); printf(\"After swap, value of b : %d\\n\", b ); return 0; } Let us put the above code in a single C file, compile and execute it, it will produce the following result: Before swap, value of a :100 Before swap, value of b :200 After swap, value of a :100 After swap, value of b :200 It shows that there are no changes in the values, though they had been changed inside the function. call by reference Both the actual and formal parameters refer to same locations, so any changes made inside the function are actually reflected in actual parameters of caller. To pass the value by reference, argument reference is passed to the functions just like any other value. So accordingly you need to declare the function parameters as reference types as in the following function swap(), which exchanges the values of the two integer variables pointed to by its arguments. Call by reference is particularly efficient for large pieces of data (such as large arrays), since they don't need to be copied. Fortran uses call by reference. Insecurity in early FORTRANs -- passing a constant allowed the procedure to change the constant! An important related concept is aliasing. Two variables are aliased if they refer to the same storage location. A common way that aliasing can arise is using call by reference. A formal parameter can be aliased to a nonlocal variable, or two formal parameters can be aliased. PPL- 33 Dr. Rakesh Kumar June 20, 2021

// function definition to swap the values. void swap(int &x, int &y) { int temp; temp = x; /* save the value at address x */ x = y; /* put y into x */ y = temp; /* put x into y */ return; } For now, let us call the function swap() by passing values by reference as in the following example: #include <iostream> using namespace std; // function declaration void swap(int &x, int &y); int main () { // local variable declaration: int a = 100; int b = 200; cout << \"Before swap, value of a :\" << a << endl; cout << \"Before swap, value of b :\" << b << endl; /* calling a function to swap the values using variable reference.*/ swap(a, b); cout << \"After swap, value of a :\" << a << endl; cout << \"After swap, value of b :\" << b << endl; return 0; } When the above code is put together in a file, compiled and executed, it produces the following result − Before swap, value of a :100 PPL- 34 Dr. Rakesh Kumar June 20, 2021

Before swap, value of b :200 After swap, value of a :200 After swap, value of b :100 Call by Result : This method uses out-mode semantics. Just before control is transfered back to the caller, the value of the formal parameter is transmitted back to the actual parameter. This method is sometimes called call by result. In general, pass by result technique is implemented by copy. Call by Value-Result : This method uses in/out-mode semantics. It is a combination of Call- by-Value and Call-by-result. Just before the control is transferred back to the caller, the value of the formal parameter is transmitted back to the actual parameter. Call by name : This technique is used in programming language such as Algol. In this technique, symbolic “name” of a variable is passed, which allows it both to be accessed and update. Call by name re-evaluate the actual parameter on every use. For actual parameters that are simple variables, this is the same as call by reference. For actual parameters that are expressions, the expression is re-evaluated on each access. It should be a runtime error to assign into a formal parameter passed by name, if the actual parameter is an expression. One possible implementation: use anonymous function (\"thunk\") for call by name expressions. (This is traditionally used to implement call by name in languages such as Algol and Simula.) Call by name is not supported in recent languages, but is still an important concept in programming language theory. (In newer languages, if you want the effect of call by name, pass a procedure as a parameter.) Algol 60 has call by name, call by value Ada uses different designations: IN, OUT, IN OUT. For scalar data types (such as integers), IN is the same as call by value, OUT is the same as call by result, and IN OUT is the same as call by value result. In addition, IN parameters are local constants -- you can't assign into them. For compound data types (such as arrays), these can be implemented as above, or using call by reference. For objects (not primitive types), Java uses call-by-value with pointer semantics. In other words, Java passes a copy of the reference to the object to the called method (but doesn't copy the object itself). Smalltalk and Scheme work in exactly the same way. Unless you PPL- 35 Dr. Rakesh Kumar June 20, 2021

assign into the formal parameter, this gives the same answers as call-by-reference. However, if you assign into the formal parameter, the link with the actual parameter is broken. (Actually, Smalltalk won't let you assign into the formal parameter, but Java and Scheme will.) Example of call by value versus value-result versus reference The following examples are in an Algol-like language. begin integer n; procedure p(k: integer); begin n := n+1; k := k+4; print(n); end; n := 0; p(n); print(n); end; Note that when using call by reference, n and k are aliased. Output: call by value: 11 call by value-result: 1 4 call by reference: 5 5 =============================================== Type Equality The meaning of basic operations such as assignment (denoted by = in C) is specified in a language definition. Thus, for example, the meaning of statements such as x = y; here the value of object y is copied into the memory locations for variable x. However, before an operation such as an assignment can be accepted by the translator, usually the types of the two operands must be the same. Thus a language translator must decide whether two types are equal in some cases. There are two standard ways to PPL- 36 Dr. Rakesh Kumar June 20, 2021

determine whether two types are considered the same: name equivalence and structural equivalence. Name equivalence: It is the most straightforward: two types are equal if, and only if, they have the same name. Thus, for example, in the code (using C syntax) typedef struct { int data[100]; int count; } Stack; typedef struct { int data[100]; int count; } Set; Stack x, y; Set r, s; if name equivalence is used in the language then x and y would be of the same type and r and s would be of the same type, but the type of x or y would not be equivalent to the type of r or s. This means that statements such as x = y; r = s; would be valid, but statements such as x = r; would not be valid (i.e., would not be accepted by a translator). Using structural equivalence:, two types are equal if, and only if, they have the same \"structure\", which can be interpreted in different ways. A strict interpretation would be that the names and types of each component of the two types must be the same and must be listed in the same order in the type definition. A less stringent requirement would be that the component types must be the same and in the same order in the two types, but the names of the components could be different. Again looking at the example above, using structural equivalence the two types Stack and Set would be considered equivalent, which means that a translator would accept statements such as PPL- 37 Dr. Rakesh Kumar June 20, 2021

x = r; (Note that C doesn't support structural equivalence and will give error for above assignment.) Type Checking A type, also known as a data type, is a classification identifying one of various types of data. A type describes the possible values of a structure (such as a variable), the semantic meaning of that structure, and how the values of that structure can be stored in memory. Types can be broken down into categories:  Primitive types – these range based on language, but some common primitive types are integers, booleans, floats, and characters.  Composite types – these are composed of more than one primitive type, e.g. an array or record. All composite types are considered data structures.  Abstract types – types that do not have a specific implementation (and thus can be represented via multiple types), such as a hash, set, queue, and stack.  Other types – such as pointers (a type which holds as its value a reference to a different memory location) and functions. A type merely defines a set of rules and protocols behind how a piece of data is supposed to behave. Type Checking The existence of types is useless without a process of verifying that those types make logical sense in the program so that the program can be executed successfully. This is where type checking comes in. Type checking is the process of verifying and enforcing the constraints of types, and it can occur either at compile time (i.e. statically) or at runtime (i.e. dynamically). Type checking is all about ensuring that the program is type-safe, meaning that the possibility of type errors is kept to a minimum. A type error is an erroneous program behavior in which an operation occurs (or trys to occur) on a particular data type that it’s not meant to occur on. This could be a situation where an operation is performed on an integer with the intent that it is a float, or even something such as adding a string and an integer together: x = 1 + \"2\" While in many languages both strings and integers can make use of the + operator, this would often result in a type error because this expression is usually not meant to handle multiple data types. PPL- 38 Dr. Rakesh Kumar June 20, 2021

When a program is considered not type-safe, there is no single standard course of action that happens upon reaching a type error. Many programming languages throw type errors which halt the runtime or compilation of the program, while other languages have built-in safety features to handle a type error and continue running (allowing developers to exhibit poor type safety). Regardless of the aftermath, the process of type checking is a necessity. Two primary methods of type checking are: static type checking and dynamic type checking. Static Type Checking A language is statically-typed if the type of a variable is known at compile time instead of at runtime. Common examples of statically-typed languages include Ada, C, C++, C#, JADE, Java, Fortran, Haskell, ML, Pascal, and Scala. The big benefit of static type checking is that it allows many type errors to be caught early in the development cycle. Static typing usually results in compiled code that executes more quickly because when the compiler knows the exact data types that are in use, it can produce optimized machine code (i.e. faster and/or using less memory). Static type checkers evaluate only the type information that can be determined at compile time, but are able to verify that the checked conditions hold for all possible executions of the program, which eliminates the need to repeat type checks every time the program is executed. A static type-checker will quickly detect type errors in rarely used code paths. Without static type checking, even code coverage tests with 100% coverage may be unable to find such type errors. However, a detriment to this is that static type-checkers make it nearly impossible to manually raise a type error in your code because even if that code block hardly gets called – the type-checker would almost always find a situation to raise that type error and thus would prevent you from executing your program (because a type error was raised). Dynamic Type Checking Dynamic type checking is the process of verifying the type safety of a program at runtime. Common dynamically-typed languages include Groovy, JavaScript, Lisp, Lua, Objective-C, PHP, Prolog, Python, Ruby, Smalltalk and Tcl. Most type-safe languages include some form of dynamic type checking, even if they also have a static type checker. The reason for this is that many useful features or properties are difficult or impossible to verify statically. For example, suppose that a program defines two PPL- 39 Dr. Rakesh Kumar June 20, 2021

types, A and B, where B is a subtype of A. If the program tries to convert a value of type A to type B, which is known as down casting, then the operation is legal only if the value being converted is actually a value of type B. Therefore, a dynamic check is needed to verify that the operation is safe. Other language features that dynamic-typing enable include dynamic dispatch, late binding, and reflection. In contrast to static type checking, dynamic type checking may cause a program to fail at runtime due to type errors. In some programming languages, it is possible to anticipate and recover from these failures – either by error handling or poor type safety. In others, type checking errors are considered fatal. Because type errors are more difficult to determine in dynamic type checking, it is a common practice to supplement development in these languages with unit testing. All in all, dynamic type checking typically results in less optimized code than does static type checking; it also includes the possibility of runtime type errors and forces runtime checks to occur for every execution of the program (instead of just at compile-time). However, it opens up the doors for more powerful language features and makes certain other development practices significantly easier. For example, metaprogramming – especially when using eval functions – is not impossible in statically-typed languages, but it is much, much easier to work with in dynamically-typed languages. Common Misconceptions (a) A common misconception is to assume that all statically-typed languages are also strongly-typed languages, and that dynamically-typed languages are also weakly-typed languages. This isn’t true. A strongly-typed language is one in which variables are bound to specific data types, and will result in type errors if types to not match up as expected in the expression – regardless of when type checking occurs. A simple way to think of strongly-typed languages is to consider them to have high degrees of type safety. To give an example, in the following code block, a strongly-typed language would result in an explicit type error which ends the program’s execution, thus forcing the developer to fix the bug: x = 1 + \"2\" We often associate statically-typed languages such as Java and C# as strongly-typed (which they are) because data types are explicitly defined when initializing a variable. However, ruby, python, and javascript (all of which are dynamically-typed) are also strongly-typed languages PPL- 40 Dr. Rakesh Kumar June 20, 2021

and the developer makes no verbose statement of data type when declaring a variable. Both of the languages are strongly-typed, but employ different type checking methods. Languages such as ruby, python, and javascript which do not require manually defining a type when declaring a variable make use of type inference – the ability to programmatically infer the type of a variable based on its value. Some programmers automatically use the term weakly typed to refer to languages that make use of type inference, often without realizing that the type information is present but implicit. Type inference is a separate feature of a language that is unrelated to any of its type systems. A weakly-typed language on the other hand is a language in which variables are not bound to a specific data type; they still have a type, but type safety constraints are lower compared to strongly-typed languages. Take the following PHP code for example: // PHP $foo = \"x\"; $foo = $foo + 2; // not an error echo $foo; // 2 Because PHP is weakly-typed, this would not error. Just as the assumption that all strongly- typed languages are statically-typed, not all weakly-typed languages are dynamically-typed; PHP is a dynamically-typed language, but C – also a weakly-typed language – is indeed statically-typed. (b) Misconception is that most statically-typed languages are usually compiled when executed, and most dynamically-typed languages are interpreted when executed – but you can’t always assume that, and there’s a simple reason for this. When we say that a language is statically- or dynamically-typed, we are referring to that language as a whole. For example, no matter what version of Java you use – it will always be statically-typed. This is different from whether a language is compiled or interpreted, because in that statement we are referring to a specific language implementation. Thus in theory, any language can be compiled or interpreted. The most common implementation of Java is to compile to bytecode, and have the JVM interpret that bytecode – but there are other implementations of Java that compile directly to machine code or that just interpret Java code as is. Type Conversion Type conversion/ type casting/ type coercion are different ways of changing an expression PPL- 41 Dr. Rakesh Kumar June 20, 2021

from one data type to another, such as conversion of an integer value into a floating point value. Two important aspects of a type conversion are whether it happens implicitly or explicitly. Each programming language has its own rules on how types can be converted. Languages with strong typing typically do little implicit conversion and discourage the reinterpretation of representations, while languages with weak typing perform many implicit conversions between data types. Weak typing language often allow forcing the compiler to arbitrarily interpret a data item as having different representations—this can be a non-obvious programming error, or a technical method to directly deal with underlying hardware. Implicit conversion: For example, in an expression mixing integer and floating point numbers (like 5 + 0.1), the compiler will automatically convert integer representation into floating point representation so fractions are not lost. Important takeaways in C language: float to int causes truncation, i.e., removal of the fractional part. Double to float causes rounding of digit. Long to int causes dropping of excess higher order bits. One special case of implicit type conversion is type promotion, where the compiler automatically expands the binary representation of objects of integer or floating-point types. Promotions are commonly used with types smaller than the native type of the target platform's arithmetic logic unit (ALU), before arithmetic and logical operations, to make such operations possible, or more efficient if the ALU can work with more than one type. C and C++ perform such promotion for objects of boolean, character, wide character, enumeration, and short integer types which are promoted to int, and for objects of type float, which are promoted to double. Unlike some other type conversions, promotions never lose precision or modify the value stored in the object. Explicit type conversions: These are either indicated by writing additional code (e.g. adding type identifiers or calling built-in routines) or by coding conversion routines for the compiler to use when it otherwise would halt with a type mismatch. In the C family of languages and ALGOL 68, the word cast typically refers to an explicit type conversion (as opposed to an implicit conversion). Example: double da = 3.3; PPL- 42 Dr. Rakesh Kumar June 20, 2021

double db = 3.3; double dc = 3.4; int result = (int)da + (int)db + (int)dc; //result == 9 //if implicit conversion would be used (as with \"result = da + db + dc\"), result would be equal to 10 =============================================== Static and Dynamic Scoping The scope of a variable x is the region of the program in which uses of x refers to its declaration. Scoping is generally divided into two classes: 1. Static Scoping 2. Dynamic Scoping Static Scoping Static scoping is also called lexical scoping. In this scoping a variable always refers to its top level environment. This is a property of the program text and unrelated to the run time call stack. Static scoping also makes it much easier to make a modular code as programmer can figure out the scope just by looking at the code. In most of the programming languages including C, C++ and Java, variables are always statically (or lexically) scoped i.e., binding of a variable can be determined by program text and is independent of the run-time function call stack. For example, output for the below program is 10, i.e., the value returned by f() is not dependent on who is calling it (Like g() calls it and has a x with value 20). f() always returns the value of global variable x. // A C program to demonstrate static scoping. #include<stdio.h> int x = 10; // Called by g() int f() { return x; } // g() has its own variable named as x and calls f() int g() { PPL- 43 Dr. Rakesh Kumar June 20, 2021

int x = 20; return f(); } main() { printf(\"%d\", g()); } Output : 10 Dynamic Scoping A fundamental distinction in scoping is what \"part of a program\" means. In languages with static scoping, name resolution depends on the location in the source code and the lexical context, which is defined by where the named variable or function is defined. In contrast, in languages with dynamic scope the name resolution depends upon the program state when the name is encountered which is determined by the execution context or calling context. In practice, with lexical scope a variable's definition is resolved by searching its containing block or function, then if that fails searching the outer containing block, and so on, whereas with dynamic scope the calling function is searched, then the function which called that calling function, and so on, progressing up the call stack. Of course, in both rules, we first look for a local definition of a variable. // Since dynamic scoping is very uncommon in the familiar languages, we consider the following pseudo code as our example. It prints 20 in a language that uses dynamic scoping. int x = 10; // Called by g() int f() { return x; } // g() has its own variable named as x and calls f() int g() { int x = 20; return f(); } main() PPL- 44 Dr. Rakesh Kumar June 20, 2021

{ printf(g()); } Output in a a language that uses Dynamic Scoping : 20 =============================================== Storage Allocation It can be categorized as: (a) Static Storage Allocation and (b) Dynamic Storage Allocation. Static Storage Allocation Static storage allocation is appropriate when the storage requirements are known at compile time. For a compiled, linked language, the compiler can include the specific memory address for the variable or constant in the code it generates. (This may be adjusted by an offset at link time.) Examples: (i) code in languages without dynamic compilation (ii) all variables in FORTRAN IV (iii) global variables in C, Ada, Algol (iv) constants in C, Ada, Algol Dynamic Storage Allocation Dynamic memory allocation is the process of assigning the memory space during the execution time or the run time. Reasons and Advantage of allocating memory dynamically are: (a) When we do not know how much amount of memory would be needed for the program beforehand. (b) When we want data structures without any upper limit of memory space. (c) When you want to use your memory space more efficiently.Example: If you have allocated memory space for a 1D array as array[20] and you end up using only 10 memory spaces then the remaining 10 memory spaces would be wasted and this wasted memory cannot even be utilized by other program variables. (d) Dynamically created lists insertions and deletions can be done very easily just by the manipulation of addresses whereas in case of statically allocated memory insertions and deletions lead to more movements and wastage of memory. (e) When you want you to use the concept of structures and linked list in programming, dynamic memory allocation is a must. PPL- 45 Dr. Rakesh Kumar June 20, 2021

There are two types of available memories- stack and heap. Static memory allocation can only be done on stack whereas dynamic memory allocation can be done on both stack and heap. An example of dynamic allocation to be done on the stack is recursion where the functions are put into call stack in order of their occurrence and popped off one by one on reaching the base case. So static memory is something that the compiler allocates in advance. While dynamic memory is something that is controlled by the program during execution. The program may ask more of it or may delete some allocated. Stack-Based Storage Allocation Stacks in computing architectures are regions of memory where data is added or removed in a last-in-first-out (LIFO) manner. Stack-based storage allocation is appropriate when the storage requests obey a last-in, first-out discipline. Examples: (i) local variables in a procedure in C/C++, Ada, Algol, or Pascal (ii) procedure call information (return address etc). Because the data is added and removed in a last-in-first-out manner, stack-based memory allocation is very simple and typically faster than heap-based memory allocation. Another feature is that memory on the stack is automatically, and very efficiently, reclaimed when the function exits, which can be convenient for the programmer if the data is no longer required. If however, the data needs to be kept in some form, then it must be copied from the stack before the function exits. Therefore, stack based allocation is suitable for temporary data or data which is no longer required after the creating function exits. Heap-Based Storage Allocation The most flexible allocation scheme is heap-based allocation. Here, storage can be allocated and deallocated dynamically at arbitrary times during program execution. This will be more expensive than either static or stack-based allocation. Heap-based allocation is used ubiquitously in languages such as Scheme and Smalltalk, and for objects in Java. In this, important issue is when is storage allocated and deallocated? Allocation is easy. In C, malloc (a standard library function) allocates fresh storage. In Java new storage is allocated when the program makes a new instance of a class. But deallocation is harder. There are two approaches: programmer-controlled and automatic. In languages such as C, the programmer is in charge of deciding when heap storage can be freed (in C using the free function). This is efficient but can lead to problems if the programmer makes a mistake -- either storage is not PPL- 46 Dr. Rakesh Kumar June 20, 2021

freed even though it is no longer needed (memory leak), or is freed but referred to later (dangling pointer or dangling reference). Dangling pointers can lead to type insecurities or other errors -- one can refer to storage that has been re-allocated for another purpose with data of a different type. Some studies estimate that over 30% of development time on large C/ C++ projects is related to storage management issues. Java, Scheme, and various other languages, use automatic storage management. There is no explicit deallocate function; rather, storage is automatically reclaimed some time after it is no longer accessible. =============================================== Sequence Control It is control of the order of execution of the operations. Sequence Control may be categorized into four groups: (1) Expressions They form the building blocks for statements. An expression is a combination of variable, constants, and operators according to syntax of language. Properties as precedence rules and parentheses determine how expressions are evaluated (2) Statements The statements (conditional & iterative) determine how control flows from one part of program to another. (3) Declarative Programming This is an execution model of program which is independent of the program statements as in case of Logic programming model of PROLOG. (4) Subprograms In structured programming, program is divided into small sections and each section is called subprogram. Subprogram calls and coroutines, can be invoked repeatedly and transfer control from one part of program to another. Implicit and Explicit Sequence Control Implicit Sequence Control: Implicit or default sequence control structures are those defined by the programming language itself. These structures can be modified explicitly by the programmer. Eg. Most languages define physical sequence as the sequence in which statements are executed. Explicit Sequence Control: Explicit sequence control structures are those that programmer PPL- 47 Dr. Rakesh Kumar June 20, 2021

may optionally use to modify the implicit sequence of operations defined by the language. Eg. Use parentheses within expressions, or goto statements and labels. Sequence Control within expressions An expression is a formula for computing a value, and it is represented as a formalized sequence of operators, operands, and parentheses. An expression results in a computed value that resides in a temporary storage location until it is used. So, an expression has an implicit type that is derived from its subexpressions. An expression is written as a sequence of operators, operands, and parentheses. An operator is a primitive function represented by a single symbol or a string of two symbols. An operand, which represents a data value and is actually a means of accessing a value, may be a constant, a variable name, a function reference, an array reference, or even another expression. Many Programming Languages do not specify the order in which the operands of an operator are evaluated. That is, in (subexp1 op subexp2), the programmer cannot make the assumption of subexp1 being evaluated before subexp2; this is true for subprogram argument evaluation order, too. In both cases, compiler is free to choose the order of evaluation: while a compiler behaves in accordance with your expectations, another may choose to upset them. Best way to avoid this is to replace side-effect producing expressions with expressions without side-effects. An operation, function or expression is said to have a side effect if it modifies some state variable value(s) outside its local environment, that is to say has an observable effect besides returning a value (the main effect) to the invoker of the operation. For example, (i++ > i); must be replaced with k = i++; (k > i); Representation of Expressions There are a variety of ways to represent expressions. These are: (a) Functional form: Functional form represents all operations as functions; the operands are arguments to these functions. It may also be called applicative form or ordinary prefix form. Using this representation, 9 + 7 * 3 would be expressed as Add(9, Mult(7, 3)). (b) Cambridge Polish: Being the type of functional form employed in LISP, the parentheses surround the operator and its associated operands. According to this representation, 9 + 7 * 3 is expressed as (Add 9 (Mult 7 3)). (c) Infix: This is the traditional way of representing expressions where the operator appears in the middle of the operands. PPL- 48 Dr. Rakesh Kumar June 20, 2021

(d) Prefix: In prefix notation, the operator appears before the operands. It is also referred to as the Polish notation. Using this notation, 9 + 7 * 3 is expressed as + * 7 3 9. (e) Postfix: In postfix notation, operands appear before the operator. It is also known as the Reverse Polish notation. 9 + 7 * 3 is expressed as 7 3 * 9 +. Note that the prefix and postfix notations are parenthesis-free. That is because we do not need to impose the order of evaluation explicitly; the sequence of operators and operands alone are sufficient to indicate the order of operation application. Programming languages provide implicit control over expressions by means of precedence and associativity rules. Programmer can impose explicit control through the use of parentheses. Operator precedence: It dictates the order of evaluation of operators in an expression. Associativity: It defines the order in which operators of the same precedence are evaluated in an expression. Associativity can be either from left to right or right to left. In C, each operator has a fixed priority or precedence in relation to other operators. As a result, the operator with higher precedence is evaluated before the operator with lower precedence. Operators that appear in the same group have the same precedence. The following table lists operator precedence and associativity. Here, operators with the highest precedence appear at the top of the table, those with the lowest appear at the bottom. Within an expression, higher precedence operators will be evaluated first. Category Operator Associativity Postfix () [] -> . ++ - - Left to right Unary + - ! ~ ++ - - (type)* & sizeof Right to left Multiplicative * / % Left to right Additive +- Left to right Shift << >> Left to right Relational < <= > >= Left to right PPL- 49 Dr. Rakesh Kumar June 20, 2021

Category Operator Associativity Equality == != Left to right Bitwise AND & Left to right Bitwise XOR ^ Left to right Bitwise OR | Left to right Logical AND && Left to right Logical OR || Left to right Conditional ?: Right to left Assignment = += -= *= /= %=>>= <<= &= ^= |= Right to left Comma , Left to right Expression evaluation order. 9 + 7 * 3 is evaluated as 9 + (7 * 3) because * has higher precedence than +. 9 + 7 + 3 is evaluated as (9 + 7) + 3 because + associates to left. (9 + 7) * 3 is evaluated differently than the first expression due to the use of parentheses. Control Over Statements Control at the statement level is concerned with the ordering of the statements that make up the program–a composition that may be accomplished sequentially, by selection among alternatives, or by iteration. Simple Sequence Simple sequence, or sequential flow, is the default control flow at the statement level. Execution of the statements follows their order in the program source. Often, simple sequence alone is insufficient for expressing the necessary computation. Selection Selective composition is employed when we wish to choose among two or more alternative statements (or, groups of statements). The typical examples of selective composition are: PPL- 50 Dr. Rakesh Kumar June 20, 2021


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook