MASTER OF COMPUTER            APPLICATIONS             SEMESTER-II    STATISTICAL & NUMERICAL               METHODS                MCA645
CHANDIGARH UNIVERSITY              Institute of Distance and Online Learning                                       Course Development Committee    Chairman    Prof. (Dr.) Parag Diwan    Vice Chancellor, Chandigarh University, Gharuan, Punjab                                           Advisors    Prof. (Dr.) Bharat Bhushan, Director – IGNOU  Prof. (Dr.) Majulika Srivastava, Director – CIQA, IGNOU                             Programme Coordinators & Editing Team    Master of Business Administration (MBA) Bachelor of Business Administration (BBA)    Coordinator – Dr. Rupali Arora         Coordinator – Dr. Simran Jewandah    Master of Computer Applications (MCA)  Bachelor of Computer Applications (BCA)  Coordinator – Dr. Raju Kumar           Coordinator – Dr. Manisha Malhotra    Master of Commerce (M.Com.)            Bachelor of Commerce (B.Com.)  Coordinator – Dr. Aman Jindal          Coordinator – Dr. Minakshi Garg    Master of Arts (Psychology)            Bachelor of Science (Travel &Tourism Management)  Coordinator – Dr. Samerjeet Kaur       Coordinator – Dr. Shikha Sharma    Master of Arts (English)               Bachelor of Arts (General)  Coordinator – Dr. Ashita Chadha        Coordinator – Ms. Neeraj Gohlan                             Academic and Administrative Management    Prof. (Dr.) R. M. Bhagat               Prof. (Dr.) S.S. Sehgal  Executive Director – Sciences          Registrar    Prof. (Dr.) Abhishek                   Prof. (Dr.) Inderpreet Kaur  Executive Director – Management        Director – IDOL    Prof. (Dr.) Manaswini Acharya  Executive Director – Liberal Arts    © No part of this publication should be reproduced, stored in a retrieval system, or transmitted in any     form or by any means, electronic, mechanical, photocopying, recording and/or otherwise without the     prior written permission of the authors and the publisher.                                            SLM SPECIALLY PREPARED FOR                                                      CU IDOL STUDENTS    Printed and Published by:                 SCHOOLGURU EDUSERVE PVT LTD                      B-903, Western Edge II, Western Express Highway, Borivali (E),                    Mumbai - 400066                      Call Us:                    +91 22 4896 8005                      Mail Us:                    [email protected]    For: CHANDIGARH UNIVERSITY                                                                               2                    Institute of Distance and Online Learning                                                               CU IDOL SELF LEARNING MATERIAL (SLM)
First Published in 2020    All rights reserved. No Part of this book may be reproduced or transmitted, in any form or by  any means, without permission in writing from Chandigarh University. Any person who does  any unauthorized act in relation to this book may be liable to criminal prosecution and civil  claims for damages. This book is meant for educational and learning purpose. The authors of  the book has/have taken all reasonable care to ensure that the contents of the book do not  violate any existing copyright or other intellectual property rights of any person in any manner  whatsoever. In the even the Authors has/ have been unable to track any source and if any  copyright has been inadvertently infringed, please notify the publisher in writing for corrective  action.                                          3    CU IDOL SELF LEARNING MATERIAL (SLM)
CONTENT    Unit 1- Computer Arithmetic ..........................................................................................................5  Unit 2- Iterative Methods ..............................................................................................................33  Unit 3- Linear Equations and Differential Equations 1: ..............................................................68  Unit 4- Linear Equations and Differential Equations 2 ..............................................................83  Unit 5- Numerical Differentiation and Integration: ...................................................................108  Unit 6- Interpolation and Approximation 1: ..............................................................................134  Unit 7- Interpolation and Approximation 2: ..............................................................................151  Unit 8- Statistical methods..........................................................................................................163  Unit 9- Analysis of Variance: ......................................................................................................181  Unit 10- Time Series Analysis: ...................................................................................................209                                          4    CU IDOL SELF LEARNING MATERIAL (SLM)
UNIT 1- COMPUTER ARITHMETIC    Structure  1.0 Learning Objective  1.1 Introduction  1.2 Floating point representation of numbers             1.2.1 Fixed-point           1.2.2 Normalized floating-point           1.2.3 Representable numbers           1.2.4 IEEE standard  1.3 Arithmetic operations with normalized floating point numbers and their consequences.           1.3.1addition           1.3.2 Subtraction           1.3.3 Multiplication           1.3.4 Division           1.3.5 Issues in Floating Point           1.3.6 Consequences:  1.4 Error in number representation - pitfalls in computing.           1.4.1 Representation errors           1.4.2 Numerical Error           1.4.3 pitfalls in computing.  1.5 Summary  1.6 Keywords  1.7 Learning Activity  1.8 Unit End Questions  1.9 References    1.0 LEARNING OBJECTIVES    After studying this unit, you will be able to:       Explain the Floating point representation of numbers,       Enumerate Arithmetic operations with normalized floating point numbers and their           consequences.       Comprehend Error in number representation    1.1 INTRODUCTION                                                                                              5    CU IDOL SELF LEARNING MATERIAL (SLM)
Computer arithmetic is a branch of computer engineering that deals with methods of representing  integers and real values (e.g., fixed- and floating-point numbers) in digital systems and efficient  algorithms for manipulating such numbers by means of hardware circuits or software routines. On the  hardware side, various types of adders, subtractors, multipliers, dividers, square-rooters, and circuit  techniques for function evaluation are considered. Both abstract structures and technology-specific  designs are dealt with. Software aspects of computer arithmetic include complexity, error  characteristics, stability, and certifiability of computational algorithms.    1.2 FLOATING POINT REPRESENTATION OF NUMBERS    In computing, floating-point arithmetic (FP) is arithmetic using formulaic representation of real  numbers as an approximation to support a trade-off between range and precision. For this reason,  floating-point computation is often found in systems which include very small and very large real  numbers, which require fast processing times. A number is, in general, represented approximately to  a fixed number of significant digits (the significand) and scaled using an exponent in some fixed  base; the base for the scaling is normally two, ten, or sixteen.    The term floating point refers to the fact that a number's radix point (decimal point, or, more  commonly in computers, binary point) can \"float\"; that is, it can be placed anywhere relative to the  significant digits of the number. This position is indicated as the exponent component, and thus the  floating-point representation can be thought of as a kind of scientific notation.  A floating-point system can be used to represent, with a fixed number of digits, numbers of  different orders of magnitude: e.g. the distance between galaxies or the diameter of an atomic  nucleus can be expressed with the same unit of length. The result of this dynamic range is that the  numbers that can be represented are not uniformly spaced; the difference between two consecutive  representable numbers varies with the chosen scale.    Over the years, a variety of floating-point representations have been used in computers. In 1985,  the IEEE 754 Standard for Floating-Point Arithmetic was established, and since the 1990s, the most  commonly encountered representations are those defined by the IEEE.    1.2.1 Fixed-point    One possibility for handling numbers with fractional parts is to add bits after the decimal point: The  first bit after the decimal point is the halves place, the next bit the quarters place, the next bit the  eighths place, and so on.                                          6    CU IDOL SELF LEARNING MATERIAL (SLM)
Suppose that we want to represent 1.625(10). We would want 1 in the ones place, leaving us with  0.625. Then we want 1 in the halves place, leaving us with 0.625 − 0.5 = 0.125. No quarters will fit,  so put a 0 there. We want a 1 in the eighths place, and we subtract 0.125 from 0.125 to get 0.    So the binary representation of 1.625 would be 1.101(2).  The idea of fixed-point representation is to split the bits of the representation between the places to  the left of the decimal point and places to the right of the decimal point. For example, a 32-bit fixed-  point representation might allocate 24 bits for the integer part and 8 bits for the fractional part.    Fig 1.1: 32 bits fixed-point representation    To represent 1.625, we would use the first 24 bits to indicate 1, and we'd use the remaining 8 bits to  represent 0.625. Thus, our 32-bit representation would be:                                 00000000 00000000 00000001 10100000.    Fixed-point representation works reasonably well as long as you work with numbers within the  supported range. The 32-bit fixed-point representation described above can represent any multiple of  1/256 from 0 up to 224 ≈ 16.7 million. But programs frequently need to work with numbers from a  much broader range. For this reason, fixed-point representation isn't used very often in today's  computing world.  A notable exception is financial software. Here, all computations must be represented exactly to the  penny, and in fact further precision is rarely desired, since results are always rounded to the penny.  Moreover, most applications have no requirement for large amounts (like trillions of dollars), so the  limited range of fixed-point numbers isn't an issue. Thus, programs typically use a variant of fixed-  point representation that represents each amount as an integer multiple of 1/100, just as the fixed-  point representation described above represents each number as a multiple of 1/256.    1.2.2 Normalized floating-point  Floating-point representation is an alternative technique based on scientific notation.  Floating-point basics                                          7    CU IDOL SELF LEARNING MATERIAL (SLM)
Though we'd like to use scientific notation, we'll base our scientific notation on powers of 2, not  powers of 10, because we're working with computers that prefer binary. For example, 5.5(10) is  101.1(2) in binary, and it converts a to binary scientific notation of 1.011(2) × 22. In converting to  binary scientific notation here, we moved the decimal point to the left two places, just as we would in  converting 101.1(10) to scientific notation: It would be 1.011(10) × 102.)  Once we have a number in binary scientific notation, we still must have a technique for mapping that  into a set of bits. First, let us define the two parts of scientific representation: In 1.011(2) × 102, we  call 1.011(2) the mantissa (or the significand), and we call 2 the exponent. In this section we'll use  8 bits to store such a number.    Fig 1.2: 8 bit memory    We use the first bit to represent the sign (1 for negative, 0 for positive), the next four bits for the sum  of 7 and the actual exponent (we add 7 to allow for negative exponents), and the last three bits for the  mantissa's fractional part. Note that we omit the integer part of the mantissa: Since the mantissa must  have exactly one nonzero bit to the left of its decimal point, and the only nonzero bit is 1, we know  that the bit to the left of the decimal point must be a 1. There's no point in wasting space in inserting  this 1 into our bit pattern, so we include only the bits of the mantissa to the right of the decimal point.  For our example of 5.5(10) = 1.011(2) × 22, we add 7 to 2 to arrive at 9(10) = 1001(2) for the exponent  bits. Into the mantissa bits we place the bits following the decimal point of the scientific notation,  011. This gives us 0 1001 011 as the 8-bit representation of 5.5(10).  We call this floating-point representation because the values of the mantissa bits “float” along with  the decimal point, based on the exponent's given value. This is in contrast to fixed-  point representation, where the decimal point is always in the same place among the bits given.  Suppose we want to represent −96(10).        a. First, we convert our desired number to binary: −1100000(2).      b. Then we convert this to binary scientific notation: −1.100000(2) × 26.      c. Then we fit this into the bits.  1. We choose 1 for the sign bit since the number is negative.  2. We add 7 to the exponent and place the result into the four exponent bits. For this example, we      arrive at 6 + 7 = 13(10) = 1101(2).  3. The three mantissa bits are the first three bits following the leading 1: 100. If it happened that      there were 1 bit beyond the 1/8's place, we would need to round the mantissa to the nearest      eighth.)  Thus we end up with 1 1101 100.  Conversely, suppose we want to decode the number 0 0101 100.                                                                 8                           CU IDOL SELF LEARNING MATERIAL (SLM)
1. We observe that the number is positive, and the exponent bits represent 0101(2) = 5(10). This is 7      more than the actual exponent, and so the actual exponent must be −2. Thus, in binary scientific      notation, we have 1.100(2) × 2−2.    2. We convert this to binary: 1.100(2) × 2−2 = 0.011(2).    3. We convert the binary into decimal: 0.011(2) = 1/4 + 1/8 = 3/8 = 0.375(10).    Let's illustrate with an example, suppose that the 32-bit pattern is 1 1000 0001 011 0000 0000 0000  0000 0000, with:               S=1             E = 1000 0001             F = 011 0000 0000 0000 0000 0000  In the normalized form, the actual fraction is normalized with an implicit leading 1 in the form of 1.F.  In this example, the actual fraction is 1.011 0000 0000 0000 0000 0000 = 1 + 1×2^-2 + 1×2^-3 =  1.375D.  The sign bit represents the sign of the number, with S=0 for positive and S=1 for negative number. In  this example with S=1, this is a negative number, i.e., -1.375D.  In normalized form, the actual exponent is E-127 (so-called excess-127 or bias-127). This is because  we need to represent both positive and negative exponent. With an 8-bit E, ranging from 0 to 255, the  excess-127 scheme could provide actual exponent of -127 to 128. In this example, E-127=129-  127=2D.  Hence, the number represented is -1.375×2^2=-5.5D.    De-Normalized Form    Normalized form has a serious problem, with an implicit leading 1 for the fraction, it cannot  represent the number zero! Convince yourself on this!    De-normalized form was devised to represent zero and other numbers.    For E=0, the numbers are in the de-normalized form. An implicit leading 0 (instead of 1) is used for  the fraction; and the actual exponent is always -126. Hence, the number zero can be represented  with E=0 and F=0 (because 0.0×2^-126=0).  We can also represent very small positive and negative numbers in de-normalized form with E=0.  For example, if S=1, E=0, and F=011 0000 0000 0000 0000 0000. The actual fraction is 0.011=1×2^-  2+1×2^-3=0.375D. Since S=1, it is a negative number. With E=0, the actual exponent is -126. Hence  the number is -0.375×2^-126 = -4.4×10^-39, which is an extremely small negative number (close to  zero).    Example 1: Suppose that IEEE-754 32-bit floating-point representation pattern is 0 10000000 110  0000 0000 0000 0000 0000.                                                                                                           9    CU IDOL SELF LEARNING MATERIAL (SLM)
Sign bit S = 0 ⇒ positive number  E = 1000 0000B = 128D (in normalized form)  Fraction is 1.11B (with an implicit leading 1) = 1 + 1×2^-1 + 1×2^-2 = 1.75D  The number is +1.75 × 2^ (128-127) = +3.5D    Example 2: Suppose that IEEE-754 32-bit floating-point representation pattern is 1 01111110 100  0000 0000 0000 0000 0000.    Sign bit S = 1 ⇒ negative number  E = 0111 1110B = 126D (in normalized form)  Fraction is 1.1B (with an implicit leading 1) = 1 + 2^-1 = 1.5D  The number is -1.5 × 2^ (126-127) = -0.75D    Example 3: Suppose that IEEE-754 32-bit floating-point representation pattern is 1 01111110 000  0000 0000 0000 0000 0001.    Sign bit S = 1 ⇒ negative number    E = 0111 1110B = 126D (in normalized form)    Fraction is 1.000 0000 0000 0000 0000 0001B (with an implicit leading 1) = 1 + 2^-23    The number is -(1 + 2^-23) × 2^ (126-127) = -0.500000059604644775390625 (may not be exact in  decimal!)  Example 4 (De-Normalized Form): Suppose that IEEE-754 32-bit floating-point  representation pattern is 1 00000000 000 0000 0000 0000 0000 0001.    Sign bit S = 1 ⇒ negative number                                                                         10  E = 0 (in de-normalized form)                                                                       CU IDOL SELF LEARNING MATERIAL (SLM)
Fraction is 0.000 0000 0000 0000 0000 0001B (with an implicit leading 0) = 1×2^-23  The number is -2^-23 × 2^ (-126) = -2× (-149) ≈ -1.4×10^-45    1.2.3 Representable numbers    This 8-bit floating-point format can represent a wide range of both small numbers and large numbers.  To find the smallest possible positive number we can represent, we would want the sign bit to be 0,  we would place 0 in all the exponent bits to get the smallest exponent possible, and we would put 0 in  all the mantissa bits. This gives us 0 0000 000, which represents                                       1.000(2) × 20 − 7 = 2−7 ≈ 0.0078(10).    To determine the largest positive number, we would want the sign bit still to be 0, we would place 1  in all the exponent bits to get the largest exponent possible, and we would put 1 in all the mantissa  bits. This gives us 0 1111 111, which represents    1.111(2) × 215 − 7 = 1.111(2) × 28 = 111100000(2) =480(10).    Thus, our 8-bit floating-point format can represent positive numbers from about 0.0078(10) to  480(10). In contrast, the 8-bit two's-complement representation can only represent positive numbers  between 1 and 127.  But notice that the floating-point representation can't represent all of the numbers in its range — this  would be impossible, since eight bits can represent only 28 = 256 distinct values, and there are  infinitely many real numbers in the range to represent. What's going on? Let's consider how to  represent 51(10) in this scheme. In binary, this is 110011(2) = 1.10011(2) × 25. When we try to fit  the mantissa into the 3-bit portion of our scheme, we find that the last two bits won't fit: We would be  forced to round to 1.101(2) × 25, and the resulting bit pattern would be 0 1100 101. That rounding  means that we're not representing the number precisely. In fact, 1 1100 101 translates to    1.101(2) × 212 − 7 = 1.101(2) × 25 = 110100(2) = 52(10).    Thus, in our 8-bit floating-point representation, 51 equals 52! That's pretty irritating, but it's a price  we have to pay if we want to be able to handle a large range of numbers with such a small number of  bits.  (By the way, in rounding numbers that are exactly between two possibilities, the typical policy is to  round so that the final mantissa bit is 0. For example, taking the number 19, we end up with  1.0011(2) × 24, and we would round this up to 1.010(2) × 24 = 20(10). On the other hand, rounding                                                                                        11    CU IDOL SELF LEARNING MATERIAL (SLM)
up the number 21 = 1.0101(2) × 24 would lead to 1.011(2) × 24, leaving a 1 in the final bit of the  mantissa, which we want to avoid; so instead we round down to 1.010(2) × 24 = 20(10). Doubtless  you were taught in grade school to “round up” all the time; computers don't do this because if we  consistently round up, all those rounding’s will bias the total of the numbers upward. Rounding so  the final bit is 0 ensure that exactly half of the possible numbers round up and exactly half round  down.)  While a floating-point representation can't represent all numbers precisely, it does give us a  guaranteed number of significant digits. For this 8-bit representation, we get a single digit of  precision, which is pretty limited. To get more precision, we need more mantissa bits. Suppose we  defined a similar 16-bit representation with 1 bit for the sign bit, 6 bits for the exponent plus 31, and  9 bits for the mantissa.    Fig 1.3: 16 bit memory    This representation, with its 9 mantissa bits, happens to provide three significant digits. Given a  limited length for a floating-point representation, we have to compromise between more mantissa bits  (to get more precision) and more exponent bits (to get a wider range of numbers to represent). For  16-bit floating-point numbers, the 6-and-9 split is a reasonable trade-off of range versus precision.    1.2.4 IEEE standard    Nearly all computers today follow the IEEE 754 standard for representing floating-point numbers.  This standard was largely developed by 1980 and it was formally adopted in 1985, though several  manufacturers continued to use their own formats throughout the 1980's. This standard is similar to  the 8-bit and 16-bit formats we've explored already, but the standard deals with longer bit lengths to  gain more precision and range; and it incorporates two special cases to deal with very small and very  large numbers.  IEEE formats  There are two major varieties of the standard, for 32 bits and 64 bits, although it does define other  lengths, like 128 bits.    format       sign exponent  mantissa  exponent     significant  Our 8-bit    bit bits          bits    excess         digits  Our 16-bit    14                3         7             1  IEEE 32-bit   16                9         31            3  IEEE 64-bit   18               23        127            6                1 11             52       1,023          15                                                                   12                 CU IDOL SELF LEARNING MATERIAL (SLM)
format        sign exponent  mantissa  exponent     significant  IEEE 128-bit  bit bits          bits    excess         digits                 1 15             112     16,383          34    All formats use an offset for the exponent that is halfway up the range of integers that can fit into the  exponent bits; this is called the excess. For the 8-bit format, we had 4 exponent bits; the largest  number that can fit into 4 bits is 24 − 1 = 15, and so the excess is 7 ≈ 15/2. The IEEE 32-bit format  has 8 exponent bits, and so the largest number that fits is 255, and the excess is 127 ≈ 255/2.  In C programming, the float type corresponds to 32 bits, and double corresponds to 64 bits.  (Actually, C doesn't specify the length or representation for these types. But this is most common.  Java, incidentally, requires that these types correspond to the respective IEEE formats.)    The representation scheme for 64-bit double-precision is similar to the 32-bit single-precision:             The most significant bit is the sign bit (S), with 0 for positive numbers and 1 for negative                numbers.             The following 11 bits represent exponent (E).             The remaining 52 bits represents fraction (F).    The value (N) is calculated as follows:             Normalized form: For 1 ≤ E ≤ 2046, N = (-1) ^S × 1.F × 2^(E-1023).             Denormalized form: For E = 0, N = (-1) ^S × 0.F × 2^ (-1022). These are in the                denormalized form.             For E = 2047, N represents special values, such as ±INF (infinity), NaN (not a number).    There are three parts in the floating-point representation:             The sign bit (S) is self-explanatory (0 for positive numbers and 1 for negative numbers).             For the exponent (E), a so-called bias (or excess) is applied so as to represent both                positive and negative exponent. The bias is set at half of the range. For single precision                with an 8-bit exponent, the bias is 127 (or excess-127). For double precision with a 11-bit                exponent, the bias is 1023 (or excess-1023).             The fraction (F) (also called the mantissa or significand) is composed of an implicit                leading bit (before the radix point) and the fractional bits (after the radix point). The                leading bit for normalized numbers is 1; while the leading bit for denormalized numbers                is 0.                                                                     13                  CU IDOL SELF LEARNING MATERIAL (SLM)
Normalized Floating-Point Numbers  In normalized form, the radix point is placed after the first non-zero digit, e.g., 9.8765D×10^-  23D, 1.001011B×2^11B. For binary number, the leading bit is always 1, and need not be represented  explicitly - this saves 1 bit of storage.    In IEEE 754's normalized form:             For single-precision, 1 ≤ E ≤ 254 with excess of 127. Hence, the actual exponent is                from -126 to +127. Negative exponents are used to represent small numbers (< 1.0);                while positive exponents are used to represent large numbers (> 1.0).                  N = (-1)^S × 1.F × 2^(E-127)             For double-precision, 1 ≤ E ≤ 2046 with excess of 1023. The actual exponent is from -                1022 to +1023, and                  N = (-1)^S × 1.F × 2^(E-1023)    Take note that n-bit pattern has a finite number of combinations (=2^n), which could  represent finite distinct numbers. It is not possible to represent the infinite numbers in the real axis  (even a small range says 0.0 to 1.0 has infinite numbers). That is, not all floating-point numbers can  be accurately represented. Instead, the closest approximation is used, which leads to loss of accuracy.  The minimum and maximum normalized floating-point numbers are:    Table 1: The minimum and maximum normalized floating-point numbers                                          14    CU IDOL SELF LEARNING MATERIAL (SLM)
Fig 1.4: Representation of Normalized and Denormalized Floating- point numbers  Denormalized Floating-Point Numbers  If E = 0, but the fraction is non-zero, then the value is in denormalized form, and a leading bit of 0 is  assumed, as follows:               For single-precision, E = 0,                  N = (-1) ^S × 0.F × 2^ (-126)               For double-precision, E = 0,                  N = (-1)^S × 0.F × 2^(-1022)    Denormalized form can represent very small numbers closed to zero, and zero, which cannot be  represented in normalized form, as shown in the above figure.  The minimum and maximum of denormalized floating-point numbers are:    Table 2: Minimum and maximum of denormalized floating-point numbers    Special Values  Zero: Zero cannot be represented in the normalized form, and must be represented in denormalized  form with E=0 and F=0. There are two representations for zero: +0 with S=0 and -0 with S=1.  Infinity: The value of +infinity (e.g., 1/0) and -infinity (e.g., -1/0) are represented with an exponent  of all 1's (E = 255 for single-precision and E = 2047 for double-precision), F=0, and S=0 (for +INF)  and S=1 (for -INF).  Not a Number (NaN): NaN denotes a value that cannot be represented as real number  (e.g. 0/0). NaN is represented with Exponent of all 1's (E = 255 for single-precision and E = 2047 for  double-precision) and any non-zero fraction.    1.3 ARITHMETIC OPERATIONS WITH NORMALIZED FLOATING POINT  NUMBERS AND THEIR CONSEQUENCES.    Arithmetic operations on floating point numbers consist of addition, subtraction, multiplication and  division. The operations are done with algorithms similar to those used on sign magnitude integers                                                                                                                            15                                                                       CU IDOL SELF LEARNING MATERIAL (SLM)
(because of the similarity of representation) -- example, only add numbers of the same sign. If the  numbers are of opposite sign, must do subtraction.    1.3.1addition  Example1: on decimal value given in scientific notation:         3.25 x 10 ** 3     + 2.63 x 10 ** -1     -----------------       First step: align decimal points     Second step: add         3.25 x 10 ** 3     + 0.000263 x 10 ** 3     --------------------       3.250263 x 10 ** 3  (presumes use of infinite precision, without regard for accuracy)  Third step: normalize the result (already normalized!)  Example 2: on fl pt. value given in binary:    .25 = 0 01111101 00000000000000000000000  100 = 0 10000101 10010000000000000000000    To add these fl. pt. representations,    Step 1: align radix points    Shifting the mantissa Left By 1 Bit Decreases the Exponent By 1 Shifting the Mantissa Right By 1  Bit Increases the Exponent by 1    we want to shift the mantissa right, because the bits that fall off the end should come from the least  significant end of the mantissa    -> choose to shift the .25, since we want to increase its exponent.                                  16  -> shift by 10000101                        -01111101                                                               CU IDOL SELF LEARNING MATERIAL (SLM)
---------           00001000 (8) places.    0 01111101 00000000000000000000000 (original value)  0 01111110 10000000000000000000000 (shifted 1 place)                       (note that hidden bit is shifted into msb of mantissa)  0 01111111 01000000000000000000000 (shifted 2 places)  0 10000000 00100000000000000000000 (shifted 3 places)  0 10000001 00010000000000000000000 (shifted 4 places)  0 10000010 00001000000000000000000 (shifted 5 places)  0 10000011 00000100000000000000000 (shifted 6 places)  0 10000100 00000010000000000000000 (shifted 7 places)  0 10000101 00000001000000000000000 (shifted 8 places)    step 2: add (don't forget the hidden bit for the 100)      0 10000101 1.10010000000000000000000 (100)  + 0 10000101 0.00000001000000000000000 (.25)  ---------------------------------------           0 10000101 1.10010001000000000000000    step 3: normalize the result (get the \"hidden bit\" to be a 1)    it already is for this example.    result is            0 10000101 10010001000000000000000    Example 3: Floating point addition:    Add the following two decimal numbers in scientific notation:                                               8.70 × 10-1 with 9.95 × 101    1. Rewrite the smaller number such that its exponent matches with the exponent of the larger    number.                                         8.70 × 10-1 = 0.087 × 101    2. Add the mantissas                              9.95 + 0.087 = 10.037 and write the sum 10.037 × 101    3. Put the result in Normalised Form                          10.037 × 101 = 1.0037 × 102 (shift mantissa, adjust exponent)             check for overflow/underflow of the exponent after normalisation                                                                                                  17             CU IDOL SELF LEARNING MATERIAL (SLM)
4. Round the result      If the mantissa does not fit in the space reserved for it, it has to be rounded off.      For Example: If only 4 digits are allowed for mantissa                                            1.0037 × 102 ===> 1.004 × 102    Example 4: addition in binary  Perform 0.5 + (-0.4375)                                         0.5 = 0.1 × 20 = 1.000 × 2-1 (normalised)                                 -0.4375 = -0.0111 × 20 = -1.110 × 2-2 (normalised)      1. Rewrite the smaller number such that its exponent matches with the exponent of the larger           number.                                                     -1.110 × 2-2 = -0.1110 × 2-1      2. Add the mantissas:                                             1.000 × 2-1 + -0.1110 × 2-1 = 0.001 × 2-1      3. Normalise the sum, checking for overflow/underflow:                                                       0.001 × 2-1 = 1.000 × 2-4                                    -126 <= -4 <= 127 ===> No overflow or underflow      4. Round the sum:                                        The sum fits in 4 bits so rounding is not required           Check: 1.000 × 2-4 = 0.0625 which is equal to 0.5 - 0.4375  (which is correct)    1.3.2 Subtraction    Like addition as far as alignment of radix points then the algorithm for subtraction of sign mag.  numbers takes over.  Before subtracting, compare magnitudes (don't forget the hidden bit!) change sign bit if order of  operands is changed. Don't forget to normalize number afterward.  Example 1: Let the two numbers be  x = 9.75  y = – 0.5625    Converting them into 32-bit floating point representation                                                18  9.75’s representation in 32-bit format = 0 10000010 00111000000000000000000  – 0.5625’s representation in 32-bit format = 1 01111110 00100000000000000000000  Now, we find the difference of exponents to know how much shifting is required.  (10000010 – 01111110)2 = (4)10  Now, we shift the mantissa of lesser number right side by 4 units.  Mantissa of – 0.5625 = 1.00100000000000000000000  (note that 1 before decimal point is understood in 32-bit representation)                                                                       CU IDOL SELF LEARNING MATERIAL (SLM)
Shifting right by 4 units, 0.00010010000000000000000  Mantissa of 9.75= 1. 00111000000000000000000    Subtracting mantissa of both  0. 00010010000000000000000  – 1. 00111000000000000000000  ————————————————  1. 00100110000000000000000    Sign bit of bigger number = 0  So, finally the answer = x – y = 0 10000010 00100110000000000000000    1.3.3 Multiplication  Example 1 on decimal values given in scientific notation:         3.0 x 10 ** 1     + 0.5 x 10 ** 2     -----------------    algorithm: multiply mantissas add exponents         3.0 x 10 ** 1     + 0.5 x 10 ** 2     -----------------      1.50 x 10 ** 3    Example 2 in binary: use a mantissa that is only 4 bits so that I don't spend all day just doing the  multiplication part.       0 10000100 0100    x 1 00111100 1100    -----------------    mantissa multiplication:     1.0100  (don't forget hidden bit)  x 1.1100                                          ------                                         00000                                                                     19                               CU IDOL SELF LEARNING MATERIAL (SLM)
00000                                 10100                                 10100                                10100                                ---------                               1000110000  becomes 10.00110000    add exponents: always add true exponents                                  (otherwise, the bias gets added in twice)     biased:   10000100  + 00111100  ----------      10000100 01111111 (switch the order of the subtraction,  - 01111111 - 00111100 so that we can get a negative value)  ---------- ----------      00000101 01000011    true exp true exp       is 5. is -67  add true exponents 5 + (-67) is -62. re-bias exponent: -62 + 127 is 65. unsigned representation  for 65 is 01000001.put the result back together (and add sign bit).    1 01000001 10.00110000    normalize the result:          (moving the radix point one place to the left increases           the exponent by 1.)    1 01000001 10.00110000   becomes  1 01000010 1.000110000  This is the value stored (not the hidden bit!): 1 01000010 000110000    Example 3: Floating Point Multiplication                                                                 20  Multiply the following two numbers in scientific notation by hand:                                                 1.110 × 1010 × 9.200 × 10-5                                                                       CU IDOL SELF LEARNING MATERIAL (SLM)
1. Add the exponents to find                                                 New Exponent = 10 + (-5) = 5    If we add biased exponents, bias will be added twice. Therefore we need to subtract it once to  compensate:                                               (10 + 127) + (-5 + 127) = 259  259 - 127 = 132 which is (5 + 127) = biased new exponent        2. Multiply the mantissas                                                   1.110 × 9.200 = 10.212000             Can only keep three digits to the right of the decimal point, so the result is                                                            10.212 × 105        3. Normalise the result                                                            1.0212 × 106        4. Round it                                                             1.021 × 106    Example 4: multiplication in binary:                                                1.000 × 2-1 × -1.110 × 2-2        1. Add the biased exponents                                    (-1 + 127) + (-2 + 127) - 127 = 124 ===> (-3 + 127)        2. Multiply the mantissas            1.000            × 1.110          -----------                 0000              1000              1000          + 1000          -----------             1110000 ===> 1.110000    The product is 1.110000 × 2-3  Need to keep it to 4 bits 1.110 × 2-3    3. Normalise (already normalised)      At this step check for overflow/underflow by making sure that                                               -126 <= Exponent <= 127                                             1 <= Biased Exponent <= 254    4. Round the result (no change)  5. Adjust the sign.                                                                                                    21    CU IDOL SELF LEARNING MATERIAL (SLM)
Since the original signs are different, the result will be negative                                                        -1.110 × 2-3    1.3.4 Division    Similar to multiplication.    true division:    do unsigned division on the mantissas (don't forget the hidden bit) subtract TRUE exponents    The IEEE standard is very specific about how all this is done. Unfortunately, the hardware to do all    this is pretty slow.    Some comparisons of approximate times:    2's complement integer add 1 time unit    fl. pt add                  4-time units    fl. pt. multiply            6-time units    fl. pt. divide              13-time units    There is a faster way to do division. It’s called division by reciprocal approximation. It takes about  the same time as a fl. pt. multiply. Unfortunately, the results are not always the same as with true  division.  Division by reciprocal approximation:        instead of doing a / b      they do a x 1/b.      figure out a reciprocal for b, and then use the fl. pt. multiplication hardware.   example of a result that isn't the same as with true division.    true division: 3/3 = 1 (exactly)    reciprocal approx.: 1/3 = .33333333                                        3 x .33333333 = .99999999, not 1     It is not always possible to get a perfectly accurate reciprocal.    1.3.5 Issues in Floating Point    Note: This discussion only touches the surface of some issues that people deal with. Entire courses  could probably be taught on each of the issues.    Rounding  --------  Arithmetic operations on fl. pt. values compute results that cannot be represented in the given amount  of precision. So, we must round results.                                                                                                          22                                CU IDOL SELF LEARNING MATERIAL (SLM)
There are MANY ways of rounding. They each have \"correct\" uses, and exist for different reasons.  The goal in a computation is to have the computer round such that the end result is as \"correct\" as  possible. There are even arguments as to what is really correct.    3 methods of rounding:  R toward 0 -- also called truncation. figure out how many bits (digits) are available. Take that many  bits (digits) for the result and throw away the rest. This has the effect of making the value represented  closer to 0.  Example:    .7783 if 3 decimal places available, .778               if 2 decimal places available, .77    R toward + infinity --  Regardless of the value, round towards +infinity.  Example:                1.23 if 2 decimal places, 1.3             -2.86 if 2 decimal places, -2.8    Round toward - infinity --  Regardless of the value, round towards -infinity.  Example:                1.23 if 2 decimal places, 1.2             -2.86 if 2 decimal places, -2.9    In binary -- rounding to 2 digits after radix point  ----------------------------------------------------     round toward + infinity --        1.1101                   |    1.11 | 10.00                      ------        1.001                   |    1.00 | 1.01                        -----                                                                      23                                CU IDOL SELF LEARNING MATERIAL (SLM)
Round toward - infinity --              1.1101                         |          1.11 | 10.00       ------              1.001                         |          1.00 | 1.01       -----  Round toward zero (TRUNCATE) --              1.1101                         |          1.11 | 10.00       ------              1.001                         |          1.00 | 1.01       -----        -1.1101                   |    -10.00 | -1.11            ------        -1.001                   |    -1.01 | -1.00           -----    Round toward nearest --  ODD CASE: if there is anything other than 1000... to the right of the number of digits to be kept,  then rounded in IEEE standard such that the least significant bit (to be kept) is a zero.                                                                                                        24                        CU IDOL SELF LEARNING MATERIAL (SLM)
1.1111                   |    1.11 | 10.00            ------         1.1101                   |    1.11 | 10.00  ------         1.001 (ODD CASE)                   |    1.00 | 1.01  -----        -1.1101 (1/4 of the way between)                   |    -10.00 | -1.11            ------        -1.001 (ODD CASE)                   |    -1.01 | -1.00           -----    NOTE: this is a bit different than the \"round to nearest\" algorithm  (for the \"tie\" case, .5) learned in elementary school for decimal numbers.    Use of standards  ----------------  --> allows all machines following the standard to exchange data and to calculate the exact same  results.  --> IEEE fl. pt. standard sets parameters of data representation (# bits for mantissa vs. exponent)  --> Pentium architecture follows the standard    Overflow and Underflow  ----------------------                                                                                                         25                            CU IDOL SELF LEARNING MATERIAL (SLM)
Just as with integer arithmetic, floating point arithmetic operations  can cause overflow. Detection of overflow in fl. pt. comes by checking exponents before/during  normalization.    Once overflow has occurred, an infinity value can be represented and propagated through a  calculation.  Underflow occurs in fl. pt. representations when a number is too small (close to 0) to be represented.  (show number line!) if a fl. pt. value cannot be normalized (getting a 1 just to the left of the radix  point would cause the exponent field to be all 0's) then underflow occurs.    1.3.6 Consequences:    You might wish that the standard laws of arithmetic would apply to numeric representations. For  example, it would be convenient if the commutative law (x + y = y + x) applied to addition.    With integer arithmetic using two's-complement representation, all the standard laws apply except  those that involve division. Identities involving division fail, of course, because division often results  in non-integers, which can only be approximated in a two's-complement representation.    With IEEE floating-point numbers, however, many laws fall apart. The commutative law still holds  for both addition and multiplication. But the associative law of addition (x + (y + z) = (x + y) + z)  does not: Suppose x and y were the largest possible numeric value, and z were the most negative  numeric value. Then the value of x + (y + z) would be x, since y + z is 0 and x + 0 is x. But the value  of (x + y) + z would be positive infinity, since x + y results in overflow, which is equivalent to  infinity in the IEEE format, and any finite number added to infinity results in infinity again.    The associative law fails for addition even without resorting to such special cases. Going back to our  8-bit representation, consider x = 1, y = 52, z = −52. If you evaluate x + (y + z), you first notice  that y + z = 0 and so x + (y + z) is 1. But (x + y) + z is 0, since the value 1 + 52 is still 52 due to  rounding.    Another example of something that should be true but isn't in IEEE floating-point arithmetic is the  following equation.                                           1/6 + 1/6 + 1/6 + 1/6 + 1/6 + 1/6 = 1.    The reason this fails is that 1/6 has no precise representation with a fixed number of mantissa bits,  and so it must be approximated with the IEEE format. As it happens, it underestimates slightly. When  we add this underestimation to itself 6 times, the total ends up being less than 1.                                          26    CU IDOL SELF LEARNING MATERIAL (SLM)
Such oddities are problems, and good programmers are very aware of them. Nobody has really come  up with an alternative that is convincingly better than the IEEE approach to representing numbers:  All have some odd anomalies, and the IEE approach avoids these better than most. Consequently, all  modern computers use this representation for non-integral numbers.    1.4 ERROR IN NUMBER REPRESENTATION - PITFALLS IN COMPUTING.    1.4.1 Representation errors    Floating-point numbers have a few obvious drawbacks:      1. Most real numbers cannot be represented exactly      2. Small representation errors can accumulate systematically during arithmetic operations    Let’s write a program that (in a quite naive way) attempts to range over the values -3, -2.9, -2.8, ..., 2.8,  2.9 using a for loop that increments a float variable.    #include <iostream>  using std::cout;  using std::endl;    #include <iomanip>  using std::setw;    int main()  {         cout.precision(10);       int n = -30;       for (float x = -3.0; x < 3.0; x += 0.1, n += 1)       cout << setw(16) << x << setw(6) << 0.1*n << endl;       return 0;  }    As the output of this program makes clear, repeatedly adding 0.1 to -3.0 does not give the desired  result. Each value is slightly off, and the comparison (x < 3.0) fails to cut off the sequence at 2.9.                     -3     -3  -2.900000095         -2.9  -2.800000191         -2.8  -2.700000286         -2.7  -2.600000381         -2.6     -0.400000602        -0.4   -0.300000608        -0.3  -0.2000006139        -0.2                                                                                                            27                                CU IDOL SELF LEARNING MATERIAL (SLM)
-0.1000006124   -0.1  -6.124377023e-07          0         0.09999939054    0.1          0.199999392   0.2                        0.3        0.2999993861    0.4        0.3999993801           2.699999094   2.7         2.799998999   2.8         2.899998903   2.9         2.999998808                          3    Why does this happen? Although -3.0 can be exactly represented, 0.1 cannot. The closest we can get  is 0.1000000015, which is represented as follows.    s/eeeeeeee/fffffffffffffffffffffff  0 01111011 10011001100110011001101    The exact number is a non- terminating fraction: \\(0.1 = (1.100110011001100\\ldots)_2\\).    1.4.2 Numerical Error    Floating point numbers are a peculiar finite subset of the rationales, designed to span many orders of    magnitude and to have a consistent number of values in each factor-two interval. Since the exponent    field is finite, there are minimum and maximum values that can be represented.            largest number              smallest number                             epsilon    float   3.40E+38                    1.5E-45                                     1.19E-7    double  1.80E+308                   5.0E-324                                    2.22E-16    sdfs    type largest number                 smallest number                             epsilon    float 3.40E+38                      1.5E-45                                     1.19E-7    double  1.80E+308                   5.0E-324                                    2.22E-16    This means that any number represented as a float that is greater than \\(3.4\\times 10^{38}\\) is an inf;  any that is smaller than \\(1.5\\times 10^{-45}\\) is identically 0. Within the representable range, an  arbitrary real number \\(x = \\hat{x} + \\delta_x\\) will be closest to some floating point  value \\(\\hat{x}\\) but a distance \\(\\delta_x\\) away. It’s easy to show that the relative error \\ (\\epsilon_x =  \\delta_x/x\\) is bounded by the machine epsilon \\(\\epsilon_\\text{m}\\), defined to be the smallest value  such that \\ (1 + \\epsilon_\\text{m} > 1\\). (That is, the bounds on the absolute error \\(\\delta_x\\) scale  with \\(|x|\\).)  In the case of a float, \\ (1 = +2^ {127-127} \\times (1+0.0\\ldots 0) \\). The next highest number has an  identical bit pattern except that the least significant bit is on.                                                                                                                            28                                                                       CU IDOL SELF LEARNING MATERIAL (SLM)
\\[\\begin{split}0/01111111/00000000000000000000000 &= 1\\\\  0/01111111/00000000000000000000001 &= 1 + \\epsilon_\\text{m}\\end{split}\\]  The difference corresponds to \\(\\epsilon_\\text{m} = 2^ {-23} = 1.19\\times 10^ {-7} \\). Hence, the  boolean test 1.0F + 1E-9F == 1.0F evaluates to true (whereas 1.0 + 1E-9 == 1.0 evaluates to false,  because \\(\\epsilon_\\text{m} = 2^ {-52} = 2.22\\times 10^ {-16} \\) for doubles).  Such representation errors are only a small part of what you need to worry about. There are many other  sources of error.        1. Rounding and truncation errors occur during conversions between types      2. Errors are introduced when series expansions are cut off: e.g., cos(0.5*M_π) != 0.0, even             though (cos π/2) is zero; such behaviour is common to all transcendental functions in           the cmath library      3. Addition of floating point values having very different orders of magnitude may lead to (mild)           loss of significance      4. Subtraction of nearly equal values may lead to (severe) loss of significance      5. Multiplication can lead to underflow or overflow if the exponents of the two operands are both           large or both small      6. Division is never entirely safe, since verifying that the divisor is not zero does not guarantee           that a division will not overflow and yield infinity    1.4.3 pitfalls in computing.    HW vs. SW computing       Floating point operations can be done by hardware (circuitry) or by software (program           code).       A programmer won't know which is occurring, without prior knowledge of the HW.       SW is much slower than HW. by approx. 1000 times.       A difficult (but good) exercise for students would be to design a SW algorithm for doing fl.           pt. addition using only integer operations.       SW to do fl. pt. operations is tedious. It takes lots of shifting and masking to get the data in           the right form to use integer arithmetic operations to get a result -- and then more shifting and           masking to put the number back into fl. pt. format.       A common thing that manufacturers used to do is to offer 2 versions of the Same           architecture, one with HW, and the other with SW fl. pt. ops.    1.5 SUMMARY    Computer arithmetic is a field of computer science that investigates how computers should represent numbers  and perform operations on them. It includes integer arithmetic, fixed-point arithmetic, and the arithmetic this  book focuses on: floating-point (FP) arithmetic. he common way computers approximate real numbers and                                          29    CU IDOL SELF LEARNING MATERIAL (SLM)
that it is described in the IEEE-754 standard [IEE 08]. As in scientific notations, numbers are represented  using an exponent and a significand, except that this significand has to fit on a certain amount of bits. As this  number of bits (called precision) is limited, each operation may be inexact due to the rounding. This  makes computer arithmetic sometimes inaccurate: the result of a long computation may be far from the  mathematical result that would have been obtained if all the computations were correct. This also makes  computer arithmetic unintuitive: for instance, the FP addition is not always associative.    1.6 KEYWORDS     Order of magnitude: An order of magnitude is an exponential change of plus-or-minus 1 in the      value of a quantity or unit. The term is generally used in conjunction with power-of-10.     A non-terminating: non-repeating decimal is a decimal number that continues endlessly, with no      group of digits repeating endlessly.     Rounding Error: Because floating-point numbers have a limited number of digits, they cannot      represent all real numbers accurately: when there are more digits than the format allows, the      leftover ones are omitted - the number is rounded     Truncation Error: A number that is Truncated or Chopped at the digit means that all digits      after the digit are removed. For example, consider the decimal number     Numeric Value: is anything of, relating to, or containing numbers.    1.7 LEARNING ACTIVITY    1. Explain IEEE 754 Floating-Point Standard    --------------------------------------------------------------------------------------------------------------------  -------------------------------------------------------------------------------------------------------------------    2. How is a binary number  can be represented in 14-bit floating-point  form?    --------------------------------------------------------------------------------------------------------------------  -------------------------------------------------------------------------------------------------------------------    1.8 UNIT END QUESTIONS    A. Descriptive Type Questions    1. What is Fixed Point?                                                                                               30  2. Explain Normalized Floating point?  3. Discuss Multiplication of Floating point?  4. What are the consequences discuss operations of in the arithmetic Floating pointed  5. Write a note on error in number representation.                                                                 CU IDOL SELF LEARNING MATERIAL (SLM)
B. Multiple Choice Questions:    1. The numbers written to the power of 10 in the representation of decimal numbers are called as      _____      a) Height factors      b) Size factors      c) Scale factors      d) None of the mentioned    2. If the decimal point is placed to the right of the first significant digit, then the number is called      ________      a) Orthogonal      b) Normalized      c) Determinate      d) None of the mentioned    3. _______ constitute the representation of the floating number.      a) Sign      b) Significant digits      c) Scale factor      d) All of the mentioned    4. The sign followed by the string of digits is called as ______      a) Significant      b) Determinant      c) Mantissa      d) Exponent    5. In IEEE 32-bit representations, the mantissa of the fraction is said to occupy ______ bits.      a) 24      b) 23      c) 20      d) 16    Answer:  1 – c; 2 - b; 3 - d; 4 -c; 5 - b;                                                                                                              31    CU IDOL SELF LEARNING MATERIAL (SLM)
1.9 REFERENCES     Rajaraman V., Computer Oriented Numerical Method. New Delhi: Prentice Hall.   Salaria R.S.A, Textbook of Statistical and Numerical Methods in Engineering, Delhi: Khanna        Book Publishing Company.   Gupta S.P. and Kapoor, V.K. (2014). Fundamentals of Mathematical Statistics. Delhi: Sultan        Chand and Sons.   Anderson (1990). Statistical Modelling .New York: McGraw Publishing House.   Gupta S.P. and Kapoor, V.K. (2015). Fundamentals of Applied statistics. Delhi: Sultan Chand &        Sons.   Graybill (1990). Introduction to Statistics. New York: McGraw Publishing House.   J. H. Wilkinson, The Algebraic Eigenvalue Problem (Numerical Mathematics and Scientific        Computation), Clarendon Press   Numerical Methods & Analysis – Engineering App – Google Play store   https://en.wikibooks.org/wiki/Statistics/Numerical_Methods/Numerical_Comparison_of_Statistic        al_Software                                          32    CU IDOL SELF LEARNING MATERIAL (SLM)
UNIT 2- ITERATIVE METHODS    Structure  2.0 Learning Objective  2.1 Introduction  2.2 Bisection             2.2.1 Bisection Method Algorithm           2.2.2 Bisection Method Example  2.3 False position,  2.4 Newton-Raphson methods,  2.5 Polynomial evaluation,  2.6 Solving polynomial equations (Bairstow's Method).  2.7 Summary  2.8 Keywords  2.9 Learning Activity  2.10 Unit End Questions  2.11 References    2.0 LEARNING OBJECTIVE:    After studying this unit, you will be able to:       Explain the Different iterative methods like Bisection, False position, Newton-Raphson           methods       Comprehend the Polynomial evaluation and solving polynomial equations using (Bairstow's           Method).    2.1 INTRODUCTION    The Iterative Method is a mathematical way of solving a problem which generates a sequence of  approximations. This method is applicable for both linear and nonlinear problems with large number  of variables. The word Iterative or Iteration refers to the technique that solve any linear system  problems with successive approximation at each step.    2.2 BISECTION METHOD    The bisection method is used to find the roots of a polynomial equation. It separates the interval and  subdivides the interval in which the root of the equation lies. The principle behind this method is the  intermediate theorem for continuous functions. It works by narrowing the gap between the positive  and negative intervals until it closes in on the correct answer. This method narrows the gap by taking                                                           33    CU IDOL SELF LEARNING MATERIAL (SLM)
the average of the positive and negative intervals. It is a simple method, and it is relatively slow. The  bisection method is also known as interval halving method, root-finding method, binary search  method or dichotomy method.  Let, consider a continuous function “f” which is defined on the closed interval [a, b], is given with  f(a) and f(b) of different signs. Then by intermediate theorem, there exists a point x belong to (a, b)  for which f(x) =0.    2.2.1 Bisection Method Algorithm    Follow the below procedure to get the solution for the continuous function:  For any continuous function f(x),         Find two points, say a and b such that a < b and f(a)* f(b) < 0       Find the midpoint of a and b, say “t”       t is the root of the given function if f(t) = 0; else follow the next step       Divide the interval [a, b]       If f(t)*f(b) <0, let a = t       Else if f(t) *f(a), let b = t       Repeat above three steps until f(t) = 0.  The bisection method is an approximation method to find the roots of the given equation by  repeatedly dividing the interval. This method will divide the interval until the resulting interval is  found, which is extremely small.    2.2.2 Bisection Method Example  Example-1: Find a root of an equation f(x)=x3-x-1 using Bisection method  Solution:  Here x3-x-1 =0  Let f(x)= x3-x-1    Here          x 0 12        f(x) -1 -1 5    1st iteration :    Here f(1)= -1 < 0 and f(2) = 5 > 0  ∴ Now, Root lies between 1 and 2                                          34    CU IDOL SELF LEARNING MATERIAL (SLM)
x0 = = 1.5                                                                                               35    f(x0) = f(1.5) = 0.875 > 0  2nd iteration :  Here f(1)= - 1 < 0 and f(1.5) = 0.875 > 0  ∴ Now, Root lies between 1 and 1.5  x1 = =1.25    f(x1) = f(1.25) = – 0.29688 < 0  3rd iteration :  Here f(1.25) = – 0.29688 < 0 and f(1.5) = 0.875 > 0  ∴ Now, Root lies between 1.25 and 1.5  x2 = = 1.375    f(x2) = f(1.375) = 0.22461 > 0  4th iteration :  Here f(1.25) = – 0.29688 < 0 and f(1.375) = 0.22461 > 0  ∴ Now, Root lies between 1.25 and 1.375  x3 = =1.3125    f(x3) = f(1.3125) = – 0.05151 < 0                                                                       CU IDOL SELF LEARNING MATERIAL (SLM)
5th iteration :                                                                                          36  Here f(1.3125) = – 0.05151 < 0 and f(1.375) = 0.22461 > 0  ∴ Now, Root lies between 1.3125 and 1.375  x4 = 1.3125 + 1.3752 = 1.34375  f(x4) = f(1.34375) = 0.08261 > 0  6th iteration :  Here f(1.3125)= – 0.05151 < 0 and f(1.34375) = 0.08261 > 0  ∴ Now, Root lies between 1.3125 and 1.34375  x5= = 1.32812    f(x5) = f(1.32812) = 0.01458>0  7th iteration :  Here f(1.3125)=-0.05151<0 and f(1.32812)=0.01458>0  ∴ Now, Root lies between 1.3125 and 1.32812  x6 = =1.32031    f(x6) = f(1.32031) = – 0.01871 < 0  8th iteration :  Here f(1.32031)= – 0.01871 < 0 and f(1.32812) = 0.01458 > 0  ∴ Now, Root lies between 1.32031 and 1.32812                                                                       CU IDOL SELF LEARNING MATERIAL (SLM)
x7= =1.32422                                                                                             37    f(x7)=f(1.32422)=-0.00213<0  9th iteration :  Here f(1.32422) = – 0.00213 < 0 and f(1.32812) = 0.01458 > 0  ∴ Now, Root lies between 1.32422 and 1.32812  x8= =1.32617    f(x8)=f(1.32617)=0.00621>0  10th iteration :  Here f(1.32422) = – 0.00213<0 and f(1.32617) = 0.00621>0  ∴ Now, Root lies between 1.32422 and 1.32617  x9= =1.3252    f(x9) = f (1.3252) = 0.00204 > 0  11th iteration :  Here f(1.32422)= – 0.00213 < 0 and f(1.3252) = 0.00204 > 0  ∴ Now, Root lies between 1.32422 and 1.3252  x10= = 1.32471                                                                       CU IDOL SELF LEARNING MATERIAL (SLM)
f(x10) = f(1.32471) = – 0.00005 < 0  Approximate root of the equation x3-x-1=0 using Bisection method is 1.32471    Example 2: Determine the root of the given equation x2-3 = 0 for x ∈ [1, 2]                              38  Solution:  Given: x2-3 = 0  Let f(x) = x2-3  Now, find the value of f(x) at a= 1 and b=2.  f(x=1) = 12-3 = 1 – 3 = -2 < 0  f(x=2) = 22-3 = 4 – 3 = 1 > 0  The given function is continuous, and the root lies in the interval [1, 2].  Let “t” be the midpoint of the interval.  I.e., t = (1+2)/2  t =3 / 2  t = 1.5  Therefore, the value of the function at “t” is  f(t) =f (1.5) = (1.5)2-3 = 2.25 – 3 = -0.75 < 0  f(t) is negative, so b is replaced with t= 1.5 for the next iterations.  The iterations for the given functions are:                                                                       CU IDOL SELF LEARNING MATERIAL (SLM)
So, at the seventh iteration, we get the final interval [1.7266, 1.7344] Hence, 1.7344 is the  approximated solution.  Example 3: Find the root of f(x) = x2 - 3. Let εstep = 0.01, εabs = 0.01 and start with the interval [1, 2].    Solution:  Table 1. Bisection method applied to f(x) = x2 - 3.            a      b     f(a)     f(b)  c = (a + b)/2 f(c)           Update new b − a  1.0        2.0    -2.0     1.0  1.5        2.0    -0.75    1.0      1.5 -0.75                    a=c  0.5  1.5        1.75   -0.75    0.0625  1.625      1.75   -0.3594  0.0625   1.75 0.062                   b = c 0.25  1.6875     1.75   -0.1523  0.0625  1.7188     1.75   -0.0457  0.0625   1.625   -0.359               a=c  0.125                                        1.6875  -0.1523              a=c  0.0625                                        1.7188  -0.0457              a=c  0.0313                                        1.7344  0.0081               b = c 0.0156                                                                                   39                               CU IDOL SELF LEARNING MATERIAL (SLM)
1.71988/td> 1.7344 -0.0457 0.0081 1.7266       -0.0189 a = c  0.0078    Thus, with the seventh iteration, we note that the final interval, [1.7266, 1.7344], has a width less  than 0.01 and |f (1.7344) | < 0.01, and therefore we chose b = 1.7344 to be our approximation of the  root.    Example 4: Find the root of f(x) = e-x (3.2 sin(x) - 0.5 cos(x)) on the interval [3, 4], this time with  εstep = 0.001, εabs = 0.001.    Solution:    Table 1. Bisection method applied to f(x) = e-x (3.2 sin(x) - 0.5 cos(x)).    ab  f(a)  f(b) c = (a + b)/2 f(c) Update new b − a    3.0 4.0 0.047127 -0.038372 3.5                 -0.019757 b = c 0.5    3.0 3.5 0.047127 -0.019757 3.25                0.0058479 a = c 0.25    3.25 3.5 0.0058479 -0.019757 3.375             -0.0086808 b = c 0.125    3.25 3.375 0.0058479 -0.0086808 3.3125         -0.0018773 b = c 0.0625    3.25 3.3125 0.0058479 -0.0018773 3.2812        0.0018739 a = c 0.0313    3.2812 3.3125 0.0018739 -0.0018773 3.2968      -0.000024791 b = c 0.0156    3.2812 3.2968 0.0018739 -0.000024791 3.289     0.00091736 a = c 0.0078    3.289 3.2968 0.00091736 -0.000024791 3.2929    0.00044352 a = c 0.0039    3.2929 3.2968 0.00044352 -0.000024791 3.2948   0.00021466 a = c 0.002    3.2948 3.2968 0.00021466 -0.000024791 3.2958   0.000094077 a = c 0.001    3.2958 3.2968 0.000094077 -0.000024791 3.2963  0.000034799 a = c 0.0005    Thus, after the 11th iteration, we note that the final interval, [3.2958, 3.2968] has a width less than  0.001 and |f (3.2968) | < 0.001 and therefore we chose b = 3.2968 to be our approximation of the  root.    2.3 METHOD OF FALSE POSITION    Let the root lie in the interval (a, b). Then, P (a, f(a)), Q (b, f(b)) are points as the curve. Join the  points P and Q. The point of intersection of this, with the X-axis, c, line is taken as the next  approximation to the root. We determine by the intermediate value theorem, whether the root now                                                                                                               40              CU IDOL SELF LEARNING MATERIAL (SLM)
lies in (a, c) or (c, b) we repeat the procedure. If x0, x1, x2…… are the sequence of approximations,  then we stop the iteration when | xk+1 − xk| < given error tolerance.             Fig 2.1: Graphical representation of Method of False Position  The equation of line chord joining (a, f(a)), (b, f(b)) is    Setting y = 0, we set the point of intersection with X-axis as givens    If we denote x0 = a, x1 = b, then the iteration formula can be written as    The rate of convergence is still linear but faster than that of the bisection method.                    41  Both these methods will fail if f has a double root.                                                                       CU IDOL SELF LEARNING MATERIAL (SLM)
Example 1: Obtain the positive root of the equation  by Regua Falsi method    Solution: Let f(x)=      . Since f (0) =  , f (2) = 3, Let us take that the root lies in (0, 2).    We have x0 = 0, x1 = 2.    Then, using (2), we get    We obtain the next approximations as x5 = 0.9756, x6 = 0.9918, x7 = 0.9973, x8 = 0.9990. Since, |x8  – x7| = 0.0017 < 0.005, the approximation x8 = 0.9990 is correct to decimal places.    Note that in this problem, the lower end of the interval tends to the root, and the minimum error tends  to zero, but the upper limit and maximum error remain fixed. In other problems, the opposite may  happen. This is the property to the regula falsi method.    Example-2: Find a root of an equation f(x) = x3 – x – 1 using False Position method    Solution:  Here x3 – x – 1 = 0    Let f(x)=x3 – x – 1    Here     0 12      x    -1 -1 5     f(x)    1st iteration :                                                                                                      42                             CU IDOL SELF LEARNING MATERIAL (SLM)
Here f(1) = – 1< 0 and f(2) = 5 > 0                                                                      43  ∴ Now, Root lies between x0=1 and x1=2  x2 = x0 – f(x0)⋅  x2 = 1 – (–1)⋅  x2 = 1.16667  f(x2) = f (1.16667) = – 0.5787 < 0    2nd iteration :  Here f (1.16667) = –0.5787 < 0 and f (2) = 5 > 0  ∴ Now, Root lies between x0=1.16667 and x1=2  x3 = x0 – f(x0)⋅  x3 = 1.16667 – (–0.5787)⋅  x3 = 1.25311  f(x3) = f(1.25311) = –0.28536<0    3rd iteration:  Here f(1.25311) = –0.28536 < 0 and f (2) = 5 > 0  ∴ Now, Root lies between x0=1.25311 and x1=2                                                                       CU IDOL SELF LEARNING MATERIAL (SLM)
x4= x0 – f(x0)⋅                                                                                          44  x4 = 1.25311– (–0.28536)⋅    x4 = 1.29344  f(x4) = f(1.29344) = –0.12954 < 0    4th iteration :  Here f(1.29344)= –0.12954 < 0 and f(2) = 5 > 0  ∴ Now, Root lies between x0=1.29344 and x1=2    x5= x0 – f(x0)⋅  x5 = 1.29344– (–0.12954)⋅  x5=1.31128  f(x5)=f (1.31128)= –0.05659 < 0    5th iteration :  Here f(1.31128) = –0.05659 < 0and f(2) = 5 > 0  ∴ Now, Root lies between x0=1.31128 and x1=2  x6= x0 – f(x0)⋅                                                                       CU IDOL SELF LEARNING MATERIAL (SLM)
x6 = 1.31128– (–0.05659) ⋅  f(x6) = f(1.31899) = –0.0243 < 0    6th iteration :  Here f(1.31899) = –0.0243 < 0 and f(2) = 5>0  ∴ Now, Root lies between x0=1.31899 and x1=2  x7 = x0 – f(x0)⋅    x7 = 1.31899– (–0.0243) ⋅  x7 = 1.32228  f(x7) = f(1.32228) = –0.01036 < 0  7th iteration :  Here f(1.32228)= –0.01036 < 0and f(2) = 5 > 0  ∴ Now, Root lies between x0 =1.32228 and x1 =2  x8= x0 – f(x0)⋅  x7 = 1.32228– (–0.01036) ⋅  x8 =1.32368  f(x8) = f (1.32368) = – 0.0044 <0    8th iteration:                                                                            45                                      CU IDOL SELF LEARNING MATERIAL (SLM)
Here f (1.32368) = – 0.0044 < 0 and f(2) = 5 > 0                                                         46  ∴ Now, Root lies between x0 =1.32368 and x1 =2  x9= x0 – f(x0)⋅    x9 = 1.32368– (–0.0044) ⋅    x9 =1.32428  f(x9) = f (1.32428) = –0.00187 < 0    9th iteration:  Here f (1.32428) = –0.00187 < 0 and f (2) = 5 > 0  ∴ Now, Root lies between x0 =1.32428 and x1=2  x10 = x0 – f(x0) ⋅    x10 = 1.32428– (–0.00187) ⋅  x10 =1.32453  f(x10) = f (1.32453) = –0.00079<0    10th iteration:  Here f (1.32453) =-0.00079<0 and f (2) =5>0  ∴ Now, Root lies between x0 =1.32453 and x1 =2                                                                       CU IDOL SELF LEARNING MATERIAL (SLM)
x11 = x0 – f(x0) ⋅  x11 =1.32453– (– 0.00079).  x11 = 1.32464  f(x11) = f (1.32464) = –0.00034 < 0    Approximate root of the equation x3-x-1=0 using False Position method is 1.32464    2.5 NEWTON-RAPHSON METHOD    This method is also called Newton-Raphson method. We assume that f is a differentiable function in  some interval [a, b] containing the root. The graph of y = f(x) is plotted in Figure 2.2 (a). The point of  intersection x = r, is the required root.                                                                                      47    CU IDOL SELF LEARNING MATERIAL (SLM)
Fig 2.2 (a): Graph of y = f(x)  Let x0 be an initial approximation of r. Then, (x0, f(x0)) is a point as the curve                                        Fig 2.2 (b): (x0, f(x0)) point on the curve  Draw the tangent line to the curve at the point (x0 , f(x0 )) . This line intersects the x-axis at a new  point, say x1.    Now, x1 is a better approximation to r, than x0 . We now repeat this process, yielding  new points x2 , x3 , ... until we are “close enough” to r. Figure2.2( b) shows one more  iteration of this process, determining x2                                     Fig 2.2 (c): One more iteration determining x2  Now, we derive this method algebraically. The equation of the tangent at (x0, f(x0)) is given by  y – f(x0) =f’(x0)(x – x0)                                                                                                              48    CU IDOL SELF LEARNING MATERIAL (SLM)
Where f’(x0) is the slope of the curve at (x0, f(x0)). Setting y = 0, we get the point of  intersection of the tangent with x-axis as    But, this is our                                                                                         next approximation, that is    Iterating this process, we get the Newton-Raphson as    Example 1: Find the smallest positive root of x7 + 9x5 − 13x − 17 = 0.  Solution: Let f(x) = x7 + 9x5 − 13x − 17, we have f (0) < 0, f (1) < 0 and f (1)f(2) < 0.  Hence, the smallest positive root lies in (1, 2). We can take any value in (1, 2) or one  of the end points as the initial approximation. Let x0 = 1, we have,  f’(x) =7x6 +45x4 − 13.  The Newton-Raphson method becomes    Starting with x0 = 1, we obtain the values given in Table    After 6 iterations of Newton’s method, we have                                                                                        49                                                                       CU IDOL SELF LEARNING MATERIAL (SLM)
                                
                                
                                Search
                            
                            Read the Text Version
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
- 213
- 214
- 215
- 216
- 217
- 218
- 219
- 220
- 221
- 222
- 223
- 224
- 225
 
                    