Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore CU-BCA-SEM-III-PROBABILITY AND STATICS- Second Draft-converted

CU-BCA-SEM-III-PROBABILITY AND STATICS- Second Draft-converted

Published by Teamlease Edtech Ltd (Amita Chitroda), 2021-05-10 06:50:16

Description: CU-BCA-SEM-III-PROBABILITY AND STATICS- Second Draft-converted

Search

Read the Text Version

7.1 INTRODUCTION In this section we will first discuss correlation analysis, which is used to quantify the association between two continuous variables (e.g., between an independent and a dependent variable or between two independent variables). Regression analysis is a related technique to assess the relationship between an outcome variable and one or more risk factors or confounding variables. The outcome variable is also called the response or dependent variable and the risk factors and confounders are called the predictors, or explanatory or independent variables. In regression analysis, the dependent variable is denoted \"y\" and the independent variables are denoted by \"x\". In correlation analysis, we estimate a sample correlation coefficient, more specifically the Pearson Product Moment correlation coefficient. The sample correlation coefficient, denoted r, ranges between -1 and +1 and quantifies the direction and strength of the linear association between the two variables. The correlation between two variables can be positive (i.e., higher levels of one variable are associated with higher levels of the other) or negative (i.e., higher levels of one variable are associated with lower levels of the other). The sign of the correlation coefficient indicates the direction of the association. The magnitude of the correlation coefficient indicates the strength of the association. For example, a correlation of r = 0.9 suggests a strong, positive association between two variables, whereas a correlation of r = -0.2 suggest a weak, negative association. A correlation close to zero suggests no linear association between two continuous variables. 7.2 TYPES OF CORRELATION Correlation can be classified in several different ways. Three of the important ways are: 1. Positive or negative. 2. Simple., Partial and Multiple. 3. Linear and non-Linear. Positive or negative Correlation can be positive (Direct) or negative (inverse); its type depends upon the direction of change of the variables. If both the variables vary in the same direction then correlation is positive. If one variable is increasing the other or one variable is decreasing the other variable then on an average it is also decreasing then correlation that exists between them is said to be positive. If the variables vary in opposite directions, then the correlation is negative. For example, if one variable is increasing and the other is decreasing or vice versa then correlation is negative. To explain the concept example is given below. 151 CU IDOL SELF LEARNING MATERIAL (SLM)

Positive Correlation: X 80 70 60 40 30 Y 50 44 30 20 10 Negative Correlation: X 100 90 60 40 30 Y 10 20 30 40 50 7.3 SIMPLE, PARTIAL AND MULTIPLE CORRELATIONS The distinction between simple, partial and multiple correlations are based upon the number of variables. When we study only two variables then it is a problem of simple correlation. When the study focuses on three or more variables then it is a problem of either multiple or partial correlation. When we study the relationship between the yields of wheat per acre and both the total rainfall and the number of fertilizers used then this problem is a problem of multiple correlations. Whereas in partial correlation we identify more than two variables but consider only two variables that influences each other and the effect of other influencing variables is kept constant for example in the rice problem taken above if we limit our correlation analysis of yield and rainfall to periods when a certain average daily temperature existed it becomes a problem relating to partial correlation only. Simple Correlation:   The simple sample correlation coefficient is r = XY − nXY X 2 − nX 2 Y 2 − nY 2 or if spare parts   SSx = X 2 − nX 2 , SS y = Y 2 − nY 2 and S xy = XY − nXY are available, we can say r = Sxy . SSx SS y 152 CU IDOL SELF LEARNING MATERIAL (SLM)

( ( )( ) )Of course, since the coefficient of determination is R2 = XY − nXY 2 = S xy 2 , X 2 − nX 2 Y 2 − nY 2 SSx SS y r2 = R2 and it is often easier to compute r = S xy 2 and to give the correlation the sign of S xy SSx SS y . But note that the correlation can range from +1 to -1, while the coefficient of determination can XY − nXY only range from 0 to 1. Also note that since the slope in simple regression is b1 = X 2 − nX 2 , R2 = b12 s 2 or b12 = s 2 R2 or b1 = sy r . The last equation has a counterpart in 1 = y  , where x y sx x s 2 s 2 y x  is the population correlation coefficient, so that testing H0 :1 = 0 is equivalent to testing H0 : = 0 and the simple regression coefficient and the correlation will have the same sign. 7.4 LINEAR AND NON-LINEAR (CURVILINEAR) CORRELATION The distinction between linear and non-linear correlation is based upon the constancy of the ratio of change among the variables. If the amount of change in one variable tends to bear constant ratio to the amount of change in the other variable then the correlation is said to be linear. For example, observe the following two variables X and Y: X: 10 20 30 40 50 Y: 70 140 210 280 350 It is clear that the ratio of change between the two variables is the identical. If we plot these variables on a graph paper then all the plotted points would fall on a straight line. Correlation would be called non-linear or curvilinear if the amount of change in one variable does not bear a constant ratio to the amount of change in the other variable. For example, if we increase the rainfall twice then production of rice or wheat will increase but we cannot assure the results as it may or may not be in that way. It may be pointed out that in most of the practical situations we find a non-linear relationship. However, techniques of analysis for measuring non-linear correlation are complicated than those for linear correlation. We generally make an assumption that the relationship between the variables is of the linear type. The following two diagrams will illustrate the difference between linear and curvilinear correlation: 153 CU IDOL SELF LEARNING MATERIAL (SLM)

Figure 7.1: Difference between Linear and Curvilinear Correlation 7.5 RANK CORRELATION The Spearman’s Rank Correlation Coefficient is a different way of describing the strength of the correlation between two quantities. Idea: Consider these three scatter graphs: 100 5 100 80 4 80 60 3 60 40 2 40 20 1 20 2 4 6 8 10 12 14 5 10 15 20 25 2 4 6 8 10 12 14 In each of these scatter graphs, yincreases each time that xincreases. Spearman’s Rank Correlation Coefficient is defined to be 1 in each of these cases. Now consider the following three scatter graphs: 154 CU IDOL SELF LEARNING MATERIAL (SLM)

80 100 60 60 80 50 40 60 40 20 40 30 20 20 2 4 6 8 10 12 10 2 4 6 8 10 12 14 2 4 6 8 10 In each of these, y decreases each time that x increases. Spearman’s Rank Correlation Coefficient is defined to be -1 in each case. Sometimes there is no connection between the two quantities, as in the following scatter graph 100 80 60 40 20 2 4 6 8 10 In this case, Spearman’s Rank Correlation Coefficient would be close to 0. Example: To calculate a Spearman rank-order correlation on data without any ties we will use the following data: Marks 6 English 56 75 45 71 62 64 58 80 76 1 Maths 6 66 70 40 60 65 56 59 77 67 3 Solution: 155 CU IDOL SELF LEARNING MATERIAL (SLM)

We then complete the following table: English Maths Rank Rank d d2 (mark) (mark) (English) (maths) 5 25 56 66 9 4 1 1 0 0 75 70 3 2 3 9 1 1 45 40 10 10 4 16 0 0 71 60 4 7 0 0 1 1 62 65 6 5 1 1 64 56 5 9 58 59 8 8 80 77 1 1 76 67 2 3 61 63 7 6 Where d = difference between ranks and d2 = difference squared. We then calculate the following: We then substitute this into the main equation with the other information as follows: 156 CU IDOL SELF LEARNING MATERIAL (SLM)

7.6 COEFFICIENT OF CORRELATION Karl Pearson's Coefficient of Correlation: Out of several mathematical methods of measuring correlation the Karl Pearson's method is widely used. The symbol r represents the Pearson coefficient of correlation. The formula for computing Pearson r is: r= ∑xy Nσxσy Where, x = (X - ); Y = (Y - Y) σ x= Standard deviation of series X σy= Standard deviation of series Y N = Number of pairs of observations r = the (product moment) correlation coefficient This method is to be applied only where deviations of items are taken from actual mean and not from assumed mean. The value of the coefficient of correlation as obtained by the above formula shall always lie between: t 1. When r = + 1, it means there is perfect positive correlation between the variables. When r = - 1, it means there is perfect negative correlation between the variables. When r = 0, it means there is no relationship between the two variables. However, in practice such values of r as + 1, - 1, and 0 are rare. We normally get values which lie between + 1 and - 1 such as + 0.8, - 0.26, etc. The coefficient of correlation describes not only the magnitude of 157 CU IDOL SELF LEARNING MATERIAL (SLM)

correlation but also its direction. Thus, + 0.8 would mean that correlation is positive because the sign of r is + and the magnitude of correlation is 0.8. Similarly - 0.26 means low degree. The above formula for computing Pearson's coefficient of correlation can be transformed to the following form which is easier to apply. r = (∑ x y) /√ ∑ x2 X ∑ y2 Where, x = X - X y = Y- Y Example 1: Calculate Karl Pearson’s coefficient of correlation from the following data: Roll No. of Students: 12 34 5 Marks in Accountancy: 48 35 17 23 47 Marks in Statistics: 45 20 40 25 45 Solution: Let marks in Accountancy be denoted by X and marks in Statistics by Y. Roll no. X (X-34) Y (Y-35) x x2 y y2 xy 1 48 +14 196 45 +10 100 +140 2 35 +1 1 20 -15 225 -15 3 17 -17 289 40 +5 25 -85 4 23 -11 121 25 -10 100 +110 5 47 +13 169 45 +10 100 +130 ∑ X =170 ∑ x=0 ∑ x2=776 ∑ Y=175 ∑ y= 0 ∑y2= 550 ∑x y=280 r=∑xy 158 CU IDOL SELF LEARNING MATERIAL (SLM)

√ ∑ x2 X ∑ y2 Where, x = X - X y = Y- Y X = ∑ x = 170 = 34 N5 Y = ∑ y = 175 = 35 N5 ∑ x y = 280 ∑ x2=776 ∑y2= 550 r = 280 = 0.429 √ 776 x 550 Direct Method of Finding out Correlation Coefficient Correlation coefficient can also be calculated without taking deviations of items either from actual mean or assumed mean, i.e., actual X and Y values. The formula in such a case is: r = N ∑ XY- (∑ X) (∑ Y) √N ∑ X2- (∑ X) 2 √ N ∑ y2 – (∑ y) 2 This formula would give the same answer as we get when deviations of items are taken from actual mean or assumed mean. The following example shall illustrate the point. Example 2: Calculate correlation coefficient from data mentioned below by direct method, i.e., without taking the deviations of items from actual or assumed mean. Solution: Calculation of correlation coefficient by the direct method Calculation of Correlation Coefficient (Direct Method) X X2 Y Y2 X Y 9 81 15 225 135 8 64 16 256 128 159 CU IDOL SELF LEARNING MATERIAL (SLM)

7 49 14 196 98 6 36 13 169 78 5 25 11 121 55 4 16 12 144 48 39 10 100 30 2 4 8 64 16 1 1 9 81 9 ∑X =45 ∑X2 = 285 ∑Y = 108 ∑Y2 = 1,356 ∑X Y= 597 r = N ∑ XY- (∑ X) (∑ Y) √N ∑ X2- (∑ X) 2 √ N ∑ y2 – (∑ y) 2 N=9 ∑X =45 ∑X2 = 285 ∑Y = 108 ∑Y2 = 1,356 ∑X Y= 597 r = 9 x 597- (45) (108) √9 285- (45) 2 √ 9 x 1,356 – (108) 2 r= 0.95 When Deviations are taken from an Assumed Mean: When actual means are in fractions, say the actual means of X and Y series are 20.167 and 29.23 the calculation of correlation by the method discussed above would involve too many calculations and would take a lot of time. In such a case we make use of the assumed mean method for finding out correlation. When deviations are taken from assumed mean the following formula is applicable: r = N ∑ dXdY- ( ∑ dX ) (∑ dY) √N ∑ dX2- (∑ dX) 2 √ N ∑ dy2 – (∑ dy) 2 160 CU IDOL SELF LEARNING MATERIAL (SLM)

where dx refers to deviations of X series from an assumed mean dy refers to deviations of Y series from an assumed mean ∑ dX= sum of the deviations of X series from an assumed mean dY = sum of the deviations of Y series from an assumed mean. ∑ dXdY = sum of the product of the deviations of X and Y series from their assumed means ∑ dX2= sum of the squares of the deviations of X series from an assumed mean ∑ dy2 = sum of the squares of the deviations of Y series from an assumed mean It may be pointed out that there are many variations of the above formula. Example 3: Calculate the coefficient of correlation between X and Y from the following data. Assume 69 and 112 as the mean value for X and Y respectively. - - X: 78 89 99 60 59 79 68 61 Y: 125 137 156 112 107 136 123 108 Solution. Calculation of Correlation Coefficient X (X-69) Y (y - 112) dx dx2 dy dy2 dx dy 78 +9 81 125 +13 169 +117 89 +20 400 137 +25 625 +500 99 +30 900 156 +44 1936 +1320 60 -9 81 112 0 0 0 59 -10 100 107 -5 25 +50 79 +10 100 136 +24 576 +240 161 CU IDOL SELF LEARNING MATERIAL (SLM)

68 -1 1 123 +11 121 -11 61 -8 64 108 -4 16 +32 ∑ X=593 ∑ dx=41 ∑dx2=1727 ∑ Y =1004 ∑dy=108 ∑ dy2 =3468 ∑ dXdY=2248 r = N ∑ dXdY- ( ∑ dX ) (∑ dY) √N ∑ dX2- (∑ dX) 2 √ N ∑ dy2 – (∑ dy) 2 = (8) (2248)- ( 41) (108) √ (8) (1727) - (41) 2 √ (8) (3468) – (108) 2 = +0.97 7.7 INTRODUCTION TO REGRESSION After having established the fact that two variables are closely related, we may be interested in estimating (predicting) the value of one variable given the values of another. For example, if we know that advertising and sales are correlated, we find out expected number of sales for a given advertising expenditure or the required amount of expenditure for attaining a given number of sales. Similarly, if we know that the yield of rice and rainfall are closely related, we may find out the amount of rain required to achieve a certain production figure. Regression analysis reveals 'average relationship between two variables and this makes possible estimation or, prediction. The dictionary meaning of the term 'regression' is the act of returning or going back. The term 'regression' was first used by Sir Francis Galton (1822- 1911) in 1877 while studying the relationship between the height of fathers and sons. This term was introduced by him in the paper 'Regression towards Mediocrity in Hereditary Stature'. His study of height of about one thousand fathers and sons revealed a very interesting relationship, i.e., tall fathers tend to have tall sons and short fathers have short sons, but if we see the results then it is noticed that the average height of the sons of a group of tall fathers is less than that of the fathers and the average height of the sons of a group of short fathers is greater than that of the fathers. The line describing the tendency to regress or going back was called by Galton a 'Regression Line.' The term is still used to describe that line drawn for a group of points to represent the trend present, but it no longer necessarily carries the original implication of \"stepping back\" that Galton intended. These days there is a growing tendency of the modern writers to use the term estimating line instead of regression line because the expression estimating line is clearer in character. It is clear from the above definitions that regression analysis is a statistical device with 162 CU IDOL SELF LEARNING MATERIAL (SLM)

the help of which we are in a position to estimate (or predict) the unknown values of one variable from known values of another variable. The variable that is used to predict the variable of interest is called the independent variable or explanatory variable and the variable we are trying to predict is called the dependent variable or \"explained\" variable. The independent variable is denoted by X and the dependent variable by Y. The analysis used is called the simple linear regression analysis-simple because there is only one predictor or independent variable, and linear because of the assumed linear relationship between the dependent and the independent variables. The term \"linear\" means that an equation of a straight line of the form Y = a + bX, where a and b are constants, is used to describe the average relationship that exists between the two variables. It should be noted that the term 'dependent' and 'independent' refer to the mathematical or functional meaning of dependence-they do not imply that there is necessarily any cause-and- effect relationship between the variables. What it meant is simply that estimates of values of the dependent variable Y may be obtained for given values of independent variable X from a mathe- matical function involving X and Y. In that sense, the values of Y are dependent upon the values of X. The X variable mayor may not be causing change in the Y variable. For example, while estimating sales of a product from figures on advertising expenditures, sale is generally taken as. the dependent variable. However, there may or may not be causal connection between these two factors in the sense that changes in advertising expenditures cause changes in sales. In fact, in certain cases, the cause-effect relation may be just opposite of what appears to be the obvious one. 7.8 USES OF REGRESSION Regression analysis is a branch of statistical theory that is widely used in almost all the scientific disciplines. It is widely used in economics where this is considered as a basic technique for measuring or estimating the relationship among economic variables at constitute the essence of economic theory and economic life. For example, if we know that two variables, price (X) and demand (Y), are closely related we can find out the most probable value of X for a given value of Y or the most probable value of Y for a given value of X. Similarly, if we know that the amount of tax and the rise in the price of a commodity are closely related, we can find out the expected price for a certain amount of tax levy. Thus, we find that the study of regression is of considerable help to the economists and businessmen. When we talk of uses of regression, then it is not only confined to economics and business field. Its applications are extended to almost all fields of study. The regression analysis attempts to accomplish' the following: 1. Regression analysis provides estimates of values of the dependent variable from values of the independent variable. To accomplish this estimation procedure, we make use of the regression line. This line describes the average relationship existing among X and Y variables. It 163 CU IDOL SELF LEARNING MATERIAL (SLM)

displays mean values of X for given values of Y and the equation of this line which is known as the regression equation provides estimates of the dependent variable. This value can be obtained by inserting values of the independent variable into the equation. 2. A second goal of regression analysis is to obtain a measure of the error involved in using the regression line which forms a basis for estimating error values. We calculate the standard error which is a measure of the scatter or spread of the experiential values of Y around the corresponding values that are predictable from the regression line. In case the line fits the data closely then it can give good estimates of the Y variable which is known by a little scatter of the observations around the regression line. 3. The regression coefficients help in calculating the correlation coefficient. It assesses the proportion of variance in the dependent variable that has been accounted for by the regression equation. In general, the greater the value of r 2 the better is the fit and the more useful the regression equations as a predictive device. Example 1: From the following data obtain the two regression equations: X6 2 10 48 Y 9 11 5 8 7 Solution: Obtaining Regression Equations X Y XY X 2 Y2 6 9 54 36 81 2 11 22 4 121 10 5 50 100 25 4 8 32 16 64 8 7 56 49 ∑ X=30 ∑ Y=40 ∑ XY=214 4 ∑ Y2 =340 ∑ Y=- N a + b ∑ X ∑ X2=220 164 CU IDOL SELF LEARNING MATERIAL (SLM)

∑ XY= a∑ X+ b∑ X2 Substituting the values 40= 5 a + 30 b …. (i) 214= 30a+220 b …. (ii) Multiplying the equation (i) by 6, 240=30a + 180 b … (iii) 214= 30a + 220 b … (iv) Deducting equation (iv) from (iii) - 40 b = 26 or b= -0.65 Substituting the value of b in (i) 40 = 5 a + 30 (-0.65) a = 11.9 Putting the value of a and b in the equation, the regression Y on X is: Y=11.9-0.65X Regression line X on Y is: X c= a + b Y ∑ X =N a + b ∑ y ∑XY=a ∑Y + b ∑ y2 30=5a + 40 b ----- (i) 214= 40a + 340b ----- (ii) Multiplying eq (i) by 8 240=40 a+320 b ----- (iii) 214=40 a+340 b ----- (iv) -20 b = 26 B = -1.3 Substituting the value of b in eq (i) 30 = 5 a + 40 (-1.3) a= 16.4 165 CU IDOL SELF LEARNING MATERIAL (SLM)

The Regression line of X on Y is: X= 16.4 – 1.3 Y Use of multiple regressions (i) You can use this statistical technique when exploring linear relationships between the predictor and criterion variables – that is, when the relationship follows a straight line. (ii) The criterion variable that you are seeking to predict should be measured on a continuous scale (such as interval or ratio scale). There is a separate regression method called logistic regression that can be used for dichotomous dependent variables. (iii) The predictor variables that you select should be measured on a ratio, interval, or ordinal scale. A nominal predictor variable is legitimate but only if it is dichotomous, i.e., there are no more that two categories. For example, sex is acceptable (where male is coded as 1 and female as 0) but gender identity (masculine, feminine and androgynous) could not be coded as a single variable. Instead, you would create three different variables each with two. The term dummy variable is used to describe this type of dichotomous variable. (iv) Multiple regressions require a large number of observations. The number of cases (participants) must substantially exceed the number of predictor variables you are using in your regression. The absolute minimum is that you have five times as many participants as predictor variables. 7.9 SUMMARY • Correlation quantifies the strength of the linear relationship between a pair of variables, whereas regression expresses the relationship in the form of an equation. • Regression analysis is a statistical technique for studying linear relationships. • The measure is best used in variables that demonstrate a linear relationship between each other. • Correlation analysis is a statistical method used to evaluate the strength of relationship between two quantitative variables 7.10 KEYWORDS • Paired data: when two of the same measurements are taken from the same subject, but under different experimental conditions. 166 CU IDOL SELF LEARNING MATERIAL (SLM)

• Correlation coefficients: A measure the strength of association between two variables. • Mean: The average of the numbers. • Standard deviation: A statistic that measures the dispersion of a dataset relative to its mean and is calculated as the square root of the variance. 7.11 LEARNING ACTIVITY Students will be learning the definitions of correlation and regression and will be encouraged to work together to get a class average. Students are introduced to statistics and why it is important to daily life. The scores for nine students in physics and math are as follows: Physics: 35, 23, 47, 17, 10, 43, 9, 6, 28 Mathematics: 30, 33, 45, 23, 8, 49, 12, 4, 31 Compute the student’s ranks in the two subjects and compute the Spearman rank correlation. ______________________________________________________________________________ _________________________________________________________________ 7.12 UNIT END QUESTIONS A. Descriptive Questions Short Questions 1. How is regression calculated? 2. Differentiate between single and multiple correlations. 3. What is Regression technique? Explain its various uses. 4. What is Rank Correlation? 5. How do you rank in Spearman's rank correlation coefficient? 6. How do you interpret a rank correlation? Long Questions 1. Find Karl Pearson’s correlation co-efficient between X and Y from the following data 167 CU IDOL SELF LEARNING MATERIAL (SLM)

X 78 89 97 69 59 79 61 61 Y 125 137 156 112 107 136 123 108 2. The scores for nine students in physics and math are as follows: Physics : 35, 23, 47, 17, 10, 43, 9, 6, 28 Mathematics : 30, 33, 45, 23, 8, 49, 12, 4, 31 Compute the student’s ranks in the two subjects and compute the Spearman rank correlation. 3. The number of visitors to a cycle track and the number of drinks sold by a café at the location are recorded in the table below. Monday Tuesday Wednesday Thursday Friday Saturday Sunday Number of visitors 32 45 39 43 58 84 65 Drinks sold 17 20 23 7 24 49 38 4. The following table shows the mean weight in kilograms of members of a group of young children of various ages. Age (x years) 1.6 2.5 3.3 4.4 5.6 Weight (y kg) 12 15 16 17 20 The relationship between the variables is modeled by the regression line with equation y=ax (a) Find the value of aa and of bb (b) Write down the correlation coefficient. (c) Use your equation to estimate the mean weight of a child that is four years old. 5. Find the correlation between the variables, for the following table shows the average weights for given heights in a population of men. Heights (x cm) 160 165 170 175 180 185 168 CU IDOL SELF LEARNING MATERIAL (SLM)

Weights (y kg) 65.1 67.9 70.1 72.8 75.4 77.2 B. Multiple choice Questions 1. The straight-line graph of the linear equation Y = a + bX, slope will be downward If: a. b > 0 b. b < 0 c. b = 0 d. b ≠ 0 2. If regression line of = 5, then value of regression coefficient of Y on X is: a. 0 b. 0.5 c. 1 d. 5 3. In the regression equation Y = a+bX, the Y is called: a. Independent variable b. Dependent variable c. Continuous variable d. None of these 4. The process of constructing a mathematical model or function that can be used to predict or determine one variable by another variable is called a. residual b. correlation c. regression d. outlier plot 169 CU IDOL SELF LEARNING MATERIAL (SLM)

5. For a data set the regression equation is Y = 21 - 3X. The correlation coefficient for this data a. is negative b. must be 0 c. must be 1 d. is positive Answers 1.b, 2.a, 3.b, 4.c, 5.a 7.13 REFERENCES Reference Books: • Dr. B. Krishna Gandhi, Dr. T.K.V Iyengar, M.V.S.S.N. Prasad, Probability and Statistics, S. Chand Publishing Co. • Quantitative Methods for Business & Economics by Mouhammed, Publisher: PHI, 2007 Edition. • Quantitative Techniques for Managerial Decisions by A. Sharma, Publisher: Macmillan, 2008 Edition. • Statistical Methods by S.P Gupta, Publisher: Sultan Chand & Sons, 2008 Edition. Textbooks: • S.C. Gupta, V.K. Kapoor, Fundamental of Mathematical Statistics, Sultan Chand and Company. • Seymour Lipschutz, Jack Schiller, Jack Schiller S, Introduction to Probability & Statistics, McGraw-Hill Publishers. • Research Methodology and Statistical Techniques by Santosh Gupta, Publisher: Deep and Deep Publication • Research Methodology by V. P. Pandey, Publisher: Himalaya Publication 170 CU IDOL SELF LEARNING MATERIAL (SLM)

UNIT 8: RELATIONSHIP BETWEEN CORRELATION AND REGRESSION ANALYSIS Structure 8.0 Learning Objectives 8.1 Introduction 8.2 Difference Between Correlation And Regression 8.3 Summary 8.4 Keywords 8.5 Learning activity 8.6 Unit End Questions 8.7 References 8.0 LEARNING OBJECTIVES After studying this unit students will be able to: • Identify the direction and strength of a linear correlation between two factors. • Interpret the Pearson correlation coefficient and the coefficient of determination, and test for significance. • Identify and explain three assumptions and three limitations for evaluating a correlation coefficient. • Delineate the use of the Spearman, point-biserial, and phi correlation coefficients. • Distinguish between a predictor variable and a criterion variable. • Identify each source of variation in an analysis of regression. 8.1 INTRODUCTION There are some key differences between correlation and regression that are important in understanding the two. • Regression establishes how x causes y to change, and the results will change if x and y are swapped. With correlation, x and y are variables that can be interchanged and get the same result. 171 CU IDOL SELF LEARNING MATERIAL (SLM)

• Correlation is a single statistic, or data point, whereas regression is the entire equation with all of the data points that are represented with a line. • Correlation shows the relationship between the two variables, while regression allows us to see how one affects the other. • The data shown with regression establishes a cause and effect, when one changes, so does the other, and not always in the same direction. With correlation, the variables move together. Similarities between correlation and regression In addition to differences, there are some key similarities between correlation and regression that can help you to better understand your data. • Both works to quantify the direction and strength of the relationship between two numeric variables. • Any time the correlation is negative, the regression slope (line within the graph) will also be negative. • Any time the correlation is positive, the regression slope (line within the graph) will be positive 8.2 DIFFERENCE BETWEEN CORRELATION AND REGRESSION Regression Analysis Regression analysis refers to assessing the relationship between the outcome variable and one or more variables. The outcome variable is known as the dependent or response variable and the risk elements, and cofounders are known as predictors or independent variables. The dependent variable is shown by “y” and independent variables are shown by “x” in regression analysis. The sample of a correlation coefficient is estimated in the correlation analysis. It ranges between -1 and +1, denoted by r and quantifies the strength and direction of the linear association among two variables. The correlation among two variables can either be positive, i.e., a higher level of one variable is related to a higher level of another or negative, i.e., a higher level of one variable is related to a lower level of the other. The sign of the coefficient of correlation shows the direction of the association. The magnitude of the coefficient shows the strength of the association. For example, a correlation of r = 0.8 indicates a positive and strong association among two variables, while a correlation of r = -0.3 shows a negative and weak association. A correlation near to zero shows the non-existence of linear association among two continuous variables. 172 CU IDOL SELF LEARNING MATERIAL (SLM)

Linear Regression Linear regression is a linear approach to modelling the relationship between the scalar components and one or more independent variables. If the regression has one independent variable, then it is known as a simple linear regression. If it has more than one independent variables, then it is known as multiple linear regression. Linear regression only focuses on the conditional probability distribution of the given values rather than the joint probability distribution. In general, all the real-world regressions models involve multiple predictors. So, the term linear regression often describes multivariate linear regression. Correlation and Regression Differences Fig 8.1 Correlation and Regression Differences There are some differences between Correlation and regression. • Correlation shows the quantity of the degree to which two variables are associated. It does not fix a line through the data points. You compute a correlation that shows how much one variable change when the other remains constant. When r is 0.0, the relationship does not exist. When r is positive, one variable goes high as the other goes up. When r is negative, one variable goes high as the other goes down. • Linear regression finds the best line that predicts y from x, but Correlation does not fit a line. 173 CU IDOL SELF LEARNING MATERIAL (SLM)

• Correlation is used when you measure both variables, while linear regression is mostly applied when x is a variable that is manipulated. Comparison Between Correlation and Regression Basis Correlation Regression Meaning A statistical measure that Describes how an defines co-relationship or independent variable is association of two variables. associated with the dependent variable. Dependent and Independent Both variables are different. No difference variables Usage To describe a linear To fit the best line and relationship between two estimate one variable based variables. on another variable. Objective To find a value expressing To estimate values of a the relationship between random variable based on variables. the values of a fixed variable. Table 8.1 Correlation and Regression Differences Correlation and Regression Statistics The degree of association is measured by “r” after its originator and a measure of linear association. Other complicated measures are used if a curved line is needed to represent the relationship. 174 CU IDOL SELF LEARNING MATERIAL (SLM)

Fig 8.2 Correlation and Regression Statistics The above graph represents the correlation. The coefficient of correlation is measured on a scale that varies from +1 to -1 through 0. The complete correlation among two variables is represented by either +1 or -1. The correlation is positive when one variable increase and so does the other; while it is negative when one decreases as the other increases. The absence of correlation is described by 0. The differences between correlation and regression are 175 CU IDOL SELF LEARNING MATERIAL (SLM)

Fig 8.2 Differences between correlation and regression 8.3 SUMMARY • Correlation is a single statistic, or data point, whereas regression is the entire equation with all of the data points that are represented with a line. • Correlation shows the relationship between the two variables, while regression allows us to see how one affects the other. • Regression analysis is a Statistical Forecasting model that is concerned with describing and evaluating the relationship between a given variable (usually called the dependent variable) and one or more other variables (usually known as the independent variables). 8.4 KEYWORDS • Standard deviation: A statistic that measures the dispersion of a dataset relative to its mean and is calculated as the square root of the variance • Correlation coefficients: A measure the strength of association between two variables. • Paired data: when two of the same measurements are taken from the same subject, but under different experimental conditions. 8.5 LEARNING ACTIVITY 1. Even a high degree of correlation does not mean that relationship of cause and effect exists between two correlated variables. Discuss ______________________________________________________________________________ _________________________________________________________________ 8.6 UNIT END QUESTIONS A. Descriptive Questions Short Questions 1. What is correlation? Explain its different types. 2. Differentiate between single and multiple correlations. 3. What is Regression technique? Explain its various uses. 4. If X and Y are independent random variables, show that they are uncorrelated. Is the converse true? Justify your answer. 176 CU IDOL SELF LEARNING MATERIAL (SLM)

5. What is regression with example? Long Questions 1. What is the relationship between correlation and regression analysis? 2. What is correlation problem? 3. What are the limitations of correlation analysis? 4. How correlation is different from regression explain with examples? 5. Find Karl Pearson’s correlation co-efficient between X and Y from the following data X 78 89 97 69 59 79 61 61 Y 125 137 156 112 107 136 123 108 B. Multiple choice Questions 1. A process by which we estimate the value of dependent variable on the basis of one or more independent variables is called: a. Correlation b. Regression c. Residual d. Slope 2. The slope of the regression line of Y on X is also called the: 177 a. Correlation coefficient of X on Y b. Correlation coefficient of Y on X c. Regression coefficient of X on Y d. Regression coefficient of Y on X 3. If ρ=0, the lines of regression are: a. Coincident b. Parallel c. Perpendicular to each other CU IDOL SELF LEARNING MATERIAL (SLM)

d. None of these 4. In a simple linear regression model (One independent variable), If we change the input variable by 1 unit. How much output variable will change? a. by 1 b. no change c. by its slope d. by intercept 5. Suppose that R-square value of a bivariate regression of Y on X is 0.64. Which of the following is necessarily true? a. The correlation coefficient is 0 b. The correlation coefficient is either 0.8 or –0.8 c. The correlation coefficient is –0.8 d. The correlation coefficient between X and Y is 0.8 Answers 1. b, 2.d, 3.a, 4.c, 5.b 8.7 REFERENCES Reference Books: • Dr. B. Krishna Gandhi, Dr. T.K.V Iyengar, M.V.S.S.N. Prasad, Probability and Statistics, S. Chand Publishing Co. • Quantitative Methods for Business & Economics by Mouhammed, Publisher: PHI, 2007 Edition. • Quantitative Techniques for Managerial Decisions by A. Sharma, Publisher: Macmillan, 2008 Edition. Textbooks: 178 CU IDOL SELF LEARNING MATERIAL (SLM)

• S.C. Gupta, V.K. Kapoor, Fundamental of Mathematical Statistics, Sultan Chand and Company. • Research Methodology in Management by Arbind and Desai, Publisher: Ashish Publication House • Research Methodology and Statistical Methods by T. Subbi Reddy, Publisher: Reliance Publishing House 179 CU IDOL SELF LEARNING MATERIAL (SLM)

UNIT 9: PROBABILITY Structure 9.0 Learning Objectives 9.1 Introduction 9.2 Probability 9.3 Calculation of Probability 9.4 Theorems of Probability 9.5 Conditional Probability 9.6 Summary 9.7 Keywords 9.8 Learning activity 9.9 Unit End Questions 9.10 References 9.0 LEARNING OBJECTIVES After studying this unit students will be able to: • Explain the concept of probability • Calculate the probability of simple events • Calculate the probability of compound events • Calculate the probability of complementary events • Compute probabilities of certain events by applying the properties of a binomial random variable. 9.1 INTRODUCTION We use the word 'probability' or 'chance' very commonly in our day-to-day conversation, but the people generally have a vague idea about its meaning. For example, in weather forecast we listen that “There is probability of having heavy rainfall tomorrow\"; or \"There are chances that both the teams A and B win tomorrow’s match\"; \"Probably you may be right\"; \"It is possible that I may not be able come to your residence for get-together \". It has been observed in above statements 180 CU IDOL SELF LEARNING MATERIAL (SLM)

the occurrence of terms like probability, chances, possible, likely, etc., convey the same sense, i.e., the event is not certain to take place or, in other words, we can say that there is uncertainty in happening of the event in above statements. In layman's terminology the word \"probability\" thus connotes that there is uncertainty about the happening of the event. However, in mathematics and statistics we try to present conditions under which we can make sensible numerical statements about uncertainty and apply certain methods of calculating numerical values of probabilities and expectations. In statistical sense the term probability is thus established by definition and is not connected with beliefs or any form of wishful thinking. Probability theory is being applied in the solution of social, economic, political and business problems. The insurance industry, which emerged in the 19th century, required precise knowledge about the risk of loss in order to calculate premium. Within a few decades many learning centres were studying probability as a tool for understanding social phenomena, Today the concept of probability has assumed great importance and the mathematical theory of probability has become the basis for statistical applications in both social and decision-making research. In fact, probability has become a part of our everyday life. In personal and management decisions, we face uncertainty and use probability theory, whether or not we admit the use of something so sophisticated. To quote Levin we live in a world in which we are unable to forecast the future with complete certainty. Our need to cope with uncertainty leads us to the study and use of Probability theory. In many instances we, as concerned citizens, will have some knowledge about the possible outcomes of a decision. By organizing this information and considering it systematically, we will be able to recognize our assumptions, communicate our reasoning to others and make a sounder decision than we could by using a shot-in-the-dark approach. Probability theory, in fact, is the foundation of statistical inference. Concept The chances that the event would occur are called probability of a given event. The expression of likelihood of an event is probability. For example, if thrown a dice what are the chances that it will have 6 on it. It is number of all the possibilities which range from 1 (one) to 6 (six). When conditions are true (event will occur), false that event will not occur. If we get 6 on dice then event has occurred so true condition else false. We denote with zero if event does not occur and 1 for an event that occur. How the number is assigned would depend on the interpretation of the term 'probability'. Many people associate probability and chance with nebulous and mystic ideas. 9.2 PROBABILITY The classical approach to probability is the oldest and simplest approach. The existence of probability term is from eighteenth century in problems related to games of chance, like 181 CU IDOL SELF LEARNING MATERIAL (SLM)

throwing of coins, dice or deck of cards. The classical theory depicts that the outcomes of a random experiment are \"equally likely\". The \"event\" whose probability is sought consists of one or more possible outcomes of the given activity such as when a die is rolled once, anyone of the six possible outcomes, i. e. 2, 3, 4, 5, 6, can occur. These activities are referred to in modern terminology as \"experiment\" which is a term that refers to processes which result in different possible outcomes or observations. The term \"equally likely\", though undefined, conveys the notion that each outcome of an experiment has the same chance of appearing as any other. Thus, in a throw of a dice occurrence of 1, 2, 3, 4, 5, 6 are equally likely events. The definition of probability given by French mathematician Laplace and generally adopted by disciples of the classical school runs as follows: Probability, it is said, is the ratio of the number of \"favourable\" cases to the total number of equally expected cases. If probability of occurrence of A is denoted by p (A), then by this definition we have: P (A) = Number of favourable cases____ Total number of equally likely cases To calculate probability, we should find out two things: 1. Number of favourable cases. 2. Total number of equally likely cases. For example, if we toss a coin there are chances of two equally and likely results i.e., a head or a tail. Thus, the probability of a head is 1/2. Similarly, if a dice is thrown, the probability of obtaining an even number is 3 or k as three of the six are equally possible results that can give even numbers on dice. Symbolically, if an event A can happen in 'a' way out of a total of 'n' equally likely and mutually exclusive ways then the probability of occurrence of the event (called its success) is denoted by: p = Pr(A) = a / n and the probability of non-occurrence of the event (called its failure) is given by: q = Pr (not A) or P ( A ) = b / n or 1- P (A) Example 1: From a bag containing 10 red and 20 white balls, a ball is drawn at random. What is the probability that it is red? Solution: Total number of balls in the bag =10 + 20 =30 182 CU IDOL SELF LEARNING MATERIAL (SLM)

Number of red balls = 10 Probability of getting red ball or P (A) =Number of favourable cases Total number of equally likely cases =a/n = 10 / 30 = 1 / 3 Probability of not getting a red ball or q = 20 / 30 = 2 / 3 p+q=1/3+2/3=1 9.3 CALCULATION OF PROBABILITY Before we discuss the procedure of calculating probability it is necessary to define certain terms as given below: Experiment and Events The term experiment refers to illustrate an act which is repetitive in nature under specified conditions and their results can be recorded to see their continued behaviour under similar conditions. Random experiments are those experiments whose results depend on chance such as tossing of a coin and throwing of dice. The results of a random experiment are called outcomes. If in an experiment all the possible outcomes are known in advance and none of the outcomes can be predicted with certainty. Then such an experiment is called a random experiment and the outcomes as events or chance events. Events are generally denoted by capital letters A. B. C. etc. An event whose occurrence is inevitable when a certain random experiment is performed it is called a certain or sure event. An event which can never occur when a certain random experiment is performed is called an impossible event. For example, in a toss of a balanced dice the occurrence of anyone of the numbers 1. 2. 3. 4. 5. 6 is a sure event while occurrence of 8 is an impossible event. An event which may or may not occur while performing a certain random experiment is known as a random event. Occurrence of 2 is a random event in the above experiment of tossing a dice. Mutually Exclusive 'Events Two events are said to be mutually exclusive or incompatible when both cannot happen simultaneously in a single trial or in other words, the occurrence of anyone of them precludes the occurrence of the other event. For example, if a single coin is tossed then we can either get head 183 CU IDOL SELF LEARNING MATERIAL (SLM)

or tail, we can never get both head and tail at the same time. Similarly, a person may be either alive or dead at a point of time-he cannot be both alive as well as dead at the same time. To take another example, if we toss a dice and observe 3, we cannot expect 5 also in the same toss of dice. Symbolically, if A and are mutually exclusive events. P (AB) =O. The diagram given below clearly illustrate the meaning of mutually exclusive events: It may be pointed out that mutually exclusive events can always be connected by the words \"either or\". Events A, B, C are mutually exclusive only if either A or B or C can occur. Equally Likely Events Events are said to be equally likely events when one does not occur more often than the others. If an unbiased coin or dice is thrown each face may be expected to be observed around the same number of times in the long run. Similarly, the cards of a pack of playing cards are so closely alike that we expect each card to appear equally often when a large number of drawings are made with replacement. However, if the coin or the dice is biased, we should not expect each face to appear exactly the same number of times. Simple and Compound Events In case of simple events, we consider the probability of the happening or not happening of single events. For example, we might be interested in finding out the probability of drawing a red ball from a bag containing 10 white and 6 red balls. On the other hand, in case of compound events we consider the joint occurrence of two or more events. As an example, if a bag contains 10 white and 6 red balls and if two successive draws of 3 balls are made, we shall be finding out the probability of getting 3 white balls in the first draw and 3 red balls in the second draw-we are thus dealing with a compound event. Exhaustive Events Events are said to be exhaustive when their totality includes all the possible outcomes in a random experiment. As an example, while tossing a dice, the possible outcomes are 1, 2. 3, 4, 5 and6 and hence the exhaustive number of cases is 6. If two dice are thrown once the possible outcomes are: 184 CU IDOL SELF LEARNING MATERIAL (SLM)

Figure 9.1: Exhaustive Events Complementary Events Let there be two events A and B where A is called the complementary event of B (and vice versa). If A and B are mutually exclusive and exhaustive. For example, when a dice is thrown then the occurrence of an even number (2, 4. 6) and odd number (1, 3, 5) are complementary events. Simultaneous occurrence of two events A and B is generally written as AB. 9.4 THEOREMS OF PROBABILITY There are two important theorems of probability, namely: 1. The Addition Theorem; and 2. The Multiplication Theorem. Addition Theorem The addition theorem states that if two events A and Bare mutually exclusive the probability of the occurrence of either A or B is the sum of the individual probability of A and B. Symbolically, P (A or B) =P (A) + P (B) Example 1: One card is drawn from a standard pack of 52. What is the probability that it is either a king or a Queen? Solution: There are 4 kings and 4 Queens in a pack of 52 cards. The probability that the card drawn is a king = 4 / 52 And the probability that the card drawn is a Queen= 4 / 52 185 CU IDOL SELF LEARNING MATERIAL (SLM)

Since the events are mutually exclusive, the probability that the card drawn is either a king or a queen = 4 / 52 + 4 / 52 = 8 / 52 = 2/13 When events are not mutually exclusive or in other words it is possible for both events to occur then the addition rule must be modified. For example, what is the probability of drawing either a king or a heart from a standard pack of cards? It is obvious that the events king and heart can occur together as we can draw a king of hearts (since king and heart are not mutually exclusive events). We must deduce from the probability of drawing either a king oar heart, the chance that we can draw both of them together. Hence for finding the probability of one or more of two events that are not mutually exclusive we use the modified form of the addition theorem. P (A or B) = P (A) + P (B) - P (A and B) P (A or B) = Probability of A or B happening when A or B are not mutually exclusive. P (A) = Probability of A happening P (B) = Probability of B happening P (AB) = Probability of A and B happening together In the example taken the probability of drawing a king or a heart shall be: P (king or heart) =P (king) + P (heart) - P (king and heart) = 4/ 52 + 13/ 52 – 1 /52 = 4/13 In the case of three events: P (A or B or C) = P (A) + P (B) + P(C) - P (AB) - P (AC) - P (BC) + P (ABC). Example 2: The Managing Committee of Vaishali Welfare Association formed a sub-committee of 5 persons to look into electricity problem. Profiles of the 5 persons are: 1. Male age 40 2. Male age 43 3. Female age 38 4. Female age 27 5. Male age 65 If a chairperson has to be selected from this, what is the probability that he would be either female or over 30 years? Solution: 186 CU IDOL SELF LEARNING MATERIAL (SLM)

P (female or over 30) = P (female) + P (over 30) - P (female and over 30) = 2/5 +4/5 – 1/5 = 5/5 = 1 Example 3: Calculate the probability of picking a card that was a heart or a spade. Comment on your answer. Solution: Using the addition rule, P (heart or spade) = P (heart) + P (spade) – P (heart and Spade) = 13/52 + 13/ 52 – 0/ 52 = 1/2 The probability that a card will be both a heart and a spade is zero since each individual card can be of one and only one suit. The intersection in this case is non-existent called the null set because it contains no outcomes since heart and spade cannot occur simultaneously in the same card. Example 4: What is the probability of picking a card that was red or black? Solution: P (red or black) = P (red) + P (black) Since there are 26 red and 26 black cards the required probability shall be 26/52 + 26/52 = 52/ 52 = 1 The probability of red or black adds up to 1 this means that this is a certain event to happen. Example 5: A bag contains 30 balls numbered from 1 to 30. One ball is drawn at random. Find the probability that the number of the ball drawn will be a multiple of (a) 5 or 7, and (b) 3 or 7. Solution: The probability of the number being multiple of 5 is P (5, 10, 15, 20, 25, 30) = 6 / 30 The probability of the number being multiple of 7 is P (7, 14, 21, 28) = 4/30 187 CU IDOL SELF LEARNING MATERIAL (SLM)

Since the events are mutually exclusive the probability of the number being a multiple of 5 or 7 will be: 6/30 + 4/30 = 10/ 30 = 1/3 The probability of the number being multiple of 3 is P (3, 6, 9, 12, 15, 18, 21, 24, 27, 30) = 10/ 30 The probability of the number being multiple of 7 is P (7, 14, 21, 28) = 4/30 Since 21 is a multiple of 3 as well as 7, the drawing of the ball numbered 21 entails the occurrence of both the events and hence the probability of getting a number which is multiple of 3 or 7 is: 10/30 + 4/30 – 1/30 = 13/30 Multiplication Theorem This theorem states that if two events A and B are independent then the probability of occurrence for both together is equal to the product of their individual probability. Numerically, if A and B are independent then P (A and B) =P (A) x P (B) The theorem can be extended to three or more independent events. Thus, P (A, Band C) =P (A) x P (B) x P (C) Example 1: A man wants to marry a girl having qualities: white complexion-the probability of getting such a girl is one in twenty; handsome dowry-the probability of getting this is one in fifty; westernized manners and etiquettes-the probability here is one in hundred. Find out the probability of his getting married to such a girl when the possession of these three attributes is independent. Solution: Probability of a girl with white complexion (1/20) = 0.05 Probability of a girl with handsome dowry (1/50) = 0.02 Probability of a girl with westernized manners (1/100) = 0.01 188 CU IDOL SELF LEARNING MATERIAL (SLM)

Since the events are independent, the probability of simultaneous occurrence of all these qualities (1/20) X (1/50) X (1/100) = 0.00001 Example 2: A box contains 7 red and 3 white marbles. Three marbles are drawn from the box one after the other without replacement. Find the probability of drawing three marbles in the alternate colours with the first marble being red. Solution: The event of interest is drawing the marbles in alternate colours with the first is red. This event can occur only when the marbles are drawn in the order (Red, White, Red) If A and C represent the events of drawing red marbles respectively in the first and the third draws and B is the event of drawing white marble in the second draw, then the required event is A∩B ∩C. The probability for the occurrence of A ∩B ∩C can be calculated applying P(A∩B∩C) = P (A) P (B/A) P (C/A∩ B) Since there are 7 red and 3 white marbles in the box for the first draw, P(A) = 7/10 Now, there will be 6 red and 3 white marbles in the box for the second draw if the event A has occurred. Hence, P(B/A) = 3/9 Similarly, there will be 6 red and 2 white marbles in the box for the third draw if the events A and B have occurred. Hence, is the required probability of drawing three marbles in the alternate colours with the first marble being red. 189 CU IDOL SELF LEARNING MATERIAL (SLM)

Example 3: There are 13 boys and 6 girls in a class. Four students are selected randomly one after another from that class. Find the probability that: (i) all are girls, (ii) first two are boys and next are girls Solution: (i). Suppose that B: all the randomly selected students are girls There will be 6 girls among 19 students, in total, while selecting the first student; there will be 5 girls among 18 students, in total, while selecting the second student; 4 girls among 17 students, in total, while selecting the third student; and 3 girls among the remaining 16 students, in total, while selecting the fourth student. Then, by applying the Theorem-8.5 for simultaneous occurrence of these four events, it follows that (ii). Suppose that C: In the randomly selected students the first two are boys and the next are girls There will be 13 boys among the 19 students, in total, while selecting the first student; there will be 12 boys among 18 students, in total, while selecting the second student; 6 girls among 17 students, in total, while selecting the third student; and 5 girls among the remaining 16 students, in total, while selecting the fourth student. Then, by applying the Theorem 8.5 for simultaneous occurrence of these four events, it follows that 190 CU IDOL SELF LEARNING MATERIAL (SLM)

Example 4: Two cards are drawn from a pack of 52 cards in succession. Find the probability that both are Jack when the first drawn card is (i) replaced (ii) not replaced Solution Let A be the event of drawing a Jack in the first draw, B be the event of drawing a Jack in the second draw. Case (i) Card is replaced n (A) = 4 (Jack) n (B) = 4 (Jack) and n (S) = 52 (Total) Clearly the event A will not affect the probability of the occurrence of event B and therefore A and B are independent. P (A ∩ B) = P (A). P (B) Case (ii) 191 Card is not replaced CU IDOL SELF LEARNING MATERIAL (SLM)

In the first draw, there are 4 Jacks and 52 cards in total. Since the Jack, drawn at the first draw is not replaced, in the second draw there are only 3 Jacks and 51 cards in total. Therefore, the first event A affects the probability of the occurrence of the second event B. Thus, A and B are not independent. That is, they are dependent events. Therefore, P (A ∩ B) = P(A). P(B/A) Example 5: An unbiased die is thrown. If A is the event ‘the number appearing is a multiple of 3’ and B be the event ‘the number appearing is even’ number then find whether A and B are independent? Solution: We know that the sample space is S = {1,2,3,4,5,6} Now, A = {3,6}; B = {2,4,6} then (A∩B) = {6} Clearly P (A ∩ B) = P(A) P(B) 192 Hence A and B are independent events. CU IDOL SELF LEARNING MATERIAL (SLM)

Example 6: Let P(A) = 3/5 and P(B) = 1/5. Find P (A ∩ B) if A and B are independent events. Solution: Since A and B are independent events then P(A∩B) = P(A) P(B) Example 7: Three coins are tossed simultaneously. Consider the events A ‘three heads or three tails’, B ‘at least two head and C ‘at most two heads’ of the pairs (A, B), (A, C) and (B, C), which are independent? Which are dependent? Solution: Here the sample space of the experiment is S= {HHH, HHT, HTH, HTT, THH, TTH, THT, TTT} A = {Three heads or Three tails} = {HHH, TTT} B = {at least two heads} = {HHH, HHT, HTH, THH}and C = {at most two heads} = {HHT, HTH, HTT, THH, TTH, THT, TTT} Also (A∩B) = {HHH}; (A∩C) = {TTT} and (B∩C) = {HHT, HTH, THH} 193 CU IDOL SELF LEARNING MATERIAL (SLM)

Thus, P(A∩B) = P(A). P(B) P(A∩C) ≠ P(A). P(C) and P(B∩C) ≠ P(B). P(C) Hence, the events (A and B) are independent, and the events (A and C) and (B and C) are dependent. Example 8: A can solve 90 per cent of the problems given in a book and B can solve 70 per cent. What is the probability that at least one of them will solve a problem selected at random? Solution: Given the probability that A will be able to solve the problem 194 CU IDOL SELF LEARNING MATERIAL (SLM)

Hence the probability that at least one of them will solve the problem = 97/100 Example 9: In a shooting test the probability of hitting the target is ¾ for A, ½ for B and 2/3 for C. If all of them fire at the same target, calculate the probabilities that (i) All the three hit the target (ii) Only one of them hits the target (iii) At least one of them hits the target Solution: 195 CU IDOL SELF LEARNING MATERIAL (SLM)

9.5 CONDITIONAL PROBABILITY The multiplication theorem explained above is not applicable in case of dependent events so in that case we use conditional probability. Two events A and B are said to be dependent when B can occur only when A occurs or vice versa. The probability attached to such an event is called the conditional probability and it is denoted by P (A /B) or, in other words, probability of A 196 CU IDOL SELF LEARNING MATERIAL (SLM)

given that B has occurred. If two events A and B are dependent, then the conditional probability of B given A is: P (B/ A) = P (AB) P (A). Example 1: A bag contains 5 white and 3 black balls. Two balls are drawn at random one after the other without replacement. Find the probability that both balls drawn are black. Solution: Probability of drawing a black ball in the first attempt is P (A) = 3 = 3 / 8 5+ 3 Probability of drawing the second black ball given that the first ball drawn is black P (B/ A) = 2 = 2 / 7 5+ 2 The probability that both balls drawn are black is given by P (AB) = P (A) x P (B/A) = 3/28 Example 2: There are 4000 people living in a village including 1500 female. Among the people in the village, the age of 1000 people is above 25 years which includes 400 females. Suppose a person is chosen and you are told that the chosen person is a female. What is the probability that her age is above 25 years? Solution: Here, the event of interest is selecting a female with age above 25 years. In connection with the occurrence of this event, the following two events must happen. A: a person selected is female B: a person chosen is above 25 years. Case (i): 197 CU IDOL SELF LEARNING MATERIAL (SLM)

We are interested in the event B, given that A has occurred. This event can be denoted by B|A. It can be read as “B given A”. It means that first the event A occurs then under that condition, B occurs. Here, we want to find the probability for the occurrence of B|A i.e., P(B|A). This probability is called conditional probability. In reverse, the probability for selecting a female given that a person has been selected with age above 25 years is denoted by P(A|B). Case (ii): Suppose that it is interested to select a person who is both female and with age above 25 years. This event can be denoted by A ∩B. Calculation of probabilities in these situations warrant us to have another theorem namely Multiplication theorem. It is derived based on the definition of conditional probability. Example 3: A pair of dice is rolled and the faces are noted. Let A: sum of the faces is odd, B: sum of the faces exceeds 8, and C: the faces are different then find (i) P (A/C) (ii) P (B/C) Solution: The outcomes favourable to the occurrence of these events are A = {(1,2), (1,4), (1,6), (2,1), (2,3), (2,5), (3,2), (3,4), (3,6), (4,1), (4,3), (4,5), (5,2), (5,4), (5,6), (6,1), (6,3), (6,5)} B = {(3,6), (4,5), (4,6), (5,4), (5,5), (5,6), (6,3), (6,4), (6,5), (6,6)} C = {(1,2), (1,3), (1,4), (1,5), (1,6), (2,1), (2,3), (2,4), (2,5), (2,6), (3,1), (3,2), (3,4), (3,5), (3,6), (4,1), (4,2), (4,3), (4,5), (4,6), (5,1), (5,2), (5,3), (5,4), (5,6), (6,1), (6,2), (6,3), (6,4), (6,5)} 198 CU IDOL SELF LEARNING MATERIAL (SLM)

Since A and B are proper subsets of C, A∩C = A and B∩C = B. Hence, the probability for the sum of the faces is an odd number given that the faces are different is Similarly, the probability for the sum of the faces exceeds 8 given that the faces are different is Example 4: 199 A die is rolled. If it shows an odd number, then find the probability of getting 5. Solution: Sample space S = {1, 2, 3, 4, 5, 6}. Let A be the event of die shows an odd number. Let B be the event of getting 5. Then, A= {1, 3, 5}, B= {5}, and A ∩ B = {5}. CU IDOL SELF LEARNING MATERIAL (SLM)

Example 5: A bag contains 5 white and 3 black balls. Two balls are drawn at random one after the other without replacement. Find the probability that both balls drawn are black. Solution: Let A, B be the events of getting a black ball in the first and second draw Probability of drawing a black ball in the first attempt is Probability of drawing the second black ball given that the first ball drawn is black ∴ The probability that both balls drawn are black is given by Example 6: Find the probability of drawing a queen, a king and a knave (Jack) in that order from a pack of cards in three consecutive draws, the card drawn not being replaced. 200 CU IDOL SELF LEARNING MATERIAL (SLM)


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook