Correlation Analysis 295 UNIT 14 CORRELATION ANALYSIS Structure 14.0 Learning Objectives 14.1 Introduction 14.2 Measuring of Correlation 14.3 Short-cut Method 14.4 The Probable Error in the Coefficient of Correlation 14.5 Coefficient of Correlation in a Grouped Data 14.6 Rank Correlation 14.7 Additional Problems 14.8 Summary 14.9 Key Words/Abbreviations 14.10 Learning Activity 14.11 Unit End Questions (MCQ and Descriptive) 14.12 References 14.0 Learning Objectives After studying this unit, you will be able to: z The concept of correlation and the methods for measuring correlation. z The method for calculating Karl Pearson’s coefficient of correlation using the direct method and the shortcut method. z Calculation of probable error in the coefficient of correlation CU IDOL SELF LEARNING MATERIAL (SLM)
296 Business Mathematics and Statistics 14.1 Introduction It is very often observed that two statistical series sometimes very together in the sense that a variation in one series is quite often accompanied by a variation in the other. In such a case we can study closely the extent of relationship existing between the two related series. The question is : if a change in one series is followed by a change in the other, that what is the amount of such a change relatively? In short, the numerical relationship between the two series is called correlation. The purpose of this topic is to study the methods of such a relationship between two series and also to explain the significance of the numerical relationship. The important set of figures, that is, the first set of figures is called the Subject and the second set of figures is called the Relative. In other words that set which we use as the standard set for the purpose of comparison is called the Subject. The following are some of the related variables : Height and Weight, Demand and Price, Income and Expenditure. If two variables change in the same direction then correlation is direct or positive; but if the variables change in opposite directions then correlation is inverse or negative. 14.2 Measuring of Correlation We can study the correlation between two sets of figures by means of the following methods: (1) By drawing the graph of the data. (2) By constructing a Scatter Diagram. (3) By calculating the coefficient of correlation. 14.2.1 Graphical Method We represent each of the two sets of figures by plotting the relevant points on the graph and then join the points so plotted. The movements of the two curves so formed are then studied closely. If we find that the curves move in the same direction then we say that correleation between the two sets of figures is positive. If we observe that the curves move in opposite directions, then it is obvious that correlation is negative. CU IDOL SELF LEARNING MATERIAL (SLM)
Correlation Analysis 297 Fig. 14.1: Correlation is Positive Fig. 14.2: Correlation is Negative 14.2.2 Scatter Diagram Each of the corresponding pairs of numerical values in the two sets of figures, can be represented by a point on the graph. By taking values for the First Series along the X axis and the values for the Second Series along the Y axis, the entire data can be represented by the plotted points. The figure obtained by plotting the points, is called a Scatter Diagram. Fig. 14.3: Correlation is Positive Fig. 14.4: Correlation is Negative By drawing a line as closely as possible to the plotted points, we can study the nature of correlation. If the direction of the line is upwards from left to right then correlation is positive; but if the direction of the line is downwards from left to right (see Fig. 14.4) then correleation is negative. If we observe that the points are scattered widely then there is no correlation (see Fig. 14.5) CU IDOL SELF LEARNING MATERIAL (SLM)
298 Business Mathematics and Statistics Correlation Fig. 14.5: Correlation: Absent Coefficient of Correlation The extent of correlation between two variables that are related can be determined by means of a coefficient. Karl Pearson devised a formula for this coefficient, which is as follows : ¦ xy r = nV1.V2 where r = the coefficient of correlation n = the number of pairs of items x = deviation of the 1st series (Subject) y = deviation of the 2nd series (Relative) V1 = S.D. of the 1st series V2 = S.D. of the 2nd series Remarks : (i) If r = + 1 then correlation is perfect positive. (ii) If r = – 1 then correlation is perfect negative. (iii) If r = 0 then there is no correlation. (iv) If r is equal to some number between 0 and + 1 then correlation is limited and positive. CU IDOL SELF LEARNING MATERIAL (SLM)
Correlation Analysis 299 (v) If r is equal to some negative number between – 1 and 0 then correlation is negative and limited. Problem 1: Calculate the coefficient of correlation for the following data, by Karl Pearson’s Method. X : 46, 52, 63, 59, 73, 70, 87, 86 Y : 39, 41, 52, 68, 60, 64, 76, 80 Xx x2 Yy y2 xy + 441 46 – 21 441 39 – 21 441 + 285 52 – 15 225 41 – 19 361 63 – 4 52 – 8 + 32 59 – 8 16 68 + 8 64 – 64 73 + 6 64 60 0 64 70 + 3 36 64 + 4 0 0 87 + 20 76 + 16 16 + 12 86 + 19 9 80 + 20 256 + 320 400 400 + 380 361 1406 536 1552 480 1602 n= 8 x= 536 = 67 y = 480 8 8 = 60 Vx = Vx = ¦ x2 = 1552 , Vy = ¦ y2 = 1602 n 8 n 8 6xy = 1406 Coefficient of correlation is : 6xy r = n.Vx.Vy 6xy = 6x2 6y2 nn CU IDOL SELF LEARNING MATERIAL (SLM)
300 Business Mathematics and Statistics 6xy = 6x2.6y2 1406 = 1552 u 1602 1 = Antilog [ log 1406 – 2 (log 1552 + log 1602 ) 1 = Antilog [ log 3.1479 – 2 (3.1907 + 3.2047) = Antilog [3.1479 – 3.1977) = Antilog (1.9502) = 0.8917 Coefficient of correlation is + 0.89 approximately. 14.3 Short-cut Method When the respective arithmetic means of both the sets of numerical items are not whole numbers but involve decimals, then the calculation of the coefficient of correlation by the direct method, becomes a tedious process. To overcome this difficulty and inconvenienc the following modified short-cut method formula is used. n6xy – 6x.6y r= ª¬ n6x 2 – 6x 2 º ª n6y2 – 6y 2 º ¼ ¬ ¼ where r = coefficient of correlation n = the number of pairs of items x = deviation of the 1st series from an arbitrary mean y = deviation of the 2nd series from an arbitrary mean Problem 2: Calculate the coefficient of correlation for the following data by using the short- cut CU IDOL SELF LEARNING MATERIAL (SLM)
Correlation Analysis 301 X: 47, 53, 64, 60, 74, 71, 89, 88 Y: 41, 43, 53, 69, 61, 65, 77, 81 X x x2 Y y y2 xy 47 – 13 169 41 – 28 784 + 364 53 – 7 49 43 – 26 676 + 182 64 + 4 16 53 – 16 256 – 64 60 0 0 69 0 0 61 –8 64 0 74 + 14 196 65 –4 16 – 112 71 + 11 121 77 +8 64 – 44 89 + 29 841 81 + 12 144 + 232 88 + 28 784 + 336 – 62 2004 66 2176 894 Coefficient of correlation is : n6xy – 6x.6y r = ª¬n.6x2 – 6x2 º¼ª¬n.6y2 – 6y2 º¼ 8 u 894 – (66) (– 62) = ª¬8 u 2176 – 662 º¼ª¬8 u 2004 – –622 º¼ 11244 = 13052 u 12188 1 = Antilog [log 11244 – 2 (log 13052 + log 12188)] 1 = Antilog [4.0508 – 2 (4.1155 + 4.0856) = Antilog [ 4.0508 – 4.1005 ] = Antilog [1 .9503 ] = 0.8919 = + 0.89 approximately. CU IDOL SELF LEARNING MATERIAL (SLM)
302 Business Mathematics and Statistics 14.4 The Probable Error in the Coefficient of Correlation In order to draw precise and correct conclusions about the coefficient of correlation it is necessary to determine the probable error in the coefficient of correlation. The formula is: 0.6745 1 – r2 P. E. = n where P.E = probable error n = number of pairs of items r = coefficient of correlation Remarks : (i) If r is less than P.E., then there is evidence of correlation. (ii) If r is greater than six times the P.E., then there is significant correlation. It the P.E. is comparatively smaller than the coefficient of correlation then the following rules hold good : (i) If r is less than 0.3, correlation is insignificant, i.e., there is not much evidence of correlation. (ii) If r is more than 0.3, then there is good evidence of correlation. 14.5 Coefficient of Correlation in a Grouped Data We have already considered the method of finding the coefficient of correlatin in the case of two series of individual items. But in certain types of data : items may be classified into certain discrete or continuous groups in such a way that some of the items belonging to one group in one series, also belong to another group in the other series. Such data are said to be grouped data or grouped series. For example the following data are grouped data. CU IDOL SELF LEARNING MATERIAL (SLM)
Correlation Analysis 303 Age 150–155 155–160 Pay 165–170 170–175 Total 160–165 25-30 3 5 — — 10 30-35 2 5 2 3 — 14 35-40 2 3 4 4 2 15 40-45 — 2 4 4 3 12 45-50 — — 3 4 3 9 Total 2 60 7 15 15 8 15 The formula for calculating the coefficient of correlation in a grouped series is as under : n6f dxdy – 6f dx 6f dy ¬ª n6f º ª 2º r = 2 6 2 ¼ ¬« 2 »¼ d x – f d x n 6 f d y – 6 f dy where r = coefficient of correlation f = frequency dx = step deviation of X (1st) series dy = step deviation of Y (2nd) series n = total of frequencies 14.6 Rank Correlation It is not possible to express attributes such as character, conduct, honesty, beauty, morality, intellectual integrity etc. in numerical terms. For example it is quite an easy job for a class teacher to arrange the students of his class in the ascending or descending order of intelligence. This means that he can rank them according to their intelligence. Hence in problems involving attributes of the type mentioned above the method of finding the coefficient of correlation is entirely based on the rank differences between the corresponding items. The following is the procedure for finding the coefficient of correlation by the method of rank differences : (1) Firstly, assign ranks to the various items of the two series. (2) Secondly, find the differences between the ranks of the corresponding items (d). CU IDOL SELF LEARNING MATERIAL (SLM)
304 Business Mathematics and Statistics (3) Thirdly, find the square of each of these differences. (d2). (4) Lastly, apply the formula, 6 d2 r = 1 – n n2 – 1 where r = coefficient of correlation d = rank n = the number of pairs of items Problem 3: Calculate the coefficient of correlation by the method of rank difference, from the following data: X: 43 56 29 81 96 34 73 62 48 76 Y: 15 26 34 86 19 29 83 67 51 58 X Rx Y Ry d d2 43 8 15 10 2 4 2 4 56 6 26 8 4 16 1 1 29 10 34 6 8 64 2 4 81 2 86 1 2 4 2 4 96 1 19 9 2 4 1 1 34 9 29 7 106 73 4 83 2 62 5 67 3 48 7 51 5 76 3 58 4 (where Rx , Ry denote ranks in X and Y series respectively) Coefficient of correlation is: r= 1– 66d2 n (n2 – 1) = 1 – 6 q 106 10 (100 – 1) CU IDOL SELF LEARNING MATERIAL (SLM)
Correlation Analysis 305 = 1 – 636 990 354 = 990 = + .36 approximately. Problem 4: Calculate the coefficient of correlation from the following data by the method of rank differences : X: 24 29 23 38 46 52 41 36 68 56 Y: 110 126 145 131 163 158 131 129 154 140 X Rx Y Ry d d2 24 9 110 10 1 1 1 29 8 126 9 1 36 0.25 23 10 145 4 6 9 1 38 6 131 6.5 0.5 2.25 1 46 4 163 1 3 4 9 52 3 158 2 1 64.50 41 5 131 6.5 1.5 36 7 129 8 1 68 1 154 3 2 56 2 140 5 3 In this data we find common ranks in the second series. Therefore the formula for the coefficient or correction by the method of rank differences has to be modified as under : ª 6 1 1 + .....º»¼ «¬ 6d2 + 12 m13 – m1 + m23 – m 2 2 12 r= 1– n n2 – 1 where m stands for the number of common ranks in a group. In this given data, there are two common ranks. Therefore m1 = 2. 1 = 1 8 – 2 = 1 = 0.5 ; and n = 10. Sd2 = 64.50 12 12 2 m13 – m1 CU IDOL SELF LEARNING MATERIAL (SLM)
306 Business Mathematics and Statistics ªd2 1 º 12 + .....¼» 6 «¬ + m13 – m1 r= 1– n n2 – 1 6 > 64.40 + 0.50 @ = 1– 990 6 u 65 = 1 – 990 390 = 1 – 990 = Antilog [log 600 – log 990] = Antilog [2.7782 – 2.9956] = Antilog [1.7826] = 0.6061 = + 0.61 approximately. 14.7 Additional Problems Problem 7: For the data given below find out Karl Pearson’s coefficient of correlation : X : 32 35 41 47 52 55 58 64 Y : 14 26 12 20 31 17 36 28 Deviation Deviation xy X from A.M 48 x2 y from A.M 23 y2 144 x y – 39 77 32 – 16 256 14 – 9 81 3 35 – 13 169 26 3 9 32 – 42 41 – 7 49 12 – 11 121 130 80 47 – 1 1 20 – 3 9 385 52 4 16 31 9 64 55 7 49 17 – 6 36 58 10 100 34 13 169 64 16 256 28 5 25 384 0 896 184 0 514 CU IDOL SELF LEARNING MATERIAL (SLM)
Correlation Analysis 307 384 For X series : A. M. = 8 = 48 184 For Y series : A. M. = 8 = 23 Karl Pearson’s Coefficient of Correlation 6xy 385 r = 6x2.6y2 = 896 u 514 1 = Antilog [ log 385 – 2 (log 896 + log 514) ] 1 = Antilog [2.5855 – (2.9523 + 2.7110) ] 2 = Antilog [2.5855 – 2.8316] = Antilog [ 1.7539 ] [Ans. r = + 0.5674] = 0.5674 Problem 8: Psychological tests of intelligence of arithmetical ability were applied to 10 children. Here is a record of ungrouped data showing intelligence and arithmetic ratios. Calculate the coefficient of correlation: Child: ABCDE FGH I J Intelligence Ratio: Arithmetic Ratio: 105 104 102 101 100 99 98 96 93 92 101 103 100 98 95 96 104 92 97 94 Solution: Child Intelli- Deviation x2 Arith- Devia- y2 xy gence from metic tion from A Ratio Arith. 36 Ratio 9 + 18 B Mean 25 (Y) Arith. 25 + 25 C (X) x Mean D 9 101 4 +6 E 105 +6 4 103 y 0 0 104 +5 1 100 9 102 +3 +3 –3 101 +2 98 +5 100 +1 95 +2 0 –3 CU IDOL SELF LEARNING MATERIAL (SLM)
308 Business Mathematics and Statistics F 99 0 0 96 –2 40 G 98 – 1 1 104 +6 36 – 6 H 96 – 3 9 92 –6 36 + 18 I 93 – 6 36 97 –1 J 92 – 7 49 94 –4 1 +6 16 + 28 Total 990 0 170 980 0 140 92 990 For X series: Arith. Mean = 10 = 99 980 For Y series: Arith. Mean = = 98 10 x = deviation of items in X series from A.M. Y = deviation of items in Y series from A.M. Calculations show that: 6x2 = 216, 6y2 = 162 6xy = + 97. The coefficient of correlation is: 6xy 92 r = 6x2.6y2 = 170 u 140 1 = Antilog [log 92 – (log 170 + log 140)] 2 1 = Antilog [1.9838 – (2.2304 + 2.1461)] 2 = Antilog [1.9638 – 2.61882] = Antilog [1.7756 ] = 0.5965approx. [Ans: r = + 0.5965] Problem 9: Calculate Karl Pearson’s coefficient of correlation from the following data: X : 55 59 63 68 56 73 82 76 64 74 Y : 60 62 55 54 63 72 78 79 65 82 CU IDOL SELF LEARNING MATERIAL (SLM)
Correlation Analysis 309 Solution: y2 xy 49 84 X x x2 Y y 25 40 144 48 55 – 12 144 60 – 7 169 – 13 59 – 8 64 62 – 5 16 44 63 – 4 16 55 – 12 25 30 68 + 1 1 54 – 13 121 165 56 – 11 63 – 4 144 108 73 + 6 121 72 + 5 82 + 15 36 78 + 11 46 76 + 9 79 + 12 225 105 64 – 3 225 65 – 2 922 617 74 + 7 81 82 + 15 9 49 670 0 746 670 0 670 670 For X : A. M. = 10 = 67, For Y : A.M. = 10 = 67. The coefficient of correlation is: 6xy r = 6x2.6y2 617 = 745 u 922 1 = Antilog [log 617 – {log 746 + log 922}] 2 1 = Antilog [2.7903 – 2 {2.8727 + 2.9647)} = Antilog [2.7903 – 2.9187] = Antilog (1.8716 ) = 0.7440 = + 0.74 approx. CU IDOL SELF LEARNING MATERIAL (SLM)
310 Business Mathematics and Statistics Problem 10: Find the coefficient of correlation between the Sales and Expenses of the following 10 firms. Interpret your result. Firms: 1 2 3 4 5 6 7 8 9 10 Sales: 50 50 55 60 65 65 65 60 60 50 Expenses: 11 13 14 16 16 15 15 14 13 13 Solution: 580 For Sales: A.M. = 10 = 58 140 For Expenses: A.M. = 10 = 14 The coefficient of correlation is: 6xy r = 6x 2.6y2 70 = 360 u 22 1 = Antilog [log 70 – 2 {log 360 + log 22}] 1 = Antilog [1.8451 – {2.5563 + 1.3424}] 2 = Antilog [1.8451 – 1.9493] = Antilog (1.8958) (Ans: r = + 0.79 approx.) ? r = 0.7866 Sales Deviation x2 Expenses Deviation y2 xy X from 58 (Y) from 14 y 50 – 8 64 11 – 3 9 24 18 50 – 8 64 13 – 1 00 44 55 – 3 9 14 0 4 14 60 2 4 16 2 65 7 49 16 2 CU IDOL SELF LEARNING MATERIAL (SLM)
Correlation Analysis 311 65 7 49 15 1 17 60 2 4 14 0 00 60 2 4 13 – 1 1 –2 60 – 8 13 – 1 18 64 0 22 70 580 0 360 Interpretation: Correlation is positive and also the value of r being more than 0.5 there is a good evidence of correlation. Problem 11: Calculate the coefficient of correlation for the data given below: X : 26 29 31 34 42 48 51 54 63 57 Y : 19 21 27 31 26 42 49 51 59 60 Solution: n = 10, 6x = 25, 6x2 = 1937 6x = 25, 6x2 = 2155, 6xy = 2022. Deviation X from x2 y from y2 xy assumed assumed 19 mean 36 289 272 21 225 95 mean 42 27 y 99 31 81 40 x 36 – 17 25 0 42 – 15 36 26 – 16 256 49 0 117 29 – 13 169 51 –9 36 31 – 11 121 59 –5 169 180 34 – 8 60 225 483 42 0 64 0 529 48 + 6 0 +6 576 60 51 + 9 + 13 54 + 12 36 + 15 2155 2022 63 + 21 81 + 23 67 + 25 144 + 24 441 625 + 35 Total + 25 1937 CU IDOL SELF LEARNING MATERIAL (SLM)
312 Business Mathematics and Statistics The coefficient of correlation n6xy – 6x.6y r = ¬ªn6x2 – 6x2 ¼ºª¬n6y2 – 6y2 º¼ 10 u 2022 – (25) (35) = >10 u 1937 – 625 @ > 10 u 2155 – 1225 @ 20220 – 875 = 18745 u 20325 19345 = 18745 u 20325 1 = Antilog [ log 19345 – {log 18745 + log 20325} ] 2 1 = Antilog [4.2867 – 2 {4.2729 + 4.3081} ] = Antilog [4.2867 – 4.2905] = Antilog [ 1.9962 ] [Ans. r = + 0.099] = 0.9913 Problem 12: Calculate the coefficient of correlation for the following ages of husband and wife. Age of Husband (X) Age of Wife (Y) 24 19 28 21 29 24 29 26 30 27 31 27 34 29 35 31 36 30 37 31 CU IDOL SELF LEARNING MATERIAL (SLM)
Correlation Analysis 313 Solution: xy Deviation Deviation y2 48 X from 30 x2 y from 27 12 64 x y 36 3 1 24 – 6 36 19 – 8 9 0 1 0 28 – 2 4 21 – 6 0 8 0 20 29 – 1 1 24 – 3 4 18 16 28 29 – 1 1 26 – 1 9 138 16 30 0 0 27 0 155 31 + 1 1 27 0 34 + 4 16 29 2 35 + 5 25 31 4 36 + 6 36 30 3 37 + 7 49 31 4 13 169 –5 We have, n = 10 , 6x = 13, 6x2 = 169 6y = – 5, 6y2 = 155, 6y = 138 The coefficient of correlation is : n6xy – 6x.6y r = ¬ªn6x2 – 6x2 ¼º – ¬ªn6y2 – 6y2 ¼º 10 u 138 – (13) (– 5) = ¬ª10 u 169 – 132 ¼º¬ª 10 u 155 – – 52 ¼º 1380 + 65 = >1690 – 169@ >1550 – 25 @ 1445 = 1521 u 1525 1 = Antilog [ log 1445 – 2 (log 1521 + log 1525) ] CU IDOL SELF LEARNING MATERIAL (SLM)
314 Business Mathematics and Statistics 1 = Antilog [3.1599 – 2 (3.1821 + 3.1832) ] = Antilog [3.1599 – 3.1826 ] = Antilog [ 1.9773] [Ans: r = + 0.95 approx.] = 0.9491 = + 0.95 approx. Problem 13: A random sample of 5 college students is selected and their grades in high school mathematics course and college algebra course are found College Students: 1 2 3 4 5 High School grade: 85 60 73 40 90 College grade: 93 75 65 50 80 Calculate Spearman’s rank correlation coefficient. High school R (x) College grade d d2 grade Y R (y) X 85 2 93 1 1 1 60 4 75 3 1 1 73 3 65 4 –1 1 40 5 50 5 0 0 90 1 80 2 –1 1 4 Hence n = 5 and 6d2 = 4 Spearman’s Rank correlation coefficient: 66d2 = 1 – n n2 – 1 6u4 = 1 – 5 52 – 1 24 = 1 – 5 u 24 CU IDOL SELF LEARNING MATERIAL (SLM)
Correlation Analysis 315 1 = 1– 5 4 = 5 = 0.8 Problem 14: Find the coefficient of rank correlation between production cost and critical appraisal of the following ten motion pictures: Production Cost Rank 1 2 3 4 5 6 7 8 9 10 Critical Appraisal Ranking 8 6 10 7 5 2 9 1 4 3 Solution : Production Cost Critical Appraisal Rank Difference (d) Rank Rank d2 –7 18 –4 49 26 –7 16 3 10 –3 49 47 55 0 9 62 +4 0 79 –2 16 81 +7 4 94 +5 49 10 3 +7 25 49 266 Coefficient of rank correlation: 6 u 266 66d2 = 1 – 10 102 – 1 = 1 – n n2 – 1 6 u 266 266 = 1 – 10 u 99 = 5 u 33 266 166 – 266 = 1 – 165 = 165 CU IDOL SELF LEARNING MATERIAL (SLM)
316 Business Mathematics and Statistics 101 = – 165 = – Antilog [ log 101 – log 165 ] = – Antilog [ 2.0043 – 2.2175 ] = – Antilog [ 1.7868 ] [(Ans. r = – 0.61] = – 0.6120 Problem 15: The following are the ranks of 12 students in two differents tests: (6 , 5) , (8 , 8) , (5 , 7) , (7 , 6) , (4 , 1) , (1 , 3) , (10 , 12) , (12 , 11), (9 , 10) , (11 , 9) , (2 , 4) , (3 , 2). Find the coefficient of correlation by the method of rank differences. Solution: X Rank Y Rank d d2 6 5 +1 1 8 8 0 0 5 7 4 7 6 –2 1 4 1 +1 9 1 3 +3 4 10 12 –2 4 12 11 –2 1 9 10 +1 1 11 9 –1 4 2 4 +2 4 3 2 –2 1 +1 34 Coefficient of rank correlation: 66d2 6 u 34 r = 1 – n n2 – 1 = 1 – 12 122 – 1 204 17 = 1 – 12 u 143 = 1 – 143 = 1 – Antilog [ log 17 – log 143 ] = 1 – Antilog [ 1.2304 – 2.1553] CU IDOL SELF LEARNING MATERIAL (SLM)
Correlation Analysis 317 = 1 – Antilog [ 1.0751 ] ? r = + 0.88 = 1 – 1.1189 = 0.8811 Problem 16: Ten competition in a beauty contest are ranked by three judges in the following order. First Judge: 1 6 5 10 3 2 4 9 7 8 Second Judge: 3 5 8 4 7 10 2 1 6 9 Third Judge: 6 4 9 8 1 2 3 10 5 7 Use the rank correlation coeffcient to discuss which pair of judges has the nearest approach to common tastes in beauty. Solution: We shall calculate the coefficient of correlatin in each of the following cases. (i) Ranks assigned by the First and Second Judges. (ii) Ranks assigned by the Second and Third Judges. (iii) Ranks assigned by the First and Third Judges. Case (i): First Judge Second Judge Rank difference d d2 1 3 –2 4 6 5 +1 1 5 8 –3 9 10 4 + 6 36 3 7 –4 16 2 10 – 8 64 4 2 +2 4 9 1 +8 64 7 6 +1 1 8 9 –1 1 200 n = 10, 6d2 = 200 Coefficient of correlation: 64d2 r = 1 – n n2 – 1 CU IDOL SELF LEARNING MATERIAL (SLM)
318 Business Mathematics and Statistics 6 u 200 = 1 – 10 102 – 1 1200 = 1 – 990 = 1 – Antilog [ log 120 – log 99 ] = 1 – Antilog [ 2.0792 – 1.9956 ] = 1 – Antilog [ 0.0836 ] = 1 – 1.213 = – 0.213 In case (i) r = – 0.213 Case (ii): Second Judge Third Judge Rank difference d d2 3 6 –3 9 5 4 +1 1 8 9 –1 1 4 8 –4 16 7 1 +6 36 10 2 + 8 64 2 3 –1 1 1 10 – 9 81 6 5 +1 1 9 7 +2 4 214 n = 10, 6d2 = 214 Coefficient of correlation: 66d 2 r = 1 – n n2 – 1 6 u 214 = 1 – 10 102 – 1 CU IDOL SELF LEARNING MATERIAL (SLM)
Correlation Analysis 319 1284 = 1 – 990 = 1 – Antilog [ log 1284 – log 990 ] = 1 – Antilog [ 3.1086 – 2.9956 ] = 1 – Antilog [ 0.1130 ] = 1 – 1.297 = – 0.297 In case (ii) r = – 0.297 Case (iii): Second Judge Third Judge Rank difference d d2 1 6 –5 25 6 4 +2 4 5 9 –4 10 8 + 2 16 3 1 +2 4 2 20 4 4 3 +1 0 9 10 – 1 1 7 5 +2 1 8 7 +1 4 1 60 n = 10, 6d2 = 60 Coefficient of correlation: 66d2 r = 1 – n n2 – 1 6 u 60 = 1 – 10 102 – 1 360 = 1 – 990 CU IDOL SELF LEARNING MATERIAL (SLM)
320 Business Mathematics and Statistics = 1 – Antilog [log 36 – log 99] = 1 – Antilog [ 1.5563 – 1.9956 ] = 1 – Antilog [ 1.5607] In case (iii) r = + 0.6363 = 1 – 0.3637 = 0.6363 Conclusions: In cases (i) and (ii) the correlation is negative, but in case (iii) the correlation is positive and therefore it is obvious that the First and Third Judges have to a large extent identical approach to common tastes in beauty. Problem 17: The coefficient of rank correlation between marks in Statistics and marks in Accountancy obtained by a certain group of students is 0.8. If the sum of squares of differences in ranks is given to be 33, find the number of students in the group. 66d 2 Solution: r = 1 – n n2 – 1 Putting r = 0.8, S d2 = 33 in the above, we get 6 u 33 198 0.8 = 1 – n n2 – 1 = 1 – n n2 – 1 198 198 ? n n2 – 1 = 1 – 0.8 n n2 – 1 = 0.2 n (n2 – 1) = 198 = 990 ?n3 – n2 – 990 = 0 0.2 (n – 10) (n2 + 10n + 99) = 0 n = 10 CU IDOL SELF LEARNING MATERIAL (SLM)
Correlation Analysis 321 Problem 18: Calculate the coefficient of correlation between the ages of 100 husbands and wives from the following data: Age of husbands Ages of Wives in Years in years 10–20 20–30 30–40 40–50 50–60 Total 15–25 6 3 — —— 9 25–35 3 16 10 —— 29 35–45 — 10 15 7— 32 45–55 — — 7 10 4 21 55–65 — — — 45 9 Total 9 29 32 21 9 100 Solution: We have n = 100, 6f dx dy = 98, 6f dx = – 8 6fd2x = 122, 6fdy = – 8, 6fd2y = 122 Coefficient of correlation: n 6 f dx dy – 6 f dx . 6 f dy r= ¬ªn 6 fd2x – 6 f dx 2 º ª n 6 f d2y – 6 f dy2 º ¼ ¬ ¼ 100 u 98 – (– 8) (– 8) = > 100 u 122 – 64 @ > 100 u 122 – 64 @ 9800 – 64 = 12136 u 12136 9736 = 12136 = Antilog [ log 9736 – log 12136 ] = Antilog [ 3.9884 – 4.0842] = Antilog [ 1.9042] = 0.8021 ? r = + 0.8 approx. CU IDOL SELF LEARNING MATERIAL (SLM)
322 Business Mathematics and StatisticsAge of wives (in yrs.) (X) CU IDOL SELF LEARNING MATERIAL (SLM) 10-20 20-30 30-40 40-50 50-60 15 55 – 20 25 35 45 + 20 –2 – 10 0 + 10 +2 Ages of husbands dx –1 0 +1 — f fdy fd2y fdxdy (in yrs.) (Y) dy — — 15–25 20 – 20 – 2 6 3 —— 4 9 – 18 36 30 25–35 30 – 10 – 1 3 5 29 – 29 29 22 35–45 40 0 0 — 16 10 — 32 0 0 45-55 50 + 10 + 1 — 21 + 21 21 0 55–65 60 + 20 + 2 — 10 15 7 9 + 18 36 18 28 — 7 10 —— 4 f 9 29 32 21 9 100 – 8 122 98 fdx – 18 – 29 0 21 18 – 8 fd2x 36 29 0 21 36 122 fd2dy 30 22 0 18 28 98
Correlation Analysis 323 14.8 Summary Two statistical series vary in tandem: a variation in one series is accompanied by a variation in the other. In such case we can study closely the relationship between the two related series. The question is: if a change in one series is followed by a change in the other, what is the magnitude of change in a relative sense? In short, the numerical relationship between two series is called correlation. The following are some of the related variables height and weight, demand and price, income and expenditure. Correlation is direct or positive if two variables change in the same direction, it is inverse or negative if the variables change in opposite directions. We can study the correlation between two sets of figures through the following methods 1. By drawing the graph of the data 2. By constructing a scatter diagram. 3. By calculating the coefficient of correlation. The extent of correlation between two variables that are related can be determined by a coefficient. Karl Pearson devised a formula for this coefficient, which is as follows: ¦ xy r = nV1.V2 Where r =the coefficient of correlation, n = the number of pairs of items, x = deviation of the first series (subject) from AM. Y = deviation of the second series (relative) from AM. V1 = S D of the 1st series V2 = S D of the 2nd series. Interpretation of the value of r If r = +1, correlation is a perfect positive If r = –1, Correlation is a perfect negative. If r = 0, there is no correlation. CU IDOL SELF LEARNING MATERIAL (SLM)
324 Business Mathematics and Statistics If r is equal to some positive number between 0 and +1 then correlation is limited and positive. If r is equal to some negative number between -1 and 0 then correlation is negative and limited. Short-cut method: When the arithmetic means of both sets of numerical items are not whole numbers and involve decimals, calculating the coefficient or correlation by the direct method becomes tedious. To overcome this difficulty the following modified short-cut method formula is used: where r = coefficient of correlation. N = the number of pairs of items X = deviation of the first series from an arbitrary mean, Y = deviation of the second series from an arbitrary mean The formula for calculating the coefficient or correlation in a grouped series is where r = coefficient of correlation, f = frequency dx = step deviation of X(1st) series, dy = step deviation of Y (2nd) series, n = total frequencies. Here is the procedure for finding the coefficient of correlation through the method of rank differences where, r = coefficient of correlation D = rank difference between the corresponding ranks N = the number of pairs of items. 14.9 Key Words/Abbreviations Scatter diagram, correlation groups, Karl Pearson’s coefficient of correlation, Rank correlation, Probable error, concurrent deviation. CU IDOL SELF LEARNING MATERIAL (SLM)
Correlation Analysis 325 14.10 Learning Activity 1. From the data given below, find out whether there is any significant relationship between age and intelligence. Age in years Test Marks 16 17 18 19 20 150–200 32213 200–250 53422 250–300 41651 300–350 23312 350–400 11231 2. The coefficient of rank correlation of the marks obtained by 10 students in English and Economics was found to be 0.5. It was later discovered that the difference in ranks in the two subjects obtained by one of the student’s was wrongly taken as 3 instead of 7. Find the correct coefficient of rank correlation. 3. Ten recruits were subjected to a selection test to ascertain their suitability for a certain course of training. At the time of the training they were given a proficiency test. The marks secured by the recruits in the selection test (X) and in the proficiency test (Y) are given below : X : 44 49 52 52 47 76 65 60 63 58 Y : 48 55 45 60 43 80 58 50 77 46 Calculate the coefficient of correlation and comment on its value. 4. Compute the Pearson’s coefficient of correlation for the following values of x and y and interpret your result. X : 78 89 97 69 59 79 68 61 Y : 125 137 156 112 107 136 123 108 5. Ten competitors in a music contest were ranked by three different judges as follows : Candidates: I II III IV V VI VII VIII IX X Ranking A: 1 10 9 8 3 5 2 4 6 7 by B: 3 4 1 9 7 8 10 2 5 6 judges C: 5 7 9 8 2 10 1 4 3 6 Find out which two judges differ most in their liking for music. CU IDOL SELF LEARNING MATERIAL (SLM)
326 Business Mathematics and Statistics 14.11 Unit End Questions (MCQ and Descriptive) A. Descriptive Type: Short Answer Type Questions 1. What is meant by the term correlation coefficient ? Explain the reasons for its calculation. 2. What is a ‘scatter diagram’? Explain its use in interpreting the correlation between two variables. 3. Write an explanatory note on the coefficient of correlation and its utility. 4. Define Karl Pearson’s coefficient of correlation. Interpret about the correlation when the correlation coefficients are (i) r = 1, (ii) r = 0 and (iii) r = – 1 5. What is Spearman’s rank correlation coefficient ? Explain its usefulness. 6. Interpret the formula: 0.6745 l – r2 State its practical utility in statistics. n 7. Define ‘Lag’ and explain its use. 8. Calculate the coefficient of correlation from the following data and interpret it: Yield of Cotton Price per in lakhs of bales bale 25 347 27 367 26 341 24 232 24 184 22 201 24 198 24 208 26 225 25 210 9. Find the coefficient of correlation from the following data: 65 39 X : 78 36 98 25 75 82 90 62 53 47 Y : 84 51 91 60 68 62 86 58, CU IDOL SELF LEARNING MATERIAL (SLM)
Correlation Analysis 327 10. Find Karl Pearson’s coefficient of correlation between X and Y series: X series: 17 18 19 19 20 20 21 21 22 23 Y series: 12 16 14 11 15 19 22 16 15 20 11. Find the correlation between height of father (X) and Height of son (Y) from the following data and comment upon its value: X : 65 66 67 67 68 69 70 72 64 61 Y : 67 68 65 56 72 69 71 68 65 60 12. Calculate the correlation coefficient for the following data concerning marks in Statistics and Accountancy of 12 students: Statistics: 52 74 93 55 41 23 92 64 40 71 33 71 Accountancy: 45 80 63 60 35 40 70 58 43 64 51 75 13. Calculate coefficient of correlation between the heights and weights of 10 students and by the test of probable error, show whether or not the relationship is significant: S. N. of Students Height (inches) Weight (lb) 1 57 113 2 59 117 3 62 126 4 63 125 5 64 130 6 65 128 7 58 110 8 66 132 9 70 140 10 72 149 CU IDOL SELF LEARNING MATERIAL (SLM)
328 Business Mathematics and Statistics 14. Calculate the coefficient of correlation from the following table relating to the marks obtained by 12 students in two subjects. Interpret your result Students Seat No. 1 2 3 4 5 6 7 8 9 10 11 12 Marks in Subject A 65 40 35 75 63 80 35 20 85 65 55 33 Marks in Subject B 30 55 68 28 76 25 80 85 20 30 45 65 15. Compute the coefficient of correlation by Pearson’s Method and also by the method of ranks, between the variables X and Y from the values given below: X : 78 36 98 25 75 82 90 62 65 39 Y : 84 51 91 60 68 62 86 58 53 47 16. Draw a scatter diagram to represent the following data: X : 7 10 17 16 12 13 9 Y : 15 18 30 27 25 23 30 Calculate the coefficient of correlation between x and y from the above data: 17. Calculate the correlation coefficient from the following data : X : 23 27 28 29 30 31 33 35 36 39 Y : 18 22 23 24 25 26 28 29 30 32 18. Calculate the coefficients of correlation between the values of x and y given below : X : 78 89 97 69 59 79 68 61 Y : 125 137 156 112 107 136 123 138 You may use 69 as working mean for x and 112 as that for y. 19. State the limits between which correlation coefficient lies. Compute its value for the following data: Age of 22 23 24 25 26 27 28 29 30 31 Bridegroom 18 18 29 18 21 23 25 24 25 26 CU IDOL SELF LEARNING MATERIAL (SLM)
Correlation Analysis 329 20. Find the coefficient of correlation ‘r’ using the shortcut method. 69 X : 97 34 68 56 71 43 39 52 103 Y : 110 113 124 101 142 115 131 120 21. Find if there is any significant correlation between the following heights and weights. Height: 59 62 60 64 61 67 66 63 65 68 (in ins.) Weight: 112 118 114 117 110 125 122 116 124 128 (in lb) 22. Find out the coefficient of correlation for the data given below: X : 68 56 72 80 89 93 94 99 Y : 97 110 112 118 116 122 125 103 Also calculate its Probable Error. 23. Eight participants in a beauty contest were given the following ranks by three different judges: Judge A: 3 5 4 2 1 6 7 8 Judge B: 5 2 1 6 3 4 8 7 Judge C: 1 3 2 5 6 7 8 4 By using the method of rank differences for finding the coefficient or correlation, find the pair of judges whose tastes of beauty are nearly identical. 24. Calculate the coefficient of concurrent deviations from the following data : X : 40 44 46 50 53 57 61 66 72 78 Y : 42 45 48 52 55 59 64 70 78 86 25. Calculate the coefficient of correlation from the followng table: X 21 22 23 24 25 26 27 Total Y 0–10 ———— 3— 58 10–20 20–30 ——— 6 4 7 2 19 —— 8 7 6 — — 21 CU IDOL SELF LEARNING MATERIAL (SLM)
330 Business Mathematics and Statistics 30–40 — 11 9 5 — — — 25 40–50 — 8 7 2 — — — 17 50–60 5 4 6 — — — — 15 60–70 9 4 — — — — — 13 Total 14 27 30 20 13 7 7 118 B. Multiple Choice/Objective Type Questions 1. The following is not a related variables. (a) Height and weight (b) Demand and Price (c) Income and expenditure (d) Weight and price 2. r = coefficient of correlation, If r = +1 the correlation is __________. (a) Perfect positive (b) Perfect negative (c) Imperfect positive (d) None if these 3. If r is equal to some negative number between –1 and 0 then correlation is __________ and limited. (a) Positive (b) Negative (c) Multiple (d) All of these 4. P E is meant for _______________. (a) Public Error (b) Private error (c) Probable error (d) None of these 5. If the plotted points in a scatter diagram move downwards from left top to right bottom, then the condition is __________. (a) Zero (b) Negative (c) Positive (d) None of these Answers: (1) (d); (2) (a); (3) (b); (4) (c); (5) (b) 14.12 References References of this unit have been given at the end of the book. CU IDOL SELF LEARNING MATERIAL (SLM)
Regression Analysis 331 UNIT 15 REGRESSION ANALYSIS Structure 15.0 Learning Objectives 15.1 Introduction 15.2 Regression Equation of Y on X 15.3 Regression Equation of X on Y 15.4 Regression Coefficients 15.5 Summary 15.6 Key Words/Abbreviations 15.7 Learning Activity 15.8 Unit End Questions (MCQ and Descriptive) 15.9 References 15.0 Learning Objectives After studying this unit, you will be able to: z Analyse the method of calculating coefficient of correlation in a grouped data. z Elaborate Spearman’s method of calculating rank correlation and as well as the method for finding the coefficient of concurrent deviations z Draw the Galtons graph for ratio of variation and the meaning of lag and lead. z Explain the method of constructing the two regression equation X on Y and Y on X. z Calculate the two equation coefficient and using these value you can find the value of the correlation coefficient. z Calculate the regression coefficient of the value of r, Vx and Vy . CU IDOL SELF LEARNING MATERIAL (SLM)
332 Business Mathematics and Statistics 15.1 Introduction The term “regression” was used by Sir Francis Galton to describe a hereditary phenomenon which he observed in his studies relating to the heights of sons and fathers. His main observation was that though tall fathers have usually, tall sons, the average height of the sons of tall fathers is less than the average height of the fathers. In short, the average height of the sons of tall fathers will regress or go back towards the general average height. This backward or downward tendency in the average height was described by Sir Francis Galton as regression. At present the term regression is being used very widely for describing many other types of economic, business and social phenomena. A line that is drawn as close as possible to the plotted points of the scatter diagram shows the average tendency of the plotted points. This line is known as the regression line and its equation is called the regression equation. The coefficient of correlation indicates the extent of relationship between two sets of figures whereas a regerssion equation enables us to calculate the amount of change in one variable corresponding to a change in the other. 15.2 Regression Equation of Y on X The regression equation that enables us to find out the amount of change in Y corresponding to a change in X, is called the regression equation of Y on X. If this regression equation is repersented by Y = a + bX, then the constants a and b are determined from the two normal equations: 6Y = Na + b6X 6XY = a6X + b6X2 15.3 Regression Equation of X on Y The regression equation that enables us to find out the amount of change in X corresponding to a change in Y is called the regression of X on Y. The regression equation is represented by X = a' + b'Y, where the constant a' and b' are determined from two normal equations: CU IDOL SELF LEARNING MATERIAL (SLM)
Regression Analysis 333 6Y = Na' + b'6X 6XY = a'6X + b'6X2 Conclusions: From the graph of the two regression lines the following conclusions can be drawn: (1) The two regression lines intersect at the point x , y where x = Arithmetic mean of the items in X series y = Arithmetic mean of the items in Y series (2) If the two regression liens are close to each other then the correlation between X series and Y series is very high. (3) If the two regression lines coincide then there is perfect correlation. (4) If the two regression liens are at a distance from each other then the correlation between X series and Y series is of less degree. (5) If the two regression lines cut at right angles then there is no correlation between X series and Y series. Problem 1: From the data given below, find out: (1) Karl Pearson’s coefficient of correlation (2) The two regression equations (3) The two regression coefficients (4) The most likely value of X when Y = 41 (5) The most likely value of Y when X = 45 X: 52 60 58 39 41 53 47 34 Y: 40 46 43 54 49 55 48 57 Xx x2 Yy y2 xy 52 + 4 16 40 – 9 81 – 36 60 + 12 144 46 – 3 9 – 36 58 + 10 100 43 – 6 39 – 9 81 54 + 5 36 – 60 41 – 5 49 49 0 25 – 45 53 + 5 55 + 6 25 00 36 + 30 CU IDOL SELF LEARNING MATERIAL (SLM)
334 Business Mathematics and Statistics 47 – 1 1 48 –1 1 +1 34 – 14 196 57 +8 64 – 112 384 612 392 252 – 258 384 For X: A.M = x = 8 = 48 S.D. = Vx = 6x2 612 = 8 = 76.5 = 8.746 n 392 For Y: A.M = y = 8 = 49 S.D. = Vy = 6y2 252 = 8 = 31.5 = 5.612 n 6xy Coefficient of correlation: r = nVx u .Vy 258 = 8 (8.746) (5.612) = – .66 Regression Equations: The regression equation of Y on X is Vy y – y = r Vx (x – x ) 5.61 y – 49 = – .66x (x – 48) 8.75 y – 49 = – .42 x + 20.16 y = – .42 x + 69.16 The regression equation of X on Y is Vx x – x = r Vy (y – y ) 8.75 x – 48 = – .66 × 5.61 (y – 49) CU IDOL SELF LEARNING MATERIAL (SLM)
Regression Analysis 335 6xy 258 x – 48 = – 1.02y + 49.98 r = 6x2 u 6y2 = – 612 u 252 = – .66 x = – 1.02y + 97.98 15.4 Regression Coefficients The regression coefficient of y on x is Vy 5.61 r Vx = .66 × 8.75 = – .42 The regression coefficient of x on y is Vx 8.75 r Vy = – .66 × 5.61 = – 1.02 The most likely value of X when Y = 41 X = – 1.02 Y + 97.98 = (– 1.02) × 41 + 97.98 = – 41.82 + 97.98 = 56.16 The most likely value of Y when X = 45 Y = – .42X + 69.16 = – 41 × 45 + 69.16 = – 18.90 + 69.16 = 50.26. Problem 2: It is given that two regression coefficients are + .8 and + .6 find out the coefficient of correlation. r= r Vx u r Vy Vy Vx = ( .8) (+ .6) CU IDOL SELF LEARNING MATERIAL (SLM)
336 Business Mathematics and Statistics = .48 = + .69 Problem 3: Following data refers to years of service in a factory of seven persons in a specialised field and to their monthly incomes: Years of Service 11 7 9 5 8 6 10 Income in 1000s of ` 7532648 Find the regression equation of income on years of service. Using it, what initial start would you recommend for a person applying for a job after having served in another factory in a similar field for twelve years? Solution: Deviation Income Deviation x from Years x2 Y* y from y2 xy X Arithmean Arithmean 11 + 3 9 7 + 2 4 + 6 7 –1 1 5 0 00 9 +1 1 3 –2 4 –2 5 –3 9 2 –3 9 +9 8 3 0 6 +1 1 0 6 –2 4 4 –1 1 +2 10 + 2 4 8 + 3 9 + 6 56 0 28 35 0 28 21 * Income in thousands of Rupees 56 For series X: Arith-Mean = 7 = 8, i.e., x = 8 35 = 5, i.e., y = 5 For series Y: Arith-Mean = 7 For X series: S. D. = Vx = 6x2 28 4 =2 = 7= n For Y series: S. D. = Vy = 6y2 28 4 =2 = = n 7 CU IDOL SELF LEARNING MATERIAL (SLM)
Regression Analysis 337 The coefficient of correlation is: 6xy 21 3 r = n Vx Vy = 7 u 2 u 2 = 4 = 0.75 Regression Equation y on x is: Vy y – y = r Vx x – x 2 i.e., y – 5 = 0.75 × (x – 8) 2 3 i.e., y – 5 = 4 (x – 8) i.e., 4y – 20 = 3x – 24 3 4y = 3x – 4 ; or y = 4 x – 1 when x = 12, the value of y is obtained from the above equation: 3 i.e., y = 4 × 12 – 1 = 9 – 1, i.e., y = 8. Remark: For a person who has served in another factory in a similar field for 12 years, the initial start that could be recommended is ` 8,000. Problem 4: For a group of children, mean age is 10 years with standard deviation 2.5 years. The average height of the group is 125 cm. with standard deviation of 13 cm. The coefficient of correlation between age and height is 0.6. Write the equations of the two regression lines and explain their use. Solution: Regression equation of y on x is: y–y = r Vy x – x Vx and regression equation of x on y is: x –x =r Vx y – y Vy CU IDOL SELF LEARNING MATERIAL (SLM)
338 Business Mathematics and Statistics ? The given values are: Mean age: x = 10 S.D.: Vx = 2.5 Mean height: y = 12.5 S.D.: Vy = 13 r = + 0.6 Regression equation of y on x is y– y = r Vx x – x Vx 13 i.e., y – 125 = 0.6 × 2.5 (x – 10) 7.8 y – 125 = 2.5 (x – 10) ? y = 3.12 (x – 10) + 125 = 3.12x – 31.2 + 125 ? y = 3.12x + 93.8 Regression equation of x on y is: x – x = Vx r Vy y– y 2.5 i.e., x – 10 = 0.6 × 13 (y – 125) x – 10 = 1.50 (y – 125) 13 ? x = 0.11 (y – 125) + 10 x = 0.11y – 13.75 + 10 ? x = 0.11y – 3.75 By using the regression equation of y on x it is possible to estimate the value of y given the value of x. Similarly using the regression equation of x on y we can estimate the value of x on the CU IDOL SELF LEARNING MATERIAL (SLM)
Regression Analysis 339 basis of the given value of y. Corresponding to these two regression equations we have two regression lines. These enable us to find the relationship between the two variables. Problem 5: Given that the regression equation of Y on X is 5y – 3x = 5 and that of X on Y is 3y – 5x – 2 = 0, find the coefficient of correlation between X and Y. Solution: Regression equation of y on x is: 5y – 3x = 5 3 ?y = 5 x – 1 The regression coefficient of y on x is: r Vy = 3 Vx 5 Regression equation of x on y is: 32 3x – 5x – 2 = 0 ? x = 5 y – 5 Regression coefficient of x on y is: r Vy = 3 Vx 5 We have r= r Vx u r Vy Vy Vx = 3 u 3 = 3 = 0.6 [Ans. r + 0.6] 5 5 5 Problem 6: For two variables x and y the regression of x on y is x = 4y – 3 and the regression equation of y on x is 9y = x + 13. Find the mean of x and y and the coefficient of correlation between x and y. CU IDOL SELF LEARNING MATERIAL (SLM)
340 Business Mathematics and Statistics Solution: Solving the two given regression equations we get the mean values of x and y 4y = x + 3 … (1) 9y = x + 13 … (2) Subtracting equation (1) from (2) 5y = 10 ?y=2 8= x+3 ?x = 8 – 3 = 5 ? The mean values are 5 and 2. Regression coefficient of y on x is byx = r Vy = 1 Vx 9 Regression coefficient of x on y is bxy = r Vx =4 Vy r= byx u bxy = 1 u4= 4 99 2 [Ans: r = + 0.67] = 3 = 0.67 Problem 7: For the data given below obtain the two regression equations. X: 2 3 1 4 Y: 4 5 1 2 Solution: x x2 y y2 xy 24 4 16 8 39 11 5 25 15 4 16 111 248 10 30 12 46 32 Let the regression equation of y on x be: y = a + bx, then a and b are determined from the two normal equations. CU IDOL SELF LEARNING MATERIAL (SLM)
Regression Analysis 341 6y = na + b6x … (1) 6xy = a6x + b6x2 … (2) 12 = 4a + 10b … (1) 32 = 0a + 30b … (2) Multiply equation (1) by 3 and subtract equation (2) from it, then 36 = 12a + 30b 32 = 10a + 30b 4 = 2a ? a = 2 From equation (1), we get 12 = 4 × 2 + 10b ? 12 = 8 + 10b ? 12 – 8 = 10b 4 ?b = 10 = 0.4 The regression equation of y on x is: y = a + bx i.e., y = 2 + 0.4x Let the regression equation of x on y be: x = a' + b'y then the constants a' and b' are determiend from the two normal equations 6x = na' + b'6x 6xy = a'6x + b'6x2 Therefore, 10 = 4a' + 12b' 32 = 12a' + 46b' Multiply equation (1) by 3 and subtract it from (2) 30 = 12a' + 36b' 32 = 12a' – 46b' 2 = 10b' 2 ? b' = 10 = 0.2 CU IDOL SELF LEARNING MATERIAL (SLM)
342 Business Mathematics and Statistics From equation (1) 10 = 4a' + 12 (0.2) 10 = 4a' + 2.4 ? 10 – 2.4 = 4a' ? 4a' = 7.6 7.6 a' = 4 = 1.9 The regression equation of x on y is: x = a' + b’y i.e., x = 1.9 + 0.2y Problem 8: Given the following data, estimate the marks in Mathematics obtained by a student who has scored 60 marks in English. Arithmetic average of marks in Mathematics (all students) : 80 Arithmetic average of marks in English (all students) : 50 S. D. of marks in Mathematics : 15 S. D. of marks in English : 10 Coefficient of correlation between marks in Mathematics and marks in English is 0.4 Solution: Maths : x = 80, Vx = 15 r = 0.4 English: y = 50 , Vy = 10 Regression equation of x on y is: Vx y – y x – x = r Vy 15 ? x – 80 = 0.4 × 10 (y – 50) = 0.6 (y – 50) x – 80 = 0.6y – 30 ? x = 0.6y – 30 + 80 x = 0.6y + 50 CU IDOL SELF LEARNING MATERIAL (SLM)
Regression Analysis 343 If y = 60, then x = 0.6 × 60 + 50 = 36 + 50 ? x = 86 Marks obtained by the student in Maths = 86. Relation between Correlation and Regression 1. Whereas the correlation is the relationship between two or more variables, when the movement of one tend to be corresponding to the other, the Regression is the return to the average value and is the mathematical average relationship between the two variables. 2. Correlation need not imply cause and effect relationship, between the variables under study, whereas Regression clearly establishes this relationship. There is a definite cause and effect relationship between dependent and independent variable. 3. Correlation coefficient between two variables is the measure of direction and degree of the linear relationship between the two variables, which is mutual and symmetric, i.e., rxy = ryx. But in case of regression, the dependent and independent variables have a definite direction. Hence there are two distinct lines of regression, i.e., y on x or x on y and hence byx ? bxy. 4. Whereas the correlation coefficient rxy is a relative measure of the linear relationship of x and y and is independent of the unit of measurement, the regression coefficient bxy, and byx are absolute measures representing the change in the values of variable y(x) for a unit change in x(y). 5. There can be a non-sense correlation between the two variables; there is no such non- sense regression. 6. Correlation analysis centres around only linear relationship between the variables, whereas in regression, there can be linear and non-linear relationship. Probable Error Probable error of Co-efficient of Correlation gives us the two limits within which the co- efficient of correlation of Series selected at random from the same universe is likely to fall. The formula for the probable error of r is as follows. CU IDOL SELF LEARNING MATERIAL (SLM)
344 Business Mathematics and Statistics 1 r2 PE= 0.6745 N Where, r = Co-efficient of Correlation N = Number of pairs of observations Significance of Probable Error Probable error is useful in the following interpretation: 1. If the value of ‘N’ is less than the probable error there is no evidence of correlation, i.e., The value of ‘r’ is not at all significant. 2. If the value of ‘r’ is more than 6 times the probable error, the Co-efficient of Correlation is practically certain, i.e., The value of ‘r’ is Significant. 3. By adding and subtracting the value of P.E from the value of ‘r,’ we get the upper limit and the lower limit, respectively within which the correlation of coefficient is expected to lie. Symbolically, it can be expressed: P - r ± P.E. r Where U (rho) denotes Correlation in the population The Conditions necessary for the use of probable error The measure of probable error can be Properly used only When the following three conditions exist: 1. The data must approximate a normal frequency curve, i.e. bell shaped curve. 2. The Statistical measure for which the P.E is computed must have been calculated from a Sample. 3. The sample must have been Selected in an unbiased manner and the individual items must be independent. Practical Problems on Correlation Analysis Illustration 1: If r = 0.6 and N = 64 of a distribution, Find out the probable error. Solution: PE = 0.6745 1 r2 = 0.6745 1 (0.6)2 = 0.6745 × 0.08 = 0.06 N 64 CU IDOL SELF LEARNING MATERIAL (SLM)
Search
Read the Text Version
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
- 213
- 214
- 215
- 216
- 217
- 218
- 219
- 220
- 221
- 222
- 223
- 224
- 225
- 226
- 227
- 228
- 229
- 230
- 231
- 232
- 233
- 234
- 235
- 236
- 237
- 238
- 239
- 240
- 241
- 242
- 243
- 244
- 245
- 246
- 247
- 248
- 249
- 250
- 251
- 252
- 253
- 254
- 255
- 256
- 257
- 258
- 259
- 260
- 261
- 262
- 263
- 264
- 265
- 266
- 267
- 268
- 269
- 270
- 271
- 272
- 273
- 274
- 275
- 276
- 277
- 278
- 279
- 280
- 281
- 282
- 283
- 284
- 285
- 286
- 287
- 288
- 289
- 290
- 291
- 292
- 293
- 294
- 295
- 296
- 297
- 298
- 299
- 300
- 301
- 302
- 303
- 304
- 305
- 306
- 307
- 308
- 309
- 310
- 311
- 312
- 313
- 314
- 315
- 316
- 317
- 318
- 319
- 320
- 321
- 322
- 323
- 324
- 325
- 326
- 327
- 328
- 329
- 330
- 331
- 332
- 333
- 334
- 335
- 336
- 337
- 338
- 339
- 340
- 341
- 342
- 343
- 344
- 345
- 346
- 347
- 348
- 349
- 350
- 351
- 352
- 353
- 354
- 355
- 356
- 357
- 358
- 359
- 360
- 361
- 362
- 363
- 364
- 365
- 366
- 367
- 368
- 369
- 370
- 371
- 372
- 373
- 374
- 375
- 376
- 377
- 378
- 379
- 380
- 381
- 382
- 383
- 384
- 385
- 386
- 387
- 388
- 389
- 390
- 391