Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore BUSINESS STATICS

BUSINESS STATICS

Published by International College of Financial Planning, 2020-06-07 13:30:34

Description: BUSINESS STATICS

Search

Read the Text Version

Based upon this information, the probability that a student picked up at random Probability Theory, will be female is 30/50 or 0.6, since there are 30 females in the total class of50 students. Permutations and Combinations Now, suppose that we are given additional information that the person picked up at random is Indian, then what is the probability that this person is a female? This additional NOTES information will result in revised probability or posteriorprobabilityin the sense that it is assigned to the outcome ofthe event after this additional information is made available. Since we are interested in the revised probability ofpicking a female student at random provided that we know that the student is Indian. Let-\"' be the event female, f\\ be the event maleand Bthe event Indian. Then based upon our knowledge ofconditional probability, Bayes' theorem can be stated as follows, In the example discussed here, there are two basic events which are~ (female) and f\\ (male). However, ifthere are n basic events, -\"', f\\, .....An' then Bayes' theorem can be generalized as, Solving the case of2 events we have, ~A I B)= (30 I 50)(20 I 30) = 20 I 35 = 4 I 7 = 0.57 (30 I 50)(20 I 30) + (20 I 50)(15 I 20) This example shows that while the prior probability of picking up a female student is 0.6, the posteriorprobabilitybecomes 0.57 after the additional information that the student is an American is incorporated in the problem. Another example ofapplication ofBayes' theorem is as follows: Example 6.2: A businessman wants to construct a hotel in New Delhi. He generally builds three types ofhotels. These are 50 room, 100 room and 150 room hotels, depending upon the demand for the rooms, which is a function of the area in which the hotel is located, and the traffic flow. The demand can be categorized as low, medium or high. Depending upon these various demands, the businessman has made some preliminary assessment of his net profits and possible losses (in thousands of dollars) for these various types ofhotels. These pay-offs are shown in the following table. SIJltes d'NiJture Demand for Rooms low(~) M:xliwn(A,) High(~) 0.2 0.5 0.3 Demand Probability Number ofRooms ~=(50) 25 35 50 ~=(100) -10 40 70 ~=(150)----3-0-------2-0-------10-0-- Solution: The businessman has also assigned 'prior probabilities' to the demand structure or rooms. These probabilities reflect the initial judgement of the businessman based upon his intuition and his degree ofbeliefregarding the outcomes ofthe states ofnature. ~rmndfor rooiiB Probabjfity ofdemand Low(~) 1)2 Medium(A) 0.5 0.3 High(~) Self-Instructional Mlterial 141

Business Smtistics-II Based upon these values, the expected pay-offs for various rooms canbe computed as follows, NOTES EV(50) = ( 25 X 0.2) + (35 X 0.5) +(50 X 0.3) = 37.50 142 Self-InstructionalMlterial EV(100)={-10 X 0.2)+(40 X 0.5)+(70 X 0.3)=39.00 EV(150) = (-30 X 0.2) + (20 X 0.5) + (100 X 0.3) = 34.00 This gives us the maximum pay-offof$39,000 for building a 100 rooms hotel. Now, the hotelier must decide whether to gather additional information regarding the states of nature, so that these states can be predicted more accurately than the preliminary assessment. The basis of such a decision would be the cost of obtaining additional information. Ifthis cost is less than the increase in maximum expected profit, then such additional information isjustified. Suppose that the businessman asks a consultant to study the market and predict the states ofnature more accurately. This study is going to cost the businessman $10,000. This cost would be justified if the maximum expected profit with the new states of nature is at least $10,000 more than the expected pay-offwith the prior probabilities. The consultant made some studies and came up with the estimates oflow demand (A;), medium demand(~, and high demand(~ with a degree ofreliability in these estimates. This degree ofreliability is expressed as conditional probabilitywhich is the probability that the consultant's estimate of low demand will be correct and the demand will be actually low. Similarly, there will be a conditionalprobability ofthe consultant's estimate of medium demand, when the demand is actually low, and so on. These conditional probabilities are expressed as follows. Conditional Probabilities States of (-\"\\) ~ A; ~ Nature (A) (Demand) 0.5 0.3 02 (AJ 0.2 0.6 02 51 03 0.6 The values in the preceding table are conditional probabilities and are interpreted as follows: The upper north-west value of0.5 is the probability that the consultant's prediction will be for low demand (A;} when the demand is actually low. Similarly, the probability is 0.3 that the consultant's estimate will be for medium demand (~ when in fact the demand is low, and so on. In other words, R.)(/ A)= 0.5 and lfXj \"\") =0.3. Similarly, }{~I~) = 0.2 and}{~ I~) = 0.6, and so on. Our objective is to obtain posteriors which are computed by taking the additional information into consideration. One way to reach this objective is to first compute the joint probability which is the product ofprior probability and conditional probability for each state ofnature. Joint probabilities as computed is given as, State Prior Joint Probabilities ofN:lture Probability lt-'\\~ lt-'\\A;> lt-'\\~ 0.2 X 0.2 = 0.04 02 0.2 X 0.5 = 0.1 0.2 X 0.3 = 0.06 0.5 X 0.2=0.1 0.5 X 0.6 = 0.3 0.3x0.6=0.18 \"\"~ 0.5 0.5 X 0.2 = 0.1 0.3 X 0.3 =0.09 =0.32 ~ 03 0.3 X 0.1 =0.03 =0.45 Total marginal probabilities =0.23

Now, the posterior probabilities for each state of nature r\\ are calculated as Probability Theory, Permutations and Combinations follows: NOTES Joint probability of fl. and~ ~AI~=----------~~ Marginal probability of~ By using this formula, thejoint probabilities are converted into posterior probabilities and the computed table for these posterior probabilities is given as, States ofNlture Posterior Probabilities P(A/~) P(A/~) P(~IX;J ~ 0.1/.023 = 0.435 0.06/0.45 = 0.133 0.04/0.32 =0.125 0.1/.023 = 0.435 0.30/0.45 = 0.667 0.1/0.32 = 0.312 \"\\ 0.03/.023 = 0.130 0.09/0.45 = 0.200 0.18/0.32 =0.563 \"\\ = l.O = l.O = l.O Total Now, we have to compute the expected pay-offs for each course ofaction with the new posterior probabilities assigned to each state ofnature. The net profits for each course of action for a given state of nature is the same as before and is restated as follows. These net profits are expressed in thousands ofdollars. Low(~) Mdium(A) High(~) NumberofRooms (R) 25 35 50 (R) -10 40 70 (~) _____~_o________w_________I_oo____ Let OiJ be the monetary outcome of the course of action (1) when (J) is the qcorresponding state of nature, so that in the above case 1 will be the outcome of course of action R.. and state ofnature~' which in our case is $25,000. Similarly, ~2 will be the outcome ofaction ~ and state ofnature ~' which in our case is -$10,000, and so on. The expected value EV(in thousands ofdollars) is calculated on the basis of actual state of nature that prevails as well as the estimate of the state of nature as provided by the consultant. These expected values are calculated as follows: Course ofaction = R. Estimate ofconsultant I Actual state ofnature =X where, i= 1, 2, 3 I =A I Then (A) Course ofaction = R.. =Build 50 room hotel 0.435(25) + 0.435 (-10) + 0.130 (-30) Self-hJStructional Mllerial 143 10.875-4.35-3.9 = 2.625 E~1}~, 0.133(25) + 0.667 (-10) + 0.200 (-30) 3.325-6.67-6.0 = -9.345

Business Statistics-H ev(1) = E~1}~ NOTES = 0.125(25) +0.312(-10) +0.563(-30) 144 Self-Instructiol11llMlterial = 3.125-3.12-16.89 -16.885 (B) Course ofaction = R;_ = Build 100 room hotel ev(1) E~1)oa = 0.435(35) +0.435 (40) +0.130 (20) = 15.225 + 17.4 +2.6 = 35.225 ev(1) = E~1)q, = 0.133(35) +0.667 (40) +0.200 (20) = 4.656 + 26.68 +4.0 = 35.335 ev(1) = E~1)q, = 0.125(35) +0.312(40) +0.563(20) = 4.375 + 12.48 + 11.26 = 28.115 (C) Course ofaction = ~ = Build 150 room hotel ev(~) = E~1)q, = 0.435(50) +0.435(70) +0.130 (1 00) = 21.75 + 30.45 + 13 = 65.2 ev(1) = ~~ x~J~q3 = 0.133(50) +0.667 (70) +0.200 (100) = 6.65 +46.69 + 20 = 73.34 E{1) = E~1)o. = 0.125(50) +0.312(70) +0.563(100) 6.25 + 21.84 + 56.3 = 84.39 The expected values in thousands ofdollars, as calculated here, are presented in a tabular form.

BlcpecflxlPosterior Pay-oflS Probability 1heory, Permutations and Combinations Outcome EV(R/A;) EV(RjA;) EV(~IX) NOTES ~ 2.625 35225 652 A; -9.345 35.335 73.34 ~ -16.885 28.115 84.39 This table can now be analysed in the following manner. Ifthe outcome is A;\". it is desirable to build 150 rooms hotel, since the expected pay-offfor this course ofaction is maximum of$65,200. Similarly, ifthe outcome is A;. the course ofaction should again be ~ since the maximum pay-offis $73,34. Finally, if the outcome is A;. the maximum pay-offis $84,390 for course of action~- Accordingly, given these conditions and the pay-off, it would be advisable to build a 150 rooms hotel. 6.5 PERMUTATIONS AND COMBINATIONS If there are n objects and they can be placed in any arrangement or order, then any Check Your Progress given order of these n objects is called a permutation ofthe n objects. 6. When is the law of For example, assume that there are 4 persons A, B, C and D who can sit on any multiplication applied? ofthe 4 chairs for a group photograph. Since the first person can sit on any one ofthese chairs, there are 4 ways that person A can be seated. Now, there are 3 chairs left and 7. What is Bayes' theorem? person B can be seated in 3 ways. Similarly, C can be seated in 2 ways and D must take the last seat left. Hence, by fundamental counting , there are 4 x 3 x 2 x 1 = Self-Instructional MJterial 145 24 possible arrangements of these 4 people occupying 4 seats in any given order. Therefore, the number of permutations of 4 things taken four at a time is 24. This sequential multiplication of4 x 3 x 2 x 1 is also known as 4 factorial, symbolized as 4! In general, the number ofdifferent permutations ofn distinct items, taken all at a time, is given by, n! = If....n- 1)(n- 2)..... 1 For example, if there are 6 horses running in a race, so that n = 6, then there are 6! number of orders in which these horses can finish as first, second, third, fourth, fifth and sixth. In other words, the total number of such finishing orders is 6! = 6 X 5 X 4 X 3 X 2 X1 = 720 6.5.1 Permutation of x out of n Distinct Items Now, let us consider a situation in which we are not interested in taking all n items in a given order, but only some items xin a given order out ofthe total ofnitems so that x:::; n For example, in the horse race of 6 horses, we may be interested only in the order of first, second and third finish, for which prizes can be awarded. Since there are 6 horses, any one ofthese horses could finish first. Hence, there are 6 ways to finish in the frrst place, 5 ways to finish in the second place, since the first place has already been filled by one horse, and there are 4 ways to finish in the third place, and hence, the total number ofdistinct orders offinish for the first 3 places is, 6 X 5 X 4= 120

Business Statistics-D In general, the number of different permutations of x out of n distinct items is given by, NOTES n! Check Your Progress fi..n-1)(n- 2)....(n- x+ 1) OR (n- x)! 8. What is pennutation? 6.5.2 Combination of x Items out of D. Distinct Items 9. Define a moment generating So far, we have taken either xitems out ofn distinct items or all the n items in a given function. order. The order ofitems has been necessary in the permutation formula. Inmanycases, 146 Self-InstructionaJMiterial however, the order is unimportant. For example, the probabilityofanytwo heads out of three tosses would be different from having two heads and a tail in that order. Accordingly, the number of combinations of n distinct ite~ taken x at a time without any given order is given by: --·n-',where x~ n x!(n-x)! ~ (~The notation X.'(n _ X).1 is also simply written as ncx or . (For x= 0 or x= n, we define 0! = 1) Example 6.3: A committee often Members ofParliament (MPs) has been selected to investigate the ethical conduct ofthe Ministers. A sub-committee offour MPs is to be selected out ofthe ten MPs to investigate one Minister. Determine the number ofways in which any four members can be selected out ofthese ten. Solution: Since the order of such selection is unimportant, the number of ways of choosing the sub-committee is given by: nC - IOC - (10')_ 10! x- 4 - 4)-4!(10-4)! = 10! = 10x9x8x7x6x5x4x3x2x1 = 5040 = 210 4!6! 4x3x2x1x6x5x4x3x2x1 24 6.6 SOLVED PROBLEMS Problem 1. Acard is drawn at random from a well-shuflled pack of52 cards. What is the probabilityofgetting: (a) Ablackqueen (b) A queen, a king, or an ace ofany suit (c) A red card Solution: (a) Since there are two black queens in the pack of52 cards, the probability that a card drawn is a black queen is 2/52 = 1126. (b) There are a total of4 queens, 4 kings and 4 aces, making a total of 12 possibilities. Hence, the probability ofdrawing any one ofthese 12 cards out ofthe total 52 cards is 12/52or3/13.

(c) There are 26 red cards and 26 black cards in the deck. Hence, the probability of Probability Theory, drawing a red card is 26/52 or 112. Permutations and Combinations Problem 2. Lisa's travel club has 1000 members. 60% ofthe members are males. 45% NOTES ofthe members pay by credit card when they travel, including 175 females. Ifa member Self-Instructional 114aterial 147 enters the travel club at random, what is the probability that: (a) The member is a female (b) The member is a female and pays cash (c) The member is a male or a credit card user (d) The member pays cash if we know that the member is a female (e) Are the sex ofthe member and the mode ofpayment statistically independent events? Solution: In order to solve this problem, it is necessary to identify each category to which each member belongs. This can be shown in the form ofa table as follows: Cash Credit Card Total Male 325 275 (ill Female 225 175 400 Total 550 450 1000 (a) There are 400 females out ofa total of 1000 members. Hence, the probability that a member entering the club is a female is 400/1000 = 0.4 (b) There are 225 female members who pay by cash. Hence, the probability that the entering member is a female paying cash is 225/1 000 = 0.225 This probability can also be calculated by the formula P[A u B] = P[A] + P[B]- P[AB] where, 725 = 550 + 400 -P[AB] 1000 1000 1000 so that P[AB] = 225/1000 (c) Let Event A= male and Event B =credit card user. Then P[A u B] = P[A] + P[B]- P[AB] = 600 + 450 - 275 = 775 =0.775 1000 1000 1000 1000 (d) Since we do know that the member is a female, our population under consideration is now reduced to total number of females which is 400. Hence, the probability that a female member pays cash is 225/400 = 0.5625. (e) Let Event A= female and Event B = cash payment. Then for events A and B to be independent, the condition that P[AB] = P[A] P[B] must be satisfied. In our case P[AB] = 225/1000

Business Statistics-H P[A] = 400/1000 NOTES P[B] = 550/1000 Then, P[AB] = P[A] P[B] for independent events. 225 400 550 --'¢--X-- 1000 1000 1000 or 0.225 '¢ 0.4 X 0.55 or 0.225 '¢ 0.22 Hence, the events are not independent. Problem 3. A survey of 200 students taking one or more courses in Management, Marketing and Finance during Spring Semesterrevealed the following numbers ofstudents in various classes. Management 100 Marketing 75 Finance 125 Management and Marketing 50 Management and Finance 40 Marketing and Finance 40 How many students are taking: (a) All three subjects (b) Management but not Finance (c) Management or Marketing but not Finance (d) Finance but not Management or Marketing Solution: (a) The number of students taking all the three subjects can be calculated by the following formula. Let Event A = Management B = Marketing C =Finance AB = Management and Marketing AC = Management and Finance BC = Marketing and Finance ABC = All the three subjects. Then, [A+ B + C] = [A]+ [B] + [C]- [AB]- [AC]- [BC] +[ABC] 200 = 100+75+125-50-40-40+[ABC] [ABC] =30 148 Self-InstructionalMlterial

(b) We can now construct the Venn diagram to identify various events. Probability 1heory, Pernmtations and Combinations NOTES C (Finance) Self-Instructional Muerial 149 Hence, the number ofstudents taking Management, but not Finance is 40+20=60. (c) The number ofstudents who are taking Management or Marketing but not Finance can be calculated from the Venn diagram. This number is: 40+20+ 15=75 (d) The number ofstudents taking only Finance is given as 75. 6.7 SUMMARY In this unit, you have learned how probability and statistics are closelyrelated. Statistical data is used to draw certain conclusions with the effective use ofprobability. Various terms and concepts can be applied in the decision-making process and also in inferring the occurrence of several events in the business environment ofan organization. You have also learned that the outcomes ofmost decisions cannot be accurately predicted because of the impact of many uncontrollable and unpredictable variables, so it is necessary to scientifically evaluate the known risks. Probability theory, often known as the science ofuncertainty, is helpful for such evaluations. It also helps decision-maker to analyse-with the help oflimited information- the risks and select a strategy ofminimum risk. Now, you can understand the probability distnbutions and Bayes' theorem. Probabilitydistribution refers to the listing ofall possible outcomes ofan experiment together with their probabilities. Bayes' theorem contributes to the statistical decision theory in revising prior probabilities ofoutcomes ofevents. 6.8 ANSWERS TO 'CHECK YOUR PROGRESS' 1. The term simple probability refers to a phenomenon where only a simple or an elementary event occurs. For example, assume that event (E), the drawing ofa diamond card from a pack of 52 cards, is a simple event. Since there are 13 diamond cards in the pack and each card is equally likelyto be drawn, the probability ofevent (E) or llE] = 13/52 or 114. The term joint probability refers to the phenomenon ofoccurrence oftwo or more simple events. For example, assume that event (E) is a joint event (or compound event) ofdrawing a black ace from a pack ofcards. There are two simple events involved in the compound event, which are: the card being black and the card being an ace. Hence, JlBlack ace] or llE] = 2/52 since there are two black aces in the pack.

Business Statistics-U 2. Two events are said to be mutually exclusive, ifboth events cannot occur at the same time as the outcome ofa single experiment. For example, ifwe toss a coin, NOTES then either event head or event tail would occur, but not both. Hence, these are mutually exclusive events. lSO Self-InstructionalMaterial 3. Two events A and B are said to be independent events ifthe occurrence ofone event is not at all influenced by the occurrence ofthe other. For example, iftwo fair coins are tossed, then the result ofone toss is totally independent ofthe result ofthe other toss. The probability that a head will be the outcome ofany one toss will always be 112, irrespective of whatever the outcome is of the other toss. Hence, these two events are independent. 4. The classical theory ofprobability is the theory based on the number offavourable outcomes and the number oftotal outcomes. The probability is expressed as a ratio of these two numbers. The term 'favourable' is not the subjective value given to the outcomes, but is rather the classical terminologyused to indicate that an outcome belongs to a given event ofinterest. 5. The addition rule states that when two events are mutually exclusive, then the probability that either of the events will occur is the sum of their separate probabilities. For example, ifyou roll a single dice then the probability that it will come up with a face 5 or face 6, where event Arefers to face 5 and event B refers to face 6, both events being mutually exclusive events, is given by, JlAor Bj llAJ + llBJ Or, Jl5 or6] Jl5]+ Jl6] 116+1/6 2/6= 113 6. Multiplication rule is applied when it is necessary to compute the probability in case two events occur at the same time. 7. Bayes' theorem on probability is concerned with a method for estimating the probabilityofcauses which are responsible for the outcome ofan observed effect. The theorem contributes to the statistical decision theory in revising prior probabilities ofoutcomes ofevents based upon the observation and analysis of additional information. 8. Ifthere are n objects and they can be placed in any arrangement or order, then any given order ofthese n objects is called a permutation ofthe n objects. 9. According to probability theory, a moment generating function generates the moments for the probabilitydistribution ofa random variable X and can be defined as, M£t)= E (tfX), t E lR 6.9 QUESTIONS AND EXERCISES Short-Answer Questions 1. Explain the concept ofprobability. 2. What are the different theories ofprobability? Explain briefly.

3. What is a mutually exclusive event? Probability Them:;, 4. What do you mean by simple probability? Permutations and Combinations 5. Explain the axiomatic approach to probability. 6. Explain the concept ofmultiplication rule. NOTES 7. What is Bayes' theorem? What is its importance in statistical calculations? Self-Instructional Material 151 Long-Answer Questions 1. A family plans to have two children. What is the probability that both children will be boys? (List all the possibilities and then select the one which would be two boys.) 2. A card is selected at random from an ordinary well-shuffled pack of 52 cards. What is the probability ofgetting, (a) Aking (b) Aspade (c) A king or an ace (d) A picture card 3. A wheel offortune has numbers 1 to 40 painted on it, each number being at equal distance from the other so that when the wheel is rotated, there is the same chance that the pointer will point at any of these numbers. Tickets have been issued to contestants numbering 1 to 40. The number at which the wheel stops after being rotated would be the winning number. What is the probability that, (a) Ticketnumber29wins. (b) One person who bought 5 tickets numbered 18 to 22 (inclusive), wins the pnze. 4. The Dean of the School of Business has two secretaries, Mary and Jane. The probability that Mary will be absent on any given day is 0.08. The probability that Jane will be absent on any given day is 0.06. The probability that both the secretaries will be absent on any given day is 0.02. Find the probability that either one ofthem will be absent on any given day. 5. Two fair dice are rolled. What is the probability ofgetting: (a) A sum of10 or more (b) A pair ofwhich atleast one number is 3 (c) Asumof8,9,or10 (d) One number less than 4 6. An urn contains 12 white balls and 8 red balls. Two balls are to be selected in succession, at random and without replacement. What is the probability that (a) Both balls are white. (b) The ftrst ball is white and the second ball is red. (c) One white ball and one red ball are selected. Would the probabilities change ifthe ftrst ball after being identified is put back in the urn before the second ball is selected? 7. 200 students from the college were surveyed to ftnd out ifthey were taking any ofthe Management, Marketing or Finance courses. It was found that 80 ofthem were taking Management courses, 70 ofthem were taking Marketing courses and 50 ofthem were taking Finance courses. It was also found that 30 ofthem were taking Management and Marketing courses, 30 of them were taking Management and Finance courses and 25 of them were taking Marketing and

Business Statistics-H Finance courses. It was further determined that 20 ofthese students were taking courses in all the three areas. What is the probability that a particular student is NOTES not taking any course in any ofthese areas? 152 Self-InstructionalMJterial 8. A family plans to have three children. List all the possible combinations and find the probability that all the three children will be boys. 9. A movie house is filled with 700 people and 60% ofthese people are females. 70% ofthese people are seated in the no smoking area including 300 females. What is the probability that a person picked up at random in the movie house is: (a) Amale. (b) A female smoker. (c) A male or a non-smoker. (d) A smoker ifwe knew that the person is a male. (e) Are the events sex and smoking statistically independent? 10. A fair dice is rolled once. What is the probability ofgetting: (a) Anoddnumber (b) A number greater than 3 11. In a computer course, the probability that a student will get an A is 0.09. The probability that he will get a B grade is 0.15 and the probability that he will get a C grade is 0.45. What is the probability that the student will get either aD or an F grade? 12. In a statistics class, the probability that a student picked up at random comes from a two parent family is 0.65, and the probability that he will fail the exam is 0.20. What is the probabilitythat such a randomly selected student will be a low achiever given that he comes from a two parent family? 13. The following is a breakdown offaculty members in various ranks at the college. Rank NJmber ofJI4Jles NJmber ofFemales Professor 20 12 Assoc. Professor 18 20 Asst. Professor 25 30 What is the probability that a faculty member selected at random is: (a) A female. (b) A female professor. (c) A female given that the person is a professor. (d) A female or a professor. (e) A professor or an assistant professor. (f) Are the events ofbeing a male and being an associate professor statistically independent events? 14. A car dealer in a suburban community is interested to make a survey of the number ofcars the families in the community owned. He selected 333 families and recorded the following results. Number ofCars Number ofFamilies 0 20 1 44

2 170 Probability Theory, 3 63 Permutations and Combinations 4 36 NOTES To promote his dealership, the dealer selects a family at random by the lottery Self-Instructional Mlterial 153 method to award two tickets to Puerto Rico. What is the probability that the family selected: · (a) Owns no car. (b) Owns 2 cars. (c) Owns 2 cars or more. (d) Owns only one car given that the families with no cars are excluded from the process. 15. A part-time student is taking two courses, Statistics and Finance. The probability that the student will pass the Statistics course is 0.60 and the probability ofpassing the Finance course is 0.70. The probability that the student will pass both courses is 0.50. Find the probability that the student: (a) Will pass at least one course (b) Will pass either or both courses (c) Will fail both courses 16. In how many ways can a person choose 4 books from a list of8 best-sellers? 17. There are five finalists ina beauty contest. Three ofthese finalists are to be selected as winners including the winner of the contest as well as a first runner-up and a second runner-up. In how many different ways can such a combination be obtained? 18. Out of20 students in a Statistics class, 3 students fail in the course. If4 students from the class are picked up at random, what is the probability that one of the failing students will be among them. 19. The Psychology class has decided to organize a Christmas party. The class has only 18 students including 12 women. The professor has decided to pick a group of4 students at random and assign this group the responsibility ofmaking all the arrangements. What is the probability that this group consists of, (a) All women (b) 2 women and 2 men 20. The New York Pick Five lottery drawing draws five numbers at random out of39 numbers labelled 1 to 39. How many different outcomes are possible? 21. A company has 18 senior executives. Six ofthese executives are women including four blacks and two Indians. Six ofthese executives are to be selected at random for a Christmas cruise. What is the probability that the selection will include: (a) All the black and Indian women (b) At least one Indian woman (c) Not more than two women (d) Halfmen and halfwomen 22. An independent insurance company has 27 employees. 15 of them sell life insurance, 7 ofthem sell automobile insurance and 4 ofthem sell both life and auto insurance. The others do the administrative work. If one of the employees is selected at random, what is the probability that such a person does administrative work? 23. The job placement office at City University keeps a record of the graduating students who apply for jobs through this office. The record shows that 70 per cent

Business Statistics-I/ ofcandidates are graduates and 30 per cent are undergraduates. The record also indicates that a graduate applicant has a 65 per cent chance ofgetting a job while NOTES the chance ofan undergraduate being placed on a job is 35 per cent. (a) What is the probability that a student randomly coming to the office to apply for a job will get the job? (b) A student comes to the office to happily announce that she got the job. What is the probability that she is a graduate student? 24. The Department ofTransportation in the city was asked to study the records ofall employees who received their training in the city technical institute. It was found that 20 per cent of all such graduates were women and 15 per cent belong to minority groups. Only 10 per cent ofthe minority graduates were women. Find the probability that a technically trained person, selected at random is: (a) A member ofthe minority group (b) A female member ofa non-minority group (c) Amale, given that the member belongs to a minority group (d) A female or a member ofthe non-minority group 25. The probability that a management trainee will remain with the company after the training programme is completed is 0.70. The records indicate that 60 per cent of all managers earn over $60,000 per year. The probability that an employee is a management trainee or who earns more than $60,000 per year is 0.80. What is the probability that an employee earns more than $60,000 per year, given that he is a management trainee who stayed with the company after completing the training programme? 26. A meterologist has forecast the probability of rain on Monday, Tuesday and Wednesday as follows: Day ProbabilityofRain 0.60 Monday 0.50 Tuesday 0.30 Wednesday Assuming that the weather from day to day is independent, what is the probability that it will rain at least once in these three days? 27. An investor buys 100 shares each ofthe three stocks A, B and C. Based on past statistical analys~, the investor has calculated probabilities ofthe values ofthese stocks to increase in one week time period as 0.80, 0.70 and 0.60, respectively. Assuming that the movements ofthese stocks are independent events, what is the probability that: (a) Exactly two ofthe three stocks will increase in value in the given week. (b) At least two stocks will increase in value. (c) All three stocks will increase in value. (d) No more than one stock will increase in value. 154 Self-InstructionalMlterial

6.10 FURTHER READING Probability Theory, Permutations and Combinations Chandan, J. S. 1998. Statistics for Business andEconomics. New Delhi: Vikas Publishing NOTES House Pvt. Ltd. Monga, G. S. 2000. .Mathematics and Statistics for Economics. New Delhi: Vikas Publishing House Pvt. Ltd. Kothari, C. R. 1984. Quantitative Technique. New Delhi: Vikas Publishing House Pvt. Ltd. Hooda, R. P. 2002. Statistics for Business and Economics. New Delhi: Macmillan India Ltd. Gupta, S.C. 2006. FWJ~ntals ofStatistics. New Delhi: Himalaya Publishing House. Gupta, S. P. 2005. Statistical A.fethods. New Delhi: S. Chand and Sons. .• Self-Instructional !vfaterial 155



UNIT 7 PROBABILITY DISTRIBUTIONS Probability Distributions Structure NOTES 7.0 Introduction 7.1 Unit Objectives 7.2 Probability Distribution Functions 7.3 Binomial Distribution 7.4 Poisson Distribution 7.5 Norrnal Distribution 7.6 Summary 7.7 Answers to 'Check Your Progress' 7.8 Questions and Exercises 7.9 Further Reading 7.0 INTRODUCTION In this unit, you will study the concept ofprobabilitydistribution with the help ofillustrations and examples. Also, the various types ofdistribution such as binomial distribution, Poisson distribution and normal distribution, and their features, will be described in detail. Binomial distribution is one ofthe simplest and most frequently used discrete probability distributions. It is useful in many practical situations involving either/or types of events. Poisson distribution, another theoretical discrete distribution, is useful for modelling certain real situations.lt is particularly useful in waiting line or queuing problems. Normal distribution is a continuous distribution and plays a pivotal role in statistical theory and practice, particularly in the area ofstatistical inference and statistical quality control. Its importance is also due to the fact that, in practice, experimental results very often seem to follow the normal distribution or bell-shapedcurve. A normal curve is symmetrical and defined by its mean (J.l) and standard deviation (cr). The various techniques involved in probability distributions are necessarily mathematical in nature. However, they give a concrete aspect to the abstract concept ofuncertainty by trying to measure through probability variables and deviations. This makes the job ofmanagement much easier while considering the risk factors involved in the operation ofa business enterprise. 7.1 UNIT OBJECTIVES After going through this unit, you will be able to: • Describe probability distribution functions • Analyse the distinct properties ofbinomial distribution • Apply Poisson distribution for modelling real situations • Defme the characteristics ofnormal distribution Self..1nstructional Jlditerial 157

Business Statistics-II 7.2 PROBABILITY DISTRIBUTION FUNCTIONS NOTES Probability distribution, as the name suggests, is the listing ofall possible outcomes ofan experiment together with their probabilities. This concept can be illustrated with the help 158 Self-InstructionalMJterial ofthe following example: Example 7.1: Let us say that we toss a fair coin two times. The following is the list of all possible outcomes ofthis experiment and their respective probabilities. Outconr Probability 1T 114 1H 114 HT 114 HI 114 Then the probability distribution ofthe number ofheads obtained in these two tosses ofthe coin is given as follows: Number ofheads (X) Probability - p(X) 0 114 1 2 1/2 1/4 1.0 All probabilities must add up to 1. A discrete random variable takes on discrete values that can be counted and it can assume values only from a distinct predetermined set. For example, if a quality control inspector examines four radios taken randomly from a production lot and the number ofdefective radios from this sample can be represented by the variable X then XIS a random variable with the following possible discrete values: 0, 1,2,3,4 The probabilitydistributions are classi:fie<.! either as discrete or continuous depending upon the nature ofthe variable being considered. A frequently used discrete probability distribution is the binomialdistribution. On the other hand, the term continuous distribution implies that it can only be measured (as against counted) to sor.1e predetermined degree ofaccuracy. Time, weight or distances are all measured on a continuous scale. The value ofa continuous variable cannot be precise at any one point and hence, it takes any one value in an interval. For example, a person cannot weigh exactly 150 pounds. His weight can have any value between 149 and 151 pounds or even between 149.9 and 150.1 pounds. The most frequently used continuous probability distribution is normal distribution. 7.3 BINOMIAL DISTRIBUTION Binomial distribution is one ofthe simplest and most frequently used discrete probability distributions and is very useful in many practical situations involving either/or types of events. It has certain distinct properties which are enumerated as follows: 1. It describes the distribution ofprobabilities where there are only two mutually exclusive outcomes for each trial ofan experiment. For example, when tossing a

coin, there are only two possible outcomes, namely a head and a tail, which are Probability Distributions both mutually exclusive events. Similarly, when checking the quality ofa product, we can see that either the item is good or it is defective. These two possible NOTES outcomes are denoted as success or failure. Success is simply the outcome in which we are interested. Similarly, ifwe are interested in the probability ofa head in the toss of a coin, then the outcome tail will be considered a failure. The probability ofsuccess is symbolized bypand the probability offailure is symbolized by q, which is also (1 - p). 2. Each trial is independent ofother trials. This means that the outcome ofany trial is independent ofthe outcome ofany other trial. 3. The probability of success p remains constant from trial to trial. Similarly, the probability offailure qor (1 - p) remains constant over all observations. 4. The process is performed under the same conditions for a fixed and finite number oftrials, say (n). The concept ofbinomial distribution may be clear with the following case: Suppose we toss a coin five times. The outcome head is designated as success and outcome tail as failure, with probability ofsuccess and failure being respectively {p) and (q). Suppose that the following was the sequence of outcomes ofthese tosses. H,T,H,T,T This means that there are two successes and three failures in the given order as above. The probability ofthis sequence ofoutcomes can be found by multiplication rule ofjoint probability ofmutually exclusive events and is given as: px qx px qx q= JicJ However, ifwe are not concerned with any particular sequence ofthe outcome, but only in the outcome oftwo successes in any order out ofthe five tosses, then it can be shown that there are ten different ways ofobtaining two heads out offive tosses. By applying the addition rule ofprobability, we can see that the probability ofgetting any sequence with two heads and three tails would be ten times the probability ofthe single sequence obtained above. Accordingly, the probability ofany two heads in five tosses would be, 10 X pcj In our case, we have, p=0.5 q= (1 - p) = 0.5 x= number ofsuccesses desired = 2 n= number oftrials undertaken = 5 (n- x) =number offailures Hence, the probability oftwo heads out of five tosses is, 10(0.5)2 (0.5)3 =0.3125 Self-Instructional Material 159

Business Statistics-I/ In our case, we have the number of successes (x) = 2 and the number oftrials NOTES tosses (n) = 5 and there are only ten possible combinations in which two heads can occur. However, as the number oftrials increases, it becomes more and more difficult to list all the possible sequences. In such cases, the counting method, is required to be used. The number of sequences for any number of tosses or trials (n) and any number of successes (x) is given by the formula, nc = n! x x!(n- x)! The symbol n ex is also simply written as (:) . Hence, the general formula ofcalculating the probability ofxsuccesses out ofn trials is given by, This expression is known as the binomial formula. Based upon this formula, the probability in our example of two heads in five tosst7s can be calculated as follows, =(1{2) ~)c.5)2(.5)3 = 5! ( 5)2( 5)3 2!(5- 2)! . . = 5x4x3x2x1(.5)2(.5)3 2xlx3x2xl = 10(.5)2 (.5)3 =0.3125 Example 7.2: Ifa new drug is found to be effective 40% ofthe time, then what is the probability that in a random sample of4 patients, it will be effective on 2 ofthem? Solution: Let us defme effective as success and non-effective as failure. Then, p= 0.4 (since the drug is effective 40% ofthe time) q= (1- p) = (1- 0.4) = 0.6 x=2 n=4 Now, 160 Self-Instructional Material

Probability Distributions = 4! ( 4)2( 6)2 NOTES 2!(4-2)! . . =6 X .16 X .36 =0.3456 Mean and Standard Deviation of Binomial Distribution The binomial distribution has an expected value or mean (Jl) and a standard deviation ( cr ), and both these statistical measures can be computed. The mean can be calculated intuitively by the following reasoning. Suppose a fair coin is tossed 400 times. Then the number of heads (x) has the binomial distribution with p= 0.5 and n= 400. Ifwe ask the question, 'How many heads can we expect in 400 tosses?' Intuitively, our answer is 200 because we expect the head to occur 50% of the time, since the probability of a head occurring in any one toss is (1/2). This logic is clearly the same as the mathematical computation (np) since n= 400 and p= 112, hence np= 400 (1/2) = 200, which is the mean. Symbolically, we can represent the mean ofa binomial distribution as: Jl=np The variance ofthe binomial distribution is given as: cr2 = npq Hence, the standard deviation cr = Jiipq Example 7.3: In a manufacturing process, a packaging machine produces 5% defective packages. Find the mean and the standard deviation ofthe number ofdefective packages in a random sample of60 packages. Solution: As explained earlier, Jl=np and In our case, p=0.05 n=60 q= (1- p) = 0.95 Hence, J.1 =np= 60(0.05) =3 JiiiXJ0\" = =../60(0.05)(0.95) =../2.85 =1.69 Self-Instructional Mlterial 161

Business Statistics-If 7.4 POISSON DISTRIBUTION NOTES Poisson distribution is another theoretical discrete distribution, which is useful for modelling certain real situations. It differs from the binomial distribution in the sense that in the 162 Self-Instructionall.'.fJterial binomial distribution we must be able to count the number ofsuccesses and the number offailures, while in Poisson distribution, all we want to know is the average number of successes in a given unit oftime or space. In many situations, it is not possible to count the number offailures even though we can know the number ofsuccesses. For example, in the case ofpatients coming to the hospital for emergency treatment, we can always count the number ofpatients arriving in any given hour. Ifthe number ofpatients arriving is considered as the number ofsuccesses, then we cannot know the number offailures because it is not possible to count the number of patients not coming for emergency treatment in that hour. Accordingly, it is not possible to determine the total number of possible outcomes (successes and failures), and hence, binomial distribution cannot be applied as a decision-making tool. In such a situation, we can use Poisson distribution, if we know the average number ofpatients arriving for emergency treatment per hour. It is assumed that such arrival ofpatients is a random phenomenon and hence, the exact number ofpatients arriving in any hour is not predictable. Other examples ofpoisson distribution are telephone calls going through a switchboard system, the number ofcars passing through India Gate, the number ofcustomers coming to a bank for service and so on. All these arrivals can be described by a discrete random variable that takes on integer values (0, 1, 2, 3, ...). Characteristics of the Poisson Distribution A physical situation must possess certain characteristics before it can be characterized by poisson distribution. Some ofthese characteristics are: (a) In a very small time interval between tand (t + ~), (where ~ is infinitesimally small), the probability that exactly one event will occur is a very small number, (the event is a rare event) and is constant for every such small time period. (b) The probability that two or more events will occur in this small time period (tto t+ ~) is so small that it can be assigned a value ofzero. (c) The events must be random and independent of each other. The occurrence of one event cannot influence the chances ofanother event occurring, nor can the occurrence of any one event be predicted in advance. With the characteristics as described here, we need to know the average of events per unit oftime. The symbol for this average is A (lambda) and it could be the average number of cars passing under a bridge in any given hour or it could be the average number ofmachine breakdowns per month, or it could be the average number ofcustomers arriving at a bank per day, and so on. The probability that exactly (x) events will occur in a given time is given as follows: Axe-f.. P(x)=-- x! where A is the average number ofoccurrences per unit oftime and e is the base ofthe natural logarithms and is equal to 2.71828...... .

!.:xampie 7.4: Assume that on an average 3 persons enter a bank fer service every 10 Probability Distributions ninutes. What is the probability that exactiy 5 customers wili enter the i:Jank in a given t0-minute period, assuming that the process can be described by Poisson distribution. NOTES ~olution: (3)5 (2.71828)-3 Self-Instructional Material 163 5! (243)(.0498) = 120 = 0.1008 Thus, the probability of5 customers arriving in a bank in any given 1_0-minute period is 0.1008. Other computations can be made the same way to show the probability ofarrivalofO, 1, 2, 3, 4 ... customers. Poisson distribution is found to be particulariy usefui in waitmg 1ine or queuing type ofproblems where, by knowing the rate ofarrival of:.u\"ds at a s~rvic~ s~aticn and the rate at which these units are served, the number of service s~atic11s as well as the average waiting time for service for each arriving unit can be reasonably determined. Example 7.5: Customers arrive at a photocopying machine at an average rate of two every 10 minutes. The number ofarrivals is distributed according to ?c;s~')r_ distribution. What is the probability that: (a) There will be no arrivals during any period often minutes. (b) There will be exactly one arrival during this time period. (c) There will be more than hvo arrivals during this time pedod. _ ;.._xe··\"- Solution: We know that wtth 'A= 2, P(x) =----for x= 0,1,2 .... x! ( ·~~(a) When x= 0, P(O) = (2)0 2 828)-2 = 0.1353 (b) When x=1, P(1)= (2)1 (2 ·~~ 828)-2 =0.210'1 (c) When x= 2, P(2) = (2)2 (2 -~~ 828)-2 = 0.270'/ Then, the probability ofmore than tv;o arrivais in a 10 -minute period is: ~x> 2) = 1 - [itO)+ .r:rt) t- ~2~] = 1- [0.1353 + 0.2707 + C.27'J7] ___________________= 1-0.6767 = 0.3233 _______\"' .... --· -··---'''\"\"'- -·····-·- .•. --·- --· -· ------ 7.5 NORMAL DISTRIBUTION Among all the probability distributions, the normal probability distribution is by far the most important and frequently used contiP..ucus prcbzl:J'lity cis'ril::u~ic:1. Theis is sc br:cause

Business Statistics-H this distribution fits well in many types of problems. This distribution is of special significance in inferential statistics since it describes probabilisticallythe linkbetween a NOTES statistic and a parameter (i.e., between the sample results and the population from which the sample is drawn). The name Karl Gauss, 8th century mathematician-astronomer, is associated with this distribution and in honour ofhis contribution, this distribution is often known as the Gaussian distribution. The normal distribution can be theoretically derived as the limiting form ofmany discrete distributions. For instance, ifin the binomial expansion of{p +q)n, the value of ±,'n'is infinity and p = q = then a perfectly smooth symmetrical curve would be obtained. Even ifthe values ofpand qare not equal but ifthe value ofthe exponent'ri happens to be veryvery large, we get a curve normal probabilitysmooth and symmetrical Such curves are called normal probability curves (or at times known as normal curves of error) and such curves represent the normal distributions.' The probability function in case ofnormal probability distribution2 is given as: l(x-p)' f(X) = 1 e2 --;;- u.& Where, J..l. =Mean ofthe distribution cil= Variance ofthe distribution The normal distribution is thus defined by two parameters viz., J..l. and ar This distribution can be represented graphically as follows: Fig. 7.1 Curve Representing NJrmal [)jstribution l. Quite often mathematicians use the normal approximation of the binomial distnbution whenever 'n' is equal to or greater than 30 and np and nq each are greater than 5. 2. Equation of the normal curve in its simplest form is y= Yo.i~~o>) where y= The computed height of an ordinate at a distance of X from the mean. Yo =The height of the maximum ordinate at the mean. It is a constant in the equation and is worked out as follows: N; .ro= ~ where N= total number of items in the sample and i = class interval 7t=3.1416 = .J2i =.J6.2832 =2.5066 e = 2.71828 base of natural logarithms u = Standard deviation X= Any given value of the dependent variable expressed as a deviation from the mean. 164 Self-InstructionalMaterial

Characteristics of Normal Distribution Probability Distributions The characteristics ofthe normal distribution or that ofnormal curve are given as follows: NOTES 1. It is symmetric distribution.3 2. The mean f.l defmes where the peak of the curve occurs. In other words, the ordinate at the mean is the highest ordinate. The height ofthe ordinate at a distance ofone standard deviation from mean is 60.653 per cent ofthe height ofthe mean ordinate and similarly, the height ofother ordinates at various standard deviations (o) from mean happens to be a fixed relationship with the height ofthe mean ordinate. 3. The curve is asymptotic to the base line which means that it continues to approach but never touches the horizontal axis. 4. The variance (cr2) defmes the spread ofthe curve. 5. Area enclosed between mean ordinate and an ordinate at a distance ofone standard deviation from the mean is always 34.134 per cent ofthe total area ofthe curve. It means that the area enclosed between two ordinates at one sigma (S.D.) distance from the mean on either side would always be 68.268 per cent ofthe total area. This can be shown as follows: §.§3::~~-(34.134'Yo + 34.134%) = 68.268'Yt, Area of the total ~~~~§\\. curve between p ± i (cr) Similarly, the other area relationships are as follows: Berneen Area covered to total area ofthe normal curvtf ~-t±l S.D. 68.27% ~-t±2 S.D. 95.45% ~-t±3 S.D. 99.73% S.D. 95% ~-t± 1.96 S.D. 99% S.D. 50% ~-t±2.578 ~-t±0.6745 3. A symmetric distribution is one which has no skewness. As such it has the following statistical properties: (a) Mean = Mode= Median (i.e., X= Z= 111) (b) (Upper Quantile - Median) = (Median - Lower Quartile) (c) Mean Deviation = 0.7979(Standard Deviation) (d) Q- Q = 0.6745 (Standard Deviation) 2 4. This also means that in a normal distribution the probability of area lying between various limits are as follows: Limits Probability of area lying within the stated limits 1.1 ± I S.D. 0.6827 1.1 ± 2 S.D. 0.9545 1.1 ± 3 S.D. 0.9973 (This means that almost all cases lie within 1.1 ± 3 S.D. limits) Self-Instructional !t4aterial 165

Business Statistics-// 6. The normal distribution has only one mode since the curve has a single peak. In other words, it is always a unimodal distribution. NOTES 7. The maximum ordinate divides the graph of the normal curve into two equal parts. 8. In addition to all these stated characteristics, the curve has the following properties: (1) Jl =x (i1) Jl2= cr2 =variance (iii) Jl4=3o4 (it-? Moment Coefficient ofKurtosis = 3 Family of Normal Distributions We canhave several normal probabilitydistnbutions but each particular normal distnbution is being defined by its two parameters viz., the mean (Jl) and the standard deviation (cr). There is, thus, not a single normal curve but rather a family ofnormal curves. We can exhibit some ofthese as under: NJrmal curves with identical means but different standard deviations: Curve having small standard deviation say (cr = I) Curve having large standard deviation say (cr = 5) -- Curve having very large standard ------\"--~~--'\"-~~-deviation say (cr= 10) ll in a normal distribution NJrmal curves with identicalstandarddeviation buteach with differentmeans: D && ll -· 15 ll ~· 30 ll = 50 Curve A with Curve B with mean Curve C with the smalle\"lt mean between means of largest mean Notes: curve A and curve C Normal curves each with different standard deviations and different means: ~I ll = 15 ~ Curve with larger ll = 30 ~l ~ 5 mean and larger Curve with very Curve with smaller standard deviation large mean mean and smaller and very large standard deviation standard deviation 166 Self-InstructionalMaterial How to measure the area under the Normal Curve We have earlier stated some ofthe area relationships involving certain intervals ofstandard deviations (plus and minus) from the means that are true in case ofa normal curve. But what should be done in all other cases? We can make use of the statistical tables

constructed by mathematicians for the pmpose. Using these tables, we can fmd the area Probability !Astributions (or probability, taking the entire area ofthe curve as equal to 1) that the normally distributed random variable will lie within certain distances from the mean. These distances are NOTES defmed in terms ofstandard deviations. While using the tables showing the area under the normal curve we talk in terms of standard variate (symbolically Z) which really means standard deviations without units ofmeasurement and this 'Z is worked out as follows: Z =X-- J-l a Where, Z= The standard variate (or number ofstandard deviations from xto the mean ofthe distribution) x= Value ofthe random variable under consideration J.l =Mean ofthe distribution ofthe random variable cr = Standard deviation ofthe distribution The table showing the area under the normal curve (often termed as the standard normal probability distribution table) is organized in terms ofstandard variate (or Z) values. It gives the values for only halfthe area under the normal curve, beginning with Z= 0 at the mean. Since the normal distribution is perfectly symmetrical the values true for one halfofthe curve are also true for the other half. We now illustrate the use ofsuch a table for working out certain problems. Example 7.6: A banker claims that the life ofa regular saving account opened with his bank averages 18 months with a standard deviation of6.45 months. Answer the following: (a) What is the probability that there will still be money in 22 months in a savings account opened with the said bank by a depositor? (b) What is the probability that the account will have been closed before two years? Solution: (a) For finding the required probability we are interested in the shaded area ofthe normal curve shown as follows: Let us calculate Z. Z= X-JL = 22-18 =0.62 (J 6.45 The value from the table showing the area under the normal curve for Z= 0.62 is 0.2324. This means that the area ofthe curve between J.l = 18 and x= 22 is 0.2324. Hence, the area ofthe shaded portion ofthe curve is (0.5)- (0.2324) = 0.2676, since the area ofthe entire right hand portion of the curve always happens to be 0.5. Thus, the probability thatthere will still be money in 22 months in a savings account is 0.26/6. Self.I11Structional .Mlterial 167

Business Stlltistics-II (b) For finding the required probability we are interested in the area ofthe shaded portion ofthe normal curve as shown in the following figure: NOTES For this purpose we calculate, Z= 24-18 =0.93 6.45 The value from the concerning table, when Z= 0.93, is 0.3238 which refers to the area ofthe curve between Jl = 18 and x= 24. The area ofthe entire left hand portion of the curve is 0.5 as usual. Hence, the area ofthe shaded portion is (0.5) + (0.3238) = 0.8238 which is the required probability that the account will have been closed before two years, i.e., before 24months. Example 7.7: Regarding a certain normal distribution concerning the income ofthe individuals, we are given that mean=500 rupees and standard deviation=100 rupees. Find the probability that an individual selected at random will belong to income group, (a)~ 550 to~ 650; (b)~ 420 to 570. Solution: (a) For finding the required probability we are interested in the area ofthe shaded portion ofthe normal curve as shown in the following figure: 2= 0 ~= 550 For finding the area ofthe curve between x= 550 to 650, let us do the following calculations: z = 550- 500 = 50 = 0.50 100 100 168 Self-Instructional Mlterial

Corresponding to which the area between fl = 500 and x= 550 in the curve as per Probability IAstributions table is equal to 0.1915 and NOTES z = 650- 500 = 150 =1.5 100 100 Corresponding to which the area between fl = 500 and x= 650 in the curve as per table is equal to 0.4332. Hence, the area of the curve that lies between x= 550 and x= 650 is, (0.4332)- (0.1915) = 0.2417. This is the required probability that an individual selected at random will belong to the income group of~ 550 to~ 650. (b) For fmding the required probability we are interested in the area ofthe shaded portion ofthe normal curve as shown in the following diagram: To fmd the area ofthe shaded portion we make the following calculations: Z= 570-500 = 0. 70 100 Corresponding to which the area between fl = 500 and x= 570 in the curve as per table is equal to 0.2580. and Z = 420-500 =-0.80 100 Corresponding to which the area between fl = 500 and x= 420 in the curve as per table is equal to 0.2881. Hence, the required area in the curve between x= 420 and x= 570 is, (0.2580) + (0.2881) =0.5461 This is the required probability that an individual selected at random will belong to the income group of~ 420 to~ 570. Example 7.8: A certain company manufactures 11-\" all-purpose rope made from 2 imported hemp. The manager of the company knows that the average load-bearing capacity ofthe rope is 200 lbs. Assuming that normal distribution applies, find the standard deviation of load-bearing capacity for the 1~ rope if it is given that the rope has a 2 0.1210 probability ofbreaking with 68 lbs or less pull. Self-Instructional Material 169

Business SfJltistics-D Solution: The given information can be depicted in a normal curve as follows: NOTES Probability ofthis area (0.5) - (0.121 0) 0.3790 cr =?(to be found out) Probability ofthis area (68 lbs. or less) as given is 0.1210 x=68 1.1=200 z=O Ifthe probability ofthe area falling within 1-1 = 200 and x= 68 is 0.3790 as stated earlier, the corresponding value ofZas per the table5 showing the area ofthe normal curve is - 1.17 (minus sign indicates that we are in the left portion ofthe curve) Now to find cr we can write, Z =x --J-.t u or -1.17 = 68- 200 (J\" or -1.17u =-132 or cr= 112.8lbs. approx. Thus, the required standard deviation is 112.8lbs approximately. Example 7.9: In a normal distribution, 31% items are below 45 and 8% are above 64. Find the X and cr ofthis distribution. Solution: We can depict the given information in the following normal curve: Probability ofthe area Probability of the area between J.l and x -- 45 between J.l and x \"' 64 is (0.5)- (0.31) ~ 0. 9 is (0.5)- (0.08)\"\" 0.42 Check Your Progress Probability of the Probability of the shaded area as shaded area as l. What is binomial distribution? given 0.31 given 0.08 2. What do you understand by J.t=? x=64 the term 'continuous x(to be bound out) probability distribution'? 3. What is Poisson distribution Ifthe probability ofthe area falling within 1-1 and x= 45 is 0.19 as stated here, the used for? corresponding value ofZfrom the table showing the area ofthe normal curve is- 0.50. 4. In which field is normal Since, we are in the left portion ofthe curve so we can express this as, distribution applicable? 5. Descnbe normal distribution -0.50= 45-j.l (1) and bell-shaped curves. (J 170 Self-InstructionalMaterial 5. Refer the z table in the Appendix given at the end of this book.

Similarly, ifthe probability of the area falling within ll and x= 64 is 0.42, the Probability Distributions corresponding value ofZfrom the area table is + 1.41. Since, we are in the right portion NOTES of the curve so we can express this as, Self-Instructional Mlterial 171 1.41 = 64- ,u (2) a (3) (4) If we solve equations ( 1) and (2) to obtain the value of ll or X, we have, -0.5cr=45-,u 1.41cr =64-,u By subtracting the equation (4) from (3) we have, -1.91 cr =-19 cr=lO Putting cr = 10 in equation (3) we have, -5 =45 -ll ~.t=50 Hence, x (or ll)=50 and cr =10 for the concerning normal distribution. 7.6SUMMARY In this unit, you have learned about theoretical distribution. The theoretical distribution theory is used for approximation ofdistributions ofa large number ofempirical variables. You have learned that probability distribution analysis can be done with the help of distribution theories. Theoretical distribution is based on mathematical formulae and logic. Now, you know the formulae for calculating the mean and standard deviation ofa discrete random variable. The mean ofthe probability distribution is also known as the expected value. Binomial distribution is useful in many practical situations involving either! ortypes ofevents. In it, every trial is independent ofother trials. Poisson distribution is useful for modelling certain real situations. Ifwe know the average number ofsuccesses in a given unit oftime and space, we can use Poisson distribution. 7.7 ANSWERS TO 'CHECK YOUR PROGRESS' 1. Binomial distribution is one of the simplest and most frequently used discrete probability distributions and is very useful in many practical situations involving either/or types ofevents. 2. The term continuous probabilitydistribution in1plies that it can only be measured (as against counted) to some predetermined degree ofaccuracy. Time, weight or distances are all measured on a continuous scale. The most frequently used continuous probability distribution is termed as normal distribution. 3. Poisson distribution is applied in waiting line or queuing type ofproblems. In such situations, by knowing the rate ofarrival ofunits at a service station, for example, and by knowing the rate at which these units are served, the number ofservice stations as well as the average waiting time for service for each arriving unit can be reasonably deteflllined.

Business Statistics-ll 4. Normal distribution is most commonly applied in statistical quality control and is useful in many sociological studies. NOTES 5. The normal distribution is a continuous distribution and plays an important and 172 Self-InstructionalMaterial pivotal role in statistical theory and practice, particularly in the area ofstatistical inference and statistical quality control. The results ofthe experiment are depicted as normal distribution or the bell-shapedcurve. The normal curve is symmetrical and is defmed by its mean (Jl) and its standard deviation (cr). 7.8 QUESTIONS AND EXERCISES Short-Answer Questions 1. Briefly explain probabilitydistribution and its types. 2. Explainbinomial distribution. 3. How you will calculate the mean and standard deviation ofbinomial distribution? 4. Describe Poisson distribution and its characteristics. 5. Define normal distribution and its characteristics. 6. What are bell-shaped curves? 7. What is Z-score? 8. Differentiate between binomial distribution and Poisson distribution. 9. Differentiate between Poisson distribution and normal distribution. Long-Answer Questions 1. A fair coin is tossed 16 times. What is the probability ofgetting no more than 2 heads? 2. A manufacturer oflaptop computer monitors has determined that on an average, 3% ofscreens produced are defective. A sample ofone dozen monitors from a production lot was taken at random. What is the probability that in this sample fewer than 2 defectives will be found? 3. On any given day, the demand for refrigerators by an appliance store can be regarded as a function ofa random variable. The retailer has determined that on some days he makes no sale at all and the maximum number ofrefrigerators sold on any given day is 7. Considering x variable as the demand, the probability distribution ofdemand is provided in the following table. frmand (x) Probability 0 0.05 1 0.10 2 0.15 3 0.25 4 0.20 5 0.10 6 0.10 7 0.05 1.00 Calculate the expected demand, B...x), per day.

4. The premium for general health insurance plan for unlimited visits to the doctor's Probability Distributions office is $650 per year. Suppose each visit ofthe insured to the doctor costs $50, of which 20 per cent is paid by the insured and 80 per cent by the insurance NOTES company. Assume further that each policy contains a $100 deductible clause so that the insured has to pay for the frrst two visits to the doctor. The following table represents the probability distribution ofthe number ofvisits to the doctor by an insured individual during a period ofone year. Number of Wsits Probability 0 0.10 1 0.15 2 0.20 3 0.30 4 0.10 5 o.m 6 0.04 7 0.02 8 0.02 Calculate the expected income ofthe insurance company. 5. A fair coin is tossed 4 times. If xis the event ofheads occurring in these tosses, then record the probability ofgetting 0, 1, 2, 3 or 4 heads in these 4 tosses and find the expected value (mean) ofthis probability distribution. 6. A student is given 4 True or False questions. The student does not know the answer to any ofthe questions. He tosses a fair coin. Each time he gets a head, he selects True. What is the probability that he will get: (a) Only one correct answer (b) At most 2 correct answers (c) At least 3 correct answers (d) All correct answers 7. A newly married couple plans to have 5 children. An astrologist tells them that based on astrological reading, they have an 80% chance ofhaving a baby boy on any particular birth. The couple would like to have 3 boys and 2 girls. Find the probability ofthis event. 8. The number ofcars pulling into a petrol pump follows a Poisson distribution with a mean ofthree cars every ten minutes. What is the probability that exactly two cars will arrive in the next ten minutes? 9. An automatic machine makes paper clips from coils ofwire. On an average, one in 400 paper clips is defective. Ifthe paper clips are packed in small boxes ofl 00 clips each, what is the probability that any given box ofclips contains: (a) No defectives (b) One or more defectives (c) Less than two defectives (d) Two or less defectives Self-Instructional Mlterial 173

Business Statistics-ll 10. Because ofrecycling campaigns, a number ofempty glass soda bottles are being returned for refilling. It has been found that 10 per cent ofthe incoming bottles NOTES are chipped and hence, are discarded. In the next batch of20 bottles, what is the probability that : 174 Self-InstructionalMl terial (a) Nonewillbechipped (b) Two or fewer will be chipped (c) Three or more will be chipped (d) What is the expected number ofchipped bottles in a batch of20? 11. The customers arrive at a drive-in window ofApple Bank at an average rate of one customer per minute. (a) What is the probability that exactly two customers arrive in a given minute? (b) What is the probabilityofno arrivals in a particular minute? 12. The local telephone switchboard in a small village receives on an average five calls per hour for information. Assuming that these calls follow Poisson distribution, what is the probability that: (a) More than halfan hour elapses between two successive calls. (b) In a particular hour, five calls are received. 13. A certain hospital annually admits 50 patients per day. On an average, 3 per cent ofincoming patients require rooms provided with special facilities. On the morning ofa given day, it is found that three rooms with special facilities are available. Assuming that 50 patients will be admitted on that day, what is the probabilitythat more than three patients will require special rooms? 14. An automatic machine produces 100 spools ofbrass wire per hour. Studies have shown that on an average, three spools ofwire turn out to be defective among these 100 spools. Assuming a Poisson dis1nbution, find the probabilityofthe machine producing the following number ofdefective spools per hour. (a) Exactly three (b) Three or fewer (c) More than three (d) Less than two or more than three (e) What is the probabilitythat the machine will produce seven defective spools in a period oftwo hours? 15. The grade point average (GPA) ofBusiness students is 3 out ofa total of4 points, with a standard deviation of 0.5. If a policy has been adopted that all students with a GPA of 2 or less during any semester will be put on probation, what perc~ntage ofthe students are expected to be put on probation during one semester of2008. 16. The IQ scores ofstudents are normally distributed with a mean 1.1 and standard deviation cr. Find the areas under the curve over the following intervals from the table of Zscores. (a) Area to the right ofZ= 2.58. (b) Area to the left ofZ= - (2.33). (c) Area between Z= 1 and Z= 1.96.

(d) Area between Z= 0 and Z= 1.2. Probability Distributions (e) Area between Z= 1 and Z= 1.96. (f) Area between Z=- (1.28) and Z= 1.28. NOTES (g) Area between Z=- (1.96) and Z=- (1.28). SeJJ:Jnstructional Material 175 17. The price of one gallon bottle of milk is normally distributed in Nassau County with an average price of$2.00 and a standard deviation of20 cents. A family driving within the county stops at a booth to buy a gallon of milk. What is the probability that they will pay: (a) Morethan$2.10 (b) Less than $1.75 (c) Between $1.85 and $2.15 (d) Between$1.85 and $1.95 18. The heights ofsoldiers are normally distributed with a mean of68 inches and a variance of 9 square inches. What is the probability that a soldier picked up at random is: (a) Less than 5 feet 1 inch tall (b) Between 63 inches and 66 inches (c) Taller than 6 feet (d) What must be the height of a solider so that only 30% of the soldiers are taller than him? (e) What should be the height ofthe doorway so that 70% ofthe soldiers have to duck when entering the room? 19. The tensile strengths ofa large number ofsteel wires were determined and were found to follow a normal distribution with a mean of250 pounds and a standard deviation of 15 pounds. (a) What per cent ofthe wires tested were between 245 pounds and 260 pounds? (b) What per cent ofwires tested were over 280 pounds? (c) 80 per cent ofthe wires tested were over what amount? 20. The diameters ofball bearings are normally distributed with a mean of2.42 inches and a standard deviation ofO.01 inches. Determine the percentage ofball bearings with diameter: (a) Between 2.40 and 2.43 inches (b) Greaterthan2.43 inches (c) Less than2.39 inches 21. Steel rods are manufactured to be 3 inches in diameter, but they are acceptable if they are within the limits of2.99 inches and 3.01 inches. It is observed that 5 per cent ofthese rods are rejected because they are oversized and 5 per cent ofthem are rejected because they are undersized. Assuming that these diameters are normally distributed, determine the standard deviation ofthe distribution. Calculate the percentage of steel rods that would be rejected if the acceptable limits are changed to between2.985 inches and 3.015 inches. 22. The height ofsoldiers is normally distributed. If 12.1% ofthe soldiers are taller than 72 inches and 20.33% of the soldiers are shorter than 67 inches, find the average and the standard deviation ofheights ofsoldiers.

Business Statistics-// 23. A large department store has 4500 accounts receivables. The dollar amounts of these accounts are known to be normally distributed with a mean of$150 and a NOTES standard deviation of$25. (a) How many accounts are expected to be between $115 and $165? (b) How many accounts are either less than $100 or more than $200? (c) How many accounts are between $190 and $125? (d) How many accounts are between $190 and $200? (e) What is the dollar amount so that 10 per cent ofaccounts receivables exceed this amount? 24. Sarko Airlines flying between two cities books on an average 6,000 passengers per week with a standard deviation of 1,250. (a) During what percentage ofweeks does the Airline book more than 5,500 passengers? (b) During what percentage ofweeks does it book fewer than 7,000 passengers? (c) During what percentage ofweeks does it book between 5,000 and 6,000 passengers? (d) During what percentage ofweeks does it book between 7,250 and 8,000 passengers? (e) During what percentage ofweeks does it book more than 8,000 passengers or fewer than 4,850 persons? 25. The Small Business Center ofa city reports that 30 per cent ofall small businesses (20 or fewer employees) are owned by women. What is the probability that a random sample of50 small businesses will show: (a) Morethan42 owned hymen (b) Less than 30 owned hymen (c) Between 30 and 42 owned by men 26. Let Z be the set ofintegers. Prove that iff: z~ Zit satisfies l.l(n)) = IJ(n+ 2) + 2) = nfor all n, and 1(0) = 1, then l(n) = 1- n. [Hint: use induction method.] 27. Prove that the sequence a0 = 2, 3, 6, 14, 40, 152, 784, ... with general term, an= (n+ 4)an-1 - 4nan-2 + (4n- 8)an-3 is the sum oftwo known sequences. [Hint: Use subtracting of various simple sequences until the required result is seen. Then prove it by induction method] 28. An event is considered either as a hit or a miss. Assume that the first event is a hit, and the second is a miss. Therefore, the probability ofa hit equals the proportion ofhits in the previous sequences oftrials. Calculate the probabilityofexactly70 hits in the first 200 trials. 29. You are given a set ofn biased coins. The nth coin has the probability 112mt-1 of showing heads (m= 1, 2, ..., n) and the results for each coin are independent. Calculate the probability that if each coin is tossed once, you will get an odd number ofheads. 176 Self-InstructionalMaterial

7.9 FURTHER READING Probability Distributions Chandan, J. S. 1998. Statistics for Business andEconomics. New Delhi: Vikas Publishing NOTES House Pvt. Ltd. Monga, G. S. 2000. Mathematics and Statistics for Economics. New Delhi: Vikas Publishing House Pvt. Ltd. Kothari, C. R. 1984. Quantitative Technique. New Delhi: Vikas Publishing House Pvt. Ltd. Hooda, R. P. 2002. Statistics for Business and Economics. New Delhi: Macmillan India Ltd. Gupta, S.C. 2006. Fundanrntals ofSta.tistics. New Delhi: Himalaya Publishing House. Gupta, S. P. 2005. Statistical l'Jethods. New Delhi: S. Chand and Sons. Self-Instructional Material 177



UNIT 8 BUSINESS FORECASTING Business Forecasting Techniques: TECHNIQUES: CORRELATION Correia lion and Regression AND REGRESSION NOTES Structure Se/1:h1sUuctiona/114ateria/ 179 8.0 Introduction 8.2 Unit Objectives 8.3 Correlation 8.4 Types of Correlation 8.5 Methods of Studying Correlation 8.6 Regression Analysis 8.6.1 Two Regression Lines 8.6.2 Formulae used in Regression 8.7 Concept of Error 8.8 Coefficient of Determination 8.9 Applications of Correlation and Regression 8.10 Summary 8.11 Answers to 'Check Your Progress' 8.12 Questions and Exercises 8.13 Further Reading 8.0 INTRODUCTION In this unit you will learn about the correlation analysis techniques that analyse indirect relationships in survey data and establishes the variable which are most closely associated with a given action or mindset. Regression is a technique that is used to determine the statistical relationship between two (or more) variables and to make prediction ofone variable on the basis ofone or more other variables. You will also learn about the scatter diagram, least squares method and standard error ofestimate. You will be able to interpret r using the coefficient ofcorrelation. 8.2 UNIT OBJECTIVES After going through this unit, you will be able to: • Defme correlation analysis • Describe the different methods ofstudying correlation • Differentiate between regression analysis and correlation • Defme the concept of error • Define coefficient ofdetermination 8.3 CORRELATION Correlation analysis is the statistical tool generally used to describe the degree to which one variable is related to another. The relationship, if any, is usually assumed to be a linear one. This analysis is used frequently in conjunction with regression

Business StatistiCY-H analysis to measure how well the regression line explains the variations of the dependent variable. In fact, the word correlation refers to the relationship or NOTES interdependence between two variables. There are various phenomena which are related to each other. For instance, when demand of a certain commodity increases, 180 Self-InstructionalMlterial then its price goes up and when its demand decreases the price also comes down. Similarly, with increased money supply the general level of prices goes up. Such a type of relationship can as well be noticed for several other phenomena. The theory by means of which quantitative connections between two sets of phenomena are determined is called the 'Theory of Correlation'. On the basis of the theory of correlation, one can study the comparative changes occurring in two related phenomena and their cause-effect relation can be examined. It should, however, be borne in mind that relationship like 'black cat causes bad luck', 'filled up pitchers result in good fortune' and similar other beliefs cannot be explained by the theory of correlation, since they are all imaginary and are incapable of being justified mathematically. Thus, correlation is concerned with the relationship between two related and quantifiable variables. If two quantities vary in sympathy so that a movement (an increase or decrease) in the one tends to be accompanied by a movement in the same or opposite direction in the other and the greater the change in one, the greater is the change in the other, the quantities are said to be correlated This type of relationship is known as correlation or what is sometimes called, in statistics, as covariation. For correlation, it is essential that the two phenomena should have cause-effect relationship. Ifsuch relationship does not exist then one should not talk ofcorrelation. For example, ifthe height of the students as well as the height ofthe trees increases, then one should not call it a case ofcorrelation because the two phenomena, viz., the height ofstudents and the height oftrees are not causally related. But, the relationship between the price of a commodity and its demand, the price of a commodity and its supply, the rate ofinterest and savings, etc., are examples ofcorrelation since in all such cases the change in one phenomenon is explained by a change in the other phenomenon. It is appropriate here to mention that correlation in case of phenomena pertaining to natural sciences can be reduced to absolute mathematical terms, e.g., heat always increases with light. But, in phenomena pertaining to social sciences it is often difficult to establish any absolute relationship between two phenomena. Hence, in social sciences we must understand that correlation is established if in a large number ofcases, two variables always tend to move in the same direction or in the opposite direction. 8.4 TYPES OF CORRELATION Correlation can either bt: positiw: or negatiw:. Whether correlation is positive or negative would depend upon the direction in which the variables are moving. Ifboth variables are changing in the same direction, then correlation is said to be positive, but when the variations in the two variables take place in opposite direction, the correlation is termed as negative. This can be explained as follows: Changes in Independent Changes in Dependent Nature of \"WI.riable \"WI.riable Correlation Increase (+)t Increase (+)t Positive(+) Decrease (-).J, Decrease (-).J, Positive(+) Increase (+)t Decrease (-).J, Negative (-) Decrease (-).J... Increase (+)t Negative (-)

Correlation can beeitherlinearornon-linear. The non-linear correlation is also known Business Forecasting Techniques: Correlation and Regression as curvilinear correlation. The distinction is based upon the constancy ofthe ratio ofchange between the variables. When the amount ofchange in one variable tends to bear a constant NOTES ratio to the amount ofchange in the other variable then the correlation is said to be linear. In such a case, ifthe values ofthe variables are plotted on a graph, then a straight line is Self-Instructional Materia! 181 obtained. This is why the correlation is known as linear correlation. But, when the amount of change in one variable does not bear a constant ratio to the amount of change in the other variable, i.e., the ratio happens to be variable instead ofconstant, then the correlation is said to be non-linear or curvilinear. In such a situation, we shall obtain a curve if the values ofthe variables are plotted on a graph. Correlation can be either simple, JBifia1 correlation or multiple correlation. The study ofcorrelation for two variables (ofwhich one is independent and the other dependent) involves application ofsimple correlation. When more than two variables are involved in a study relating to correlation, then the study can either be ofmultiple correlation or ofpartial correlation. Multiple correlation studies the relationship between a dependent variable and two or more independent variables. In partial correlation, we measure the correlation between a dependent variable and one particular independent variable assuming that all other independent variables remain constant. Statisticians have developed two measures for describing the correlation between two variables, viz. the coefficient ofdetermination and the coefficient ofcorrelation. 8.5 METHODS OF STUDYING CORRELATION Scatter Diagram The scatter diagram is a graph ofobserved plotted points where each point represents the values ofXand Yas a coordinate. It portrays the relationship between these two variables graphically. By looking at the scatter of the various points on the chart, it is possible to determine the extent of association between these two variables. The wider the scatter on the chart, the less close is the relationship. On the other hand, the closer the points and the closer they come to falling on a line passing through them, the higher the degree of relationship. If all the points fall on a line, the relationship is perfect. Ifthis line goes up from the lower left hand comer to the upper right hand comer, i.e., ifthe slope ofthe line is positive, then the correlation between the two variables is considered to be perfect positive. Similarly, ifthis line starts at the upper left hand corner and comes down to the lower right hand comer ofthe diagram, i.e., ifthe slope is negative, and also all points fall on the line, then their correlation is said to be perfect negative. Example 8.1: The following data represents the money spent on advertising of a product and the respective profits realized from each advertising period for the given product. The amounts are in thousands of dollars. Assume profit to be a dependent variable and advertising as an independent variable. Advertising (X} Profit ( }j 5 8 6 7 7 9 8 10 9 13 10 12 II 13

Business Statistics-I/ Solution: We shall draw a scatter diagram for this data. •• • • • • • 5 6 7 8 9 10 11 (X) We can see that the trend in the relationship is increasing and even though this relationship is not perfect, i.e., all the points do not lie in a straight line, the profits in general do increase as the advertising budget increases. This gives us a reasonable visual idea about the relationship between Xand Y Linear Regression Equation The pattern ofthe scatter diagram indicates a linear relationship between X and Yand this relationship can be described by a straight line through these points. This line is known as the line ofregression. This line should be the most representative ofthe data. There are infmite number oflines that can approximately pass through this pattern, and we are looking for one line out ofthese, that is most suitable as the representative of all the data. This line is known as the line ofbestfit. But, how do we find this regression line or the line ofbest fit? The best line would be the one that passes through all the points. Since that is not possible, we must find a line which is closest to all the points. A line will be closest to all these points ifthe total distance between the line and all the points is minimum. However, some points will be above the line, so that the difference between the line and the points above the line would be positive and some points will be below the line, so that these differences would be negative. Accordingly, for the best line through this data, these differences will cancel each other, and the total sum of differences as a measure of best fit would not be valid. However, if we took these differences individually and squared them, this would eliminate the problem ofpositive and negative differences, since the square ofnegative differences would also be positive; hence, the total sum ofsquares would be positive. Now, have to fmd a line which is closest to all the points. Hence, for such a line the absolute sum ofdifferences between the points would be minimum, and so would the sum ofsquares ofthese differences. Hence, this method offinding the line ofbest fit is known as the method ofleast squares. This line of best fit is known as the regression line and the algebraic expression that identifies this line is a general straight line equation and is given as, J;;= b0 + b1X where b0 and b1 are the two pieces of information called parameters which determine the position of the line completely. Parameter b0 is known as the Y-intercept (or the 182 Self-Instructional Mlterial

value of~ at X= 0), and parameter b1 determines the slope of the regression line Business Forecasting Techniques: which is the change in ~for each unit change in X Correlation and Regression Also, Xrepresents a given value ofthe independent variable, and ~represents the NOTES :::omputed value ofthe dependent variable based upon this relationship. This regression would have the following properties. (a) L(Y- J:d = 0 (b) L( Y- J:d2 = Minimum where, Yts the observed value ofthe dependent variable for a given value ofXand Yis c the computed value of the dependent variable for the same value of X This relation between Yand ~is shown in the following figure. 7 6B 5 4 (Y) 3 2 234 5 6 7 Self-Instructional !vbterial 183 (X) Fig. 8.1 Observed and Computed lfllues offrpendent lflriable The line AB is the line of best fit when, (a) L ( Y- J:d = 0 (b) L ( Y- J:d2 = Minimum where, Yis the actual observation and ~is the corresponding computed value, based upon the method ofleast squares. Now, we since ~ = b0 + b1Xis the algebraic equation for any line, we must fmd the unique values of b0 and q, which would automatically give us the regression line. These unique values of b0 and b1 based upon the least squares principle, are calculated according to the following formulae: (L }')(~X2 )- (~X)(LU) b0 = n(~X2) _ (~X)z and n(LTI) - (LX)(L }) b, = n(LX2)- (LX)z The value of b0 can also be calculated easily, once the value of b1 has been calculated as follows: b0 = Y-qX where Y and X are simple arithmetic means of the Ydata and X data respectively, and n represents the number of paired observations. We can illustrate these calculations by an example.

Business Statistics-I/ Example 8.2: A researcher wants to fmd out if there is a relationship between the heights ofthe sons and the heights oftheir fathers. In other words, do tall fathers have NOTES tall sons? He took a random sample of6 fathers and their 6 sons. The following is their heights in inches are given in an ordered array. Father(~ Son (lJ 63 66 65 68 66 65 67 67 67 69 68 70 (a) For this data, compute the regression line. (b) Based upon the relationship between the heights, what would be the estimate of the height of the son, ifthe father's height is 70 inches? Solution: (a) We can start with showing the scatter diagram for this data. 70 • s::- B rn 69 1:: -0 (/) 68 0 .£!3 67 ~ ·aC;l I 66 65 A • 63 64 65 66 67 68 Heights of Fathers (X) The scatter diagram shows an increasing trend through which the line ofbest fit AB can be established. This line is identified by: ~ = b0 + b1X where, n(L\\11- (I:X)(I: }) b1 = n(I:X2) _ (I:X)2 and, Let us make a table to calculate all these values: X y ~ XY }\"'! 63 66 3969 4158 4356 65 68 4225 4420 4624 66 65 4356 4290 4225 67 67 4489 4489 4489 67 69 4489 4623 4761 68 70 4624 4760 4900 :EX= 396 :E Y= 405 :EN= 26152 :EXY= 26740 :E }l = 27355 184 Self-Instructional Material

Then, Business Forecasting Techniques: Correiation and Regression 6(26740)- (396)(405) bl = 6(26152)-(396)(396) NOTES 160440-160380 156912-156816 60 96 = 0.625 and, b0 = 4~5 - 0.625(396/ 6) = 67.5-41.25 = 26.25 Hence, the line ofregression equation would be: ~ = b0 + b1X = 26.25 + 0.625X (b) Ifthe father's height is 70 inches, i.e., if X= 70, then ~would be: ~ = 26.25 + 0.625 (70) = 26.25 + 43.75 = 70 Thus, the computed height of the son is 70 inches. The coefficient ofcorrelation symbolically denoted by 'r is another important measure to describe how well one variable is explained by another. It measures the degree of relationship between the two causally related variables. The value ofthis coefficient can never be more than+ 1or less than -1. Thus, + 1 and -1 are the limits ofthis coefficient. For a unit change in independent variable, ifthere happens to be a constant change in the dependent variable in the same direction, then the value ofthe coefficient will be + 1 indicative ofthe perfect positive correlation; but ifsuch a change occurs in the opposite direction, the value ofthe coefficient will be -1, indicating the perfect negative correlation. In practical life, the possibility ofobtaining either a perfect positive or perfect negative correlation is very remote, particularly in respect of phenomena concerning social sciences. Ifthe coefficient ofcorrelation has a zero value, then it means that there exists no correlation between the variables under study. There are several methods offinding the coefficient ofcorrelation. but the following ones are considered important: 1 (1) Coefficient ofcorrelation by the method of least squares (ii) Coefficient of correlation through product moment method or Karl Pearson's coefficient ofcorrelation (iii) Coefficient ofcorrelation using simple regression coefficients Whichever of these three methods we adopt we get the same value ofr. I. The methods are self-explanatory and hence only a brief outline of these has been given along with the commonly used formulae. One can find numerous illustrations pertaining to these in any elementary book on statistics. Self-Instructional Material 185

Business Sta.tistics-H Coefficient of Correlation by the Method of Least Squares NOTES Least squares method offitting a line (the line ofbest fit or the regression line) through the scatter diagram is a method which minimizes the sum ofthe squared vertical deviations from the fitted line. In other words, the line to be fitted will pass through the points ofthe scatter diagram in such a fashion that the sum ofthe squares ofthe vertical deviations of these points from the line will be minimum. y-axis Vi' ®.®.;. ®• 0::: 0 120 ® \\:1 9.._., ® ~ 100 ~8. 80 ;..: =IJ.l 60 ·0~= 40 8~ 20 !--~':----:1:----!-::--~:---:&:~~-=--..-~..-+ x-axis 0 20 .JO 60 Income ('00 Rs) Fig. 8.2 Scatter Diagram The meaning ofthe least squares criterion can be better understood more easily through reference to figure drawn below where the earlier scatter diagram2 has been reproduced along with a line which represents the least squares fit to the data. In the figure the vertical deviations ofthe individual points from the line are shown as the short vertical lines joining the points to the least squares line. These deviations will be denoted by the symbol 'e'. The value 'e' varies from one point to another. In some cases it is positive, in others it is negative. Ifthe line drawn happens to be least squares L e;line then the values of is the least possible. It is because ofthis feature the method is known as Least Squares method. Why we insist on minimizing the sum of squared deviations is a question that needs explanation. Ifwe denote the deviations from the actual value lio the estimated Ln value as ( Y- }) or e,, it is logical that we want the :E( Y- }) or e, to be as small as i=l Ln possible. However, mere examining :E( Y- }) or e, is inappropriate since any eican i=l be positive or negative and large positive values and large negative values could cancel one another out. ~ Q.. r.-:-:1 ~ Lc::.:-..1J ~ (2) t.;_:___j t_:__:_!j (5) (I) (4) (3) 2. Five possible forms which a scatter diagram may assume has been depicted in these five diagrams. The first is indicative of perfect positive relationship, the second shows perfect negative relationship, the third shows no relationship, the fourth shows positive relationship and the fifth shows negative 186 Self-Instructional Mtterial relationship between the two variables under consideration.

Y-axis Business Forecasting 'Jechniques: Correlation and Regression c't:i:i' NOTES €_120 Self-Instructional !v/aterial 187 § 100 i·:= 80 Ll:.l .§ 6(1 f ~0 8 o'---:2::\":o~-:4L:-0-60~~~'=\"o--=l~()(~l~ll~O~---+.X -axis Income ('OORs) Fig 8.3 Scatter Diagram, Regression line and Short lerticallines representing e' But large values ofei regardless oftheir sign, indicate a poor prediction. Even ifwe n Lignore the signs while working out Iei I,the difficulties may continue to be there. Hence, i=l the standard procedure is to eliminate the effect of signs by squaring each observation. Squaring each term accomplishes two purposes, viz. (z) it magnifies (or penalizes) the larger errors, and (iz) it cancels the effect of the positive and negative values (since a negative error squared becomes positive). The choice ofminimizing the squared sum of errors rather than the sum ofthe absolute values implies that we would make many small errors rather than a few large errors. Hence, in obtaining the regression line we follow the approach that the sum ofthe squared deviations be minimum and onthis basis work out the values ofits constants, viz.' a' and' b' or what is known as the intercept and the slope of the line. This is done with the help ofthe following two normal equations:3 :LY= na+ b:LX I:XY = ai:X + bL~ 3. If we proceed centring each variable, i.e., setting its origin at its mean, then the two equations will be as follows: Iy = na + bix Ixy = aix + bir But since Iy and Ix will be zero, the first equation and the first term of the second equation will disappear and we shall simply have the following: Ixy = bir b = Ixy~r and the value of •a' can then be worked out as: a = Y- bX Alternatively, we can also write as under: (IX --Ixy = -IX-Y- - .I-~ =IXY-nXY n nn and Ir n{L:2 -(LnXJ}=IX2 _flk2 .. Lny LXY-nXY b Ld= LX2 -nX and a Y-bX=LY_bLX nn

Business Statistics-/I In these two equations, 'a' and 'b' are unknowns and all other values, viz. 'L.X 'L Y,LX2, 'LXY are the sum ofthe products and the cross products (the cross products to NOTES be calculated from the sample data) and 'n' means the number of observations in the sample. Hence, one can solve these two equations for finding the unknown values. Once these values are found, the regression line is said to have been defined for the given problem. Statisticians have also derived a short cut method through which these two equations can be rewritten so that the values of' a' and' b' can be directly obtained as follows: b= n.LXY-.LX.LY n.LX2 -(.LX)2 a=LY_bLX nn Example 8.3: Fit a regression line Y= a+ b~by the method ofleast squares to the following sample information: Observations 1 2 3 4 5 6 7 8 9 10 Income (~ ('00 Rs) 41 65 50 57 96 94 110 30 79 65 Consumption Expenditure ( lj\" ('00 Rs) 44 60 39 51 80 68 84 34 55 48 Solution: We are to fit a regression line Y=a+ b~ to the given data by the method of least squares. Accordingly, we shall work out the' a' and' li values with the help ofthe normal equations as stated above and for the purpose work out LX'L Y,L.xl;L.X values from the given sample information. Summations for Regression Equation Observations Income Consumption XY Jl. }2' X Expenditure y ( '00 Rs) ( '00 Rs) Check Your Progress 1 41 44 1804 1681 1936 2 65 60 3900 4225 1600 1. Explain the different types of 3 50 32 1950 2500 1521 correlation. 4 57 51 2907 3249 2601 5 96 80 7680 9216 6400 2. Explain the meaning of 6 94 68 6392 8836 4624 correlation analysis. 7 110 84 9240 12100 7056 8 30 34 1020 900 1156 3. What is a scatter diagram 9 79 55 4345 6241 3025 method? 10 65 48 3120 4225 2304 4. What is a least squares n= 10 'LX=687 'LY=563 LXY=42358 'L.-Xl=53173 'LF=34223 method? 188 Self-InstructionalfthJterial

Putting the values in the required normal equations we have, Business Forecasting Techniques: 563 10a+687b Correlation and Regression 42358 687a+ 53173b NOTES Solving these two equations for a and bwe obtain, a= 14.000 and b=0.616 Hence, the equation for our required regression line is, Y=a+bAj or, Y = 14.000+ 0.616Aj This equation is known as the regression equation of Yon Xfrom which Yvalues can be estimated for the given values ofXvariable.4 Coefficient of Correlation by Karl Pearson's Method Karl Pearson's method is the most widely used method ofmeasuring the relationship between two variables. This coefficient is based on the following assumptions: (1) There is a linear relationship between the two variables which means that a straight line would be obtained ifthe observed data is plotted on a graph. (i1) The two variables are causally related, which means that one of the variables is independent and the other is dependent. (iii) A large number ofindependent causes operate in both the variables so as to produce a normal distribution. According to Karl Pearson 'r can be worked out as follows: Ixy r where x =(X-X) y = (Y-}) CYx Standard deviation of ~I:Xseries and is equal to CYY Standard deviation of I::to~Yseries and is equal =n Number ofpairs ofXand Yobserved A short cut formula known as the Product Moment Formula can be derived from this formula as follows: Ixy 4. It should be pointed out that the equation used to estimate variable Yvalues from the values of X should not be used to estimate the values of variable X from the given values of variable Y. Another regression equation (known as the regression equation of X on Y of the type X= a + b }'f that reverses the two values should be used if it is desired to estimate X from the value of Y. Self-Instructional M!terial 189

Business Statistics-H - :Lxy NOTES - ~:Lr .ri nn DLXJ' = --;=~r===r====r==1 These formulae are based on obtaining true means (viz. X and Y) first and then doing all other calculations. 8.6 REGRESSION ANALYSIS 8.6.1 Two Regression Lines The relationship between Yand Xis not perfect The average relationship of Yand Xm which Xis the independent and Hhe dependent variable is not the same as the average relationship ofXand Ym which Yls the independent and Xthe dependent variable. The two regression lines, which best describe these two average relationships, are given by the regression equations: Y= a + bX (4.1) X= a' + lJ Y (4.2) y Y=a+bX • 0 X Scatter diagram with lines of best fit a, bin (4.1) are obtained by minimizing :E( Y- Y)2 a', lJ in(4.2) are obtained byminimizing:E(X-Xf b =-::-EE=XX-2-Y---n-=nX:-X=2Y- ll= ::EEX~Y--nnXPY a= Y-b.X a'= .X-liY The signs of b and lJ indicate whether the slopes of the lines of best fit are positive or negative. It may be recalled that the regression coefficient measures the 190 Self-InstructionalMaterial


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook