Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore MCM 602 Quantitative Techniques for Managers

MCM 602 Quantitative Techniques for Managers

Published by Teamlease Edtech Ltd (Amita Chitroda), 2020-12-04 12:01:48

Description: MCM 602 Quantitative Techniques for Managers

Search

Read the Text Version

Probability Distributions 245 p(x) = p(X = x) = nCx px qn–x for x = 0, 1, 2, 3, 4, 5, 6 Here we have been given 9P(X = 4) = P(X = 2) or 9 × 6C4 p4 q2 = 6C2 p2 q4 or 9p2 = q2 Substituting the value q = (1 – p), 9(p2) = (1 – p)2 = 1 + p2 – 2p or 8p2 + 4p – 2p –1 = 0 or (4p – 1) = 0 or 2p = –1  p 1 or p= –1 =4 2 1 Since p cannot be negative, p = – is rejected 2 Hence 1 p= 4 Problem 5 Comment on the following: For a binomial distribution, mean = 7 and variance = 11. Solution: For a binomial distribution, Mean = np = 7 and variance = npq = 11 For these relations, we get q = 1.6 (impossible). Since probability cannot be greater than 1, the statement is wrong. CU IDOL SELF LEARNING MATERIAL (SLM)

246 Quantitative Techniques for Managers Problem 6 8 coins are tossed at a time, 256 times. Find the expected frequencies of success (getting a head) and tabulate the results obtained. Solution: Given hereN 1 = 256 and n = 8 and p = q = 2 (getting head or tail)  Frequency of x successes will be = N p(x) F I1 8 HG KJ= 256 × 8Cx 2 = 8Cx The expected Binomial frequencies can now be tabulated as under Number of heads (x): 0 12 34 5 67 8 28 8 1 Expected Frequency: 1 8 28 56 70 56 Problem 7 Fit a Poisson Distribution to the following data and calculate the theoretical frequencies. x: 0 1 2 3 4 f: 123 59 14 3 1 Solution: Given data: x: 0 1 2 3 4 f: 123 59 14 3 1 fx: 0 59 28 9 4 From the above table,  f = 200, fx = 100 CU IDOL SELF LEARNING MATERIAL (SLM)

Probability Distributions 247 Now to calculate mean, fx x= f 100 = = 0.5 200 If the data follow the Poisson Distribution x =  Theoretical frequncies will be given by f(x) = Np(x) e– x =N x! The table can now be worked out for theoretical frequencies. If x = 0, f(x) = N e– = 200 × e–0.5 = 121 x = 1, x = 2, 0! f(x) = 200e –0.5 1 = 61 1! f(x) = 200e –0.52 = 15.3 2! x = 3, f(x) = 200e –0.53 = 2.5 for x = 4, 3!  f(x) = 200e –0.54 = 0.3 4! Sf ~ 200 CU IDOL SELF LEARNING MATERIAL (SLM)

248 Quantitative Techniques for Managers Problem 8 From a population of candles of 25 pieces, we find that 5 have been found to be broken. If we examine the candles, with a sample size of 5, what is the probability of having (a) less than 2 broken, (b) no broken candle. Solution: Given: as per normal notation, N = 25, D = 5, n = 5 (a) Probability of less than 2 broken will be P(x < 2) = P(x = 0) + P(x = 1) P(x = 0) = GHF05KIJGFHGHF225555–KJI–05KJI = FHGHFG225505KIKIJJ = 0.292 P(x = 1) = GFH51IKJHFGGHF225555–JIK–15JIK = 5FGHFGH225450KJIKJI = 0.456 P(x < 2) = 0.292 + 0.456 = 0.748 (b) for no broken candles case, P(x = 0) = HGF50HFGJKI25HFG52JIK55JKI = FHG 225505JKJIIK HGF = 0.292 (as worked out under (a)). CU IDOL SELF LEARNING MATERIAL (SLM)

Probability Distributions 249 Problem 9 While conducting population survey of a city, the enumerator noticed that 40% of the male population were illiterate. If the trend continues, workout the probability that out of a random sample of 2,00,000 males population, the number of illiterates will be (i) less than 75,000; (ii) more than 82,000. Solution: (i) In this problem, if we denote X as the random variable indicating number of males in the city of sample size 2,00,000, we have n = 2,00,000 p = 40% or 0.4 Hence q = 1 – 0.4 = 0.6 (non-illiterate) np = 2,00,000 × 0.4 = 80,000 npq = 80,000 × 0.6 = 48,000 npq = 48,000 = 219 Hence the distribution of the population of the population is (80,000, 219). (ii) When x  75,000, when Z1 = 75,000 – 80,000 = – 5000 48,000 219 = –22.8 When x > 82,000, Z2 = 82,000 – 80,000 219 2,000 = 219 = 9.13 CU IDOL SELF LEARNING MATERIAL (SLM)

250 Quantitative Techniques for Managers Both the values of Z are very high. Hence the probability of illiterates in the range of 75,000 and 82,000 are infinitely small. 8.8 Summary Terms Used  Bernoulli’s Process: A process in which each trial has only two possible outcomes, the probability of the outcome of any trial remains the same over a period of time and that the trials are statistically independent.  Binomial Distribution: A discrete Distribution specifying the results of an experiment known as Bernoull’s process.  Continuous Probability Distribution: A probability Distribution in which a variable is allowed to take any value within a specified range.  Continuous Random variable: A random variable allowed to take any value within a specified range.  Expected value:A weighted average of the outcomes of an experiment.  Expected value of a Random variable: The sum of the products of each value of the random variable with that value’s probability of happening.  Hyper Geometric distribution: Distribution of such events, when the replacement of the value is not permitted.  Normal distribution: A distribution of a continuous random variable with a single peaked, bellshaped symmetrical curve. The average value of the random variable lies at the centre of the distribution and the curve is symmtrical around a vertical line drawn at this average value. The two tails extend indefinitely, never touching the horizontal line.  Poisson distribution: A discrete distribution in which the probability of occurance of an event within a very small time period is very small number, the probability that two or more such events will occur within the same time interval is effectively zero and the probability of occurance of the event within one time period is independent of where that time period is.  Probability distribution: A list of outcomes of an experiment with the probabilities expected to be associated with these outcomes. CU IDOL SELF LEARNING MATERIAL (SLM)

Probability Distributions 251  Random variable: A variable that takes different values as a result of the outcome of a random experiment.  Standard normal probability distribution: A normal probability distribution, with mean µ = 0 and standard deviation  = 1. Relationships Used  Probability of r successes = p(r) = ncr pr qx–r in Bernoullis trials where p = probability of success and q = probability of failure (q = 1– p)  Mean of binomial distribution µ = np  Standard deviation of a binomial distribution  = npq  Probability of discrete random variable occurring in a Poisson Distribution p(x) = x e –  or e–m.mx x! x!  Poisson Distribution as approximated to normal distribution. Then  Normal variate (np) x e– xp Where p(x) = x! x– z=  x = value of the random variable. µ = mean of the distribution s = standard deviation of the distribution z = number of standard deviations from x to the mean of this distribution.  Probability function of a random continuous variable f(z) = 1 ( x–)2 2 . e 22 CU IDOL SELF LEARNING MATERIAL (SLM)

252 Quantitative Techniques for Managers 8.9 Key Words/Abbreviations The unit is summarised by some of its important points as below:  Binomial Distribution: Adiscrete Distribution specifying the results of an experiment known as Bernoull’s process.  Continuous Random variable: A random variable allowed to take any value within a specified range.  Random variable: A variable that takes different values as a result of the outcome of a random experiment.  Normal distribution: A distribution of a continuous random variable with a single peaked, bellshaped symmetrical curve.  Poisson distribution: A discrete distribution in which the probability of occurance of an event within a very small time period is very small number, the probability that two or more such events will occur within the same time interval is effectively zero and the probability of occurance of the event within one time period is independent of where that time period is. 8.10 Learning Activity 1. Five fair coins were tossed 100 times. From the following outcomes calculate the expected frequencies. No. of heads up : 012 345 Observed frequencies : 2 10 24 38 18 8 2. Fit a binomial distribution X 01 2 3 4 5 6 7 Y 7 6 19 35 30 23 7 1 8.11 Unit End Questions (MCQ and Descriptive) A. Descriptive Types Questions 1. A manufacturing process turns out articles that on the average 10% are defective. Compute the probability of 0, 1, 2 and 3 defective articles that might occur in a sample of 3 articles. CU IDOL SELF LEARNING MATERIAL (SLM)

Probability Distributions 253 2. 20% of the bolts produced by a machine are defective. Deduce the probability distibution of the number of defectives in a sample of 25 bolts, chosen at random. 3. Eight coins are thrown simultaneously. Show that the probability of obtaining at least 6 heads is 37/256. 4. The average percentage of failures in a certain examination is 40. What is the probability that out of a group of 6 candidates, at least 4 passed in the examination ? 5. The incidence of occupational disease in an industry is such that the workmen have a 20% chance of suffering from it. What is the probability that out of 6 workmen, 4 or more will contract the diesease? 6. Suppose that half the population of a town are consumers of rice. 100 investigators are appointed to find out its truth. Each investigator interviews 10 individuals. How many investigators do you expect to report that three or less of the people interviewed are consumers of the rice ? 7. Four coins are tossed simultaneously. What is the probability of getting (i) 2 heads and 2 tails (ii) at least two heads (iii) at least one head? 8. An oil exploration firm finds that 5% of the test wells it drills yield a deposit of natural gas. If it drills 6 wells, find the probability that at least one well will yield gas (simplification is not necessary). 9. An accountant is to audit 24 accounts of a firm. Sixteen of these are of high-valued customers. If the accountant selects 4 of the accounts at random, what is the probability that he chooses at least one highly valued account? 10. 12% of the items produced by a machine are defective. What is the probability that out of a random sample of 20 items produced by the machine, 5 are defective ? (Simplification is not necessary) 11. The odds in favour of X winning a game against Y are 4 : 3. Find the probability of Y’s winning 3 games out of 7 played. 12. On an average 2% of the population in an area suffers from TB. What is the probability that out of 5 persons chosen at random from this area, at least two suffer from TB? (Simplification not necessary) CU IDOL SELF LEARNING MATERIAL (SLM)

254 Quantitative Techniques for Managers 13. In a multiple - choice examination, there are 20 questions. Each question has four alternative answers following it and students must select the one correct answer. Four marks are given for the correct answer and one mark is deducted for every wrong answer. A student must secure at least 50% of the maximum possible marks to pass the examination. Suppose the student has not studied at all so that he decides to select the answers to the questions on a random basis. What is the probability that he will pass in the examination? 14. An anti-aircraft battery had 3 out of 5 successes in shooting down the flying aircraft that came within the range. What is the chance that if 8 aircrafts came within range, not more than 2 got success? 15. A machine produces an average of 20% defective bolts. A batch is accepted if a sample of 5 bolts taken from that batch contains no defective and rejected if the sample contains 3 or more defectives. In other cases, a second sample is taken. What is the probability that the second sample is required? 16. The probability of a man hitting a target is 1/4. (i) If he fires 7 times, what is the probability P for his hitting the target at least twice? (ii) How many times must he fire so that the probability of his hitting the target at least once is greater than 2/3. 17. From past weather records, it has been found that on an average, rain falls on 12 days in June. Find the probability that in a given week of June. (i) the first 4 days will be dry and the remaining 3 days wet. (ii) there will be rain on alternate days. (iii) exactly 3 days will be wet. 18. Assuming that it is true that 2 in 10 industrial accidents are due to fatigue, find the probability that (i) Exactly 2 of 8 industrial accidents will be due to fatigue. (ii) At least 2 of 8 industrial accidents will be due to fatigue. 19. If hens of a certain breed lay eggs on 5 days a week on an average, find how many days during a season of 100 days, a poultry keeper with 5 hens of this breed, will expect to receive at least 4 eggs? CU IDOL SELF LEARNING MATERIAL (SLM)

Probability Distributions 255 20. Out of 1,000 families of 3 children each, how many families would you expect to have 2 boys and 1 girl, assuming that boys and girls are equally likely? 21. The following statement cannot be true, why? “The mean of a binomial distribution is 4 and its standard deviation is 3”. 22. If the mean of the binomial distribution is 3 and variance is 3/2, find the probability of at least 4 successes. 23. The mean of binomial distribution is 4 and its standard deviation is 3 . What are the values of n, p and q with usual notation? 24. A discrete random variable X has mean equal to 6 and variance equal to 2. If it is assumed that the underlying distribution of X is binomial, what is the probability that 5  X  7? 25. Five coins are tossed 3200 times. Find the frequencies of the distribution of heads and tails and tabulate the results. Calculate the mean number of successes and standard deviation. 26. Five fair coins were tossed 100 times. From the following outcomes calculate the expected frequencies. No. of heads up :0 1 2 3 4 5 Observed frequencies : 2 10 24 38 18 8 27. Comment on the following: For a Poisson Distribution. Mean = 8 and variance = 7. 28. If 5% of the electric bulbs manufactured by a company are defective, use Poisson Distribution to find the probability that in a sample of 100 bulbs ; (i) None is defective ; (ii) 5 bulbs will be defective. (Given : e–5 = 0.007) 29. Between the hours 2 PM and 4 PM, the average number of phone calls per minute coming through the switch board of a company is 2.35. Find the probability that during one particular minute, there will be at least 2 phone calls (Given : e–2.35 = 0.095374) 30. A manufacturer of blades knows that 5% of his products are defective. If he sells blades in boxes of 100 and guarantees that not more than 10 blades will be defective, what is the probability (approximately) that a box will fail to meet the guaranteed quality? CU IDOL SELF LEARNING MATERIAL (SLM)

256 Quantitative Techniques for Managers 31. If a random variable X follows the Poisson Distribution such that P(X = 1) = P(X = 2), find (a) The mean of the Distribution (b) P(X = 0) 32. It is known from past experience that in a certain plant, there are on the average 4 industrial accidents per month. Find the probability that in a given year, there will less than 4 accidents. Assume Poisson Distribution (e–4 = 0.0183) 33. Assuming that one in 100 births is a case of twins, calculate the probability of 3 or more sets of twins on a day when 40 births occur. Compare the results obtained by using (i) The Binomial distribution, and (ii) Poisson approximation. 34. Write down the probability function of a Poisson Distribution whose mean is 2. What is its variance? Give 4 examples of Poisson Variable 35. The standard deviation of a Poisson Distribution is 2. Find the probability that X = 3. (Given e–4 = 0.0183) 36. Define a Poisson Distribution. If X be a Poisson Variate with parameter 1. find P (3 < X < 5). 37. If 2% of electric bulbs manufactured by a certain company are defective, find the probability that in a sample of 200 bulbs, (i) less than 2 bulbs ; (ii) more than 3 bulbs are defective. (Given : e–4 = 0.0183) 38. A manufacturer of pins knows that on an average 5% of his products are defective. He sells pins in the boxes of 100 and guarantees that not more than 4 pins will be defective. What is the probability that a box will meet the guaranteed quality? (Given : e–5 = 0.0067) 39. Find the probability of at least 5 defective bolts found in a box of 200 bolts if it is known that 2 per cent of such bolts are expected to be defective (assume Poisson Distribution and e–4 = 0.0183) 40. Using Poisson approximation to the Binomial Distribution, solve the following problem : If the probability that an individual suffers a bad reaction from a particular injection is 0.001, determine the probability that out of 2000 individuals (i) exactly three : (ii) more than two individuals will suffer a bad reaction (Given : e2 = 7.4) CU IDOL SELF LEARNING MATERIAL (SLM)

Probability Distributions 257 1 41. In a certain factory, turning out razor blades, there is a small chance for any blade to 500 be defective. The blades are supplied in packets of 10. Use Poisson distribution to calculate the-approximate number of packets containing no defective, one defective and two detective blades respectively in a consignment of 10,000 packets. 42. The distribution of typing mistakes committed by a typist is given below. Assuming a Poisson Model, find the expected frequencies. Mistakes per page : 0 1 2 3 4 5 No. of Pages : 142 156 69 27 5 1 43. Five hundred television sets are inspected as they come off the production line and the number of detects per set is recorded below. Estimate the average number of defects per set and the expected frequencies of 0, 1. 2. 3 and 4 defects, assuming Poisson Distribution. No. of defects (X) : 01 2 3 4 No. of Sets : 368 72 52 7 1 44. Suppose that waist measurement W of 800 girls are normally distributed with mean 66 cm. and standard deviation 5 cm. Find the number N of girls with waists (i) Between 65 and 70 cm. (ii) Greater than or equal to 72 cm. 45. Assume the mean height of soldiers to be 68.22 inches with a variance of 10.8 inches. How many soldiers in a regiment of 1,000 would you expect to be (i) over six feet tall and (ii) below 5.5 feet. Assume heights to be normally distributed. 46. Compare the salient features of Binomial and Normal Distributions. Give appropriate examples from real life situations in business and management for suitability of their applications. 47. Balances in Saving’s Accounts in a bank have an average of ` 1,200 and SD of ` 400. Assuming that the accounts are normally distributed, estimate the number of accounts having balances between ` 1,000 and ` 1,500 (Given the areas under normal curve between x–x o and Z =  are 0.1915 and 0.2734 for Z = 0.5 and Z = 0.75 respectively). CU IDOL SELF LEARNING MATERIAL (SLM)

258 Quantitative Techniques for Managers 48. A workshop produces 5,000 units per day. The average weight of the units is 140 kgs. with standard deviation of 8 kgs. The variable weight follows a normal distribution. Find the number of units weighing less than 145 kgs. 49. What are the characteristics of Binomial and Poisson Probability Distributions? 50. Assuming that the distribution is normal and if a student is selected at random, what is the probability that his 1Q will be (i) above 165, (ii) between 140 and 155 and (iii) less than 145. 51. Describe the characteristics of Normal Distribution. Explain how t-distribution is different from Normal Distribution. 52. Fit a binomial distribution X0 1 2 3 4 5 6 7 Y 7 6 19 35 30 23 7 1 53. The arithmetic mean of purchases per day by a customer in a large store is ` 25 with a standard deviation of ` 10. If on a particular day, 100 customers purchased for ` 37.80 or more, estimate the total number of customers who purchased from the store that day. Given that the normal area between t = 0 to t = 1.28 is 0.4000, where t is a standard normal variate. B. Multiple Choice/Objective Type Questions 1. A variable that can assume any value between two give points is called ____________. (a) Continuous random variable (b) Discrete random variable (c) Irregular random variable (d) Uncertain random variable 2. Discrete probability distribution depends on the properties of __________________. (a) data (b) machine (c) discrete variable (d) probability function 3. If m is the mean of poisson distribution, then P(0) is given by __________________. (a) em (b) e–m (c) e (d) m–e CU IDOL SELF LEARNING MATERIAL (SLM)

Probability Distributions 259 4. Variance of binomial probability distribution is larger in value if __________________. (a) q is greater than 0.5 (b) p and q are greater than 0.5 (c) p is greater than 0.5 (d) p and q are equal 5. The parameters of the normal distribution are ______________________. (a)  and 2 (b) np and nq (c)  and  (d) n and p Answers 1. (a), 2. (a), 3. (b), 4. (d), 5. (c) 8.12 References References of this unit have been given at the end of the book. CU IDOL SELF LEARNING MATERIAL (SLM)

260 Quantitative Techniques for Managers UNIT 9 SAMPLING AND SAMPLING DISTRIBUTIONS Structure: 9.0 Learning Objectives 9.1 Introduction 9.2 Sampling Method 9.3 Parameters and Characteristics 9.4 Basic Sampling Concepts 9.5 SamplingUtility 9.6 Steps in Sample Survey 9.7 Errors in Sample Survey 9.8 Types of Sampling 9.9 Sampling Distributions 9.10 Solved Problems 9.11 Summary 9.12 Key Words/Abbreviations 9.13 LearningActivity 9.14 Unit End Questions (MCQ and Descriptive) 9.15 References CU IDOL SELF LEARNING MATERIAL (SLM)

Sampling and Sampling Distributions 261 9.0 Learning Objectives After studying this unit, you will be able to:  Analyse the sampling method  Explain the sampling utility  Discuss the errors in sample survey  Elaborate the probability sampling and non- probability sampling 9.1 Introduction To gain information or its analysis about a given population is not very easy nor it may be feasible when the population is large. We then find a method which can speak about the characteristics of a total population based on the analysis of certain representative members of that population. This gives rise to 'Theory of Sampling'. The sample is a small part or a representative section selected from a population. The process of such a selection is called 'Sampling'. Therefore, we can say that Sampling Theory is the study of relationship existing between the population and the samples obtained from the population. Population being aggregate of very large number of members, or a very vast information, may be not even easy to collect, is not feasible to be studied for complete enumeration. It may be very cumbersome, time and effort consuming or may be very costly process, not serving the purpose due to delay in collection of information, even after incurring heavy cost. Information not collected in time may not be worthless, if decision making gets affected due to its non-availability. Even after collection of large information, it may take very long time for its structuring and analysis. Hence 'Sampling Theory' is a very handy tool for day-to-day decision making environment. It, therefore, becomes essential to draw inferences for the total population based on the analysis carried out on some of its members, information about whom can be collected easily and their selection itself can be structured in a definite way, so that this sample analysis can be relied upon and deployed for the total population. Thus, samples help us in determining the reliability of the estimates. This can be achieved by taking different samples from the same parent population and then by comparing analysis results obtained from different samples. CU IDOL SELF LEARNING MATERIAL (SLM)

262 Quantitative Techniques for Managers 9.2 Sampling Method When we decide to undertake the survey of the entire population, such as population of the country, the data is collected from each and every member of the family from across the country. We can also collect information about the salary structure of an organisation by obtaining data of all the employees of that organisation. This is Population Data Collection Process. We can then use this data to analyse and obtain various characteristics of the population, such as number of males and females in the country, the range of their ages, the number of persons in an organisation paid in a certain range of salary or their correlation of salary with their educational background or salary relationship with reference to their age groups etc. The average, mean or range parameters can be worked out from such vast data collected. In this process, a very coherent and correct information is made available about the total population. Thus, we can enumerate the advantage of the population survey as follows. 1. The relevant respective data for every unit or sub-group of the population can be compiled. 2. The analysis based on this data is very accurate and reliable. 3. The population data so obtained can be used for future study for comparing certain changed future characteristics from this accurate data. Though there are distinct advantages of the population or Census Method, the time, effort, cost required to carry out such a survey is very large and may not be worthwhile every now and then. This method should be used sparingly for a large population or else can be utilised for smaller population groups. Sampling, therefore, should be resorted to, because it can be done faster and the analysis results can be used as fairly reliable. Sampling involves the study of a small group of the target population sector or unit. Sampling Method is most desired under the following situations. 1. The population is very large (almost infinite) and survey is either practically impossible to conduct or is undesirable from the cost effectiveness angle of such a survey. 2. When information or analysis is required at short notice for a quick decision making process, it is better to obtain quick results by Sample Survey. 3. When we are trying to carry out 'destructive testing of a group of units' it is preferred to rely on sample testing rather than study of all units. CU IDOL SELF LEARNING MATERIAL (SLM)

Sampling and Sampling Distributions 263 4. When cost of conducting total population survey is prohibitive, we better rely on sample survey. 5. The reliability of the Census or Population survey also cannot be guaranteed, due to vastness of the data and its defective structuring. In such cases, sample reliability can be used with same degree of usefulness. Since the data volume in a sample survey is limited, it is possible to pay attention to all relevant elements and more authentic results can be expected and that too at limited cost and effort level. To augment reliability further, a number of sample studies can be compared, spending again limited time and money. The sampling theory is widely used as a very useful tool in day to day working such as tasting tea and coffee, home food for salt or sugar, testing of bulbs in the shop before buying, tasting grapes from the seller's baskets or testing students (on 3 hrs. test basis) for their quality of performance. 9.3 Parameters and Characteristics When we conduct surveys, we aim at obtaining some useful information about the attributes of certain entities. The attributes selected for the study or survey are called characteristics and the units possessing these characteristics are termed as elementary units. We are normally concerned with certain measurable characteristics of these units or with the number of proportions of such units marked by the presence or absence of some qualitative characteristic. The study of the qualification of workers or employees in an organisation will be treated as qualitative characteristic, whereas the salary structure study of a group of people is known as quantitative characteristic. Thus, the sampling unit is an elementary unit such as the organisation for which salary structure is being studied. The sample is then defined as an aggregate of the sampling units actually chosen in obtaining a representative subset from which inferences about the population can be drawn. The statistical constants of the population like mean (), the variance (2), the skewness (1), Kurtosis (2), moments (r), correlation coefficients (r) etc., are known as parameters and we can compute similar statistical constants for the sample drawn from the given population. Considering a finite population of N units with y1, y2 ... yN observations of the population units, we can select a sample of size n units from this population. If x1, x2, x3 ....xn are the observations of the sample units, then CU IDOL SELF LEARNING MATERIAL (SLM)

264 Quantitative Techniques for Managers  11 N = N (y1 + y2 + .... yN) = N  yi N i 1  11 ( yi  )2  2 = N [(y1 – )2 + (y2 – )2 +....] = N i 1 From sample observations, x = 1 n n  xi x 1 b g1 and s2 = n  xi  x 2 We can say that sample statistics are functions of sample observations and we can write t = t(x1, x2, ... xn) If a statistic t = t(x1, x2, x3, .... xn) is said to be an unbiased estimate of the population parameter , if E(t) = . Then E (Statistic) = Parameter. 9.4 Basic Sampling Concepts 1. Sampling Population Relationship : If we draw a sample of size n from a given finite population of size N, then the total number of possible samples is a fN! nCn = n! N  n ! = k Now we can comp ute x , s2 etc. for t = t(x1, x2 ... xn) and Var(t) = 1 k  t )2 k  (ti i1 2. Standard Error : The standard deviation of the sampling distribution of a statistic is known as its Standard Error. Thus SE(t) = Var(t) NML OQP=1 k  t )2 k  (ti i1 This concept is extremely useful in testing the statistical hypothesis by Z = t  E(t) ~ N(0,1). If SE(t) n is large. If (Z) < 1.96 then |t–E(t)| < 1.96 SE(t). Then the difference t – E(t) is not significant at 5% level of significance. The difference is just the fluctuation of the sampling and data do not provide any evidence against null hypothesis, which may, therefore, be accepted. However if |Z| > 1.96 or |t – E(t)| > 1.96 SE(t) CU IDOL SELF LEARNING MATERIAL (SLM)

Sampling and Sampling Distributions 265 Then the difference is regarded as significant and null hypothesis is rejected at 5% level of significance. The reciprocal of SE of a statistic gives a measure of the precision or the reliability of the estimate of the parameter. 3. Law of Statistical Regularity : In the words of L.R. Conner, \"The law of statistical regularity lays down that a group of objects chosen at random from a higher group tends to possess the characteristic of that larger group\". Thus if the sample size increases, the sample is more likely to reveal the true characteristics of the population. It also establishes that the sample should be selected at random from the population. 4. Principle of Inertia of Large Numbers : From the above rule i.e. \"Law of Statistical Regularity\" emerges the Principle of Inertia of large Numbers, which can be stated as \"other things being equal, as the sample size increases, the results tend to be more reliable and accurate\". 5. Principle of Persistence of Small Numbers : If some of the items in a population possesses distinct characteristics from the remaining items, then the tendency would be revealed in the sample values also. It means that the tendency of the characteristic will be persisted in sample observations also in large samples. 6. Principle of Validity : A sample design is said to be valid if it obtains valid results and estimates about the population parameters. 7. Principle of Optimisation : It stresses the necessity of obtaining optimum results in terms of efficiency and the cost of the sampling design with the sources available. 9.5 Sampling Utility As described earlier in the chapter, there are distinct advantages of sampling method over the population enumeration method. These can be summarised as speed, economy, adaptability and scientific approach. It also ensures a great administrative convenience by avoiding requirement of large resources. Though there are chances of large sampling errors, a carefully designed and scientifically executed sample survey can provide very reliable results. It also offers a scope of better results due to execution of large number of samples and comparison of their results to establish reliability. CU IDOL SELF LEARNING MATERIAL (SLM)

266 Quantitative Techniques for Managers However, the sample method can be made more useful by 1. Drawing the sample in a scientific manner. 2. Using appropriate sampling design. 3. Ensuring an adequate (fairly large) sample size. In the words of Frederick F. Stephen - \"Samples are like medicines. They can be harmful when they are taken carelessly or without the knowledge of their effects. Every good sample should have a proper label with instructions about its use\". 9.6 Steps in Sample Survey For the reasons stated in para 14.5 above, the planning and execution of the sample survey should be immaculate with following structured steps: 1. Objectives and Scope of the Survey : We should have specific, clear and concrete objective and scope of the survey to be conducted, so that collection of irrelevant data can be avoided and wastages of resources can be eliminated. 2. Defining the Population to be Sampled : With aim being clear, the target population for the relevant statistic should be well defined, so that unnecessary population units can be left out from the survey. 3. The Frame and Sampling Units : We should select the population capable to division into small sample units, so that enumeration is made easy and simultaneous. A sampling unit should be specific, unambiguous, stable and appropriate for the requirement. The list and other acceptable material to serve as a guide for information collection is called a Frame. An updated good frame will help in collection of quality data and hence is a great help in achieving good results out of sample survey. 4. Data Collection : Since aim is known, frame is updated, the data to be collected gets specified. This will help in getting all the relevant information within the cost and time estimate of the survey. 5. Schedule : Data can be collected by deciding a structured schedule or a set of questionnaire, which can be obtained in writing from the respondents. Care should be taken to frame questionnaire CU IDOL SELF LEARNING MATERIAL (SLM)

Sampling and Sampling Distributions 267 very clearly to the point, without ambiguity, in view of the knowledge, understanding and the general level of respondents. 6. Collection of Information : The relevant information either in the form of structured data set or questionnaire can be collected by direct personal interviews or by mail questionnaire method. Choice between the two depends on time, cost, urgency or ambiguity of the survey. Non-respondents can disturb the balance of information and to be guarded against 7. Selection of Sampling Design : The sampling plan should be decided before execution of the sample survey. Simple Random Sampling, Stratified Random Sampling, Systematic Sampling etc., be decided in advance based on the situation due to the object of the survey, nature of population, cost involved or the time availability for the sample survey. 8. Field Work : For reliable results, the data collected should be reliable and hence sampling errors should be eliminated. For this purpose, the ground work should be properly designed, organised and monitored by proper selection and training of the field workers. It is very desirable to provide adequate and frequent supervisory monitoring on the field work. 9. Pilot Survey : Initially, we should resort to conduct of pre-test or a guide survey for establishing usefulness of the survey. Hence a pilot survey on miniature scale be conducted first to establish authenticity of the method, level of questionnaire and training and knowledge of the survey field staff. It will improve the quality of the outcome of the result. 10. Summary and Analysis : Once the planning and execution of survey has been completed, we have to organise analysis of the collected data. The analysis involves the following: (a) Scrutiny and editing of the data (b) Proper tabulation of data as per parameters (c) Statistical analysis (d) Reports, summary, conclusions and recommendations. For this purpose, we need designing and utilising appropriate statistical techniques, so as to minimise errors at every stage. The report should be concise, logical and clear cut so that its usefulness can be established. CU IDOL SELF LEARNING MATERIAL (SLM)

268 Quantitative Techniques for Managers 9.7 Errors in Sample Survey Depending on the type of population and sampling technique, the level of competence of the surveyor the data availability and its structuring, there may be inaccuracies arising in any statistical investigation and these may arise during any phase such as collection of information, processing, analysis or interpretation of the data. These inaccuracies are called 'Errors' and may be of two types (a) Sampling errors and (b) Non-sampling errors. 1. Sampling Errors : While studying the characteristics of a population we take only a small portion of data in the form of the sample and the quality of this data in the sample survey may not truely represent the basic structure of the population, the results thereby differing from the population results. This is called Sampling Error. Even if the sample is high random and representative of the population, some error is natural. This error is attributed to the fluctuations of the sample data. Thus sampling error will be present in any sampling survey results and not in the census method. These errors may be creeping in either due to faulty selection of the sample or an intentional random data substitution in the sample. Even analysis method selection may lead to different results with built in sampling errors. The errors reduce when we increase the sample size as now the data tend towards the population. It is demonstrated in Fig. 9.1. Sampling error Sampling size Fig. 9.1. Sampling Errors 2. Non-Sampling Error : These errors are a consequence of certain factors, which can be controlled by human intervention, such as proper selection, planning and careful execution of the sample survey. These types of errors are due to certain assignable causes, which can be traced and can be controlled. They, obviously, will be present in sample survey as well as in census survey, in fact more in census survey due to large data the possible causes can be faulty planning (aim of survey not very clear), imperfect questionnaire, selection of interviewers being unplanned, or non- coherent answers by the sample subjects or respondents either due to self interest or inadequate knowledge or even due to ego or prestige problems. Then there may be a similar bias introduced by CU IDOL SELF LEARNING MATERIAL (SLM)

Sampling and Sampling Distributions 269 the investigator himself either due to his self-interest or poor training, non-response or improper coverage (to suit his convenience). Then we may find publication errors arising out of poor printing or proof reading problems etc. These may be biased or unbiased errors in the sample survey (or even in census survey) as described in the reasons above. Bias may be introduced either due to intention of the investigator or by his faulty instrumentation. The bias can be due to poor or faulty response of the respondents or in the processing techniques. The unbiased errors can creep in during investigation based on small samples, but will be minimised when sample size is increased. The unbiased errors do not grow with the increase in the number of observations, but have a tendency to reduce during final analysis. These unbiased errors are thus inversely proportional to the number of observations . 9.8 Types of Sampling For achieving desired correct results from a sample survey, the execution of sample design is of utmost importance and hence proper selection of the sampling method becomes imperative. The Sampling Techniques can be broadly classified into following categories: viz. probability and non- probability sampling Few names under these two headings of sampling are enumerated below: 1. Probability Sampling : (a) Simple Random Sampling (b) Stratified Sampling (c) Cluster Sampling (d) Multi-stage Sampling (e) Area Sampling (f) Multi-phase Sampling (g) Systematic Sampling 2. Non-Probability Sampling : (a) Convenience Sampling (b) Haphazard Sampling CU IDOL SELF LEARNING MATERIAL (SLM)

270 Quantitative Techniques for Managers (c) Purposive Sampling (d) Expert Sampling (e) Heterogeneity Sampling (f) Modal Instance Sampling (g) Quota Sampling (h) Snowball Sampling Probability Sampling The probability sampling is the scientific technique which draws sample from the population based on the application of probability methods, wherein each unit of the population has some predefined probability of inclusion of an event into the drawn sample. The samples will therefore be selected in the following manner: 1. Each unit is drawn on the basis of randomness 2. Each unit has the same chance of being selected 3. Probability of selection of a unit is proportional to the sample size. Thus the samples are drawn based on random procedure and not on any judgemental method. It attaches a objective measure of precision to the results achieved by such samples. These sampling techniques are described below: (a) Simple Random Sampling : Sampling design in this case is based on some probability laws. In a random sampling, the elements of the sample are drawn at random and the choice of each element follows some probability law. In this case, each element will, then, be chosen based on the same law. Thus every possible element has the same chance of drawing as the others. If we have a population of N elements, we can select n sets of elements out of such a population (where n is fairly large), and the possible sets of n elements will be NCn, following the same probability of selection for every such set of elements. The basic aim is to achieve randomness in drawing the elements of a sample to ensure all possible samples to have the same chance of being selected. We can use either lottery system or the Random Number table system, both either with replacement of the drawn number or without replacement. CU IDOL SELF LEARNING MATERIAL (SLM)

Sampling and Sampling Distributions 271 In lottery system, all the elements of the population are allotted identical identification, say same type and size of paper with element numbers written on each. After proper folding the papers in the same manner and thorough mixing of these papers, we can choose any paper at random without any bias either through the container system or taking out each paper blindly. Thus, if we have 200 students and we have to nominate only 10 to the students welfare council, we can use 200 paper written numbers or names and by selecting any first 10 out the container having all the 200 papers, we can constitute the council in a very fair manner. When N is very large, this method becomes cumbersome and difficult to manage. In that case, we use the method of Random Number Tables (tables attached at the end of the book). From the Random Number Tables, the numbers can be selected from the list, where numbers have already been arranged in Random order. We can select Numbers either through the rows or through columns. Various Random Number Tables in use are : (i) Tippetts (1927) – 10,400 sets of four-digit Random Numbers (ii) Fisher and Yotes (1938) – Table of Random Numbers with 1500 sets of ten-digit Random Numbers. (iii) Kendall and Babinton Smiths (1939) – Consisting of 100,000 digits grouped into 25,000 sets of 4 digit Random numbers. (iv) Rand Corporation (1955) – Table of Random Number of 2,00,000 sets of five-digit Random Numbers. (v) Table of Random Numbers – (ISI Series, Calcutta) by C.R. Rao, Mitra and Mathai. (b) Stratified Sampling : In Simple Random Sampling, we draw very homogeneous samples and the individual elements are drawn from the whole universe or population. In some situations, we can classify the population into some specified distinct class of element sets such as Men and Women classes (out of total population). When we draw a random sample out of each of these classified groups, we call such samples as Stratified Samples. Thus, if we divide the seats on the welfare council, say 5 for men and 5 for women, then we have to draw Random samples from two different groups (Men and women). Hence there is no possibility of drawing more than 5 elements from Men's group, so also from the women's group. Thus we ensure there is no probability of more than 5 men or 5 women on the council. There is distinct advantage of such a sample based on specific situation. Thus, stratification is an effective sampling tool to create homogeneous class samples rather than the total. We ensure homogeneity among the class elements only. It is a more representative sample for the study in this case. CU IDOL SELF LEARNING MATERIAL (SLM)

272 Quantitative Techniques for Managers (c) Cluster Sampling : Cluster sampling consists of forming suitable groups or clusters and then collecting relevant information of all the elements in a sample of clusters as per the appropriate sampling design. The advantage of the cluster sampling is the cost involved; because the data collected from the nearby elements is easier, cheaper, faster and more convenient than by observing units scattered over a wide region. For explaining the concept, we can divide a town into various blocks and then using Random Sampling Technique for each block for collection of information about their income groups. For a similarly placed people in a block, this type of sampling will be more useful. (d)Multi-stage Sampling : As described above, we can see that use of cluster sampling technique under certain circumstances is cheaper, but it is less efficient than the individual sampling. Thus as a combination, we can use Multistage Sampling, in which we can select cluster samples and then studying only a sample of units in each cluster. This is called Two-stage Sampling. Similar concept can be extended to bring in Multistage sampling, where sampling units at each stage are the cluster of units of the next stage and the observations are selected in stages, sampling at each stage being done from each of the sampling units. This method is more efficient than the direct sampling and less efficient than the cluster sampling. The diagrammatic representation is given in Fig. 9.2 below: Cluster of units Cluster of units Cluster of units Elementary units Elementary units Elementary units First stage Second stage Third stage Fig. 9.2 Multistage sampling units It may be noted that the sampling process at each stage may be either random or stratified. Multistage sampling is more flexible as compared to other methods of sampling. (e) Area Sampling : When we use cluster sampling concept for the elementary units of population is a particular geographical area, it is called 'Area Sampling'. In this case, we can study the community behaviour index of a particular community living in a particular locality or part of the country, such as Tamil population in Tamil Nadu, Jats in Haryana, Brahmins in U.P. or Gujjars in Rajasthan or J & K etc. But selection of sample in each area should be random for enumerated elements. Thus the enumeration of elements is necessary only in the limited number of selected areas. CU IDOL SELF LEARNING MATERIAL (SLM)

Sampling and Sampling Distributions 273 (f) Multi-phase Sampling : This type of sampling is adopted when sampling units of the same type are the objects of different phases of observation. In this case, all the units of a phase in a sample are studied with respect to the same characteristics. If we want to collect information about capital employed by all members of 300 companies, we can add information about source of financing about another 100 companies, this type of sampling will be treated as Two-phase sampling. The concept can be extended to Multiphase Sampling. In this case information collected during one phase is then used in the second or subsequent phases. (g) Systematic Sampling : A very simple form of sampling for its design and execution is used when the members of a population are arranged in an order. The order corresponding to consecutive members. In this type of sampling, the first sample unit is selected at random and the remaining units are automatically selected on a definite sequence at equal spacing from one another. For example, if we want to select 50 candidates out of 1,000 names arranged systematically, we can select any 20 N 1,000 at random (i.e., K = = ) and the corresponding candidates for a call centre job will be n 50 selected based on a selected number basis (say 4), we then have candidates with serial numbers, 4, 24, 44, 64 etc. and can have 50 such candidates for the first set of interviews. Non-Probability Sampling Non-probability sampling is a sampling technique where the odds of any member being selected for a sample cannot be calculated. In addition, probability sampling involves random selection, while non-probability sampling does not-it relies on the subjective judgement of the researcher. Types of Non-Probability Sampling (a) Convenience Sampling: as the name suggests, this involves collecting a sample from somewhere convenient to you: the mall, your local school, your church. Sometimes called accidental sampling, opportunity sampling or grab sampling. (b) Haphazard Sampling: where a researcher chooses items haphazardly, trying to simulate randomness. However, the result may not be random at all and is often tainted by selection bias. (c) Purposive Sampling: where the researcher chooses a sample based on their knowledge about the population and the study itself. The study participants are chosen based on the study's purpose. There are several types of purposive sampling. For a full list, advantages and disadvantages of the method, see the article: Purposive Sampling. CU IDOL SELF LEARNING MATERIAL (SLM)

274 Quantitative Techniques for Managers (d) Expert Sampling: in this method, the researcher draws the sample from a list of experts in the field. (e) Heterogeneity Sampling / Diversity Sampling: A type of sampling where you deliberately choose members so that all views are represented. However, those views may or may not be represented proportionally. (f) Modal Instance Sampling: The most \"typical\" members are chosen from a set. (g) Quota Sampling: where the groups (i.e. men and women) in the sample are proportional to the groups in the population. (h) Snowball Sampling: where research participants recruit other members for the study. This method is particularly useful when participants might be hard to find. For example, a study on working prostitutes or current heroin users. 9.9. Sampling Distributions We have discussed the laws of large numbers and also the central limit theorem. Now we use these concepts to develop sampling distribution. Sampling distribution means that we take a sample from the given population and study the way various parameters of this sample are distributed. The mean and the standard deviation of the sampling distribution are written as x and s respectively. Sampling distributions constitute the basis of statistical inference and play an important role in decision-making process. If we take number of samples of equal size from the population, the probability distribution of all the possible values of given statistics from all the possible samples of equal size is termed as sampling distribution. If we take x1, x2, ……xn random independent variables of sample size n from a population having the same mean µ, then x = x1  x2  x3  xn n L Oand X = E(x) = E x1  x2  x3 xn NM PQn 1 = n [E(x1) + E(x2) + …E(xn)] CU IDOL SELF LEARNING MATERIAL (SLM)

Sampling and Sampling Distributions 275 1 = n [µ + µ + …µ] 1 = n .nµ = µ Fig. 9.3. SamplingDistribution ofSample means Thus it can be seen that mean of a sampling distribution of sample means is the same as the mean of the population. From the diagram given below, we can observe that sampling distributions closely approximate a normal distribution. The mean of the sampling distribution is given the same symbol as that for the mean of the population i.e. µ, but the standard deviation of the sampling distribution is called standard error of mean and is denoted as  x indicating that it is for sampling distribution of means. The relation of standard error of sampling that to the standard deviation of the population is given below.  x = n This relationship is valid only when the population is infinite or the samples are chosen from a finite population without replacement. Thus, LMN POQ2x = Var( x ) = Var x1  x2  xn n 1 = n2 [Var(x1) + Var(x2) + ……Var(xn)] 1 = n2 n2 CU IDOL SELF LEARNING MATERIAL (SLM)

276 Quantitative Techniques for Managers   x = n Before embarking on the sampling distribution, it is necessary to make assumptions about the population parameter. We can use any value for a parameter, depending on the quality of population but there is no theoretical limit to the number of sampling distribution for the same sample size that can be drawn from the given population. The distribution of one static may differ from that of another statistic. Thus the shape of the distribution of x will differ from that of s, even though both the parameters are computed from the same sample. It is interesting to notice that the mean of the sampling distribution is the same as the mean of the population. Also the observed standard deviation of a sample is close to the standard deviation of the population values. However the standard deviation of the sample is calculated as follows. a f x  x 2 s= n1 and not a f x  x 2 s= n Due to smaller denominator, it gives slightly larger value of standard deviation. Thus the estimated standard deviation of the population is slightly larger than the observed standard deviation of the sample. Sampling Distribution of the Mean Fig. 9.4. Relationship ofpopulation andsampling distribution of mean CU IDOL SELF LEARNING MATERIAL (SLM)

Sampling and Sampling Distributions 277 If the population distribution is normal, the sampling distribution of the mean ( x ) is also normal for all sample sizes. Important properties of the sampling distribution of mean are (a) Its mean is equal to the population mean, x = m  (b) Standard deviation  x = n (c) It is normally distributed In practice, population mean may not be easily available and hence standard deviation of the sample is used in its place. s  x = n where s = standard deviation of the sample Normally, for all survey and research, the samples are chosen without replacement, which is contrary to the assumption of central limit theorem. Hence a correction factor has to he applied to cater tor the proportion of observations not included in the sample. If N is the population finite size, then we apply a correction factor as 1– n = N–n , When N is large, this would be approximated to N N population, N – n . Therefore, in case of sampling from a finite sample being chosen without replaceNme–n1t, the sampling distribution of a sample mean will have mean x = m and standard error aaNN – 1nff  – x = n Hence x – Z= a f N – n a fn N – 1 Distribution of Sample Medians If the population is large and can be approximated to the normal distribution with the mean m and the standard deviation s, the medians of random samples of size n are distributed with a mean m  and standard deviation 1.2533 n . If the value of n is large, this distribution is nearly normal. FHG IKJThus  Med = 1.2533 n CU IDOL SELF LEARNING MATERIAL (SLM)

278 Quantitative Techniques for Managers Sampling Distribution of the Differences between Two Means If we are analysing two populations, of sizes N1 and N2, their respective parameters can be denoted as 1, 1 and 2, 2. The comparison of these two populations can be made based on two Independent random samples of size n1 drawn from first population and of size n2 from the second population. We denote the means of these two samples as x1 and x2 and we now determine the properties of a sampling distribution of x1 – x2 . b gImportant properties of sampling distribution of x1 – x2 are (a)  x1 – x2 =  x1 –  x2  1 – 2  x1 –x2 = b g(b)  2   2 x1 x2 12   2 x2 = n1 n2 (c) If x1 and x2 are means of the independent samples drawn from two large populations, the sampling distribution of x1 – x2 will be normal if the samples are fairly large sized. Sampling Distribution of the Number of successes If a random sample of size n is chosen from a population, where elements belong to two mutually exclusive categories, then the sampling distribution of the number of successes will be binomial distribution if the sampling is done with replacement. If sampling is done without replacement, the distribution will he hypergeometric. Then  = np and  = npq as per binomial probability model. Sampling Distribution of Proportions A population proportion is defined as X = N where X = number of elements in a sample for a trial N = total number of items in the population CU IDOL SELF LEARNING MATERIAL (SLM)

Sampling and Sampling Distributions 279 Similarly a sample proportion can be written as x p= n If a random sample of size n is obtained with replacement, the sampling distribution (p) obeys the binomial probability law Then a fp =  and  1 p = n For value of n being large (n > 30), this will be closely normally distributed. When the sampling is made without replacement, then we apply the correction factor. a fN  n  1 –  p= N  1 n Sampling Distribution of the Differences of Two Proportions As discussed in the case of difference of means of samples from two populations, we can obtain similar results for the difference of proportion from two binomially distributed populations with parameters 1, 2 respectively, when random samples of sizes n1 and n2 are drawn from their respective populations Then  (P1– P2) = P1 – P2 = 1 – 2 and  (P1 – P2) =  2   2 p1 p2 If n1 and n2 are large i.e. n1, n2 > 30, then the sampling distribution of differences of proportions can be considered close to normal distribution. 9.10 Solved Problems Problem 1 A random sample of 700 units from a large consignment showed that 200 were damaged. Find (i) 95%, (ii) 99% confidence limits for the proportion of the damaged units in the consginment. Solution : Given here n = 700 200 2 and p = 700 = 7 CU IDOL SELF LEARNING MATERIAL (SLM)

280 Quantitative Techniques for Managers 5 q =1–p= 7  SE(p) = pq 25 1 = 7 7 700 n = 0.017 (i) For 95% confidence limits for P, we have p + 1.96 pq = 0.286 + 1.96 × 0.017 n = (0.319, 0.253) (ii) For 99% confidence limits for P, we get p + 2.58 pq = 0.286 + 2.58 × 0.017 n = (0.330, 0.242) Problem 2 A research worker wishes to estimate the mean of a population by using sufficiently large sample. The probability is 95% that the sample mean is not at variance with the true mean by more than 25% of the standard deviation. How large a sample he should obtain ? Solution : b gP| x   | < 25% of SD = 0.95 Given Hence b g or P| x   | < n = 0.95 af  P| x –  | < 1.96 4 = 0.95 Hence   4 = 1.96 n n = (4 × 1.96)2 ~ 62 CU IDOL SELF LEARNING MATERIAL (SLM)

Sampling and Sampling Distributions 281 9.11 Summary The unit is summarised by some of its important points as below:  Parameters: Values that indicate the characteristics of a population.  Random or probability sampling: A method of selecting a sample from a population in which all the items in the population have an equal chance of being included in the sample.  Sample: A part of the elements or observations in a population, selected for examinations or study of the population.  Sampling distribution of the mean: A probability distribution of all the possible means of sample.  Sampling error: Variation among sample statistics due to chance.  Simple random sampling: Method of selecting samples that allow each possible elements an equal opportunity of being chosen and each element in the population has one equal chance of being included in the sample.  Standard error: The standard deviation of the sample distribution of a statistic.  Standard error of the mean: The standard deviation of the sampling distribution of the mean. 9.12 Key Words/Abbreviations The unit is summarised by some of its important points as below:  Sample: A part of the elements or observations in a population, selected for examinations or study of the population.  Sampling distribution of the mean: A probability distribution of all the possible means of sample.  Sampling error: Variation among sample statistics due to chance. CU IDOL SELF LEARNING MATERIAL (SLM)

282 Quantitative Techniques for Managers 9.13 Learning Activity 1. Distinguish between simple random sampling and purposive sampling. Describe a procedure for drawing a random sample of size 5 from a population of 17 (with replacement method). 2. Three sampling plans to determine the quality of manufactured product are given below; (i) Inspect every 10th item. (ii) Inspect one item every 10 minutes (iii) Inspect a random sample of 6 during each hour's production State the sampling design in each case. Which one is the most appropriate? Give reasons. 9.14 Unit End Questions (MCQ and Descriptive) A. Descriptive Types Questions 1. What is the difference between Statistic and Parameter as used in Sampling Theory? What is Sampling Distribution of a statistic? Explain it by taking a particular statistic. 2. What are the main objectives of sampling? Compare and contrast the merits and drawbacks of sample and census studies? 3. What is a statistical error? Explain the difference between a statistical error and a 'mistake'? Describe the various measures of statistical errors. 4. What are statistical errors? What are the sources of errors? Explain the methods of measuring them. 5. Bring out the important features of (i) Systematic Sampling (ii) Stratified Sampling 6. Distinguish between random sampling and stratified sampling suppose if it is desired to survey petrol buying habits of car owners in a particular city, how would you proceed about it? Draw a brief questionnaire for the purpose. CU IDOL SELF LEARNING MATERIAL (SLM)

Sampling and Sampling Distributions 283 B. Multiple Choice/Objective Type Questions 1. Sampling error increases as we increase the sampling size. (a) True (b) False 2. In which of the following types of sampling the information is carried out under the opinion of an expert? (a) quota sampling (b) convenience sampling (c) purposive sampling (d) judgement sampling 3. The sampling error is defined as __________. (a) difference between population and parameter (b) difference between sample and parameter (c) difference between population and sample (d) difference between parameter and sample 4. What does the central limit theorem state? (a) if the sample size increases sampling distribution must approach normal distribution (b) if the sample size decreases then the sample distribution must approach normal distribution (c) if the sample size increases then the sampling distribution much approach an exponential distribution (d) if the sample size decreases then the sampling distribution much approach an exponential distribution. 5. Selection of a cricket team for cricket World Cup is called as __________. (a) random sampling (b) systematic sampling (c) purposive sampling (d) cluster sampling Answers 1. (a), 2. (d), 3. (c), 4. (a) 9.15 References References of this unit have been given at the end of the book. CU IDOL SELF LEARNING MATERIAL (SLM)

284 Quantitative Techniques for Managers UNIT 10 HYPOTHESIS TESTING PART - I Structure: 10.0 Learning Objectives 10.1 Introduction 10.2 Types of Hypothesis 10.3 Testing of Hypothesis 10.4 Steps Involved in Hypothesis Testing 10.5 Types of Errors in Testing of Hypothesis 10.6 Power of the Test 10.7 Large Sample Test 10.8 Solved Problems 10.9 Summary 10.10 Key Words/Abbreviations 10.11 LearningActivity 10.12 Unit End Questions (MCQ and Descriptive) 10.13 References 10.0 Learning Objectives After studying this unit, you will be able to:  Define the concept of hypothesis testing.  Discuss the steps involved in hypothesis testing.  Illustrate the large sample tests for hypothesis. CU IDOL SELF LEARNING MATERIAL (SLM)

Hypothesis Testing Part - I 285  Explain the use of these hypothesis for business problems.  Analyse the capability judgement through self-assessment problems. 10.1 Introduction So far we have studied the inference of the population characteristics based on some samples drawn from it in a scientific manner, because census method is time consuming, costly, and at times impractical. These concepts have already been discussed in chapter on ‘Sampling Theory’. The samples are expected to give close results regarding the population provided samples are drawn to make them representative of the population. These results can now he generalised if we know how much these generalisalion conditions are valid. We then can estimate the population parameter with the degree of confidence. The techniques used for the purpose are dealt with in statistics by “Statistical Inference”, which is classified into two main categories: 1. Theory of Estimation, 2. Testing of Hypothesis. 10.2 Types of Hypothesis First, we must take a moment to define independent and dependent variables. Simply put, an independent variable is the cause and the dependent variable is the effect. The independent variable can be changed whereas the dependent variable is what you’re watching for change. For example: How does the amount of makeup one applies affect how clear their skin is? Here, the independent variable is the makeup and the dependent variable is the skin. The six most common forms of hypotheses are:  Simple Hypothesis  Complex Hypothesis  Empirical Hypothesis  Null Hypothesis (Denoted by “HO”)  Alternative Hypothesis (Denoted by “H1”)  Logical Hypothesis  Statistical Hypothesis CU IDOL SELF LEARNING MATERIAL (SLM)

286 Quantitative Techniques for Managers A simple hypothesis is a prediction of the relationship between two variables: the independent variable and the dependent variable.  Drinking sugary drinks daily leads to obesity. A complex hypothesis examines the relationship between two or more independent variables and two or more dependent variables.  Overweight adults who 1) value longevity and 2) seek happiness are more likely than other adults to 1) lose their excess weight and 2) feel a more regular sense of joy. A null hypothesis (H0) exists when a researcher believes there is no relationship between the two variables, or there is a lack of information to state a scientific hypothesis. This is something to attempt to disprove or discredit.  There is no significant change in my health during the times when I drink green tea only or root beer only. This is where the alternative hypothesis (H1) enters the scene. In an attempt to disprove a null hypothesis, researchers will seek to discover an alternative hypothesis.  My health improves during the times when I drink green tea only, as opposed to root beer only. A logical hypothesis is a proposed explanation possessing limited evidence. Generally, you want to turn a logical hypothesis into an empirical hypothesis, putting your theories or postulations to the test.  Cacti experience more successful growth rates than tulips on Mars. (Until we’re able to test plant growth in Mars’ ground for an extended period of time, the evidence for this claim will be limited and the hypothesis will only remain logical.) An empirical hypothesis, or working hypothesis, comes to life when a theory is being put to the test, using observation and experiment. It’s no longer just an idea or notion. It’s actually going through some trial and error, and perhaps changing around those independent variables.  Roses watered with liquid Vitamin B grow faster than roses watered with liquid Vitamin E. (Here, trial and error is leading to a series of findings.) A statistical hypothesis is an examination of a portion of a population. CU IDOL SELF LEARNING MATERIAL (SLM)

Hypothesis Testing Part - I 287  If you wanted to conduct a study on the life expectancy of Savannians, you would want to examine every single resident of Savannah. This is not practical. Therefore, you would conduct your research using a statistical hypothesis, or a sample of the Savannian population. 10.3 Testing of Hypothesis We have already discussed that the inductive inference can be used for deciding the characteristics of the population based on the sample study. The inherent risks in such decision-making processes may give rise to some serious business repercussions. During the process of such decisions, we normally streamline the analysis by making some assumptions under which the uncertainly of decisions can be controlled. To reduce risks, we check whether all these assumptions hold good during the pendency of the decision. For this purpose, theory of probability plays a very prominent role and the statistical theory used is called “Testing of Hypothesis”. The theory of testing of hypothesis was used by J. Nagman and E.S. Pearson to arrive at decisions under uncertain circumstances but based on samples of fixed sizes. When the sample size is not fixed, another technique known as “Sequential Testing” was advocated by Abraham Bald. We are discussing only Testing of Hypothesis in this chapter. We can define a statistical hypothesis as a statement about the population or the probability distribution defining a population. In testing of hypothesis, we, therefore, plan techniquess which can tell us whether the assumptions made during the study of population based only some representation random sample are valid or not or how long they remain valid. Thus the decision problem is either to accept the hypothesis (H0) true or reject H0 (equivalent version is accept H1 as true). Out of these two hypotheses, H0 and H1 one is called the “Null hypothesis” and the other as “alternative hypothesis”. Normally, notionally H0 is defined as null hypothesis and H1 as alternative hypothesis. Let us use an information that a business manager thinks that his product will he preferred by the consumers by more than 40% of the potential consumers, otherwise the product may not he launched. Hence we write the null hypothesis that the true proportion p of the consumers preferring the product as H0: P < 0.4 and hence the alternative hypothesis H1 : p 0.4. Now the manager holds this view that the preference proportion of the product will be less than 0.4 till such time it is otherwise established to the contrary by actual sample results. Thus the decision manager starts with negative thinking and revises his thinking on getting overwhelming evidence from the sample that it is not true. CU IDOL SELF LEARNING MATERIAL (SLM)

288 Quantitative Techniques for Managers When the hypothesis completely specifies the population, it is called a simple hypothesis, otherwise it will be called a composite hypothesis. The simple hypothesis for sampling from a normal population N(2) can be specified as H:  = 0 and 2 = 02. For accepting or rejecting a hypothesis, we must ensure a definite evidence from the sample. Thus “The acceptance of a statistical hypothesis is due to insufficient evidence provided by the sample to reject it and does not necessarily imply that it is true”. 10.4 Steps Involved in Hypothesis Testing The process of testing the hypothesis involve following steps: 1. State the hypothesis parameters clearly as to what is to be confirmed or tested. 2. Formulate the null and alternative hypothesis with the help of the problem. 3. Specify the test statistic that best reflects the relative merits of H0 and H1. 4. Separate out the set of values of the test statistic into two distinct areas, the rejection area and acceptance area, so that H0 reject means H1 accept. 5. Observe the sample data, compute the value of the test statistic and apply the decision rule in 4 above. 10.5 Types of Errors in Testing of Hypothesis Since there is a risk involved in decision-making based on the sample theory, there may be errors in the decision making or accepting or rejecting a null hypothesis (H0) after knowing the results of a sample from the population. From the point of decision, four alternatives can be thought of: Reject H0 when actually it is not true. (i) Accept H0 when it is true. (ii) Reject H0 when it is true. (iii) Accept H0 when it is false. (iv) CU IDOL SELF LEARNING MATERIAL (SLM)

Hypothesis Testing Part - I 289 We can see that (i) and (ii) are correct decisions, whereas (iii) and (iv) are wrong decisions. These can be shown diagrammatically as given in Fig. 10.1 below. Fig. 10.1: Type I and Type II Errors As indicated above in Fig. 10.1. these errors in taking wrong decisions are termed as Type I errors or Type II errors. Type I errors are more serious, because we are taking a decision, when we have correct hypothesis and have rejected it. as in this case, the losses will be of far more greater consequences, compared to a case when we accept the proposal when it is actually false. As Type I error is more serious. it is customary to control a at a predetermined low level and to choose a test procedure to minimise b. Type I or Type II errors are not normally explicitely quantifiable and hence it is difficult for the decision maker to make a logical assessment of the level of tolerance of such errors. Type I errors are generally kept very low, to the level of 0.01 to 0.05. These values are called level of significance, which is the maximum level of probability of risk acceptance. By this we mean that if a level of significance is chosen as 0.05 or 5%, then we have 5% chance that we reject the hypothesis when it should have been accepted, i.e. in 95% chance, we have made the correct decision. In summary  = Prob. of rejecting a good lot  = Prob. of accepting a bad lot The size of Type I errors () is called Producer’s risk whereas the size of Type II errors () is called consumer’s risk. CU IDOL SELF LEARNING MATERIAL (SLM)

290 Quantitative Techniques for Managers 10.6 Power of the Test We have  = Prob. (Type II errors) = Prob (Accept H0 when H0 is false or H0 is true) Prob. (Accept H0 when H is true) = 1 – Prob. (Accept H0 when H0 is false) = 1 – . Hence to minimise type II error i.e. , we should maximise 1 – . The term (1 – ) is called the Power of the test. Thus in testing of hypothesis, we aim at fixing  and then minimise  i.e., maximise (1 – ), the Power of the test. 10.7 Large Sample Tests Hypothesis Testing of Means Let us assume and discuss a normal population with standard deviation known or in the second case, when standard deviation is not known. When standard deviation is known: Consider at normal population N/(2) where  is known, but  is not known. We choose a random sample x1, x2, .x3, ... xn and we wish to lest the hypothesis regarding the value of . We can formulate the null and alternative hypotheses as H0 :  = 0; H1 : 0 (i) or H0 :  = 0; H1 :  > 0 (ii) or H0 :  = 0; H1 :  < 0 (iii) Where 0 is a specified value. In situation (i), the alternative hypothesis is two sided, whereas in situation (ii) and (iii) it is one sided. We now proceed to locate the critical region or rejection region for the hypothesis. CU IDOL SELF LEARNING MATERIAL (SLM)

Hypothesis Testing Part - I 291 10.8 Solved Problems Problem 1 While checking the quality of a product, one particular dimension was varying slightly due to changes in the machine setting, though  and  were not much at variation. The target value of mean  was 0 = 50 and  = 2.5 Dimensions checked by the inspector based on sample procedure were 43, 1, 50, 41, 53, 52, 47, 54, 51, 45, 48 and 47. Formulate the null hypothesis and test the same. Solution: The target value of mean is 50. Hence null hypothesis will be H0 : 0 = 50 If the production is to be guarded against decreasing value of , the alternative hypothesis will be H1 : 0 < 50. For testing the hypothesis, we work out the Standard Normal variate Z. Z= x  0 / n  x 43  51  50  41 ........47 Here x = n = 12 = 48.5 Now x = 48.5, n = 12 and  = 2.5 48.5 – 50 Hence Z = 2.5 / 12 = – 2.078 For a 5% level of significance, the critical region will be R : Z – 1.645 But Z = – 2.078 < – 1.645 This means that  is significantly less than expected. Hence at 5% level of significance H0 is rejected. Problem 2 In a factory, the following null hypothesis is formulated for its defect levels. CU IDOL SELF LEARNING MATERIAL (SLM)

292 Quantitative Techniques for Managers H0 :  60 H1 : 60 From an inspection report, the samples showed the following values, n = 16,  = 12 and x = 62 Test the hypothesis. Solution: For the given values, Normal Variate x Z = / n 62  60 2  4 = 12 / 16 = 12 8 = = 0.667 12 At 5% level of significance Rejection region will be R : 1.64. The value of Z being less than the Rejection region, the value falls in the acceptance region. Hence H0 is accepted. Problem 3 In a restaurant, the average sales of Pizzas is 200 per day. Due to a new office building in the vicinity, the sales increased during first 27 days, and these were found to be 205, 215, 216, 220, 225, 236, 240, 241, 245, 250, 216, 240, 238, 204, 217, 219, 225, 235, 196, 193, 215, 168, 190, 216, 218, 222, 219. Discuss that the sales of Pizzas have increased. Solution: The average sales figures are 200 per day. The hypothesis are H0 :  = 200 H1 :  > 200 For the sample sales for 27 days CU IDOL SELF LEARNING MATERIAL (SLM)

Hypothesis Testing Part - I 293  x 5924 x = n = 27 = 219 The standard deviation of sample s= ( xi  x)2 n 1 196  16  9  1  36  289  441  484  676  961  9  441  361 =  225  4  0  36  256  529  676  16  2601  841  9  1  9  0 27 – 1 = 18.73 The calculated value of the test statistic will be  x   219  200 t = s / n = 18.73 / 27 19  5.2 = = 5.27 18.73 At 5% level of significance, the rejection region is | t | > 1.711 (table for t- distribution) Since value of t falls beyond the rejection region, the hypothesis is rejected in favour of null hypothesis. Problem 4 From a certain process, it was concluded that on the average there are 15 per cent defectives. The new material purchased was used in the process and it was noticed that out of total output of 400 units, 48 were found to be defective. Would you accept the new material? Solution: From given data 48 p = 400 = 0.16 CU IDOL SELF LEARNING MATERIAL (SLM)

294 Quantitative Techniques for Managers The test statistic Pp Z= pq n (0.15 – 0.16)  20 0.2 = = = – 0.545 0.16  0.84 0.367 For 5% level of significance, the rejection region is R : Z < – 1.65. The tested statistic value is not within this range of rejection. Hence, the hypothesis is accepted. Problem 5 Before an increase in excise duty on tea, 400 people out of a sample of 500 persons were found to be tea drinkers. After an increase in duty, 400 people were tea drinkers in a sample of 600 people. Using standard error of proportion, state whether there is a significant decrease in the consumption of tea. Solution: With Standard Notations, given are n1= 500 and n2 = 600 400 400 p1= 500 = 0.8 and p2 = 600 = 0.67 The null hypothesis is H0 : p1 = p2 Alternative hypothesis H1 : p1 > p2 or H1 : p1 < p2 Under the null hypothesis, p1 p2 p1  p2 PQ 1  1 n1 n2 FHG KJIZ = = SE ( p1  p2 ) CU IDOL SELF LEARNING MATERIAL (SLM)


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook