lear Per cent Unemployed Business Forecasting Techniques: 1982 12.6 1inr Series Analysis 1983 12.2 1984 13.0 NOTES 1985 13.5 1986 12.8 Self-Instructional ll!aterial 241 1987 12.7 1988 13.1 1989 13.6 1990 13.5 1991 13.8 1992 14.2 1993 14.0 (a) Smooth out the fluctuations using a four-year moving average. (b) Use the exponential smoothing model to forecast the unemployment rate in South India for the year 1995. Assume a= 0.4. (c) Calculate the value of MSE. 15. Juhu Chawla has a car dealership for Toyota in Bombay along with her sister Ammu. The number of cars sold for the first 7 months of 1995 are as follows: Month Cars Sold Jan 45 Feb 52 Mar 41 Apr 36 May 49 June 47 July 43 Juhu wants to predict the car sales for the month ofAugust by using exponential smoothing method with an a value of 0.4. Her sister thinks that an a value of 0.8 would be more suitable. What is the forecast in each case and who do you think is more correct based on these given values? 16. The following table gives a times series of monthly sales of luxury automobiles for a large car dealer for each month of 1994 and 1995. Month 1994 1995 January 720 780 February 800 840 March 1080 1110 April 1000 980 May 1020 1050 June 1105 905 July 900 880 August 930 910 September 830 780 October 1030 920 November 880 770 December 800 730 Derive the seasonal index using the ratio-to-moving average.
Business Statistic:<i-/I 17. The following data represent the index oftotal industrial production ( lJ for each NOTES quarter of the previous four years. 242 Self-InstructionalMaterial lear Quarter (l) 1st year 1 103.1 2 107.2 2nd year 3 109.0 4 102.1 3rd year 1 105.9 2 109.7 4th year 3 112.1 4 106.0 1 110.0 2 112.6 3 112.8 4 104.3 1 107.0 2 105.2 3 104.8 4 99.6 Calculate the quarter moving averages, quarter centred moving averages and percentage of actual to centered moving averages. 18. Narula restaurant chain operates a Chinese restaurant in Defence Colony area of New Delhi. The general manager has collected the quarterly revenue data for the years 1992, 1993 and 1994. This revenue data in thousands of dollars is shown as follows: lear Quarter Revenues 1992 I 50 55 1993 II III 60 1994 IV 58 I 61 II 64 III 64 IV 60 I 65 II 68 III 72 IV 68 Calculate for this data: (a) Quarter moving average (b) Quarter centered moving average (c) Percentage of actual to centered moving average 19. The U.S. Department ofCommerce, Bureau of Economic Analysis has collected data on the industrial production of the United States in the last 4 years and developed an index for such industrial production. The quarterly index data is presented in the following table:
lear Quarter Index Business Forecasting Techniques: 1 111.7 Time Series Analysis 2 I 113.8 3 II 113.5 NOTES 4 III 107.9 IV 111.9 Self-Instructional Material 243 Calculate for this data: I 107.3 II 110.9 III 107.9 IV 112.0 I 114.3 II 114.6 III 104.5 IV 106.0 I 105.1 II 104.4 III 97.6 IV ·- (a) Quarter moving average (b) Quarter centered moving average (c) Percentage of actual to centered moving average (d) The modified mean for each quarter data 20. The sale of pianos at Kiran Musical Enterprise for the previous ten months has been 1004,99, 110, 120, 105, 112, 109, 117, 120 and 125 respectively. The owner of the store is interested in using exponential smoothing to aid in analysing this data. The owner is interested in a greater degree of smoothing and hence, he has set the value of the smoothing factor a= 0.3. Calculate the expected sale for the next month. 21. The following data represents the quarterly earnings per share of a software company for the previous four years. ~--- -· Quarter . --·-·---- lear I--- 2 3 4 1st year 0.27 0.35 0.43 1.25 2nd year 0.40 0.55 0.45 1.35 3rd year 0.52 0.70 0.53 1.55 4th year I0.60 0.80 0.64 1.85 Analyse the quarterly time series to determine the effects of the trend, cyclical, seasonal and irregular components. 22. Consider the time series of quarterly sales (in thousands of dollars) for a local department store for the previous three years, as shown in the following table. lear Quarter Sales 19~ 1 ~0 2 520 3 690 _______ 4 570-1~9-9_4 ~------~------~----~5~5~0----- 2 600 Contd...
· Business Statistics-/I 3 840 4 640 NOTES 1995 1 650 2 620 244 Self-InstructionalMlterial 3 990 4 800 The seasonal indices for each quarter are shown in the following table: Quarter Index 1 0.90 2 0.88 3 1.25 0.95 4 Find the seasonally adjusted sales corresponding to each sales value. 23. The following data represent the values of percentage of actual data to centered moving average for each quarter of the last six years for sales of 52 inch screen Sony television of a large appliance store. lear Fall Wnter Quarter .Quarter 30 1990 - - SorinJ! 35 1991 150 120 90 1992 160 115 98 ~0 1993 152 108 95 1994 145 115 100 28 1995 152 120 107 32 - - Determine the seasonal index for each quarter. 24. A restaurant manager has recorded the daily number of customers for four weeks. He wants to improve customer service and change employee scheduling as necessary based on the expected number of daily customers in the future. The following data represent the daily number of customers as recorded by the manager for the four weeks. \"»eek Mon Tues lied Thurs Fri Sat Sun 1 440 400 480 510 650 800 710 2 510 430 500 520 740 850 800 3 490 480 410 630 720 810 690 4 500 500 470 540 780 900 850 Determine the daily seasonal indices using the seven-day moving average. 25. The Pacific Amusement Park, located in Silicon Valley has provided the following data on the number of visitors (in thousands of admissions) during the park's open seasons of Spring, Summer and Fall. lear Spring SlJ11JJ1rr Fall 1991 280 610 220 1992 300 725 180 1993 140 600 200 1994 200 580 180 Calculate the seasonal indices for this data. 26. The tourist industry is subject to enormous seasonal variation. The Palace Hotel in Raipur has recorded its occupancy rate (percentage of total rooms) for each quarter during the last 4 years. This data is shown in the following table.
lear Quarter Occupancy Rate (%) Business Forecasting Techniques: Jinr Series Analysis 1991 I 56 NOTES II 70 III 80 IV 60 1992 I 57 II 73 III 86 IV 62 1993 I 60 II 74 III 90 IV 65 1994 I 66 II 83 III 87 IV 67 Calculate the seasonal indices and seasonally adjusted values for each quarter. 27. The following table represents the percentages of moving average values for the sales of Simla textbooks. These quarterly values are calculated for the years 1989 through 1992. lear Quarter lillue 1989 I - II - III 160 100 IV 52 1990 I 92 152 II 110 55 III 90 IV 1991 I 150 115 II 48 85 III - IV 1992 I - II III IV Calculate: (a)The seasonal index for each quarter (b)The seasonally adjusted value for each quarter (c) The value of irregular variation for each quarter 28. The Depat1ment of Health has compiled data on the liquor sales in the United States (in billion dollars) for each quarter of the last four years. This quarterly data is given in the following table. Self-Instructional Material 245
Business Statistics-If lear Quarter Sales 1991 I 4.5 NOTES II 4.8 1992 III 5.0 246 Self-Instructional Material IV 6.0 1993 I 4.0 II 4.4 1994 III 4.9 IV 5.8 I 4.2 II 4.6 III 5.2 IV 6.1 I 4.5 II 4.6 III 4.9 IV 5.5 (a) Using moving average method, find the values ofcombined trend and cyclical component. (b) Find the values of combined seasonal and irregular component. (c) Find the values of the seasonal indices for each quarter. (d) Find the seasonally adjusted values for the time series. (e) Find the value of the irregular component. 29. A real estate agency has been in business for the last 4 years and specializes in the sales of2-family houses. The sales in the last 4 years have grown from 20 houses in the first year to 105 houses last year. The owner of the agency would like to develop a forecast for sale of houses in the coming year. The quarterly sales data for the last 4 years are shown as follows. lear Quarter(J) Quarter(2) Quarter(J) Quarter(4) 18 6 2 4 2 10 8 8 12 3 18S 12 15 25 4 25 20 28 32 (a) Using moving average method, find the values ofcombined trend and cyclical component. (b) Find the values of combined seasonal and irregular component. (c) Compute the seasonal indices for the four quarters. (d) Deseasonalize the data and use the deseasonalized time series to identify the trend. (e) Find the value of irregular component. 9.9 FURTHER READING Chandan, J. S. 1998. Statistics for Business andEconomics. New Delhi: Vikas Publishing House Pvt. Ltd.
Monga, G. S. 2000. Jl.fathematics and Statistics for Economics. New Delhi: Vikas Business Forecasting Techniques: 7lme Series Analysis Publishing House Pvt. Ltd. NOTES Kothari, C. R. 1984. Quantitative Technique. New Delhi: Vikas Publishing House Pvt. Ltd. Hooda, R. P. 2002. Statistics for Business and Economics. New Delhi: Macmillan India Ltd. Gupta, S.C. 2006. FWJda.Irentals ofStatistics. New Delhi: Himalaya Publishing House. Gupta, S. P. 2005. Statistical Mthods. New Delhi: S. Chand and Sons. Self-Instructional Material 247
UNIT 10 SAMPLING THEORY AND Sampling Theory and ITS BASIC CONCEPTS Its Basic Concepts itructure NOTES 10.0 Introduction Self-Instructional 1\\;faterial 249 10.1 Unit Objectives 10.2 What is Sampling? 10.3 Benefits of Sampling 10.4 Methods of Sampling 10.4.1 Deliberate Sampling 10.4.2 Random Sampling 10.4.3 Mixed Sampling 10.4.4 Various Other Sampling Techniques/Designs l0.4.5 Sampling and Non-sampling Errors 10.5 Sampling Theory 10.5.1 The Two Concepts: Parameter and Statistic 10.5.2 Objects ofSamp1ing Theory 10.5.3 Sampling Distribution 10.5.4 The Concept ofStandard Error (or S.E.) 10.5.5 Procedure of Significance Testing 10.6 Tests of Significance 10.6.1 Sampling ofAttributes 10.6.2 Sampling ofVariables (Large Samples) 10.6.3 Standard Error for Different Statistics 10.6.4 Sampling ofVariables (Small Samples) 10.7 Central Limit Theorem 10.8 Summary 10.9 Answers to 'Check Your Progress' 10.10 Questions and Exercises 10.11 Further Reading 10.0 INTRODUCTION In this unit, you will learn the basic concepts of sampling theory and its applications in the real world. A sample is a group of units selected from a larger group (the population). By studying the sample it is hoped to draw valid conclusions about the larger group. It is used in surveys and in studying variety of other problems concerning production management, time and motion studies, market research, various areas of accounting and finance and the like ones. Sampling theory is a study of relationships existing between a population and samples drawn from the population. 10.1 UNIT OBJECTIVES After going through this unit, you will be able to: • Define basics of sampling theory • Explain the benefits and methods of sampling
Sampling Theory and • Describe various sampling techniques Its Basic Concepts • Explain sampling theory and concepts of standard error NOTES • Describe tests of significance and its limitations 250 Self-Instructional Material 10.2 WHAT IS SAMPLING? Simple random sampling is the basic sampling technique where we select a group of subjects (a sample) for study from a larger group (a population). Each individual is chosen entirely by chance and each member of the population has an equal chance of being included in the sample. Every possible sample of a given size has the same chance of selection, i.e., each member of the population is equally likely to be chosen at any stage in the sampling process. Stratified sampling techniques are generally used when the population is heterogeneous, or dissimilar, where certain homogeneous, or similar, subpopulations can be isolated (strata). Simple random sampling is most appropriate when the entire population from which the sample is taken is homogeneous. Some reasons for using stratified sampling over simple random sampling are: (1) the cost per observation in the survey may be reduced, (il) estimates of the population parameters may be required for each sub-population, increased accuracy at a given cost. A one- tailed test looks for an increase or decrease in the parameter whereas a two-tailed test looks for any change in the parameter (which can be any change-increase or decrease). As has been stated earlier a statistical inquiry can either be of a census type or of a sample type. Checking every object of the population is the main feature of a census inquiry. However limitations of time, energy and money may accommodate only a partial review of the population. This can be avoided by selecting a representative group from the population, which is called a sample. This sample is dealt with and from this sample conclusions are drawn relating to the Universe, i.e., for the whole material from which the sample is taken. Thus, when a statistical investigation is done on the basis of sample/ samples, it is known as sample inquiry. In this context the selection of the sample must be made with great care so that it may be truly representative and conclusions drawn from it may remain valid. Besides, in several practical problems the statistician is quite often confronted with the necessity of discussing a population of which he cannot examine every member. For instance, an inquirer into the heights of the population of India cannot afford the time or expense required to measure the height of each individual. A farmer who wants to know what proportion of his potato crop is diseased cannot examine every single potato. Similarly, a country's government may obtain a reliable indication of the public reactions to the newly imposed taxes by making out an enquiry from a small sample of the millions of people living in that country. In such cases, the best an investigator can do is to examine a limited number of individuals and hope that they will tell him with reasonable trustworthiness as much as he wants to know about the population from which they come. 10.3 BENEFITS OF SAMPLING A sample study usually is substantially less expensive than a census. Sample usually produces information faster than a census does and thus sampling inquiry saves time. The results obtained by sampling often are almost as accurate and sometimes even more accurate than those obtained from a census; inquiry because in sample inquiry the entire work is generally conducted by trained and experienced investigators. More detailed information can be obtained from a sample survey than from a census because a sample in many instances takes less time, is less costly and permits more care to be taken in its execution.
iVhen a test involves the destruction of item under study, sampling inquiry remains the Sampling Theory and mly choice. This, in other words, means that in many cases sampling inquiry remains Its Basic Concepts he only alternative for taking decisions: Sampling errors can as well be estimated :speciaily in case of random sample inquiry. NOTES l0.4 METHODS OF SAMPLING Self:hJStructional Material 251 3efore discussing the different methods and techniques of sampling one should keep in tiew the important features of a good sampling technique. They are as follows: :1) The sample should be true representative of the universe from where it has been taken. )) There should remain no bias in selecting a sample. :3) It should be possible to measure or estimate the sampling error. (4) The results of the sample study in general, should be applicable to all items of the universe. In general, there are two methods of selecting a sample, viz., Deliberate sampling and Random sampling. A brief description of these two methods is given as follows: 10.4.1 Deliberate Sampling Deliberate sampling is also known as purposive sampling or non-random sampling. At times it is also known as judgement sampling. Under it, the organizers of the inquiry purposively or deliberately choose the particular units of the universe for constituting a sample on the basis that the small mass that they so select out of a huge one will be typical or representative of the whole. For example, if economic conditions of people living in a state are to be studied, a few towns and villages may be purposively selected for intensive study on the principle that they shall be representative of the entire state. The validity of such sample hinges on the soundness of the j'udgement of whoever selects the sample. In this method personal element has a great chance of entering into the selection of the sample. The investigator may select a sample which shall yield results favourable to his point of view and if that happens, the entire inquiry may be vitiated. In deliberate sampling there is always the danger of bias entering into the . sampling technique. But if the investigators are impartial, work without bias and have the necessary experience then the results obtained from an analysis of deliberately selected sample may be tolerably reliable. However, sampling error, in this type of sampling, cannot be estimated and the element of bias, great or small is always there. Such samples are risky in the sense that we can not measure as to how precise our estimates are. We have no objective measure of the degree of confidence that can be placed in our estimates. 10.4.2 Random Sampling Random sampling may also be known as chance sampling or probability sampling. Under random sampling each and every item of the universe has an equal chance of inclusion in the sample. It is, so to say, a lottery method in which individual units are picked up from the whole group not deliberately but by some mechanical process. Here it is blind chance alone that determines whether the one unit or another is selected. The results obtained from a random sample· can be assured in terms of probability that is while we can measure the errors of estimation or the significance of results obtained from a random sample, the same is not possible in the case of deliberate sampling. Random sampling ensures the Law of Statistical Regularity which states that if on an average the sample chosen is a random one then the sample will have the same composition and characteristics as the universe. This is the reason why random sampling is considered the best technique of selecting a representative sample, a basic essential of a good sampling technique.
Sampling Theory and Random sampling from a finite population (i.e., the universe having finite number of Its Basic Concepts items) refers to that method of sample selection which gives each possible sample combination an equal probability of being chosen. This applies to sampling without NOTES replacement, that is once an item is selected for the sample, it cannot appear in the sample again. Sampling with replacement is used less frequently in which procedure of 252 Self-Instructional Material the element selected for the sample is returned to the population before the next element is selected. In brief, the implications of simple random sampling are: (1) It gives each element in the population an equal probability of getting into the sample. (2) It gives each possible sample combination an equal probability of being chosen. How to select a random sample? It is the basic question. In this context one should answer two questions. How many units of the universe should be included in the sample? How should these units be selected so that the selection is done randomly? So far as the first question is concerned, no hard and fast rule can be laid down as to the number of units to be selected for the sample. As far as the second question is concerned, every item in the universe should be given equal chance of being included in the sample so that the results obtained from the sample can be assessed in terms of probability and can be applied for the universe. For such objective selection of a random sample, generally any one of the following methods is used: (a) Drawing lots or what is known as the lottery sampling. (b) Arranging the units geographically, numerically or alphabetically and then selecting every IOth or 15th (or choosing any other figure) item from the prepared list indiscriminately. (c) Utilizing tables of random numbers. Various statisticians like Tippett, Yates, Fishers have prepared tables of random numbers which can be used for selecting a random sample. 10.4.3 Mixed Sampling Sometimes, a combination of the two methods described above viz., purposive sampling and random sampling may be tried. In that case the given universe may be divided purposively into homogeneous groups and from each group the sample may be selected randomly. This is known as mixed sampling. 10.4.4 Various Other Sampling Techniques/Designs Statisticians have developed various other sampling techniques or designs which are of considerable significance. Some of these are as under: · (a) Systematic Sampling. Under systematic sampling only the first unit of the sample is selected at random and the remaining units are selected at fixed intervals. For instance, if questions were to be asked of householders in a road and the 4th house was selected randomly as the starting point, there would be systematic sampling if subsequent interviews were carried out at fixed regular intervals, e.g., at the 8th, 12th, 16th, 20th house, and so on. (b) Stratified Sampling. If the population from which a sample is to be drawn does not constitute a homogeneous group, then stratified sampling technique is to be applied so as to obtain a representative sample. This is done by dividing the population into different 'strata' on the basis of certain common characteristics and then selecting items randomly from them to constitute a sample. The sample so constituted is the result of successive application of purposive (involved in stratification of items) and random sampling methods and as such is an example of mixed sampling. (c) Ouster and Area Sampling. In all the sample designs discussed so far, the elements selected were the individual elements to be studied. As such the sampling frame
happens to be a list of all the individual elements of the population. It is sometimes Sampling Theory and more feasible and economical to develop a sampling frame where the primary Its Basic Concepts sampling units represent groups of elements rather than individual elements of the population. Selection of a primary sampling unit from such a frame, then, requires NOTES investigating all the elements clustered in that primary sampling unit, such sample designs are called cluster sampling. The selection of the clusters can be done by Self-Instructional Material 253 any of the sampling methods we have already explained. Cluster designs where the primary sampling unit represents a cluster of units based on geographic area are often termed as an area sampling. (d) M.Jlti-stage Sampling. This is a .further development of the principle of cluster sampling. The first stage may be to select large primary sampling units such as States, then districts, then towns and finally certain families within towns. Ordinarily this technique is applied in big inquiries extending to a considerable large geographical area, say, the entire country. There are two advantages of this sample design. Firs~ it is easier to administer as the sampling frame can be developed in partial units. Second, this design permits a larger number of units to be sampled for a given cost than do simple designs. (e) Sequential Sampling. This is a complex technique. The ultimate size of the sample under this technique is not fixed in advance but is determined according to mathematical decision rules on the basis of information yielded as survey progresses. This is usually adopted in case of Acceptance Sampling Plan in context of Statistical Quality Control. When a particular lot is to be accepted or rejected on the basis of a single sample, it is known as single sampling when the decision is to be taken on the basis of two samples, it is known as double sampling and in case the decision rests on the basis of more than two samples but the number of samples is certain and decided in advance, the sampling is known as multiple sampling. But, when the number of samples is more than two but how many is neither certain nor decided in advance, this type of system is often referred to as sequential sampling. Thus, in brief, we can say that in sequential sampling one can go on taking samples one after another as long as one desires to do so. 10.4.5 Sampling and Non-sampling Errors Sample survey implies the study of a small proportion of the total universe, there would naturally be a certain amount of inaccuracy in the information collected. These inaccuracies are of two different types, viz., the systematic bias and the sampling error. A systematic bias results from errors in the sampling procedures. Some of the important causes of the systematic bias are: (a) Inappropriate sampling frame (b) Natural bias in the reporting of data (c) Non-respondents (d) Bias in the instrument of collection The systematic bias cannot be reduced or eliminated by increasing the .sample size although the causes of these errors can generally be determined and corrected. The sampling errors are random variations in the sample estimates around the true population values. Sampling errors result from a conglomeration of non-determinable effects. The more homogeneous the population, the smaller the sampling error. As the sample size increases, the sampling error decreases and is altogether eliminated in a census inquiry. It should also be remembered that sampling errors occur randomly and are equally likely to be in either direction, their expected value is zero. Certain inaccuracies and mistakes may also creep in during the process of collecting the actual information. Such errors are known as non-sampling errors and they occur in all surveys whether census or sample.
Sampling Theory and The sampling errors under random sampling can be measured Qmthematically, (This Its Basic Concepts measure is called the 'precision' of the sampling plan) as we shall see a little later in this unit but non-sampling errors can not be measured. The sampling errors, sometimes NOTES referred to as standard errors, are also determining the limits within which it is probable that the true value of a statistical measure would lie. Check Your Progress 10.5 SAMPLING THEORY I . What is sampling? 2. Explain the uses of A universe is the complete group of items about which knowledge is sought. The universe may be finite or infinite. Finite universe is one which has a definite and certain sampling. number of items but when the number of items is uncertain and infinite, the universe 3. What are the benefits of is said to be an infinite universe. Similarly the universe may be hypothetical or existent. In the former case the universe in fact does not exist and we can only imagine the items sampling? constituting it. Tossing of a coin or throwing of a dice are examples of hypothetical 4. Explain the different universes. Existent universe is a universe of concrete objects, i.e., the universe where the items constituting it really exist. On the other hand, the term sample refers to that methods of sampling. part of the universe which is selected for the purpose of investigation. The theory of 5. What are the various sampling studies the relationships that exist between the universe and the sample or samples draMJ /Tom it other techniques involved in sampling? 10.5.1 The Two Concepts: Parameter and Statistic 6. What are systematic bias and sampling errors? It would be appropriate to explain the meaning of two terms viz., parameter and statistic. 254 Self-Instructional Material All the statistical measures based on all items of the universe are termed as parameters whereas statistical measures worked out on the basis of sample studies are termed as sample statistics. Thus, a sample mean or a sample standard deviation is an example of statistic whereas the universe mean or universe standard deviation is an example of a parameter. The main problem of sampling theory is the problem of relationship between a parameter and a statistic. The theory of sampling is concerned with estimating the properties of the population from those of the sample and also with gauging the precision of the estimate. This sort of movement from particular Sample towards general Universe is what is known as statistical induction or statistical inference. In more clear terms, 'from the sample we attempt to draw inferences concerning the universe. In order to be able to follow this inductive method, we first follow a deductive argument which is that we imagine a population or universe (finite or infinite) and investigate the behaviour of the samples drawn from this universe applying the laws of probability.' The methodology dealing with all this is known as Sampling Theory. 10.5.2 Objects of Sampling Theory Sampling theory is to attain one or more of the following objectives: (a) Statistical Estimation. Sampling theory helps in estimating unknown population quantities or what are called parameters from a knowledge of statistical measures based on sample studies often called as 'statistic'. In other words, to obtain the estimate of parameter from statistic is the main objective of the sampling theory. The estimate can either be a point estimate or it may be an interval estimate. Point estimate is a single estimate expressed in the form of a single figure but interval estimate has two limits, the upper and lower limits. Interval estimates are often used in statistical induction. (b) Tests of Hypotheses or Tests of Significance. The second objective of sampling theory is to enable us to decide whether to accept or reject hypotheses or to determine whether observed samples differ significantly from expected results. The sampling theory helps in determining whether observed differences are actually due to chance or whether they are really significant. Tests of significance are important in the theory of decisions.
(c) Statistical Inference. Sampling theory helps in making generalization about the universe Sampling Theory and from the studies based on samples drawn from it. It also helps in determining the Its Basic Concepts accuracy of such generalizations. NOTES 10.5.3 Sampling Distribution In sampling theory we are concerned with what is known as the sampling distribution. Self-Instructional Material 255 For this purpose we can take certain number of samples and for each sample we can compute various statistical measures such as mean, standard deviation etc. It is to be noted that each sample will give its own value for the statistic under consideration. All these values of the statistic together with their relative frequencies with which they occur, constitute the sampling distribution. We can have sampling distribution of means or the sampling distribution of standard deviations or the sampling distribution of any other statistical measure. The sampling distribution tends quite closer to the normal distribution if the number of samples is large. The significance ofsampling distribution follows from the fact that the mean ofa sampling distribution is the same as the mean ofthe universe. 1hus, the mean of the sampling distribution can be taken as the mean of the universe. 10.6 CENTRAL LIMIT THEOREM According to the central limit theorem, the sampling distribution of any statistic will be normal or nearly normal if the sample size is large enough. Sampling distribution of the sample means Instead of working with individual scores, statisticians often work with means. Several samples are taken, the mean is computed for each sample, and then the means are used as the data, rather than individual scores. The sample is a sampling distribution of the sample means. When all of the possible sample means are computed, then the following properties are true: • The mean ofthe sample means will be the mean ofthe population • The variance ofthe sample means will be the variance ofthe population divided by the sample size. • The standard deviation ofthe sample means (known as the standard error ofthe mean) will be smaller than the population mean and will be equal to the standard deviation ofthe population divided by the square root ofthe sample size. • Ifthe population has a normal distribution, then the sample means will have a normal distribution. • If the population is not normally distributed, but the sampl~ size is sufficiently large, then the sample means will have an approximately normal distribution. Some books defme sufficiently large as at least 30 and others as at least 31. Z = x·-\"r a l.jn The formula for a z-score when working with the sample means is: Finite population correction factor If the sample size is more than 5 per cent of the population size and the sampling is done without replacement, then a correction needs to be made to the standard
Sampling Theory and error of the means. Its Basic Concepts In the following, N is the population size and n is the sample size. The adjustment NOTES is to multiply the standard error by the square root of the quotient of the difference between the population and sample sizes and one less than the population size. 0- = .2_.~ )I Jn~N-1 Check Your Progress 10.7 TESTS OF SIGNIFICANCE 7. What is sampling theory? The theory of sampling can be studied under two heads viz., the sampling of attributes 8. Explain the two concepts and the sampling of variables. Accordingly we shall study the tests of significance putting them under the following categories: involved in sampling theory. (a) Tests of significance in respect of samples concerning statistics of attributes. 9. What are the objectives of (b) Tests of significance in respect of samples concerning statistics of variables. sampling theory? (1) Concerning large samples 10. What is a sampling (i1) Concerning small samples Statisticians distinguish between large and small samples. By small sample is commonly distribution? understood any sample that includes 30 items or fewer, e.g., a sample of 12 items is a small sample. Large sample is one in which the number of items is more than 30, 256 Self-Instructional Material though this is not the deadline in all cases. A sample containing 200 items is clearly an example of a large sample. 10.7.1 Sampling of Attributes Sampling of attributes is referred to as the drawing of samples from a population of A's and not A's. In the population there can be only two mutually exclusive classes. The presence of an attribute may be termed a 'success' and its absence a 'failure'. Thus, if out of 600 people selected for the sample, 120 are found to possess a certain attribute and 480 are such people where the attribute is absent. In such a situation we would say that sample consist of 600 events or items (i.e., n = 600) out of which 120 are successes and 480 failures. The probability of success or p = 120 or 0.2 and the 600 probability of failue or q = 480 or 0.8 so that p + q = 0.2 + 0.8 = 1. With such data 600 the sampling distribution may take the form of binomial probability model whose mean (or 1-L) would be equal to np and standard deviation (or cr) would be equal to np and JiiiXJstandard deviation (or cr) would be equal to In case of sampling of attributes, we have to consider the following three types of problems: (1) The 'parameter' value is given and it is only to be tested if an observed 'statistic' is its estimate. (2) The 'parameter' value is not known and we have to estimate it from the sample. (3) Examination of the reliability of the estimate, i.e., the problem of finding out how far the estimate is expected to deviate from the true value for the population. In case of sampling of attributes, the following standard error formulae are important: (a) Standard Error ofNunber ofSuccesses. Standard deviation of the simple sampling distribution of attribution of attributes is known as the standard error of successes. If we have N samples with n events in each, the chance of success in each event
is p and of its failure is q such that p + q = 1, then the standard error of the Sampling Theory and number of successes is given by the following formula: Its Basic Concepts S.E. of number of successes = ~npq NOTES If n is large, then the binomial distribution tends to become the normal distribution and Self-.Jnstructional Material 257 as such for significance testing purpose we generally make use of the criteria np± 3foiXJ. If the observed value of the number of successes is within np± 3foiXJ limits of the expected number of successes, the difference between the observed value and the expected value is considered insignificant and is presumed to have arisen due to sampling fluctuations but if the observed number of successes does not fail within the said limits, the difference is taken as significant and could not have arisen due to sampling fluctuations. Example 10.1: Twenty-four dice are thrown 1543 times and a throw of 3, 4 and 5 is reckoned as a success. Suppose that 19142 throws of a 3, 4 or 5 have been made out. Do you think that this observed value deviate from the expected value significantly? Solution: It is given that 24 dice have been thrown 1543 times. This is equivalent to a dice being thrown 24 x 1543 = 37032 times. Hence, n = 37032. If the dice falls with number 3 or 4 or 5, it is taken as a success. Hence, the probability of success in a single throw of a dice or, p = 21 q = l - p = -1 2 Expected number of successes, = np= 37032 x-1 = 18516 2 Observed number of successes (as given) =19142. In order to test whether the difference between the expected and observed number of successes is significant or not we will first calculate the S.E. of number of successes as follows, = foiXI = 37032x1-x1- 22 = 96.2 Taking 3 times the S.E. = 96.2 x 3 = 288.6 This means that a difference as high as 288.6 can take place as a result of sampling. But the observed difference is 19142-18516 = 626 which is much greater than 288.6. Hence, the observed difference is significantly and as such the observed value deviate significantly from the expected value. Example 10.2: In tossing a rupee coin 100 times a boy gets head 66 times. Do you think that the coin is unbiased? Solution: Let us take the hypothesis that the coin is unbiased. If that is so, the probability of getting head or p should be _!_ and thus q should also be 1 22
Sampling Theory and Hence, the expected nwnber of heads or np = 100x1- =50 Its Basic Concepts 2 Standard error of nwnber of success, NOTES (i.e., of obtaining heads) = .j;;pq 258 Self-Instructional Material 100x1-x1- =5 22 Three times S.E. = 5 x 3 15 Difference between observed and expected nwnber of heads = 66 - 50 = 16. The difference of 16 is significant because it is more than 3 times the S.E. figure. This means the hypothesis that the coin is unbiased is wrong. In other words, the coin is biased. (b) Standard Error ofProportion of Successes. Instead of taking nwnber of successes as stated above, we might record the proportion of successes in each sample. In such a case the mean and the standard deviation (or the standard error) of the proportion of successes is obtained by the following formulae: Mean of proportion of successes is obtained by the following formulae: np Mean of proportion of successes = -=p n Standard deviation (or S.E.) of the proportion of successes, = \\j[-pq;m; Example 10.3: A sample survey indicates that out of 3232 births, 1705 were boys and 1527 girls. Do these figures conform to the hypothesis that the sex ratio is 50:50? Solution: Starting from the hypothesis that the sex ratio is 50 : 50, the expected proportion of boys (or the proportion of successes, i.e., p) ..!. = 0.5 2 Hence, q= 1-(p) = 21\" and n= (1705 + 1527) = 3232 The standard error of proportion of successes, 11 v--;= Jpq = 2x2 =0.0088 3232 Three times S.E. = 0.0088 x 3 = 0.0264 1705 Observed proportion of boys = 3232 = 0.5275 Difference between observed and expected proportion = 0.5275 - 0.5000 = 0.0275 The difference is significant being greater than 3 times the S.E. value and hence the hypothesis that the sex ratio is 50 : 50 is wrong. Thus the given figures do not conform the hypothesis of sex ratio being 50 : 50. (c) Standard Error of the Difference Between Proportions of Two Samples. If two samples are drawn from different populations, one may be interested in knowing whether the difference between the proportion of successes in significant or not. In such a case
we take the hypothesis that the difference between p1 (i.e., proportion of successes in Sampling Theory and sample number one) and p2 (i.e., proportion of successes in sample number two) is due Its Basic Concepts to fluctuations of random sampling. The standard error of the difference between NOTES proportions of two samples is worked out by applying the formula, Where, p Best estimate of proportion in the population and is obtained as follows: ~~+11zPJ. p ~+11z q I - (p) ~ Number of events m sample no. one Number of events in sample no. two ~ Example 10.4: At a certain date in a large city 400 out of random sample of 500 men were found to be smokers. After the tax on tobacco had been heavily increased, another random sample of 600 men in the same city included 400 smokers. Was the observed decrease in the proportion of smokers significant? Solution: Let us take the hypothesis that the population of smokers even after the heavy tax on tobacco remains unchanged. Proportion of smokers in sample one, PI = 400 =0.800 500 Proportion of smokers in sample two, or p2 = 400 = 0.667 600 Difference between I p1 - p2 I = o.8oo - 0.667 = 0.133 Best estimate of the proportion of smokers in the universe or p= ~ ~ + 11z PJ. 400 + 400 8 ~+11z 500+ 600 11 q 1- p=3- 11 S.E.Diff.pt- P2 ( 1 1l pql~+ 11z) I . Sometimes instead of this formula the following one is used for standard Error of the difference between proportion of two samples: S.E.Diff.p - ll. = -AC-JI + -!J2-qz 1 rL nl f1J_ This formula is used when samples are drawn from two heterogeneous universes where we cannot have best estimate of proportion in the universe (or p) on the basis of given sample data. Such a situation often arises in the study of association of attributes. Self-Instructional Material 259
Sampling Theory and u·u8 3 ( 1 1 ) Its Basic Concepts 5oo + 6oo NOTES ~=0.027 260 Self-Instructional Material Three times the S.E.Diff.p,-p2 =(3) x (0.027) = 0.081 Thus the difference between the two proportions is significant being higher than 3 times the S.E. value. As such the hypothesis is wrong. Hence it can be concluded that the observed decrease in the proportion of smokers after tax on tobacco is significant. (d) Standard Error ofDifkrence Between Proportion ofpersons Possessing an attribute in a sample and the proportion given by the population. The required formula for this is as under: vS.E.Diff.[p pI= /.AJ'lo X ~~(+111]1. ]_ ) 1-2 Where, Po Population proportion and % = 1 - Po nJ Number of items in the sample ~ Size of population - n1 nJ + ~ Size of population Example 10.5: There are 100 students in a university college and in the whole university the number of students is 2000. In a random sample study 20 were found smokers in the college and 100 in the whole university. Is there a significant difference between the proportion of smokers in the college and university? Solution: Let us start with the hypothesis that there is no significant difference between the proportion of smokers in the college and university. Given information can be written as follows: Population of smokers in the college, or p1 = 20 =0.20 and 'lJ =0.80 and q =100 100 Proportion of smokers in the university, or A = 100 =0.05, '1J =0.95 and~ + 11z =2000 -- 0 2000 Difference between the above two proportions, J0.05 - 0.20J = 0.150 ~.AJQo X ~(~l1+znz) (0.05)(0.95)x 1900 100(2000) '-'0.00044125 =0.021 Three times S.E. = 3(0.21) = 0.063 Since the observed difference (viz., 0.150) is much more than three times S.E. value (0.063), it could not have arisen due to sampling fluctuations. Thus, the hypothesis is
vrong, we can conclude that there is significant difference between the proportion of Sampling Theory and mokers in the college and university. Its Basic Concepts .0.7.2 Sampling of Variables (Large Samples) NOTES .et us now consider the problems of sampling of variables such as height, weight, age :tc. Each individual of the population provides a value of the variable and the population s a frequency distribution of the variable. From the population a random sample can 1e drawn and the statistic calculated. lhe object, as in the case of sampling of attributes, is used, a) To compare the observed and expected values and to find if the difference can be ascribed to the fluctuations of sampling. b) To estimate from the sample some parameters (such as mean, standard deviation etc.) for the universe. c) To find out the degree of reliability of the estimate. rhe tests of significance used for dealing with problems relating to large samples (i.e., ;amples normally having more than 30 items) are different from those used for small ;amples (i.e., samples having 30 or less number of items) simply because the assumptions hat we make in case of large sample do not hold good for small samples. The following wo assumptions are made while studying problems relating to large samples: ( 1) The simple sampling distribution of a statistic tends to be normal. (2) The sample values are approximately close to the population values. When n is large, the probability of a sample value of the statistic deviating from its mean 'Y more than 3 times its standard error is very small. We, therefore, apply this test as tdopted in the case of sampling of attributes, to find out the degree of reliability of a ;tatistic. Consequently the determination of standard errors of statistics is a matter of mportance in the sampling of variables (large samples). fhe following is the list of formulae2 which are used for calculating standard error. l0.7.3 Standard Error for Different Statistics )) Standard Error of Mean (or S.E X) (a) When standard deviation of the population Is known, S.E.x=c.r)p; Where, crP = Standard deviation of population. n Number of items in the sample. (b) When standard deviation of the population is not known, S.E.x = Jc2rsfl All these formulae apply in case of infinite population. But in case of finite population where sampling is done without replacement and the sample is more than 5% of the population we must as well use the finite population multiplier in our standard error formulae. For example, S.E. X in case of finite population will be as under: It may be remembered that is cases in which the population is very large in relation to the size of the sample, the finite population multiplier is close to one and has little effect on the calculation of S.E. As such when sampling fraction is less than 0.05, the finite population multiplier is generally not used. Se/1~/nstructional Mlterial 261
Sampling Theory and Where, Its Basic Concepts as= Standard deviation of sample. n = Number of items in the sample. (Note. When both aP and a 5 are known, then we should always use aP). NOTES (2) Standard Error of Standard Deviation (or S.E a) Check Your Progress S.E.x-= Easn 11. State the concept of (If aP is not known then in its place we shall use a). standard error. (3) Standard Error of lflriance (or S.E a 2) 12. Describe the steps involved in significance testing. S.E.a2 =a~~ 13. What are one-tailed and (If aP is not known then in its place we shall use as). two-tailed tests? (4) Standard Error of Coeflicient of lflriation (or S.EJ 262 Self-Instructional Material Where, S.E.v= J2viirV.12+1lif)4t V = Coefficient of variation. n = Number of items in the sample. (5) Standard Error of Coellicient of Correlation (or S.Er) Where, S.E.r = 1J-f?l r = Coefficient of correlation. n = Number of items in the sample. (6) Standard Error of Rank Correlation Coellicient (or S.Er(rank} 1 S.E.r(rank) = .Jn-1 /3(7) Standard Error of Skewness (or S.EsJ) S.E.sk = v~ (8) Standard Error of 11.-kd.ian (or S.Emedia) S.E.nrdian =1.25331 Jafpl (If aP is not known, use as> (9) Standard Error of Quartiles (or S.Equartil) S.E.quartiles = 1.36263 (S.E.x) (10) Standard Error ofQuartile Deviation or (S.E 0.n) S.E.o.D. = 0.78672(S.E.x-) (11) Standard Error of 11.-kan Deviation (or S.EMD.). S.E.MD. = 0.78672(S.E.x-) (12) Standard Error of Regression Coellicient of Yon X (S.Eb,). ay~ S.E.J, = r: \"'\" axv n Where, n = Number of pairs of observations of X and Y ax= Standard deviation of X
r = Coefficient of correlation Sampling 1heoryand cr y= Standard deviation of Y Its Basic Concepts (10) Standard Error of Regression Coeflicient of X on Y(or S.Eb ) NOTES ·\"\" S.E.b =ax~ w ay/il (14) Standard Error of Regression Estimate of X on Y(or S.E;c). S.E.XY= a~1-? (15) Standard Error of Regression Estimate of Yon X (or S.Ey)J S.E.}X = ay~1-? (or(16) Standard Error ofthe Difference Between the ~ans ofTwo Samples S.E ~_X;). (1) When two samples are drawn from the same universe: us,, )If crP is not known, sample standard deviation for combined samples (or 3 may be substituted. In such a case we can as well use: S.E.._\\';-_.\\; = (us,) 2 +(usJ 11t ~ (ii) When two samples are drawn from different universe: (If a 11 and a A are not known then in their places as, and as, may be substituted respectively.) (17) Standard Error ofthe Difference Between the Standard Deviations ofTwo Samples (or S.Ecr1 - cr2). (1) When population standard deviations are known: 3. as,, or standard deviation for combined samples (no. one and two) can be worked out as follows: u = 11t(u>;' +Lf)+~(us; + q) >;, 11t +~ Where, Self-Instructional Material 263
SarqJling 1beoryand (il) When population standard deviations are not known: Its Basic Concepts S.E. = {ull)2 + {up,)2 NOTES u,-u, 2~ 2J:2 (18) Standard Error of the Difference Between Tm1 Sample ~ans where ~an of Sample NJ. 1 is Compared with the Combined ~an of Tm1 Samples. S.E.~-x; = (u P)2 ( l2 ) ~ ~ +J:2 (19) Standard Error of the Difference between Tm1 Sample ~ans where the Tm1 Samples are Correlated· rl(usY (us,t (us, us,\\ S.E.DitrX.-x, with r = ~ + ~- 2 ~·~) (20) Standard Error ofthe difference between tJID sample Standard Deviations where Standard Deviations of Sample NJ. 1 is Compared with the Combined Standard Deviations of tJID samples: (uPt n S.E.u,-u, = -2-x ~{~ +J:2) Noll:. This list of standard errors may still be enlarged. However, the readers should keep in view that S.E. formulae numbered 1, 2, 5, 16 and 17 stated above are ordinarily used in case of sampling of variables (large samples). The standard error formulae given above either enable us to give the limits within which the parameter values would lie, i.e., expected to fluctuate or would enable us to judge whether the difference happens to be significant or not at certain confidence levels. For instance, X±3S.E.x, would give us the range within which the parameter mean value is expected to vary. Similarly 3 (s.E..x;-.x;) value would enable us to say whether the difference between two sample means is significant or insignificant. When we do this we use Ztest. The value of z is to be worked out by the appropriate formula. For example, we can work out z as under: Z= _,__l,u_----'-X1 SEx When we are comparing sample mean with population mean or hypothesised population mean. We can work out z as under for comparing the means of two samples: We generally take the table value of z = 3 and derive the inference with 99.73% confidence. For other levels of confidence, the appropriate value of z has to be taken from the table of area under standard normal curve given in the appendix. We now give certain examples to illustrate the calculation and use of some of the standard error formulae given above. 264 Self-Instructional Material
Example 10.6: n = 100 aP = 3\" _ Sampling Theory and Its Basic Concepts - NOTES Given X of the sample= 64\" Set up the probable limits (with 99.73% confidence level) of X of the population from the information. Solution: S.E.x = aP = M3o = 0.3 ._/n . . Probable limits of X (with 99.73% confidence level) of the population -- X of the sample ± 3 (S.E. of X) 64 ± 3 (0.3), i.e., 63.1 and 64.9 inches. Example 10.7: A sample of 400 male students is found to have a mean height of 67.47 inches. Can it be reasonably regarded as a sample from a large population with mean height 67.39 inches and standard deviation 1.30 inches? Solution: \"nS.E. of X= a~= 1~ = 0.065\" v400 Three times S.E. = 3(0.065) = 0.195 Difference between if.! - XI = 167.39 - 67.471 = 0.08\" The observed difference (viz., 0.08\") is less than three times S.E. value (viz. 0.195) and as such is insignificant. It can therefore be concluded that the sample with a mean height of 67.47\" can be regarded to have been taken from a large population with mean height 67.39\" and standard deviation 1.30\" Example 10.8: From a normal population a random sample of size 32 was drawn and the sample standard deviation was found to be 1.38. Using 1% level of significance, decide if it would be reasonable to adopt the value unity for the population standard deviation. Solution: SEo- =~£n = 1.38 = 1.38 = 0.1725 8 ../2x32 At 1% level of significance the probable limits of the standard deviation of the population. • s ± 2.5758 (S.E.cr) = 1.38 ± 2.5758(0.1725) 1.38 ± 0.44 i.e., 0.94 and 1.82 As the value unity lies within these limits we can infer that it would be reasonable to adopt the value unity for the population standard deviation. Example 10.9: The mean produce of wheat of a sample of 100 fields is 200 lbs. per acre with a standard deviation of 10 lbs. Another sample of 150 fields gives the mean of 220 lbs. with a standard deviation of 12 lbs. Assuming the standard deviation of the mean yield at 11 lbs. for the universe, find if the results are consistent. Self-Instructional .Material 265
Sampling Theory and Solution: Its Basic Concepts The given information can be put as follows: NOTES 100; Xi = 200 lbs. o-1 = 10 lbs. 266 Self-Instructional Mlterial 150; Xz = 220 lbs. o-2 = 12lbs. and 0'p = 11 I X2lDifference between the sample means = Xi - = J200- 2201 =20 o-2 o-2 S.E.Diff. x;-x; = __!!_ + __!!_ (This formula is used as o-P has been given) 11t 11;. -{11+) (-11-)2 100 150 = ~1.210- 0.806 = 14.2 Three times the S.E. = 3( 1.42) = 4.26 The observed difference is 20 which is much more than three times the S.E. value. Hence, the difference is significant. Example 10.10: The following statistical measures were calculated from a random sample of 100 items taken from a large inverse: X= 0.85 Y, Y= 0.89 X: crx = 3 Find (a) Standard errors of the two regression coefficients. (b) Standard error of r. (c) Limits of the true value of 'r in the universe. Solution: On the basis of the given information we can say that the regression coefficient of X on Y (orb~= 0.85 and the regression coefficient of Yon X(or byx) = 0.89 Hence, r = ~%c.bx.y ~( 0.89)( 0.85) ~0. 7565 = 0.87 b = r o-x =0.85 XY O\"y 3 or 0.87- = 0.85 O\"y or 0.85cry= 2.61 or 0' y= 3.07 Now, the given three parts of the question can be answered as follows: (a) :. Standard Error of Regression Coefficient of Yon X (or S.E.XY) O\"y~ 3.07~1-(0.87)2 = o-xfii = 3Mo = 0.025
Standard Error of Regression Coefficient of X on Y(or S.E. b~, Sampling Theory';and Its Basic Concepts 3~1-(0.87) 2 NOTES ,-;;;;:; = 0.024 Self-Instructional Material 261 3v1UU (b) Standard Error of Coefficient of Correlation (or S.E), 1- (0.87) 2 ,-;;;;:; = 0.0243 vwO (c) Limits of the true value of r in the universe, = r ± 3(S.E) = 0.87 ± 3 (0.0243) = 0.87 ± 0.0729 i.e., 0.7971 to 0.9429 10.7.4 Sampling of Variables (Small Samples) The sampling theory for large samples is not applicable in small samples because when samples are small we cannot assume that the random sampling distribution is approximately normal or that the sample values are approximately equal to those of the parent universe. In other words, the assumptions of sampling theory concerning large samples do not hold good when we deal with small samples. Hence, new technique is necessary for handling small samples. In case of small samples our main object is to test a given hypothesis. In other words, we try to ascertain whether observed values could have arisen ·by sampling fluctuations. Moreover, we should use relatively wide confidence intervals as the results of small samples usually vary widely from sample to sample. Various significance tests have been developed for dealing with problems of small samples. 1he t-test developed by Sir William Gosset (pen name Student) and z-test developed by R. A. Fisher are very important tests and deserve special mention in context of sampling analysis for small samples. These tests are used to determine the reliability of the small samples, i.e., Arithmetic mean and standard deviation in case of small samples. The two important measures for any samples are the mean and the standard deviations. As such we shall calculate them for small samples. The standard deviation of a small sample is given by, L:((~x_ -1)xf whereas in case of large samples it is worked out as The denominator (n - 1) is equivalent to the number of degrees of freedom (df) involved. The number of df means the number of values in a sample which may be assigned arbitrarily. Thus if there are 10 items in a sample the total of which is 100, then we can assign any values to any 9 items of the sample but the value of the remaining one item is determined automatically because the total of all the ten items cannot exceed or be less than 100. This means that in a sample of 10 items we have only n - 1 = 10 - 1= 9 degrees of freedom or df. We now discuss the important significance tests developed in context of small samples: t-Test Sir William S. Gosset (pen name Student) developed a significance test and through it made significant contribution in the theory of sampling applicable in case of small samples. When population variance is not known. The test is commonly known as Student's t-test and is based on the t-distribution.
Sampling Theoryand Like the normal distribution, t-distribution is also synunetrical but happens to be flatter Its Basic Concepts than the normal distribution. Moreover, there is a different t-distribution for every possible sample size. As the sample size gets larger, the shape of the t-distribution loses NOTES its flatness and becomes approximately equal to the normal distribution. In fact, for sample sizes of more than 30, the t-distribution is so close to the normal distribution that we will use the normal to approximate the t-distribution. Thus, when n is small the t-distribution is far from normal but when n as infinite it is identical with normal distribution. For applying t-test in context of small samples the t value is calculated first of all and then compared with the table value of t at certain level of significance for given degrees of freedom If the calculated value of t exceeds the table value (say ~.05) we infer that the difference is significant at 5% level but if t is less than the concerning table value of the ~ the difference is not treated as significant. The t-test is used when two conditions are fullftled: (a) The sample size is less than 30, i.e., when n < 30. (b) The population standard deviation (cr) must be unknown. In using the t-test we assume the following: (a) That the population is normal or approximately normal. (b) That the observations are independent and the samples are randomly drawn samples. (c) That there is no measurement error. (d) That in the case of two samples, population variances are regarded as equal if equality of the two population means is to be tested. The following formulae are commonly used to calculate the t value: (a) 1b test the significance of the mean ofa random sample lx-.ul (=.!.---...!. SE.x Where, X = Mean of the sample p = Mean of the universe SEx = S.E. of mean in case of small sample and is worked out as follows: L(x;-xf JD and the degrees of freedom = (n - 1) The above stated formula for t can as well be stated as under: t= = lx-.ul x .rn4 ~ Jz.(x-x)' ~ n-l v~ 4. If standard deviation is calculated by the formula, a~~!:(x;:x)' then the value of t can be written as follows: 268 Self-Instructional Material
If we want to work out the probable or fiducial limits of population mean (JL) in case Sampling Theory and of small samples we can use either of the following: Its Basic Concepts (1) Probable limits with 95% confidence level: NOTES Jl = X :t SEx (~.o5) (iJ) Probable limits with 99% confidence level: At other confidence levels the limits can be worked out in a similar manner taking the concerning table value of t just as we have taken ~.05 in case (1) and ~.ol in case (iJ) above. (b) To test the difference between the means of the two samples: Where, SE~-~ Mean of the sample Mean of the sample 2 SE~-.\\; = Standard Error of difference between two sample means and is worked out as follows: SEA-;-A-; = and the degrees of freedom = (n1 + ~- 2) (c) To test the significance of an observed correlation coef/Jcient. t= px~n-2 1-? Here tis based on (n - 2) degrees of freedom. The following examples would illustrate the application of t-test using the above stated formulae: Example 10.11: A sample of 10 measurements ofthe diameter of a sphere gave a mean X= 4.38 inches and a standard deviation, cr = 0.06 inches. Find (a) 95% and (b) 99% confidence limits for the actual diameter. Solution: On the basis of the given data the standard error of mean, ___!!i_ = 0.06 = 0.06 = 0.02 = ~n-1 ~10-1 3 Assuming the sample mean 4.38 inches to be the population mean, the required limits are as follows: (a) 95% confidence limits X± S.E._x( ~.05 ) with degrees of freedom 4.38 ± 0.02(2.262) 4.38 ± 0.04524 I.e., 4.335 to 4.425 Self-hJstructional Material 269
Sampling Theoryand (b) 99% confidence limits X±S.E._x{~.ot) with 9 degrees of freedom Its Basic Concepts 4.38 ± 0.02(3.25) = 4.38 ± 0.0650 NOTES i.e., 4.3150 to 4.4450. 270 Self-Instructional Mlterial Example 10.12: Sample of sales in similar shops in two towns are taken for a new product with the following results: Man Sales lluiance Size of Sample Town A 57 5.3 5 Town B 61 4.8 7 Is there any evidence of difference in sales in the two towns? Solution: We take the hypothesis that there is no difference between the two sample means concerning sales in the two towns. In other words, H0 : -Xi =-X2, 1ft :-Xi ~ -X2. Then we work out the concerning t value as follows: IXi- x2l t=!....--~ SE~-x; Where, ~ Mean of the sample concerning Town A ~ = Mean of the sample concerning Town B. SE~--x-; Standard Error of the difference between two means. SE~--x-; For town A: J.!.= ~( .\\li- .\\1)2 26.5 + 33.6 X +.!. V an.ance 5+7-2 57 4 =J60.1 xJ7+5 =5.3 10 35 ~( .\\li- Xi )2 =(5.3){5) = 26.5 = ~6.01 X ~0.343 = 2.45 X 0.58 FortownB = 1.421 . =-~-(-'x-2--i----x--2''-t- .. Vanance ~ =4.8 ~(Xij-x2t =(4.8){7)=33.6 Hence, t= 157-611 = _4_ = 2.82 1.421 1.421 Degrees of freedom = (n1 + 11:2- 2) = (5 + 7 - 2) = 10 Table value oft at 5% level of significance for 10 degrees of freedom is 2.228. For a two-tailed test. The calculated value of t is greater than its table value. Hence, the hypothesis is wrong and the difference is significant. Example 10.13: Memory capacity of 9 students was tested before and after training. State whether the training was effective or not from the following scores:
Student 12345 6 78 9 Sampling Theory and Its Basic Concepts Before(~) 10 15 9 3 7 12 16 17 4 NOTES After (XA) 12 17 8 5 6 11 18 20 3 Self-Instructional Material 271 Solution: We take the hypothesis that training was not effective. We can write, Ho: XA = X 8 , H,_ : X A > X B We apply the difference test for which purpose first of all we calculate the mean and standard deviation of difference as follows: Students Before ASi After x;._i Difkrence = D 10 12 24 2 15 17 24 3 9 8 -1 4 3 5 24 5 7 6 -1 6 12 11 -1 7 16 18 24 8 17 20 39 9 4 3 -1 n=9 \"i.D= 7 L.IJ = 29 D- = L,D 7 -=-=0.78 n9 L.d -(I>t.n _ 29-(0.78)2 x9 n-1 9-1 ~29-85.44 = ~23~56 = ~2.945 = 1.71 = 0.78 =J9 = 2·34 = 1.364 1.71 . . t 1.71 Degrees of freedom= (n- 1) = (9- 1) = 8 Table value of t at 5% level of significance for 8 degrees of freedom = 1.860 for one-tailed test. Since the calculated value of t is less than its table value, the difference is insignificant and the hypothesis is true. Hence, it can be inferred that the training was not effective. Ztest for testing the significance of 'r in case of small samples or Ztransformation Prof. R.A. Fisher has given a test popularly known as Ztest to test the significance of the correlation coefficient in small samples. While applying the test, r of the sample is transformed into Z on account of which the test is also known as Ztransformation. The Ztransforrnation is done as under: z=-1 1oge.I-+-r 2 1- r r)= 1.15129log10 ( 1+ -- 1- r * Natural or Nepharion system of logs. The relationship between natural log, and log 10 is as under: log, = 2.3026 x log 10
Sampling Theory and Where r represents correlation coefficient on the basis of sample. Its Basic Concepts The statistic Zis used to test (1) whether an observed value of r is significantly different NOTES from a given hypothetical or known value of population correlation (i1) whether two ~<tmple values of r differ significantly from each other. 272 Self-Instructional Material Th.! Standard Error of Z is calculated as: S.E.z = ,.-1--;:; vn-3 Where n means the number of pairs in a sample, p p)and 1+ =1.51129log10 ( p 1- Where p represents population and J.l represents population mean. Note. If p is not known, then it is taken as zero in which case fJ. = 0. Finally the value of the Standard Normal Variate (S.N.V.) is calculated as follows: lz-pl = (z-p)~n-3 SNV = I ~n-3 If the value of SNV exceeds 1.96, the difference is significant at 5% level. The following example makes the application of Ztest clear in testing the significance orr. Example 10.14: Given is the following information: NJ. of ltellB in Coefficient of the Sample Correlation Sample I 23 0.40 Sample 2 19 0.65 Test the significance of the difference, at 5% level, between the two given values of coefficient of correlation. Using Fisher's Ztransformation. Solution: Applying Ztest we obtain z1 and ~ values as under: +1j)~ = 1.15129log10 ( 1 +1i)~ = 1.15129log10 ( 1 1-lj 1-Ji 1.15129log ( 11+-44 ) 1.15129log ( 11+-6655) 1.15129 log 2.333 = 0.424 = 1.15129 log 4.71 = 0.775 ~- 1- + -1- ·~ [9 4-3 11]_-3 ;-16 = =o.335 V2o VsoSE.z;-z;= = We now work out the ratio: 121 - ~1 = 1o.424- o.n5l = ~ = 1.o5 SE 21 _z; 0.335 0.335 As this ratio is less than 1.96, the difference between the two given values of coefficient of correlation at 5% level is insignificant and it can be concluded that the two samples come from the same population.
F-Test Sampling Theory and Its Basic Concepts .FTest is generally known as the variance ratio test and is mostly used in context of NOTES analysis of variance, a technique developed by great statistician R.A. Fisher used extensively in agricultural experiments to test the null hypothesis of equality amongst several sample -- -- means (i.e., to test the l-fo. For test of hypothesis of equality X1 = Xz = X3 = X4 .... ). amongst several sample means the Ftest is considered to be more appropriate. Moreover in the case of Ftest there is no assumption of equality of variances as it was in the case of t-test for testing the equality of the means of two samples. All this has been explained at length in a separate unit entitled 'Analysis of Variance'. Ftest initially was used to verifY the hypothesis of equality between two variances. In fact, Ftest is a test of significance concerning two sample variances. It is based on Rlistribution and is concerned with the Fratio (or the variance ratio) rather than the difference between variances. Prof. Snedecor's name is associated with this test. The fundamental assumptions of Ftest are the following: (a) The population is normal (b) The observations are independent and the samples drawn are random samples. (c) There is no measurement error. The object of Ftest is to test the hypothesis whether the two samples are from same normal population with equal variance or from two normal populations with same variance. The test statistic F is calculated as under: Where, af is treated> which means that the numerator is always the greater variance. Tables have been prepared by statisticians for the variance ratio Fat difference levels of significance. By comparing the observed value of Fwith the table value we can conclude whether the difference between the samples variances could have arisen due to chance fluctuations. If the value of F > F\"o.os for (n1 - 1) and (~ - 1) degrees of freedom we regard the ratio as significant at 5% level. Degrees of freedom for greater variance is represented as v1 and dffor smaller variance as v2. On the other hand, ifF> F\"o.os we shall conclude that the samples could have come from two normal populations with the same variance or from the same normal population with equal variance. The following examples illustrate the application of Ftest. Example 10.15: Two random samples drawn from two normal populations are: Sample I . . . 20, 16, 26, 27, 23, 22, 18, 24, 25, 19 Sample 2 . . . 27, 33, 42, 35, 32, 34, 38, 28, 41, 43, 30, 37 Test using variance ratio at 5% and I% level of significance whether the two populations have the same variance. Self-Instructional Material 213
Sampling Theory and Solution: Its Basic Concepts Taking the hypothesis that the two samples are drawn from the same nonnal population NOTES of equal variance we solve the question as under: 274 Self-Instructional Material Sample 1 Sample 2 ~i dl d2 ~i ~ ~2 I J\\ = 20 -'\\ = 35 20 0 0 27 -8 64 16 -4 16 33 -2 4 26 6 36 42 7 49 27 7 49 35 0 0 23 3 9 32 -3 9 22 2 4 34 -1 1 18 -2 4 38 3 9 24 4 16 28 -7 49 25 5 25 41 6 36 19 -1 1 43 8 64 30 -5 25 37 2 4 LXu=220 Ld12 = 160 LX2i= 420 L~2 = 314 n1 = 10 ~ = 12 Xi = L Ali = 220 = 22 L\\ 10 = L Xii = 420 = 35 X2 ~ 12 Xit~ L( Ali- = (L\\ -1) L~2 -.q (X- i-J\\ )2 = (.q-1) = 160-10(22-20)2 10-1 = 160-40 = 120 = 13.33 9 9 x2t~ L(Xii- = (~ -1) 2 tLd22 -~ (X- -r\\ = (~ -1) = 314-12(35- 35) 314 =-=28.55 12-1 11 28.55 (·:~>of) F = 13.33 = 2.14
Degrees of freedom in Sample I = (10-1) = 9 Sampling 1he01yand Degrees of freedom in Sample 2 = (12-1) = II Its Basic Concepts As the variance of Sample 2 is greater variance, hence NOTES v1 = 11 ( v1 = dffor greater variance) Self-Instructional Material 275 ( v2 = dffor smaller variance) v2 = 9 The value of F at 5% level of significance for v1 = 11 and v2 = 9 df is 3.11. The value of F at 1% level of significance for v1 = 11 and v2 = 9 df is 5.20. Since, 2.14 < 3.11 and also< 5.20 the samples may at 5% as well as 1% level be said to have drawn from two populations having the same variance. Example 16: Given is the following information: Brand A Brand B Simple size nI = 21 ~ = 16 Standard deviation as, = 2.5 - Mean x= 100 ~= 95 Apply Ftest at 5% level to know if the variances of the two makes are significantly different. Solution: Let us take the hypothesis that the variances of the two makes are the same. For testing it, let us work out the F ratio as under: ~.a~!(~ -1) 21(2.5)2 120 F-- ~.a;,!(~ -1) = l6(I.5f 115 =656 = 2.73 2.40 The table value of Fat 5% level for v1 = 20 and v2 = 15, dfis 2.35. The computed value of Fis greater than the table value of Fand hence the Fratio is significant. Hence the variances of the two makes car.not be considered as same. Limitations of Tests of Significance We have described above the various tests used in large as well as small samples. On the basis of these tests, important decisions are made in different spheres. But there are several limitations of these significance tests which should always be borne in mind. Important limitations are as follows: (a) The tests of significance should not be used mechanically. It should be kept in view that testing is not decision making itself; the significance tests are only useful aids for managerial and economic decision-making. (b) The significance tests do not explain why does the difference exist. (c) The results of significance tests are based on probabilities and as such cannot be expressed with full certainty. (d) Statistical inferences based on the significance test cannot be said to be entirely correct evidences concerning the truth of the hypotheses. All the above limitations thus suggest that in problems of statistical significance, the inference techniques (or the tests) must be combined with adequate knowledge of the subject matter along with the ability of good judgement.
Sampling Theory and The Concept of Standard Error (or S.E.) Its Basic Concepts The standard deviation of sampling distribution of a statistic is known as its standard NOTES error and is considered the key to sampling theory. The utility of the concept of standard error in statistical induction arises on account of the following reasons: Check Your Progress (a) The standard error helps in testing whether the difference between observed and 14. What is meant by sampling expected frequencies could arise due to chance. The criterion usually adopted is that of attributes and sampling if a difference is upto 3 times the S.E. then the difference is supposed to exist as of variables? a matter of chance and if the difference is more than 3 times the S.E., chance fails to account for it, and we conclude the difference as significant difference. This 15. Under what conditions is the criterion is based on the fact that at x ± 3(S.E.), the normal curve covers an area t-test used? of 99.73 per cent. The product of the critical value at certain level of significance and the S. E. is often described as the Sampling Error at that particular level of 16. What is an Ftest? significance. We can test the difference at certain other levels of significance as well 17. What are the fundamental depending upon our requirement. assumptions involved in (b) The standard error gives an idea about the reliability and precision of a sample. If using Ftests? the relationship between the standard deviation and the sample size is kept in view, 18. What are the limitations of one would find that the standard error is smaller than the standard deviation. The tests of significance? smaller the S.E. the greater the uniformity of the sampling distribution and hence greater is the reliability of sample. Conversely, the greater the S.E., the greater the difference between observed and expected frequencies and in such a situation the unreliability of the sample is greater. The size of S.E. depends upon the sample size; the greater the number of items included in the sample the smaller the error to be expected and vice versa. (c) The standard error enables us to specify the limits, maximum and minimum, within which the parameters of the population are expected to lie with a specified degree of confidence. Such an interval is usually known as confidence interval. The degree of confidence with which it can be asserted that a particular value of the population lies within certain limits is known as the level of confidence. Procedure of Significance Testing The following sequential steps constitute, in general, the procedure of significance testing: (a) Statement of the Problem First, the problem has to be stated in clear terms. It should be quite clear as to in respect of what the statistical decision has to be taken. The problem may be, Whether the hypothesis is to be rejected or accepted? Is the difference between a parameter and a statistic significant? or the like ones. (b) frfining the Hypothesis. Usually, we start with the null hypothesis according to which it is presumed that there is no difference between a parameter and a statistic. If we are take a decision whether the students have been benefited from the extra coaching and if we start with the supposition that they have not been benefited then this supposition would be termed as null hypothesis which in symbolic form is denoted by Ho· As against null hypothesis, the researcher may as well start with some alternative hypothesis, (symbolically H,) which specifies those values that the researcher believes to hold true and then may test such hypothesis on the basis of sample data. Only one alternative hypothesis can be tested at one time against the null hypothesis. (c) Selecting the Level ofSignificance. The hypothesis is examined on a pre-determined level of significance. Generally, either 5 per cent level or I per cent level of significance is adopted for the purpose. However, it can be stated here that the level of significance must be adequate keeping in view the purpose and nature of enquiry. (d) Computation of the Standard Error. After determining the level of significance the standard error of the concerning statistic (mean, standard deviation or any other 276 Self-Instructional Material
measure) is computed. There are different formulae for computing the standard Sampling Theory and errors of different statistic. For example, the standard error of mean = Its Basic Concepts Fn ,standard deviation the standard error of standard devt.ati.on = standard deviation NOTES j2; I-? the standard error of Karl Pearson's coefficient of correlation = .[;; and so on. (A detailed description of important standard errors formulae has been given on the pages that follow). (e) Calculation ofthe Significance Ratio. The significance ratio, symbolically described as .z; t, f etc. depending on the test we use, is often calculated by diving the difference between a parameter and a statistic by the standard error concerned. Thus, in context of mean, of small sample when population variance is not known, lx- JLI in context of t = - - and in context of difference between two sample means SEX Xzl .t = IXi - (All this has been fully explained while explaining sampling theory SEctitrA;-x, in respect of small samples of variables later in this unit). (f) Deriving the Inference. The significance ratio is then compared with the predetermined critical value. If the ratio exceeds the critical value then the difference is taken as significant but if the ratio is less than the critical value, the difference is considered insignificant. For example, the critical value at 5 per cent level of significance is 1.96. If the computed value exceeds 1.96 then the inference would be that the difference at 5 per cent level is significant and this difference is not the result of sampling fluctuations but the difference is a real one and should be understood as such. 10.8 SUMMARY In this unit, you have learnt the sampling theory, test of significance and statistical inference. Sampling theory is applicable only to random samples. It studies the relationships that exists between the universe and the sample or samples drawn from it. The aim of the sampling theory is to attain statistical estimation, tests of hypotheses or tests of significance, and statistical inference. A one-tailed test would be used when we are testing whether the population mean is lower than or higher than some hypotnesized value. Important decisions are made in different areas using the tests of significance. But, it has limitations too. In problems of statistical significance, the inference techniques must be combined with adequate knowledge of the subject matter along with the ability of good judgement. 10.9 ANSWERS TO 'CHECK YOUR PROGRESS' 1. Sampling is a technique for selecting a group of subjects (a sample) for study from a larger group (a population). Here, each individual chosen entirely by chance and each member of the population has an equal chance of being included in the sample. 2. Sampling is used in surveys and in studying variety of other problems concerning production management, time and motion studies, market research and various areas of accounting and finance. Se!J:InstructiOiwl Material 277
Sampling Theoryand 3. The various benefits of sampling are: Its Basic Concepts (a) Sample usually produces infonnation faster than a census does and thus sampling inquiry saves time. NOTES (b) The results obtained by sampling often are almost as accurate and sometimes even more accurate that those obtained from a census. 278 Self-Instructional Material (c) More detailed infonnation can be obtained from a sample survey than from a census because a sample in many instances takes less time, is less costly and permits more care to be taken in its execution. 4. The different methods of sampling are: (a) Deliberate sampling: In deliberate sampling, the organizers ofthe inquiry purposively or deliberately choose the particular units of the universe for constituting a sample on the basis that the small mass that they select out of a huge one will be typical or representative of the whole. (b) Random sampling: In random sampling, each and every item of the universe has an equal chance of inclusion in the sample. Here it is blind chance alone that determines whether the one unit or another is selected. The results obtained from a random sample can be assured in terms of probability that is while we can measure the errors of estimation or the significance of results obtained from a random sample. 5. The other sampling techniques are: (a) Systematic sampling· (b) Stratified sampling (c) Cluster and area sampling (d) Multi-stage sampling (e) Sequential sampling 6. A systematic bias is an inaccuracy in the infonnation collected for a sample survey. It is resulted from the errors in the sampling procedures. The sampling errors are random variations in the sample estimates around the true population values. Sampling errors results from a conglomeration of non-determinable effects. 7. Sampling theory is a study of relationships existing between a population and sample drawn from the population. 8. The two concepts of sampling theory are parameter and statistic. All the statistical measures based on all items of the universe are termed as parameters, while statistical measures worked out on the basis of sample studies are termed as sample statistics. 9. The objectives of sampling theory are: (a) Statistical estimation (b) Tests of hypotheses or tests of significance (c) Statistical inference 10. Sampling distribution is concerned in sampling theory where certain number samples are taken and for each sample various statistical measures such as mean, standard deviation, etc. are computed. 11. The standard deviation of sampling distribution of a statistic is known as its standard error and is considered as the key to sampling theory. The standard error helps in testing whether the difference between observed and expected frequencies could arise due to chance. It gives an idea about the reliability
and precision of a sample. If the relationship between the standard deviation and the Sampling 111eoryand sample size is kept in view, one would find that the standard error is smaller than the Its Basic Concepts standard deviation. It enables us to specify the limits, maximum and minimum, within which the parameters of the population are expected to lie with a specified degree of NOTES confidence. Self-Instructional Material 219 12. The various steps involved in the significance testing are: (a) Statement of the problem: First of all the problem has to be stated in clear terms. It should be quite clear as to in respect of what the statistical decision has to be taken. The problem may be: Whether the hypothesis is to be rejected or accepted? Is the difference between a parameter and a statistic significant? (b) Defining the hypothesis: Usually we start the null hypothesis according to which it is presumed that there is no difference between a parameter and a statistic. If we were to take a decision whether the students have been benefited from the extra coaching and ifwe start with the supposition that they have not been benefited then this supposition would be termed as null hypothesis, which in symbolic form is denoted by ]\\. As against null hypothesis, the researcher may as well start with some alternative hypothesis, which specifies those values the researcher believes to hold true and then may test such hypothesis on the basis of sample data. Only one alternative hypothesis can be tested at one time against the null hypothesis. (c) Selecting the level ofsignificance: The hypothesis is examined on a pre-determined level of significance. Generally either five per cent level or one per cent level of significance is adopted for the purpose. However, it can be stated here that the level of significance must be adequate keeping in view the purpose and nature of enqmry. 13. One-tailed test are used when we are testing whether the population mean is lower than or higher than some hypothesized value. A two-tailed test will reject the null hypothesis if, say, the sample mean is significantly higher than or lower than the hypothesized value of the mean of the population. Such a test is appropriate when the null hypothesis is some specified value. 14. Sampling of attributes is referred to as the drawing of samples from a population of Ns and not A's. In sampling of variable, each individual of the population provides a value of the variable and the population is a frequency distribution of the variable. 15. The t-test is used under the following two conditions: (a) The sample size is less than 30, i.e., when n < 30. (b) The population standard deviation (crP) must be unknown. 16. F-test is generally known as the variance ratio test and is mostly used in context of analysis ofvariance (ANOVA), a technique developed by great statisticianR.A. Fisher. 17. The fundamental assumptions ofF-test are: (a) The population is normal. (b) Observations are independent and the samples drawn are random samples. (c) There is no measurement error. 18. The limitations of tests of significance are: (a) The test of significance should not be used mechanically. It should be kept in view that testing is not decision making itself; the significance tests are only useful aids for managerial and economic decision-making. Hence, proper interpretation of statistical evidence is important to intelligent decision.
Sampling Theory and (b) The significance tests do not explain why does the difference exist. They simply Its Basic Concepts indicate whether the difference is due to fluctuations of sampling or because of some other reasons but the test do not tell us as to which are the other reason/ NOTES reasons causing the difference. 280 Self-Instructional Material (c) The results of significance test are based on probabilities and as such cannot be expressed with full certainty. When a test shows that a difference is statistically significant then it simply suggests that the difference is probably not due to chance. (d) Statistical inferences based on the significance test cannot be said to be entirely correct evidences concerning the truth of the hypotheses. 10.10 QUESTIONS AND EXERCISES Short-Answer Questions 1. What is sampling? 2. Describe the benefits of sampling in statistics. 3. Explain the methods used in selecting a sample. 4. Differentiate between deliberate and random sampling techniques. 5. What are the implications of simple random sampling? 6. List the various sampling techniques. 7. Analyse sampling and non-sampling errors. 8. Explain about the concepts and objects of sampling theory. 9. What is sampling distribution? 10. How is significance testing done? 11. Describe the significance of sampling of attributes. 12. Explain any six standard errors of statistical calculations. 13. Define t-test with the help of example. 14. Describe the role of Z-test in statistics. 15. What are the limitations of tests of significance? Long-Answer Questions 1. Explain the meaning and significance of the concept of 'Standard Error' in sampling analysis. 2. Why is sampling necessary in statistical investigations? Explain the important methods of sampling commonly used. 3. 'Sampling is a necessity under certain conditi~ns.' Explain this by giving examples. Also indicate the advantages of stratified ·sampling over random sampling 4. Write short notes on the following: (a) Objects of sampling (b) Procedure of significance testing (c) Sampling distribution (d) Benefits of sampling 5. Answer the following: (a) How does the size of sample affect sampling errors? Explain. (b) Give your understanding of 'Non-sampling errors'. (c) Distinguish between a statistic and a parameter.
6. (a) Explain briefly the student's t-test pointing out its salient features. Sampling 'Theory and Its Basic Concepts (b) Why should there be different formulae for testing the significance of the difference between means when the samples are small and large? NOTES 7. (a) A coin is tossed 400 times and it turns head 216 times. Discuss whether the Self-Instructional Material 281 coin may be an unbiased one and explain briefly the theoretical principles you would use for this purpose. (b) A coin is tossed 10,000 times and head turns up 5195 times. Is the coin unbiased? 8. In some dice throwing experiments, A threw dice 49152 times and of these 25145 yielded a 4 or 5 or 6. Is this consistent with the hypothesis that the dice were unbiased? 9. In a sample of 400 people 172 were males. Estimate the population proportion at 95% confidence level. 10. A machine puts out 16 imperfect articles in a sample of 500. After machine is overhauled, it puts out 3 imperfect articles in a batch of 100. Has the machine been improved? 11. 500 articles were selected at random out of a batch containing 10000 articles and 30 were found defective. How many defective articles would you reasonably expect to find in the whole batch? 12. In a simple sample of 600 men from a certain large city 400 are found to be smokers. In one of 900 from another city 450 are found to be smokers. Do the data indicate that cities are significantly different with respect to prevalence of smoking among men? 13. In two large populations there are 35 and 30% respectively of fair haired people. Is this difference likely to be revealed by simple sample of 1500 and 1000 respectively from the two populations? 14. The means of a simple sample of 1000 and 2000 are 67.5 and 68.0 inches respectively. Can the samples be regarded as drawn from the same population of S.D. 9.5 inches? 15. A random sample from 200 villages was taken from Kanpur district and the average population per village was found to be 420 with standard deviation of 50. Another random sample of 200 villages from the same district gave an average population of 480 per village with a standard deviation of 60. Is the difference between the averages of the two samples statistically significant? 16. A random sample of 200 measurements from an infinite population gave a mean value of 50 and a standard deviation of 9. Determine the 95% confidence interval for the mean value of the population. 17. The number of accidents per day were studied for 144 days in town X and for 100 days in town Y and the following information was obtained: Town X Town Y Mean Number of Accidents 4.5 5.4 Standard Deviation 1.2 1.5 Is the difference between the mean accidents of the two towns statistically significant? 18. In a random sample of 64 mangoes taken from a large consignment, some were found to be bad. Deduce that the percentage of bad mangoes in the consignment
Sampling Theory and almost certainly lies between 31.25 and 68.75. Given that the standard error of the Its Basic Concepts proportion of the bad mangoes in the sample in 1116. NOTES 19. A sample of 600 persons selected at random from a large city gives the result that males are 53%. Is there reason to doubt the hypothesis that males and females are in equal numbers in the city? 20. A random sample of 900 members is found to have a mean of 4.45 em Can it be reasonably regarded as a sample from a large population where mean is 5 em and variance is 4? 21. Two random samples are drawn from two normal populations and the size of various items of the two samples are: First Sample: 20, 16, 26, 27, 23, 22, 18, 24, 25, 19 Second Sample: 27, 33, 42, 35, 32, 34, 38, 28, 41, 43, 30, 37 Obtain the estimates of the variances of the population. Test whether the two populations have the same variance. 22. To study the correlation between the stature of the father and the stature of the son, a sample of 1600 is taken from the universe of fathers and sons. The sample study gives the correlation coefficient (or r) = 0.50. Within what limits does it hold true for the universe? 23. Ten students are selected at random from a school and their heights are found to be, in inches 50, 52, 52, 53, 55, 56, 57, 58, 58 and 59. In the light of these data, discuss the suggestion that the mean height of the students of the school is 54 inches. (The value of t for 8 degrees of freedom is 2.306, for 9 degrees of freedom 2.262 and for 10 degrees of freedom 2.228.) 24. The following results were obtained from a sample of 10 boxes of biscuits: Mean Weight of Contents 490 gm Standard Deviation of the Weights of Contents 9 gm Test the hypothesis that X= 490 gm is the mean of a random sample from a population having a mean of 500 gm. Use the following extract from t table for the purpose: lalue oft Degrees ofFreedom ' Levels of Significance 0.05 0.01 8 2.306 3.355 9 2.262 3.250 10 2.228 3.169 25. In a test given to two groups of students, the marks obtained were as follows: First Group: 18, 20, 36, 50, 49, 36, 34, 49, 41 Second Group: 29, 28, 26, 35, 30, 44, 46 Examine the significance of difference between the mean marks obtained by the students of the above two groups. (At 5% level of significance the value of t for 14 degrees of freedom is 2.14). 26. 12 students were given intensive coaching and 5 tests were conducted in a month. The scores of tests 1 and 5 are given below. Does the score from test 1 to test 5 show an improvement? 282 Self-Instructional Material
No. of Students 2 3 4 5 6 7 8 9 10 11 12 Sampling Theory and Its Basic Concepts Marks in First Test 50 42 51 26 35 42 60 41 70 55 62 38 NOTES Marks in Fifth Test 62 40 61 35 30 52 68 51 84 63 72 50 (The value of t for 11 degrees of freedom at 5% level of significance is 2.20). 27. A sample of 16 measurements of the diameter of a sphere gave a mean X= 4.58 inches and a standard deviation cr = 0.08 inches. Find (a) 95% and (b) 99% confidence limits for the actual diameter. 28. A farmer grows crops on two fields A and B. On A he puts Rs 10 worth Groups of manure per acre and on B Rs. 20 worth. The net returns per acre exclusive of the cost of manure on the two fields in the five years are: Year 2 3 4 5 Field A, Rs per acre 34 28 42 37 44 Field B, Rs per acre 36 33 48 38 50 Other things being equal, discuss the question whether it is likely to pay the farmer to continue the more expensive dressing. The value of t for 4 degrees of freedom at 5% level of significance is 2.776. 29. The means of the random samples of sizes 9 and 7 are 196.42 and 198.42 respectively. The sums of the squares of the deviations from the mean are 26.94 and 18.73 respectively. Can the samples be constituted to have been drawn from the same normal population? 30. The heights of six randomly chosen sailors are, in inches, 63, 65, 58, 69, 71 and 72. The heights of ten randomly chosen soldiers are, in inches, 61, 62, 65, 66, 69, 69, 70, 71, 72 and 73. Do these figures indicate that soldiers are on an average shorter than sailors? (5% value of t for 14 degrees of freedom = 2.45). 31. It is claimed that Americans are 16 pounds overweight on average. To test tllis claim, 9 randomly selected individuals were examined and the average excess weight was found to be 18 pounds with a standard deviation of 4 pounds. At the 5% level of significance, is there reason to believe the claim of 16 pounds to be in error? 32. The sales of an item in six shops before and after a special promotional campaign are: Shop A B c D E F Before 53 28 31 48 50 42 After 58 29 30 55 56 45 Can the campaign be judged to be a success? Test at 5% level of significance. 33. From a sample of 19 pairs of observation the coefficient of correlation is 0.5 and the corresponding population r is 0.3. Is the difference significant at 1% level? Apply Ztest. 34. The given value of population r is 0.22. Show whether the observed value or r = 0.37 (from a sample of 25 pairs) is significantly different from the population value. Apply Ztest. 35. Two independent samples have 19 and 28 pairs of observations with correlation coefficient 0.50 and 0.65 respectively. Are these values of r consistent with the hypothesis that the samples are drawn from the same population? (Answer applying Ztransformation.) Self-hJstructional Material 283
Sampling Theory and 36. Answer using Ftest whether the following two samples have come from the same Its Basic Concepts population: NOTES Sample 1 17, 27, 18, 25, 27, 29, 27, 23, 17 284 Self-Instructional Material Sample 2 16, 16, 20, 16, 20, 17, 15, 21 37. The following table gives the number of units produced per day by two workers A and B for a number of days: Jt 40, 30, 38, 41, 38, 35 B. 39, 38, 41, 33, 32, 49, 49, 34 Should these results be accepted as evidence that B is the more stable worker? Use Ftest. 38. Given n = 11, r = 0.9, test the significance of r at 5% level with the help of Student's t-test. 39. In a sample of 500 from a village in Haryana 280 are found to be wheat eaters and the rest rice eaters. Can we assume that both the food articles are equally popular? 40. Two random samples drawn from the normal population are: Sample 1 19, 20, 24, 27, 21, 18, 20, 19 Sample 2 26, 37, 40, 35, 30, 30, 40, 26, 30, 37 Obtain the estimates of the variances of the population. Test whether the two population have the same variance. 41. Apply t-test to find whether correlation is significant if r = 0.7 and n = 28. 42. How many pairs of observation must be included in a sample in order that an observed r = 0.42 shall have a calculated value of t = 2.72? 43. The foreman of a certain mining company has estimated the average quantity of ore extracted to be 34.6 tons per shift and the sample standard deviation to be 2.8 tons per shift, based upon a random selection of 6 shifts. Construct 95% as well as 98% confidence interval for the mean sales. 44. A sample of 16 bottles has a mean of 122 mi. Is the sample representative of a large consignment with a mean of 130 mi. and a standard deviation of 10 mi.? Mention the level of significance you use. 45. Ten cartons are taken at random from an automatic fitting machine. The mean net weight of the 10 cartons is 11.8 oz. And standard deviation is 0.15 oz. Does the sample mean differ significantly from the intended weight of 12 oz.? You are given that for 9 degrees of freedom fo.os = 2.26. 46. In a sample of 8 observations, the sum of the squared deviations of items from the mean was 94.5. In another sample of 10 observations, the value was found to be 101.7. Test whether the difference is significant at 5% level. You are given that at 5% level, critical value of Ffor vi= 7 and v2 = 9 degrees of freedom is 3.29 and for vi = 8 and v2 = 10 degrees of freedom, its value is 3.07. 47. (a) Explain why there is no single level of probability used to reject or accept in hypothesis testing. (b) How many standard deviations around the hypothesized value should we use to be 95.45% certain that we accept the hypothesis when it is correct? (c) Formulate the null and alternative hypotheses to test whether the mean life for men is 80 years.
48. The average number of defective articles in a certain factory is claimed to be less Sampling Theory and than the average for all the factories. The average for all the factories is 30.5. A Its Basic Concepts random sample of 100 defective articles showed the following distribution: NOTES Class Limits 16--20 21-25 26--30 31-35 36--40 Total No. of Defectives 12 22 20 30 16 100 Calculate the X and S.D. of the sample and use it to claim that the average is less than the figure for all the factories at 5% level of significance. Given Z ( 1.645) = 0.95. 10.11 FURTHER READING Kothari, C.R. 1984. Quantitative Tec}mjques, 3rd Edition. New Delhi: Vikas Publishing House Pvt. Ltd. Chandan, J.S. 1998. Statistics for Business and Economics. New Delhi: Vikas Publishing House Pvt. Ltd. Chandan, J.S., Jagjit Singh and K.K. Khanna. 1995. Business Statistics, 2nd Edition. New Delhi: Vikas Publishing House Pvt. Ltd. Croxton, Frederick E., and Dudley J. Cowden. 1943. Applied General Statistics. New York: Prentice-Hall. Neter, John, and William Wasserman. 1956. Fundamental Statistics for Business and Economics. New York: Allyn and Bacon. Chance, William A. 1969. Statistical !vlethods for Decision Making. Illinois: Richard D Irwin. Yule, G. U., and M.G. Kendall. 1950. An Introduction to the Theory ofStatistics. London: Griffin. Self-hJstructional Miterial. 285
UNIT 11 HYPOTHESIS Hypothesis Structure NOTES 11.0 Introduction 11.1 Unit Objectives 11.2 What is a Hypothesis? 11.2.1 Statistical Decision-Making 11.2.2 Committing Errors: Type I and Type II 11.3 Null and Alternative Hypotheses 11.3.1 Null Hypothesis and Alternative Hypothesis 11.3.2 Comparison ofNull Hypothesis with Alternate Hypothesis 11.4 Critical Region 11.5 Penalty 11.6 Standard Error 11.7 Decision Rule 11.7.1 Large Sample Tests 11.8 Summary 11.9 Answers to 'Check Your Progress' 11.10 Questions and Exercises 11.11 Further Reading 11.0 INTRODUCTION In this unit, you will learn about hypothesis, types of hypotheses, critical region, penalty, standard error and decision rule. Hypothesis is an assumption that is tested to find its logical or empirical consequence. It refers to a provisional idea whose merit needs evaluation, but having no specific meaning. A hypothesis should be clear and accurate. Various concepts, such as null and alternative hypotheses enable to verify the testability of an assumption. Hypotheses are often statements about population parameters like variance and expected value. During the course of hypothesis testing, some inference about population like the mean and proportion are made. Any useful hypothesis will enable predictions by reasoning, including deductive reasoning. For the purpose of decision- making, a hypothesis has to be verified and then accepted or rejected. This is done with the help of observations. In this unit you will also learn how decision-making plays significant role in different areas such as marketing, industry and management. Testing a statistical hypothesis on the basis of a sample enables us to decide whether the hypothesis should be accepted or rejected. The Critical Region (CR) or Rejection Region (RR), is a set of values for testing statistic for which the null hypothesis is rejected in a hypothesis test. 11.1 UNIT OBJECTIVES • After going through this unit, you will be able to: • Define the concept of hypothesis and the types of errors • Explain null and alternative hypotheses • Describe critical region or region of hypothesis rejection • Define penalty • Calculate standard error • Explain decision rule Se!J:h1structional Mltcrial 287
Hypothesis 11.2 WHAT IS A HYPOTHESIS? NOTES A hypothesis is an approximate assumption that a researcher wants to test for its logical or empirical consequences. Hypothesis refers to a provisional idea whose merit needs evaluation, but having no specific meaning. Though it is often referred as a convenient mathematical approach for simplifying cumbersome calculation. Setting up and testing hypothesis is an integral art of statistical inference. Hypotheses are often statements about population parameters like variance and expected value. During the course of hypothesis testing some inference about population like the mean and proportion are made. Any useful hypothesis will enable predictions by reasoning including deductive reasoning. According to Karl Popper, a hypothesis must be falsifiable and that a proposition or theory cannot be called scientific if it does not admit the possibility of being shown false. Hypothesis might predict outcome of an experiment in a lab setting the observation of a phenomenon in nature. Thus, hypothesis is a explanation of a phenomenon proposal suggesting a possible correlation between multiple phenomena. The characteristics of hypothesis are: • Clear and accurate: Hypothesis should be clear and accurate so as to draw a consistent conclusion. • Statement of relationship between variables: If a hypothesis is relational, it should state the relationship between different variables. • Testability: A hypothesis should be open to testing so that other deductions can be made from it and can be confirmed or disproved by observation. The researcher should do some prior study to make the hypothesis a testable one. • Specific with limited scope: A hypothesis, which is specific, with limited scope, is easily testable than a hypothesis with limitless scope. Therefore, a researcher should pay more time to do research on such kind of hypothesis. • Simplicity: A hypothesis should be stated in the most simple and clear terms to make it understandable. • Consistency: A hypothesis should be reliable and consistent with established and known facts. • Time limit: A hypothesis should be capable of being tested within a reasonable time. In other words, it can be said that the excellence of a hypothesis is judged by the time taken to collect the data needed for the test. • Empirical reference: A hypothesis should explain or support all the sufficient facts needed to understand what the problem is all about. A hypothesis is a statement or assumption concerning a population. For the purpose of decision-making, a hypothesis has to be verified and then accepted or rejected. This is done with the help of observations. We test a sample and make a decision on the basis of the result obtained. Decision-making plays significant role in different areas such as marketing, industry and management. 11.2.1 Statistical Decision-Making Testing a statistical hypothesis on the basis of a sample enables us to decide whether the hypothesis should be accepted or rejected. The sample data enable us to accept or reject the hypothesis. Since the sample data give incomplete information about the population, the result of the test need not be considered to be final or unchallengeable. The procedure, on which the basis of sample results, enables to decide whether a hypothesis is to be accepted or rejected. This is called Hypothesis Testing or Test of Significance. 288 Self-.Instructional MJterial
Note 1: A test provides evidence, if any, against a hypothesis, usually called a null Hypolhesis hypothesis. The test cannot prove the hypothesis to be correct. It can give some evidence against it. NOTES The test of hypothesis is a procedure to decide whether to accept or reject a hypothesis. Sclf-lnstruaional Mllcrial 289 Note 2: The acceptance of a hypotheses implies if there is no evidence from the sample that we should believe otherwise. The rejection of a hypothesis leads us to conclude that it is false. This way of putting the problem is convenient because ofthe uncertainty inherent in the problem. In view of this we must always briefly state a hypothesis that we hope to reject. A hypothesis stated in the hope of being rejected is called a null hypothesis and is denoted byHo. If llJ is rejected, it may lead to the acceptance of an alternative hypothesis denoted by H,.. For example, New fragrance soap is introduced in the market. The null hypothesis //;), which may be rejected, is that the new soap is not better than the existing soap. Similarly, a dice is suspected to be rolled. Roll the dice a number of times to test. The Null Hypothesis//;): p= l/6 for showing six. The Alternative hypothesis H,.: p ~ 116. For example, Skulls found at an ancient site may all belong to race X or race Yon the basis of their diameters. We may test the hypothesis that the mean is p of the population from which the present skulls came. We have the hypotheses. //;): p= Jlx, H,.: p= )Jy Here we should not insist on calling either hypothesis null and the other alternative since the reverse could also be true. 11.2.2 Committing Errors: Type I and Type II • Types of Errors: There are two types of errors in statistical hypothesis, which are as follows: o Type I Error: In this type of error, you may reject a null hypothesis when it is true. It means rejection of a hypothesis, which should have been accepted. It is denoted by a (alpha) and is also known alpha error. o Type II Error: In this type of error, you are supposed to accept a null hypothesis when it is not true. It means accepting a hypothesis, which should have been rejected. It is denoted by f3 (beta) and is also known as beta error. Type I error can be controlled by fixing it at a lower level, for example, If you fix it at 2%, then the maximum probability to commit Type I error is 0.02. But reducing Type I error, has a disadvantage when the sample size is fixed as it increases the chances of Type II error. In other words, it can be said that both types of errors cannot be reduced simultaneously. The only solution of this problem is to set an appropriate level by considering the costs and penalties attached to them or to strike a proper balance between both types of errors. In a hypothesis test, a type I error occurs when the null hypothesis is rejected when it is in fact true; that is, //;) is wrongly rejected. For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is no better, on average, than the current drug; that is //;): there is no difference between the two drugs on average. A type I error would occur if we concluded that the two drugs produced different effects when in fact there was no difference between them.
Hypothesis In a hypothesis test, a type II error occurs when the null hypothesis liJ, is not rejected NOTES when it is in fact false. For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is no better, on average, than the current drug; that is liJ: there is no difference between the two drugs on average. A type II error would occur if it were concluded that the two drugs produced the same effect, that is, there is no difference between the two drugs on average, when in fact they produced different ones. In how many ways can we commit errors? We reject a hypothesis when it may be true. This is Type I Error. We accept a hypothesis when it may be false. This is Type II Error. The other true situations are desirable: We accept a hypothesis when it is true. We reject a hypothesis when it is false. Acceptl!J Rejectl!J liJ Accept True liJ Reject True liJ True Desirable Type I Error ~ Accept False liJ Reject False liJ False Type II Error Desirable The level of significance implies the probability of type I error. A five per cent level implies that the probability of committing a type I error is 0.05. A one per cent level implies 0.01 probability of committing type I error. Lowering the significance level and hence the probability of type I error is good but unfortunately it would lead to the undesirable situation of committing type II error. To sum up: • Type I Error: Rejecting liJ when liJ is true. • Type II Error: Accepting liJ when liJ is false. Note. The probability of making a Type I error is the level of significance of a statistical test. It is denoted by a Where, a= Prob. (Rejecting liJ I liJ true) 1-a = Prob. (Accepting & I liJ true) The probability of making a Type II error is denoted by f3. Where, f3 = Prob. (Accepting liJ I liJ false) 1-f3 = Prob. (Rejecting & I & false) = Prob. (The test correctly rejects ./{)when ./{)is false) 1-f3 is called the power of the test. It depends on the level of significance a, sample size nand the parameter value. 11.3 NULL AND ALTERNATIVE HYPOTHESES Hypothesis is usually considered as the principal instrument in research. The basic concepts regarding the testability of a hypothesis are as follows: 290 Self-Instructional Mlterial
Search
Read the Text Version
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
- 213
- 214
- 215
- 216
- 217
- 218
- 219
- 220
- 221
- 222
- 223
- 224
- 225
- 226
- 227
- 228
- 229
- 230
- 231
- 232
- 233
- 234
- 235
- 236
- 237
- 238
- 239
- 240
- 241
- 242
- 243
- 244
- 245
- 246
- 247
- 248
- 249
- 250
- 251
- 252
- 253
- 254
- 255
- 256
- 257
- 258
- 259
- 260
- 261
- 262
- 263
- 264
- 265
- 266
- 267
- 268
- 269
- 270
- 271
- 272
- 273
- 274
- 275
- 276
- 277
- 278
- 279
- 280
- 281
- 282
- 283
- 284
- 285
- 286
- 287
- 288
- 289
- 290
- 291
- 292
- 293
- 294
- 295
- 296
- 297
- 298
- 299
- 300
- 301
- 302
- 303
- 304
- 305
- 306
- 307
- 308
- 309
- 310
- 311
- 312
- 313
- 314
- 315
- 316
- 317
- 318
- 319
- 320
- 321
- 322
- 323
- 324
- 325
- 326
- 327
- 328
- 329
- 330
- 331
- 332
- 333
- 334
- 335
- 336
- 337
- 338
- 339
- 340
- 341
- 342
- 343
- 344
- 345
- 346
- 347
- 348
- 349
- 350
- 351
- 352
- 353
- 354
- 355
- 356