Home Explore BUSINESS STATICS

BUSINESS STATICS

Published by International College of Financial Planning, 2020-06-07 13:30:34

Description: BUSINESS STATICS

Read the Text Version

Pages:

In general, if there are n values in the sample, then .Measures ofCentral Tendency X=At+X2+ ......... +~ n NOTES In other words, Self-Instructional Mlterial 41 n i= I, 2 ... n (3.1) LXf X= i=l n' The above formula states, add up all the values of A; where the value of i starts at 1 and ends at n with writ increments so that i = 1, 2, 3, ... n. If instead of taking a sample, we take the entire population in our calculations of the mean, then the symbol for the mean of the population is f.1 (mu) and the size of the population is N, so that: N i=l, 2 ...N (3.2) LXf f l =i=]lV · If we have the data in grouped discrete form with frequencies, then the sample mean is given by: X= l:f(X) (3.3) \"if Where l:f Summation of all frequencies =n \"£ J(..\\) Summation of each value of X multiplied by its corresponding frequency ( f) Example 3.1: Let us take the ages of 10 students as follows: 19, 20, 22, 22, 17, 22, 20, 23, I7, I8 Solution: This data can be arranged in a frequency distribution as follows: Age Frequency I(X) (X) J!)_ 34 I7 2 I8 I8 I I9 19 I 40 20 2 22 3 66 23 I 23 Total= 10 200 In the given case we have \"if= 10 and l:J(..\\) = 200, so that: l:f(X) X l:f 200/10 = 20

~sures ofCentral Example 3.2: Calculate the mean of the marks of46 students given in the following table. Tendency Frequency of ~rks of 46 Students NOTES Marks Frequency 42 Self-Instructional Material (~ (f) 91 10 2 11 3 12 6 13 10 14 11 15 7 16 3 17 2 18 1 Total 46 Solution: This is a discrete frequency distribution, and is calculated using equation (3.3). The following table shows the method of obtianing l:ftX). MJrks (X) Frequency (f) /(X) 9 1 9 10 2 20 11 3 33 12 6 72 13 10 130 14 11 154 15 7 105 16 3 48 17 2 34 18 1 18 \"U= 46 \"i.I(X)= 623 Using equation 3.3, we get, x = l:f(X) =623 =13.54 l:f 46 3.4.1 Arithmetic Mean of Grouped Data If however the data is grouped such that we are given frequency of finite-sized class intervals we do not know the value of every item The calculation of arithmetic mean in such a case is then necessarily, a process of estimation, based on some assumption. The standard assumption for this purpose is that all the items within a particular class are concentrated at the midvalue of the class and thus f(X) corresponding to the !items of a class equals f(m), where m is the midpoint of the class interval, and the arithmetic mean is then given by, (3.4) The determination of the midpoint of a class interval requires some consideration. The position of the midpoint is determined by real as distinguished from apparent class limits. 3.4.2 Advantages of Mean 1. Its concept is familiar to most people and is intuitively clear. 2. Every data set has a mean, which is unique and describes the entire data to some degree. For example, when we say that the average salary of a professor is ~ 25,000 per month, it gives us a reasonable idea about the salaries of professors.

3. It is a measure that can be easily calculated. Af:asures ofCentral 4. It includes all values of the data set in its calculation. Tendency 5. Its value varies very little from sample to sample taken from the same population. NOTES 6. It is useful for performing statistical procedures such as computing and comparing the means of several data sets. 3.4.3 Disadvantages of Mean 1. It is affected by extreme values, and hence, not very reliable when the data set has extreme values especially when these extreme values are on one side of the ordered data. Thus, a mean of such data is not truly a representative of such data. For example, the average age of three persons of ages 4, 6 and 80 is 30. 2. It is tedious to compute for a large data set as every point in the data set is to be used in computations. 3. We are unable to compute the mean for a data set that has open-ended classes either at the high or at the low end of the scale. 4. The mean cannot be calculated for qualitative characteristics such as beauty or intelligence, unless these can be converted into quantitative figures such as intelligence into IQs. 3.5 PREPARING A FREQUENCY DISTRIBUTION TABLE Ifa frequency table records the distribution ofa discrete variable where real and the apparent class limits are the same (unless the class interval is exclusive), this is due to the fact that discrete data are always expressed in whole numbers and are always characterized by gaps at which no measure may ever be found. Thus, if the class intervals of discrete variable are: 6-10 11-15 16-20 The apparent limits 6 and 10, 11 and 15, 16 and 20 are real limits also. The class 6--10 includes only those items whose sizes are 6, 7, 8, 9 or 10. Any item whose size is more than 10, i.e., 11, 12, etc., or less than 6, i.e., 5 is not included in this class, but in the next higher or the next lower class. There are, of course, no values between 10 and 11 or 15 and 16. In such a case, the midpoint is the middle of the five values included in a class, viz., 8 is the midpoint in 6--10 class, 13 is the midpoint in 11-15 class, and 18 is the midpoint in 16--20 class. If, however, the class interval is exclusive, the apparent limits are not real and before finding the midpoint, the real limits should be determined. If the class interval is given in the following manner, it is said to be an exclusive class interval: {1) 5-10 (iJ) 10-15 (iiJ) 15-20 This means that an item having a value 15 is to be included either in class (iJ) or class Check Your Progress (iii). If it is included in class (ii) it means value 10 is included in class (1). Hence, the real limits of class (i1) are 11-15 and the midpoint is 13. If 15 is not included in class (i1) but is 1. What is descriptive statistics? included in class (iii) the real limits of class (i1) are then 10-14 and the midpoint is 12. It, 2. Describe the term arithmetic therefore, follows that, whenever we have an exclusive class interval we must decide as to which limit of the class is excluded and only then the midpoint should be ascertained. mean. 3. Explain the advantages of If the frequency table records the distribution of a continuous variable, then the real limits are not the same as the apparent limits. This is because theoretically such variables mean. can be measured to an infinitesimal fraction of a unit, and the measures that are obtained are only approximations to absolute accuracy. While measuring the weight of boys, for Self-Instructional Mlterial 43

Af:asures ofCentral example, we seldom go to a unit smaller than the pound. Thus, when we say that the Tmdency weight of an individual is 140 pounds, what we really mean is that his weight is nearer 140 pounds than to 139 or 141 pounds. This means that it is somewhere between 139.5 and NOTES 140.5 pounds. From this, it follows that if in any frequency distribution of weights we find a class interval identified by the interval limit (say, 140--144), we must conclude (a) that weights have been measured correct to the nearest pound, and (b) hence the real limits of the interval extend by 0.5 pounds on either side, and class interval strictly speaking is 139.5-144.5. The midpoint ofthis class is to be determined from these limits. The method of finding the midvalue in this case is as follows: Lower limit of the class + Upper limit -Lower limit = 139.5 + ~ = 139.5 + 2.5 = 142.0 22 or Upper limit+ Lower limit = 139.5 + 144.5 = 284 = 142.0 2 22 If the weight has been measured correct to the nearest tenth of a pound, we will have class intervals like the following: 140-144.9 145-149.9 On the basis of what has been said earlier the real limits are: 139.95-144.95 144.95-149.95 Here the tm·dpom· t wt·ll be, 139·95 +2 144·95 = 142.45, I·.e., 142.5 Example 3: The first and third columns ofthe following table give the frequency distribution of the average monthly earnings of male workers. Calculate the mean earnings. Distribution ofMJle Mbrlrn byAJerage MJnthly Earnings (CompufBtion ofArithm:tic Af:an byLong Af:thod) MJnthlyEarnings Mdpoint NJ. of Mbrlrn r (I) r r f[m) (m) 27.5-32.5 30 120 3,600 32.5-37.5 37.5--42.5 35 152 5,320 42.5--47.5 47.5-52.5 40 170 6,800 52.5-57.5 57.5-62.5 45 214 9,630 62.5-67.5 67.5-72.7 50 410 20,500 72.5-77.5 77.5-82.5 55 429 23,595 82.5-87.5 87.5-92.5 60 568 34,080 92.5-97.5 97.5-102.5 65 650 42,250 102.5-107.5 107.5-112.5 70 795 55,650 75 915 68,625 80 745 59,600 85 530 45,050 90 259 23,310 95 152 14,440 100 107 10,700 105 50 5,250 110 25 2,750 'f.f= 6,291 'f.f(m) = 4,31,150 44 Self-Instructional Material

Solution: Since the variable is a continuous one, the midpoints are ca.lculated simply as Jl..feasures ofCentral (lower limit + upper limit)/2, and are shown in second column as m The f(m) values are Tendency calculated in column 4. The mean is calculated as, NOTES X= \"i'..f(m) = 4,31,150 =~ 68. 5 \"i'..f 6,291 If we compute the arithmetic mean from unclassified data, it may differ slightly from ~ 68.5. This lack of agreement is due to the inadequacy of the midvalue assumption. It is true that none of the midvalue is actually the true concentration point of this class. But in the case of symmetrical distributions there is greater possibility of errors compensating some ofthe midpoints erring by being too low and others erring by being too high. However, if the frequency tails off towards either the high or low values, i.e., if it departs seriously from a symmetrical distribution, the arithmetic average computed will be somewhat in error because of the failure of the known errors in the midpoints assumption to compensate. 3.6 PROPERTIES OF THE MEAN The arithmetic mean has the following interesting properties. 1. The sum of the deviations of individual values of X from the mean will always add up to zero. This means that if we subtract all the individual values from their mean, then some values will be negative and some will be positive, but if all these differences are added together then the total sum will be zero. In other words, the positive deviations must balance the negative deviations, or symbolically: Incx;-X) 0, i= 1, 2, ... n i=l 2. The second important characteristic of the mean is that it is very sensitive to extreme values. Since the computation of the mean is based upon inclusion of all values in the data, an extreme value in the data would shift the mean towards it, thus making the mean unrepresentative of the data. 3. The third property of the mean is that the sum of squares of the deviations about the mean is minimum. This means that if we take differences between individual values and the mean and square these differences individually and then add these squared differences, then the final figure will be less than the sum of the squared deviations around any other nwnber other than the mean. Symbolically, it means that: n L~.,(Xj- -X2) = Minimum, i = 1, 2, ... n i=l The following examples will make the concept clear about properties of mean. 4. The product of the arithmetic mean and the number of values on which the mean is based is equal to the sum of all given values. In other words if we replace each item in series by the mean, then the sum of the these substitutions will equal the sum of individual items. Thus, in the figures 3, 5, 7, 9, ifwe substitute the mean for each item, 6, 6, 6, 6 then the total is 24 both in the original series and in the substitution series. Self-Instructional Material 45

!vf:asures ofCentral This can be shown like this. Tendency Since, - ~X NOTES X=- 46 Self-Instructional Jllaterial N NX=~X For example, ifwe have a series of values 3, 5, 7, 9, the mean is 6. The squared deviations are: X X-X= X' X'2 3 3-6=-3 9 5 5-6=-1 1 7 7-6=1 1 9 9-6=3 9 ~X' 2 =20 This property provides a test to check if the computed value is the correct arithmetic mean. Example 3.4: The mean age of a group of 100 persons (grouped in intervals 10-, 12- ... etc.) was found to be 32.02. Later, it was discovered that age 57 was misread as 27. Find the corrected mean. Solution: Let the mean be denoted by X So putting the given values in the formula of A.M., we have, 32.02 = ~:,i.e., LX=3202 LXCorrect = 3202-27 +57= 3232 Correct A.M. = 3232 = 32.32 100 Example 3.5: The mean monthly salary paid to all employees in a company is ~ 500. The monthly salaries paid to male and female employees average~ 520 and~ 420, respectively. Determine the percentage of males and females employed by the company. Solution: Let .1\\{ be the number of males and Ni be the number of females employed by the company. Also let x1 and .\\2 be the monthly average salaries paid to male and female employees and x12 be the mean monthly salary paid to all the employees. - 1\\1:tl + 1\\2 A2 Xl2 = __,_N,__,_I+-N.-=2~ or 500 = 5201\\1 + 4201\\2 or 201\\{ = 80~ Nj+Nz Nior -1\\1= -8=0 4 20 Hence, the males and females are in the ratio of 4 : I or 80 per cent are males and 20 per cent are females in those employed by the company. 3.7 SHORT-CUT METHODS FOR CALCULATING MEAN We can simplify the calculations of mean by noticing that if we subtract a constant amount A from each item X to define a new variable X = X- A, the mean X' of X differs from

X by A This generally simplifies the calculations and we can then add back the constant !vi:asures ofCentral Tendency A, termed as the asszmrd mean1: NOTES X= A+ .k =A+ L f(X') 2:f The following table illustrates the procedure of calculation by short-cut method using the data given in example 2. The choice of A is made in such a manner as to simplify calculation the most, and is generally in the region of the concentration of data. X (f) Deviation fi'om !(X') ksumed !vi:an (13) X 9 I -4 -4 10 2 -3 -6 II 3 -2 -6 12 6 -I -6 13 10 0 -22 1 14 II +I +II 15 7 +2 +14 16 3 +3 +9 17 2 +4 +8 18 I +5 +5 +47 -22 Lf=46 'LIX = 25 The mean, X= A+ L f(X') =13+ 25 = 13.54 2: f 46 the same as calculated in Example 2. In the case of grouped frequency data, the variable Xis replaced by midvalue m, and in the short-cut technique; we subtract a constant value A from each m, so that the formula becomes, X- = A+L-:-f-(L'm-:-f--'A--) In cases where the class intervals are equal, we may further simplify the calculation by taking the factor i from the variable m--A defining, X =m- -.A- 1 Where i is the class width. It can be verified, that when X is defined then the mean of the distribution is given by: X= A+ L f(X') Xi 2:f The following examples will illustrate the use of short-cut method. Example 3.6: The ages of twenty husbands and wives are given in the following table. Form a two-way frequency table showing the relationship between the ages of husbands and wives with class intervals 20-24; 25-29, etc. I. Since there will not be entry in the /(X) column corresponding to X= 0, we write the sum -22 of the negative entries in the /(X) column. The sum of the positive products in the /(X) column, i.e., 47, is also written as the total N The fmal sum 25 is then easily obtained. Self-Instructional Mlteria/ 47

M:asures ofCentral Calcula~e the arithmetic mean of the two groups after the classification. Tendency S.NJ. Age ofHusband AgeofWfe NOTES 1 28 23 Solution: 2 37 30 3 42 40 4 25 26 5 29 25 6 47 41 7 37 35 8 35 25 9 23 21 10 41 38 11 27 24 12 39 34 13 23 20 14 33 31 15 36 29 16 32 35 17 22 23 18 29 27 19 38 34 20 48 47 FrequencyDatributiOIJ ofAge ofHusbtuJda IUJd 'M• Age of 20-24 25-29 Ageofwife 45-49 Total Husband III III 30-34 35-39 40-44 I 3 II II 5 2~24 II I 1 2 5 5 III I I 6 25-29 2 I 2 3~34 432 20 35-39 4~4 45-49 Total Oass Intervals Mdvalues Husband x ' =m--53-7 fjxl' 2~24 m Frequency ( fj) '2 -9 -10 25-29 22 3 -3 -2 27 5 -2 3~34 32 2 -1 ---2--1 35-39 37 6 0 0 42 2 1 2 4~4 47 2 2 4 45-49 -6 l:lj = 20 l:ljx1'=-15 48 Self..Jnstructional Material

Husband age, arithmetic mean: Afeasures ofCentral Tendency _ Ifix) . -15 NOTES X =---XJ+ A=-X5+37 =33.25 l\\ 20 Calculation ofArithrrx:tic !vi:an ofHi\"ves'Age Wk m-37 fr/ Class Intervals Midvalues x ' = -5 - m Frequency -9 2 -10 I (lj) -2 20-24 22 3 -3 0 25-29 27 5 -2 2 30-34 32 2 2 35-39 37 6 -I ~Izxl = -17 40-44 42 2 45-49 47 I 0 I 2 ~Iz = 19 Wife age, arithmetic mean: _-\" = If2x2. -17 =41.47 ---xi+A=--x5+37 N 19 3.8 WEIGHTED ARITHMETIC MEAN In the computation of arithmetic mean we had given equal importance to each observation in the series. This equal importance may be misleading if the individual values constituting the series have different importance as in the following example: The Raja Toy shop sells ~ 3 each Toy cars at Toy locomotives at ~ 5 each Toy aeroplanes at ~ 7 each Toy double decker at ~ 9 each What shall be the average price of the toys sold, if the shop sells 4 toys, one of each kind? Mean Price, i.e., --\" = -LX = Rs24- = ~ 6 44 In this case the importance of each observation (price quotation) is equal in as much as one toy of each variety has been sold. In the above computation of the arithmetic mean this fact has been taken care of by including 'once only' the price of each toy. But if the shop sells 100 toys: 50 cars, 25 locomotives, 15 aeroplanes and 10 double deckers, the importance of the four price quotations to the dealer is not equal as a source of earning revenue. In fact their respective importance is equal to the number of units of each toy sold, i.e., The importance of toy car 50 The importance of locomotive 25 The importance of aeroplane 15 The importance of double decker 10 It may be noted that 50, 25, 15, 10 are the quantities of the various classes of toys sold. It is for these quantities that the term 'weights' is used in statistical language. Weight is represented by symbol ' w, and L w represents the sum of weights. Self-lnstructiollal Mlterial 49

While determining the 'average price oftoy sold' these weights are ofgreat importance and are taken into account in the manner illustrated below: -. l-- Mf\" + ~\"'2 + ~~ + ~.\\4 -- I-: w-.x Mf+~+~+~ I:w NOTES When M'J.• Ml• \"3• 144 are the respective weights of x1, \"2• .13• x4 which in tum represent the 50 Self-IDstructiODJll Mlterial price offour varieties oftoys, viz., car, locorootive, aeroplane and double decker, respectively. A = (50x3)+(25x5)+(15x7)+(10x9) 50+25+15+10 = (150) + (125) + (105) + (90) = 470 = ~ 4.70 100 100 The table below summarizes the steps taken in the computation ofthe weighted arithmetic mean. Ew = 100; Ew.x=470 A =I; w.x = 470 =4.70 I:w 100 The weighted arithmetic mean is particularly useful where we have to compute the 111t:an of111t:am. Ifwe are given two arithmetic means, one for each oftwo different series, in respect ofthe same variable, and are required to find the arithmetic mean ofthe combined series, the weighted arithmetic mean is the only suitable method of its determination. Meigbted Arithmetic ]d,an of 1bys Sold by the Raja 1by Shop Toys Price per 'lby NJmberSo/d Price o ~igbt w Car rx xw Locomotive so Aeroplane 3 150 Double Decker 25 125 s 15 105 10 90 7 9 :tw= 100 :txw= 470 Eumple 3.7: The arithmetic mean of daily wages of two manufacturing concerns A Ltd. and B Ltd. is~ 5 and~ 7, respectively. Determine the average daily wages ofboth concerns if the number of workers employed were 2,000 and 4,000, respectively. Solution: (1) Multiply each average (viz. 5 and 7) by the number of workers in the concern it represents. (h) Add up the two products obtained in (1) above and (iil) Divide the total obtained in (h) by the total number of workers. Meigbted M:an of ]d,an l\\iges ofA lJd. and B lJd. Mulufacturing 1\\&n WI~ Mbrkers Mmn WigesO Concem Errployed Mbrkers Employed X A Ltd. w w.K B Ltd. s 2,000 10,000 7 4,000 28,000 ~wx =38,000 ~w=6,000 - Iw.x A=- l;w 38,000 6,000 =~ 6.33

The above mentioned examples explain that 'Arithmetic Means and Percentage' are not A*asures ofCentral original data. They are derived figures and their importance is relative to the original data Tendency from which they are obtained. This relative importance must be taken into account by weighting while averaging them (means and percentage). NOTES 3.9 MEDIAN The second measure of central tendency that has a wide usage in statistical works, is the median. Median is that value ofa variable which divides the series in such a manner that the number of items below it is equal to the number of items above it. Half the total number of observations lie below the median, and halfabove it. The median is thus a positional average. The median of ungrouped data is found easily if the items are first arranged in order of magnitude. The median may then be located simply by counting, and its value can be obtained by reading the value of the middle observations. If we have five observations whose values are 8, 10, I, 3 and 5, the values are first arrayed: I, 3, 5, 8 and 10. It is now apparent that the value of the median is 5, since two observations are below that value and two observations are above it. When there is an even number of cases, there is no actual middle item and the median is taken to be the average of the values of the items lying on either side of (N + I)/2, where Nis the total number of items. Thus, if the values of six items of a series are 1, 2, 3, 5, 8 and 10. The median is the value of item number (6 + 1)/2 = 3.5, which is approximated as the average of the third and the fourth items, i.e., (3+5)/2=4. Thus the steps required for obtaining median are: 1. Arrange the data as an array of increasing magnitude. 2. Obtain the value of the (N+ l)/2th item. Even in the case of grouped data, the procedure for obtaining median is straightforward as long as the variable is discrete or non-continuous as is clear from the following examples. Example 3.8: Obtain the median size of shoes sold from the following data. Number ofSbot!S Sold bySDJ: iD Oae l6ar Size Number ofPairs Cumulatiw: Total 5 30 30 s.l 40 70 2 6 50 120 6.1 150 270 2 7 300 570 7.1 600 1170 2 8 950 2120 8.1 820 2940 2 9 750 3690 9.1 440 4130 2 10 250 4380 to.l 150 4530 2 11 40 4570 11.1 39 4609 2 Total 4609 Self-Instructional Mlterial 51

Masures ofCentral Solution: Median, is the value of ( N + 1) th = 4609 + 1th = 2305th item Since the items Tendency 22 NOTES are already arranged in ascending order (size-wise), the size of2305th item is easily determined by constructing the cumulative frequency. Thus, the median size of shoes sold is 81, the size of 2305th item In the case of grouped data with continuous variable, the determination of median is a bit more involved. Consider an example: the data relating to the distribution of male workers by average monthly earnings is given in the following table. Clearly the median of 6291 cases is the earnings of(6291 + 1)/2 = 3146th worker arranged in ascending order of earnings. From the cumulative frequency, it is clear that this worker has his income in the class interval67.5-72.5. But it is impossible to determine his exact income. We, therefore, resort to approximation by assuming that the 795 workers of this class are distributed uniformly across the interval67.5 to 72.5. The median worker is (3146-2713) = 433rd of these 795, and hence, the value corresponding to him can be approximated as, 67.5 +4-33X (72.5- 67.5) = 67.5 + 2.73 = 70.23 795 Distribution of Male Ubrkers by Average MJnthly Earnings GroupNJ. M:mthly NJ.of Cumulative NJ. Earnings (f) Hbrkers of Hbr.kers 1 2 27.5-32.5 120 120 3 32.5-37.5 152 272 4 37.5--42.5 170 442 5 42.5--47.5 214 656 6 47.5-52.5 410 1066 7 52.5-57.5 429 1495 8 57.5-62.5 568 2063 9 62.5-67.5 650 2713 10 67.5-72.5 795 3508 11 72.5-77.5 915 4423 12 77.5-82.5 745 5168 13 82.5-87.5 530 5698 14 87.5-92.5 259 5957 15 92.5-97.5 152 6109 16 97.5-102.5 107 6216 17 102.5-107.5 50 6266 107.5-112.5 25 6291 Total 6291 The value of the median can thus be put in the form of the formula, 1\\+1_c M=l+ 2 xj f Where I is the lower limit of the median class, i its width, fits frequency, Cthe cumulative frequency upto (but not including) the median class, and Nis the total number of cases. 52 Self-Instructional Material

3.10 LOCATION OF MEDIAN BY !vkasures ofCentral GRAPHICAL ANALYSIS Tendency The median can quite conveniently be determined by reference to the ogive which plots the NOTES cumulative frequency against the variable. The value of the item below which half the items lie can easily be read from the ogive as is shown in example 9. Example 3.9: Obtain the median of data given in the following table. Monthly Earnings Frequency (I) Less than More than (greater than) 27.5 - 0 6291 32.5 120 120 6171 37.5 152 272 6019 42.5 170 442 5849 47.5 214 656 5635 52.5 410 1066 5225 57.5 429 1495 4796 62.5 568 2063 4228 67.5 650 2713 3578 72.5 795 3508 2783 77.5 915 4423 1868 82.5 745 5168 1123 87.5 530 5698 593 92.5 259 5957 334 97.5 !52 6109 182 102.5 107 6216 65 107.5 50 6266 25 I 12.5 25 6291 0 Solution: It is clear that this is grouped data. The first class is 27.5-32.5, whose frequency is 120, and the last class is 107.5-112.5, whose frequency is 25. The median can also be determined by plotting both 'less than' and 'more than' cumulative frequency as shown in Fig. 3.3. It is obvious that the two curves should intersect at the median of the data. 3.11 QUARTILES, DECILES AND PERCENTILES We have defined the median as the value of the item which is located at the centre of the array, we can define other measures which are located at other specified points. Thus, the Nh percentile of an array is the value of the item such that Nper cent items lie below it. Clearly then the Nh percentile Pn of grouped data is given by, nN -C P=l+lOO xi nf Self-Instructional Material 53

M:asures ofCentral -..6291 1;11' 'R:ndency --6000 ~ ~ NOTES ....I ~j\"'lll 54 Self-Instructional Mlterial .....,MORBTBAN ~\\. J LBSSTBAN ~ , ,\\ I4000 i 3000 ..... ~ I\\ ~ lOOO I I\\ v t\\. '1000 , \" '0 _,~[.;' ~MfiDM ~ ~ ....... ~~~~ -~Cf'l· ~a~~a ~~~E~ ~ .~....~... a MONTHLY EARNJNOS JN .R.UPBES Fig. 3.3 Location of ~dian Where 1is the lower limit ofthe class in which nN1 OOth item lies, i its width, fits frequency, Cthe cumulative frequency upto (but not including) this class, and Nis the total number of items. We similarly define the .Nh decile as the value of the item below which (nNIO) items ofthe array lie. Clearly, nN_c Dn_-rnwn--1+ 10 f xj The other most commonly referred to measures of location are the quartiles. Thus, nth quartile is the value of the item which lie at the ri_ N5)th item. Clearly Q2, the second quartile is the median. For grouped data, nN_c On= ~Sn = 1+ 4 f xi Example 3.10: Find the first and the third quartiles and the 90th percentile ofthe data given in the table of solution of example 8. Solution: The first quartile Q1 is the value of the N4 = 6291/4 = 1572.75th item. Thus the appropriate class is 57.5- 62.5, and by the above equation of decile, Q, =57.5 + (1572. 75 -1495) X 5 = 58.18 1 568 The third quartile Q3 is the value of the 3N4 = 3 x 6291 = 4718.25th item, or Q3 class interval is 77.5-82.5, and Q = 775 + (4718.25- 4423) X 5 = 79.5 3 745

Similarly, P90 lies in 82.5- 87.5 class interval, and ~asures ofCentral p =82.5+ (5661.9-5168) x5=87.16 Tendency ~ 530 NOTES or 90 per cent workers earn less than~ 87.16. Self-Instructional .MJterial 55 3.12 MODE The mode, is that value of the variable, which occurs or repeats itself the greatest number of times. The mode is the most 'fashionable' size in the sense that it is the most common and typical, and is defined by Zizek as 'the value occurring most frequently in a series (or group of items) and around which the other items are distributed most densely.' The mode of a distribution is the value at the point around which the items tend to be most heavily concentrated. It is the most frequent or the most common value, provided that a sufficiently large number of items are available to give a smooth distribution. It will correspond to the value of the maximum point (ordinate) of a frequency distribution if it is an 'ideal' or smooth distribution. It may be regarded as the most typical of a series of values. The modal wage, for example, is the wage received by more individuals than any other wage. The modal 'hat' size is that which is worn by more persons than any other single size. It may be noted that the occurrence of one or a few extremely high or low values has no effect upon the mode. If a series of data are unclassified, not having been either arrayed or put into a frequency distribution, the mode cannot be readily located. Taking first an extremely simple example, if seven men are receiving daily wages of~ 5, 6, 7, 7, 7, 8 and 10, it is clear that the modal wage is~ 7 per day. If we have a series such as 2, 3, 5, 6, 7, 10 and 11, it is apparent that there is no mode. There are several methods of estimating the value of the mode. But, it is seldom that the different methods of ascertaining the mode give us identical results. Consequently, it becomes necessary to decide as to which method would be most suitable for the purpose in hand. In order that a choice of the method may be made, we should understand each of the methods and the differences that exist among them The four important methods of estimating mode of a series are: (1) Locating the mc3t frequently repeated value in the array; (h) Estimating the mode by interpolation; (iii) Locating the mode by graphic method; and (i0 Estimating the mode from the mean and the median. Only the last three methods are discussed in this unit. Estimating the Mode by Interpolation. In the case of continuous frequency distributions, the problem of determining the value of the mode is not so simple as it might have appeared from the foregoing description. Having located the modal class of the data, the next problem in the case of continuous series is to interpolate the value of the mode within this 'modal' class. The interpolation is made by the use of any one of the following formulae: (1) !v.b= /1 + 12 xi; (i1) !v.b= 12 - 10 xi IO+'i fo+'i or (11\"1) Jv.b= } + /1-/.0 Xj I (fj-/Q)+(/j-fi) Where 11 is the lower limit of the modal class, 12 is the upper limit of the modal class, fo equals the frequency of the preceding class in value, fi. equals the frequency of the modal class in value, f2 equals the frequency of the following class (class next to modal class) in value and i equals the interval of the modal class.

Jo..f:asures ofCentral Example 3.11: Determine the mode for the data given in the following table. Tendency lfege Group Frequency (f) NOTES 14-18 6 56 Self-Instructional Material 18-22 18 22-26 19 26-30 12 30-34 5 34-38 4 38-42 3 42-46 2 46-50 1 50-54 0 54-58 Solution: In the given data, 22 - 26 is the modal class, since it has the largest frequency, the lower limit ofthe modal class is 22, its upper limit is 26, its frequency 19, the frequency of the preceding class is 18, and of the following class is 12. The class interval is 4. Using the various methods of determining mode, we have, (1) 2 2 + 12 4 (11') MJ=26- -18-1+8X12 4 MJ= 18 -+-12x =22+8- = 2 6 -1-2 5 5 =23.6 =23.6 (ii1) MJ= 22+ 19 - 18 x4 = 22+~ =22.5 (19-18)+(19-12) 8 In formulae (1) and (i1), the frequency of the classes adjoining the modal class is used to pull the estimate of the mode away from the midpoint towards either the upper or lower class limit. In this particular case, the frequency of the class preceding the modal class is more than the frequency of the class following and, therefore, the estimated mode is less than the midvalue of the modal class. This seems quite logical. If the frequencies are more on one side of the modal class than on the other, it can be reasonably concluded that the items in the modal class are concentrated more towards the class limit of the adjoining class with the larger frequency. The formula (iiJ) is also based on a logic similar to that of (1) and (iJ). In this case, to interpolate the value of the mode within the modal class, the differences between the frequency of the modal class, and the respective frequencies of the classes adjoining it are used. This formula usually gives results better than the values obtained by the other and exactly equal to the results obtained by graphic method. The formulae (1) and (i1) give values which are different from the value obtained by formula (ii1) and are more close to the central point of modal class. If the frequencies of the class adjoining the modal are equal, the mode is expected to be located at the midvalue of the modal class, but if the frequency on one of the sides is greater the mode will be pulled away from the central point. It will be pulled more and more if the difference between the frequencies of the classes adjoining the modal class is higher and higher. In the example given above, the frequency of the modal class is 19 and that of preceding class is 18. So, the mode should be quite close to the lower limit of the modal class. The midpoint of the modal class is 24 and lower limit of the modal class is 22. Locating the Mode by the Graphic Method. The method of graphic interpolation is illustrated in Fig. 3.4. The upper corners of the rectangle over the modal class have been joined by straight lines to those of the adjoining rectangles as shown in the diagram; the right corner to the corresponding one of the adjoining rectangle on the left, etc. If a

perpendicular is drawn from the point of intersection of these lines, we have a value for the Jl..f:asures ofCentral mode indicated on the base line. The graphic approach is, in principle, similar to the arithmetic interpolation explained earlier. Tendency 20 ~,I•••I,• NOTES 18 I• Self-Instructional Jl,flterial 57 16 I• 14 • ' I 12 t I 10 ~ I J8 I 6 • 4 I 2 I 0 • I I I II I I -I I I -• I 171.171 I 10 14 18 22· 26 30 34 38 42 461so 1 s41s\"s •• Wage Groups MODE Fig. 3.4 Mthod of Mxle frtermination by Graphic Interpolation The mode may also be determined graphically from an ogive or cumulative frequency curve. It is found by drawing a perpendicular to the base from that point on the curve where the curve is most nearly vertical, i.e., steepest (in other words, where it passes through the greatest distance vertically and smallest distance horizontally). The point where it cuts the base gives us the value of the mode. How accurately this method determines the mode is governed by: (1) The shape of the ogive, (2) The scale on which the curve is drawn. Estimating the Mode from the Mean and the Median. There usually exists a relationship among the mean, median and mode for moderately asymmetrical distributions. If the distribution is symmetrical, the mean, median and mode will have identical values, but if the distribution is skewed (moderately) the mean, median and mode will pull apart. If the distribution tails off towards higher values, the mean and the median will be greater than the mode. If it tails off towards lower values, the mode will be greater than either of the other two measures. In either case, the median will be about one-third as far away from the mean as the mode is. This means that, Mode = Mean - 3 (Mean - Median) = 3 Median - 2 Mean In the case of the average monthly earnings (refer table of example 3) the mean is 68.53 and the median is 70.2. If these values are substituted in the above formula, we get, Mode = 68.5- 3(68.5 -70.2) = 68.5 + 5.1 = 73.6 According to the formula used earlier, Mode =1 + I xj I 2 IQ+t;_

M:asures ofCentral = 72.5+ 745 x5 795+745 1Cnderx:y = 72.5 + 2.4= 74.9 NOTES 00. Check Your Progress Mode = ~l + 2 fi-fo so xi 4. The following are the scores 12 for the mid-term exam given f. so to 13 students in statistics. 42,42,68,80, 75,54,62,89, 1- 10- 72,80,80, 75,65 Calculate: =72.5+ 915-795 x5 (a) The mean (b) The mode 2 X 915-795-745 (c) The median = 72.5 + 120 X 5 = 75.57 5. The following data represents 290 the number of cars entering a gas station on Bedford The difference between the two estimates is due to the fact that the assumption of Avenue for repairs between relationship between the mean, median and mode may not always be true, it is obviously 10.00 a.m. and 11.00 a.m. in not valid in this case. the last 8 days. 7, 8, 6,8, 9, 7, 5,6 Example 3.12: (a) In a moderately symmetrical distribution, the mode and mean are 32.1 Calculate the mean for this and 35.4 respectively. Calculate the median. data. (b) If the mode and median of moderately asynunetrical series are respectively 16\" and 6. The following are the 15.7\", what would be its most probable median? monthly salaries, in rupees, of the employees in a branch (c) In a moderately skewed distribution, the mean and the median are respectively 25.6 bank. Calculate the arithmetic and 26.1 inches. What is the mode of the distribution? mean. 10, 17,29,95,95, 100,100, Solution: (a) We know, 175, 250 and 750 Mean- Mode = 3 (Mean- Median) 7. The following figures represent the number of or 3 Median =Mode+ 2 Mean books issued at the counter of a commerce library in or Median = -32-.1 +-23-X-35-.4 11 different days. Calculate the median. (b) -1-02-.9 96, 180,98, 75,270,20, 102, or 100,94,75,200 3 =34.3 2 Mean = 3 Median- Mode Mean =t(3x15.7-16.0)= 3~· 1 =15.55 (c) Mode = 3 Median- 2 Mean =3 X 26.1-2 X 25.6=78.3-51.2=27.1 3.13 GEOMETRIC MEAN The Geometric Mean (GM) of npositive values is defined as the .zth root of their product. Thus, it is obtained by multiplying together all the values and then extracting the relevant root of the- product. It can be represented as: Geometric Mean or GM = !I) x1 • x2 • x3 ,.. xn Where n stands for the number of items and Xp .12• ..\\3• ... xn are the various values. For instance, the geometric mean of 4, 8, 16 is, GM= ~4x8x16 = ~512 =8 The above method of calculating geometric mean is satisfactory only if there are two or three items. But if n is a large number, the problem of computing the .zth root of the product of these values by simple arithmetic is a tedious work. To facilitate the computation 58 Self-Instructional Mlterial

of geometric mean we make use of logarithms. The above formula when reduced to its A.feasures ofCentral logarithmic form will be: Tendency log GM = logA1 +logA2 +logA3 + ... logAn NOTES .......:.--'---=---\"---\"---\"'----=--_,._ n The logarithm of the geometric mean is equal to the arithmetic mean of the logarithms of individual values. Example 3.13: Find the GM of2, 4, 8, 12, 16, 24. Jog 2 0.3010 4 0.6021 8 0.9031 12 1.0792 16 1.2041 24 1.3802 5.4697 Solution: Geometn.c Mean = anti.1og -5.46-97 6 =antilog 0.9116 = 8.158 It is easily verified that the geometric mean (GM) of a frequency distribution is given by, logGM= / 1 log A1 + 12 log ..lz + / 3 log ..l3 ... In log An !\\ Similarly, for grouped data, logGM= I flogm !\\ Where m is the midvalue of a particular class. 3.14 HARMONIC MEAN Another important mean is the Harmonic Mean (HM) which is used for averaging the rates. It is defmed by, l1 (I 1 1 ll Hl\\1 = ~ + Xz + x3 + ... + xJ In Where n is the number of items in the series xl' ~· x3, .•., xn. Thus, if a man travels 200 krn each on three days at speeds of 60, 50 and 40 kmph, respectively, his average speed is given by the HM of the three speeds, namely 3 HM= I l 1 =48.65kmph -+-+- 60 50 40 Nott:: HM gives the correct average speed because the man travelled equal distances on three speeds. If, however, he had travelled for equal times, the AM would have been the correct average. Self-Instructional Material 59

~asures ofCentral 3.15 CHOICE OF AVERAGE TendellCy The choice of a particular measure of central tendency of location depends on the purpose NOTES of investigation. It should be noted that the arithmetic mean (AM) is quite precisely defined and is therefore more amenable to further mathematical manipulation. On the other hand, Check Your Progress the mode can be located by inspection, but cannot be manipulated easily. 8. What are the limitations of There are certain other properties of these which also determine their use. It has been geometric mean? noted that sum of deviations of individual items from AM is zero and, therefore, AM is useful in those situations where the effect of positive deviations cancels out the effect of 9. What is harmonic mean? negative deviations. Thus, the average height of boys or the average marks of students is 60 Self-Instructional Material generally the AM. On the other hand, mode is the most typical item and thus more representative of the series. Hence, if we want to determine the height of chair seats most suitable to a class room, we should use most typical value, i.e., the mode. That value will satisfY the most number of students. If we had chosen the arithmetic mean it will ensure that the 'total discomfort' of those who find it too high balances the 'total discomfort' of those who find it too low which obviously has nothing to recommend for itself. The median on the other hand is a value, the total of absolute values ofdeviations from which it is the minimum. This property makes it useful in those situations where we want to minimize the total 'discomfort'. It should further be noted that AM takes all the items into consideration, while median and mode are unaffected by extreme values. In many situation this makes AM less attractive. Thus, if we calculate average income for the purposes of determining the general well being of a population, the astronomical incomes of a few individuals give a wrong tilt to the AM income. In such a situation the median income will serve the purpose better. 3.16 MISUSE OF AVERAGES A lot of mischief can be done by wrong interpretation of the meaning of averages. It was reported in 1953 that the death rate in the state of Arizona came down to 8.5 per thousand from 16.8 per thousand in a period of 5 years, making it a much healthier place to live in. A closer investigation revealed that the decrease was essentially because of the influx of a large number ofyoung recruits to army posts swamping a relatively small base ofpopulation. Similarly, it was reported that in a village in Rajasthan, a cloud burst produced a rainfall in one night which was 20 times the average annual rainfall of last 60 years. This seems like quite a cloud burst till it is mentioned, that in those 60 years there was only one other recorded rainfall of a 3 mm, so that a mere 1 mm rainfall gave that dramatic headline. Summary of the Properties of the lluious Measures ofLocatiOJJ Property Arithrrx:tic Mxlian Mxle Geometric Harmonic .M:an .M:an .M:an AM ~ Ab JIM not very GM 1. Rigidity defined . yes yes no yes 2. Based on all values yes no yes yes very easy yes of series moderate very easy quite 3. Ease in calculation most easily quite very difficult difficult 5. Understanding by no not easy not easily most no common man moderate yes yes 6. Amenable to least moderate moderate moderate algebraic treatment 7. Effect of sample variations (Small effect is desirable) Contd...

8. Effect of extreme large none none very low very low J..feasures ofCentral values Tendency yes no difficult' no difficulty yes yes (Small effect is no no NOTES desirable) general more most typical no averaging purpose rates Self-Instructional Mlterial 61 9. Difficulties with problems least net typifying averaging open-ended classes discomfort/ series ratios comfort 10. Use as typical value of series problems II. Most useful in 3.17 SOLVED PROBLEMS Problem 1: The following table shows the number of skilled and unskilled workers in two small communities, together with their average hourly wages. Ram Nagar ShyamNagar Ubrker Number l-fegePer Number l-fegePer Category Hour Hour Skilled 150 ~ 1.80 350 ~ 1.75 Unskilled 850 ~ 1.30 650 ~ 1.25 Determine the hourly wage for each community. Also give reasons why the result shows that the average hourly wage in Shyam Nagar exceeds the average hourly wage in Ram Nagar, even though in Shyam Nagar the average hourly wage of both categories of workers is lower. Solution: Ram Nagar ShyamNagar Ubrker l-fege Total Number l-fege Total Category Per l-fege Per l-fege Number Hour Hour Per Hour Per Hour Skilled !50 ~ 1.80 270 350 ~ 1.75 612.50 Unskilled 850 1105 650 ~ 1.25 812.50 ~1.30 Total 1000 1375 1000 1425.00 The hourly rate of wages for Ram Nagar is~ 1375. The hourly rate of wages for Shyam Nagar is~ 1425. The reason for the wage rate in Shyam Nagar being higher than in Ram Nagar is the fact the number of skilled workers getting a higher wage rate is proportionately more than in Ram Nagar. In Shyam Nagar their number is 350 out of 1000 as compared to 150 out of 1000 in Ram Nagar. Problem 2: From the following data calculate the missing frequency. No. of Tablets 4-8 8-12 12-16 16-20 2(}...24 24-28 28-32 32-36 36-40 No. of Persons Cured 11 13 16 14 ? 9 17 6 4 The average number of tablets to cure fever was 19.9. Solution: Let the missing frequency be .t;. - I./X Now AM=X=N 1772 + 22/1 or 19.9 =---~ t;90+

Afeasures ofCentral aass Interval Frequency MdJXJint /(X) Tendency (f) (X) NOTES 4-8 11 6 66 8-12 10 130 12-16 13 14 224 16-20 16 18 252 20-24 14 22 22t; 24-28 26 234 28-32 t; 30 510 32-36 9 34 204 36-40 17 38 152 6 Total 4 '£/(X)= 1772 + 22t; r.f= 90 +f. or 1791 + 19.9J;\" = 1772 +22t;\" or t;\" =9.0 Hence, the missing frequency is 9. Problem 3: Connnent on the perfonnance of the students in the three universities given below using simple and weighted averages. University Bombay Calcutta A!adras Course of Percent of NJ.of Percent of NJ.of Percent of NJ.of Srudy Pass SflJdents Pass SflJdents Pass SflJdents in Hundreds in Hundreds in Hundreds M.A. 71 3 82 2 81 2 M.Com. 83 4 76 3 76 3.5 B.A. 73 5 73 6 74 4.5 B.Com. 74 2 76 7 58 2 B.Sc. 65 3 65 3 70 7 M.Sc. 66 3 60 7 73 2 Solution: Bombay Course Per cent ofPass NJ.of ~·=~-75 WI~' of SflJdents in (~) Hundreds -4 -12 Srudy 32 71 (wl) 8 -10 M.A. 83 -2 -2 M.Com. 73 3 -1 -30 B.A. 74 4 -10 -27 B.Com. 65 5 -9 B.Sc. 66 2 'f.w1A;' = -49 M.Sc. 3 'f.A;' = -18 3 r.\"'I = 20 St.mp1e Ari\"thmett.c Average: -X1 = -I.N\\-1 + A =- 1-8 + 75 = 72.0 6 Weighted Average= Xj w = L .\\i'\"i + A=- 49 + 75 = 72.55 I;w 20 62 Self-Instructional .Material

Calcutta J..k:asures ofCentral Tendency Course Per cent ofPass NJ.of ~·=~-75 wh' of (A;) Students in NOTES Hundreds Study Self-Instructional Mlterial 63 (w2) M.A. M.Com. 82 2 7 14 B.A. 3 B.Com. 76 3 1 -12 B.Sc. 7 M.Sc. 73 6 -2 -30 -105 76 7 1 65 3 -10 60 7 -15 rw2 = 28 rA;' = -18 rw2A;' =- 123 = 6Simple Arithmetic Average= .-.\\2 -18 + 75 = 72.0 Weighted Average= -X 2 w -123 + 75 =70.61 =-- 28 ll&tdras Course Per cent ofPass NJ.of A)'=A)-75 wy\\3' of Students in (Xj) Hundred Study (w3) M.A. 81 6 12 M.Com. 76 2 1 3.5 B.A. 74 3.5 -1 4.5 B.Com. 58 4.5 -17 -34 B.Sc. 70 2 -5 -35 M.Sc. 73 7 -2 -4 2 rXj' = -18 rw3Xj' = -62 r~ = 21 Simple Arithmetic Average= .X; = - 18 + 75 = 72.0 6 Weighted Average= - -62 + 75 = 72.05 =21- Y \"\"\":lw Noll:s: 1. Since the simple arithmetic mean gives equal importance to all examinations and ignores their importance in terms of the number of students taking each examination, it is a misleading figure. 2. The weighted arithmetic mean, on the other hand, gives the correct percentage of passes of each university. Problem 4: The mean wage of I 00 labourers working in a factory running two shifts of 60 and 40 workers respectively, is~ 38. The mean wage of 60 labourers working in the morning shift is~ 40. Find the mean wage of 40 labourers working in evening shift. Solution: We know, X= (NjXj + ~X2) Nj+~ or 38 = -(6-0-x4-0-)+-(4-0-x=Xi-) 100

Jl.f:asures ofCentral or 40.\\'2 =3800-2400 Tendency Xi =35 NOTES Hence, the mean wage of 40 workers in the evening shift is 't 35. Problem 5: Twenty per cent of the workers in a firm employing a total of 2000 earn less than 't 2 per hour, 440 earn from 't 2 to 2.24 per hour, 24 per cent earn from 2.25 to 2.49 per hour, 370 earn from 't 2.50 to 2.74 per hour, 12 per cent earn from 't 2.75 to 2.99 per hour and the rest earn 't 3 or more per hour. Set up a frequency table and calculate the modal wage. Solution: Earnings Per Hour NJ. ofW>rk~ Less than ~ 2.00 400 ~ 2.00 to~ 2.24 440 ~ 2.25 to ~ 2.49 480 ~ 2.50 to~ 2.74 370 ~ 2.75 to~ 2.99 240 ~ 3.00 and more 70 Total 2000 In the given data 2.25 to 2.49 is the modal class. The real limits of this class are 2.245 to 2.495. Using the mode formula we get, Mode - l + 11 - 10 x i - l 2/j-fo-'i = 2.245 + 480-440 X 0.25 960-440-370 40 = 2.245 + 40 + 110 X 0.25 =2.245+0.7='t2.315 Problem 6: A manufacturer ofhand shovels is deciding what length handles to use. Studies of user preference reveal that the average, the median and the modal preferred length are all different. What are the implications of using each of these values? Which value would you decide? Solution: If the average length is used, it is quite possible that it may suit none. Those who prefer shorter than the average length, as also those who prefer longer than the average length will be put to discomfort. That is to say, a very large number (perhaps all) users should not take it. In case median is used, a similar situation would arise except that halfthe users would be uncomfortable because the handle is too short and another half would be uncomfortable because the handle is too long. If the mxlc is used, the largest number of persons would be satisfied. Hence, the modal length handles should be used. Problem 7: Twenty boats make 6 transatlantic trips each per year, and 10 boats make 4 trips per year. What is the average number of days for a 'turn around' (that is the time between consecutive departures from the same part? Take the year as 360 days for convenience. 64 Self-Instructional Material

Solution: Masures ofCentral Tendency Trip Boats Toml Trips NOTES 6 20 120 4 10 40 Self-Instructional Material 65 Average trt.ps per boat=16-0 = 5.33 30 Average number of days for a trip= 360 = 67.5 days 16/3 or This average may also be determined in another manner also, viz., using harmonic mean, since it is averaging speeds. Number of days needed for a trip are 60 and 90 days, respectively, for the groups consisting of 20 and 10 boats. Harmom.c Mean = 1 20+ 10 1 -X20+-XlQ 60 90 = -13- 0 1 = -304X-9 = 67.5 days -+- 39 Problem 8: A given machine is assumed to depreciate 40 per cent in value in the first year, 25 per cent in the second year and 10 per cent per annum for the next three years, each percentage being calculated on the diminishing value. What is the average percentage depreciation, reckoned on the diminishing value for the five years? Solution: Average diminishing value, ?J(0.60 X0. 75 X0.90 X0.90 X0.90) = Antilog[~(-1. 7~82 -1.8751-1.9542 -1.9542 -1.9542)] =Antilog [-1.9032] = 0.80 Thus depreciation is 20 per cent. 3.18 SUMMARY In this unit, you have learnt the measures of central tendency used in statistical analysis. The most important objective of statistical analysis is to determine a single value for the entire mass of data so that it describes the overall level of the group of observations and can be considered a representative value of the whole set of data. It tells us where the centre of the distribution of data is located on the scale that we are using. This unit emphasizes on the distribution of data and measures of central location. 3.19 ANSWERS TO 'CHECK YOUR PROGRESS' 1. A single number describing some feature of a frequency distribution is called descriptive statistics. 2. This is also commonly known as simply the mean. Even though average, in general, . means any measure of central location. When we use the word average in our daily routine, we always mean the arithmetic average. 3. (a) It is a measure that can be easily calculated. (b) It includes all values of the data set in its calculation.

kblsures ofCentral (c) Its value varies very little from sample to sample taken from the same population. (d) It is useful for performing statistical procedures such as computing and Tendency comparing the means of several data sets. NOTES 4. Mean=68 66 Self-Instructional .M!Jterial Mode= 80 Median=72 5. Mean=7 6. ~ 162.1 7. 98 8. The major limitations or drawbacks of the geometric mean are as follows: (1) It is difficult to use and to compute. (il) It is determined for positive values and cannot be used for negative values of zero. 9. Harmonic mean is used for averaging the rates. 3.20 QUESTIONS AND EXERCISES Short-Answer Questions 1. What do you mean by descriptive statistics? 2. How is central tendency measured? 3. Define the term arithmetic mean. 4. Write three characteristics of mean. 5. What is the importance of arithmetic mean in statistics? 6. Explain the term median with example. 7. How is location of median calculated using graphic analysis? 8. Define quartiles, deciles and percentiles with suitable examples. 9. What is~? How is it calculated? 10. Differentiate between a mean and a mode. 11. What is geometric mean? How is it calculated? 12. Explain the importance of harmonic mean. 13. Describe the following terms: (a) Choice of Averages (b) Misuse of Averages 14. The following are the monthly salaries, in rupees, of the employees in a branch bank. Calculate the arithmetic mean. 10, 17, 29, 95, 95, 100, 100, 175,250 and 750. 15. The following figures represent the number of books issued at the counter of a commerce library in 11 different days. Calculate the median. 96, 180, 98, 75, 270, 20, 102, 100, 94, 75, 200. 16. An investor buys~ 1200 worth of shares in a company each month. During the first five months, he bought the shares at a price of~ 10, ~ 12, ~ 15, ~ 20 and~ 24 per share. After 5 months what is the average price paid for the shares by him? 17. The price of a commodity was four times higher in 1970 than what it was a decade back. Find the average rate of growth of price of the commodity. 18. Arithmetic mean of a group of 100 items is 50 and of another group of 150 items is 100. What will be the mean of all the items?

19. Arithmetic mean of98 items is 50. Two items 60 and 70 were left out at the time of Afeasures ofCentral calculation. What is the correct mean of all the items? Tendency Long-Answer Questions NOTES I. Define the various measures ofcentral tendency. What purposes do their measurement serve? 2. Define geometric and harmonic mean and explain their uses. 3. Show the relative positions of different averages in a moderately synnnetrical series. 4. What do you mean by: (a) Quartiles (b) Deciles (c) Percentiles 5. What are the qualities which an average must possess? Which of the averages, possess most of these qualities? 6. What do you mean by 'weights'? Why are they l;lSSigned? Point out a few cases in which weighted average should be used. 7. Differentiate between crude and corrected death rates. 8. The expenditure often families in rupees are given below: Family ABCDEF GHI J Expenditure 30 70 10 75 500 8 52 250 50 36 Calculate the Arithmetic Average by (a) Direct Method and (b) Short-cut Method. 9. Eight coins were tossed together and the number of heads resulting was observed. The operation was performed 256 times and the frequencies that were obtained for the different values of x, the number of heads, are shown in the following table. Calculate mean, median and quartiles of the distribution of x. X 012 3 4 56 7 8 Frequency 1 9 26 59 72 52 29 7 10. Find the mean of the following distribution: Breadth in mm 19-21 22-24 25-27 28-30 31-33 34-36 37-39 6 13 19 23 18 12 9 11. The following table shows the number of persons employed in certain units of an industry. Find the average number of persons employed. No. of Persons: below 20 2(}-30 3(}-50 5(}-100 10(}-200 200 and above 25 6 3 2 2 12. Calculate arithmetic mean for the following data: Class Interval 5-10 1(}-15 15-20 2(}-25 25-30 3(}-35 35-40 40-45 Frequency 6 5 15 10 5 4 3 2 13. From the following table, calculate mean and median. Crop Cutting Experimental Ddta. on Plot }felds ofl~·heat lfeld (in lbs) NJ. ofPlots lfeld (in lbs.) NJ. ofPlots OverO 216 Over 300 31 Over 60 210 Over 360 13 Over 120 156 Over420 7 Over 180 Over480 2 Over 240 98 Up to 540 216 57 Self-Instructional Mlterial 67

M:asures ofCentral 14. Find arithmetic mean, median and mode from the following: 'Jendency Marks below 10 20 30 40 50 60 70 80 NOTES No. of Students 15 35 60 84 96 127 198 250 15. The wages of 1060 employees range from~ 300 to~ 450. They are grouped in 15 classes with a common class interval of~ 10. Class frequencies from lowest to the highest are 6, 17, 35, 48, 65, 90, 131, 173, 155, 177, 75, 52, 9, 6. Tabulate the data and calculate the mean wage. 16. From the table given below, find the mean. Salary Per Day NJ. ofPersoiJS Salary Per Day NJ. ofPersoiJS 1-5 7 26--30 18 6--10 10 31-35 10 11-15 16 36-40 5 16--20 32 41-45 1 21-25 24 17. Calculate arithmetic mean from the following data: Temp. oc -40 to -30 to -20 to -10 toO to 10 to 20 to 30 65 180 10 No. of Days 10 28 30 42 18. Calculate the median, 3rd decile and 20th percentile from the following data: Central Size 2.5 7.5 12.5 17.5 22.5 Frequency 7 18 25 30 20 19. Calculate median, mode, quartile, 7th decile and 87th percentile from the following data: lflriate lfllue Frequency lflriate lfllue Frequency 7-10.99 5 31-34.99 12 11-14.99 9 35-38.99 7 15-18.99 13 39-42.99 5 19-22.99 21 43-46.99 3 23-26.99 17 47-50.99 2 27-30.99 15 51-54.99 1 20. In the frequency distribution of 100 families given below, the number of families corresponding to expenditure groups 2()-40 and 60-80 are missing from the table. However, the median is known to be ~ 50. Find the missing frequencies. Expenditure 0--20 20--40 40--60 60--80 80--100 (f) No. of Families: 14 ? 27 ? 15 21. Obtain the mode of the following data: MJnthlyRent NJ. ofFamilies MmthlyRent NJ. ofFamilies in((} Paying the Rent in((} Paying the Rcnt 20-40 6 100--120 20 40--60 9 120--140 15 60--80 11 140-160 10 80-100 14 160--180 8 180--200 7 68 Self-Instructional Material

22. (a) From the data given below, find the mode. !vkasures ofCentral Tendency Ages 20-25 25-30 30-35 35-40 40-45 45-50 50-55 55-60 NOTES No. of Persons 50 70 80 180 150 120 70 50 Self-Instruclional Mlterial 69 (b) If the mode and mean of a moderately asynnnetrical series are respectively 16 inches and 20.2 inches, compute the most probable median. 23. Following is the distribution of the size of certain farms selected at random from a district. Calculate the mode of distribution. Central Size of the Farm in Acres 10 20 30 40 50 60 70 No. of Farms 7 12 17 29 31 5 3 24. Draw a histogram from the following data and measure the modal value: Oass Size Frequency Oass Size Frequency 0-10 5 50-60 10 10-20 11 60-70 8 20-30 19 70-80 6 30-40 21 80-90 3 40-50 16 90-100 1 25. Monthly incomes of the families are given below in rupees: 2000, 35, 400, 15, 40, 1500, 300, 6, 90, 250, 20, 12, 450, 10, 150, 8, 25, 30, 1200, 60. Calculate the geometric mean and harmonic mean of the above series. 26. The following table gives weights of 31 persons in a sample inquiry. Calculate mean weight using (a) Geometric mean and (b) Harmonic mean. Weight in lbs. 130 135 140 145 146 148 149 150 157 No. of Persons 3 4 6 6 3 5 2 27. Peter travelled by car for 4 days. He drove 10 hours each day. He drove: first day at the rate of 45 km per hour and fourth day at the rate of 37 km per hour. What was his average speed? 28. The price of certain articles becomes 1.!. times in first year, 1~ times in the 28 second year and 2 times in the third year. What is the average change per year? 9 29. You take a trip which entails travelling 900 miles by train at an average speed of 60 mph, 3000 miles by boat at an average of 25 mph, 400 miles by plane at 350 mph, and finally 15 miles by taxi at 25 mph. What is your average speed for the entire distance? 30. Calculate the simple average and weighted average of the following items: Items 68 85 101 102 108 110 112 113 124 128 143 146 151 153 172 Weights 46 31 11 7 23 17 9 14 2 4 6 5 2 Account for the difference in the two averages. 31. The following is the distribution of 136 individuals by 10 years age groups. Calculate the measure of central tendency, which will appropriately describe the distribution. Age Group 0-9 10-19 20-29 30-39 40-49 50-59 60-69 70 and over No. of Persons 48 26 27 11 13 4 3 4 32. In a shooting competition a person shoots 200 times a running object. If he shoots in front of the target it is taken as positive distance and if he shoots behind the target it is taken as negative distance. The distribution ofdistance missed (in em.) by different shots are as follow:

M:asures ofCentral Distance in em -30 to -20 -20 to -10 -10 to 0 0 0 to 10 10 to 20 20 to 30 'Jendency No. of Shots 5 15 30 90 43 9 8 NOTES Find the average distance by which (a) Shot is ahead of the target (b) Shot is behind the target, (c) Shot missing the target misses it and (d) A shot misses the target. 33. From the following data or calculation of arithmetic mean find the missing item Housing Rent in f 110 112 113 117 125 128 130 No. of Houses 25 17 13 15 14 8 6 2 Mean Rent= 115.86 34. Monthly expenditure for a group of families is as below: Expenditure in f 100-200 200-300 300-350 350-400 400-500 No. of Families 8 20 12 5 Median of expenditure is known to be as 317.5. Determine the number of families having expenditure between~ 200 to~ 300 per month. 35. Modal marks for a group of 57 students is 27. 5 students got marks between 0 to 10 and 15 students got marks between 20--30. Maximum marks in the test were 50, and 7 students got marks between 50-50. Tabulate the data in class interval of size 10 and calculate the missing frequencies. 36. Weekly wages for a group of 100 persons are given below: Wages in (f) 0-55-10 10-15 15--20 20-25 No. of Persons 7 25 30 The 3rd decile for this group is ~ 11. Calculate the missing frequency. 37. Do you agree with the following? (a) Rate for a certain commodity in the first week is 4 kilos for a rupee and in the second week is 8 kilos for a rupee. So the average price is (4+8)/2 = 6 kilos for a rupee. (b) Usually the attendance ofB.Com 1st year class in a college is 60 students per day. Therefore the total attendance for 100 working days is 6000. (c) An ordinary person consumes 30 g of salt per week. So 32 crores of persons living in India will consume 19.2 crore kg of salt in 5 months. (1 month = 4 weeks) (d) The increase in the price of commodity x was 20%. Then the price decreased 25%, and again increased 15%. So the resultant increase is (1) 15%, and (iJ) 10%. (e) The rate ofincrease in the number ofcows in India is greater than that ofpopulation. So the people of India are now getting more milk per head. 38. (a) The average rainfall for a week excluding Sunday was 0.50 inches. Due to heavy rainfall on Sunday the average for the week rose to 1.5 inches. How much rainfall was there on Sunday? (b) A train runs 25 miles at a speed of 30 mph, another 50 miles at a speed of 50 mph, then due to repairs of the track, travels for 6 minutes at a speed of 10 mph and finally covers the remaining distance of 25 miles at a speed of 25 mph. What is the average speed in miles per hour? (c) The annual rates of growth of output of a factory in 5 years are 5.0, 7.5, 2.5, 5.0 and 10.0 per cent respectively. What is the compound rate growth of output per annwn for the period? 39. The population of a country was 300 million in 1951. It became 520 million in 1969. Calculate the percentage compound rate of growth per annum. 70 Self-Instructional Material

3.21 FURTHER READING ~asures ofCem.~-aJ Kothari, C.R.. 1984. Quantitative Techniques, 3rd Edition. New Delhi: Vikas Publishing Tendency House Pvt. Ltd. Chandan, J.S. 1998. Statistics for Business and Economics. New Delhi: Vikas Publishing NOTES House Pvt. Ltd. Chandan, J.S., Jagjit Singh and K.K. Khanna. 1995. Business Statistics, 2nd Edition. New Delhi: Vikas Publishing House Pvt. Ltd. Self-Instructional Jo.flterial 71

UNIT 4 MEASURES OF DISPERSION Jl.f:asures ofDispersion Structure NOTES 4.0 Introduction Self-Instructional Mllerial 73 4.1 Unit Objectives 4.2 Measure of Dispersion: Definition 4.3 Range 4.4 Quartile Deviation 4.5 MeanDeviation 4.6 Coefficient ofMean Deviation 4. 7 Standard Deviation 4.8 Calculation of Standard Deviation by Short-cut Method 4.9 Combining Standard Deviations of Two Distributions 4.10 Comparison of Various Measures of Dispersion 4.11 Solved Problems 4.12 Summary 4.13 Answers to 'Check Your Progress' 4.14 Questions and Exercises 4.15 Further Reading 4.0 INTRODUCTION In this unit, you will learn about the measures of dispersion. The measures of central tendency is computed to see through the variability or dispersion of the individual values. But the dispersion is in itself a very important property of a distribution and needs to be measured by an appropriate statistics. In this unit, you will learn that measure ofdispersion can be expressed in an 'absolute form', or in a 'relative form'. It is said to be in an absolute form when it states the actual amount by which the value of an item on an average deviates from a measure of central tendency. A relative measure of dispersion is a quotient obtained by dividing the absolute measures by a quantity in respect to which absolute deviation has been computed. Relative measures are used for making comparisons between two or more distributions. The common measures of dispersion are range, semi-interquartile range or the quartile deviation, mean deviation and standard deviation. Of these, the standard deviation is the best measure. All these measures are discussed in this unit. 4.1 UNIT OBJECTIVES After going through this unit, you will be able to: • Define the concept of measures of dispersion and its significance in statistical analysis • Define range of distribution • Differentiate between quartile deviation and standard deviation • Describe how to calculate coefficient of mean deviation and standard deviation • Calculate standard deviation by short-cut method • Combine standard deviation of two distributions • Compare various measures of dispersion

Af:asures ofDispersion 4.2 MEASURES OF DISPERSION: DEFINITION NOTES A measure of dispersion, or simply dispersion may be defined as statistics signifying the extent of the scatteredness of items around a measure of central tendency. 74 Self-Instructional lila teria1 A measure of dispersion may be expressed in an 'absolute form', or in a 'relative form'. It is said to be in an absolute form when it states the actual amount by which the value of an item on an average deviates from a measure of central tendency. Absolute measures are expressed in concrete units, i.e., units in terms of which the data have been expressed, e.g., rupees, centimetres, kilograms, etc., and are used to describe frequency distribution. A relative measure ofdispersion is a quotient obtained by dividing the absolute measures by a quantity in respect to which absolute deviation has been computed. It is as such a pure number and is usually expressed in a percentage form. Relative measures are used for making comparisons between two or more distributions. A measure of dispersion should possess the following characteristics which are considered essential for a measure of central tendency. (a) It should be based on all observations. (b) It should be readily comprehensible. (c) It should be fairly and easily calculated. (d) It should be affected as little as possible by fluctuations of sampling. (e) It should be amenable to algebraic treatment. The following are the common measures of dispersion: (1)The range, (i1) The semi-interquartile range or the quartile deviation, (iil) The mean deviation and (i0 The standard deviation. Of these, the standard deviation is the best measure. All these measures are discussed in this unit. 4.3 RANGE The crudest measure of dispersion is the range of the distribution. The range of any series is the difference between the highest and the lowest values in the series. If the marks received in an examination taken by 248 students are arranged in ascending order, then the range will be equal to the difference between the highest and the lowest marks. In a frequency distribution, the range is taken to be the difference between the lower limit of the class at the lower extreme of the distribution and the upper limit of the class at the upper extreme. Table 4.1 Wekly Earnings ofLabourers in Fow· WJrkshops ofthe Sanr 1jpe Wekfy earnings W>rkshopA NJ. of W>rkers W>rkshop C W>rkshop D t ... W>rkshop B 2 ... 15-16 ... 4 17-18 ... ... 4 ... 19-20 10 21-22 10 2 16 4 23-24 4 14 14 25-26 22 10 12 16 27-28 20 14 6 16 29-30 14 18 6 12 31-32 14 16 2 12 33-34 10 ... 4 35-36 ... 6 4 2 37-38 ... ... ... ... ... ... ... ... Total 80 80 80 80 Mean 25.5 25.5 25.5 25.5

Consider the data on weekly earning of worker on four workshops given in the above !Jeasures ofDispersion Table 4.1. We note the following: NOTES Ubrkshop Range A 9 Self-Ins rructional Material 75 B 15 23 c 15 D From these figures, it is clear that the greater the range, the greater is the variation of the values in the group. The range is a measure of absolute dispersion and as such cannot be usefully employed for comparing the variability of two distributions expressed in different units. The amount of dispersion measured, say, in pounds, is not comparable with dispersion measured in inches. So the need of measuring relative dispersion arises. An absolute measure can be converted into a relative measure if we divide it by some other value regarded as standard for the purpose. We may use the mean of the distribution or any other positional average as the standard. For Table 4.1, the relative dispersion would be: Workshop A= - 9 Workshop C= - 23 25.5 25.5 Workshop B = - 15 Workshop D = - 15 25.5 25.5 An alternate method of converting an absolute variation into a relative one would be to use the total of the extremes as the standard. This will be equal to dividing the difference ofthe extreme items by the total of the extreme items. Thus, Difference of extreme items, i.e., Range Relative Dispersion= -----------'------=- Sum of extren-..:: items The relative dispersion ofthe series is called the coefficient or ratio ofdispersion. In our example of weekly earnings of workers considered earlier, the coefficients would be: Workshop A= 9 9 Workshop B = - -15 - 15 17+32 49 21+30 51 Workshop C= -23- 23 Workshop D = - -15 - 15 19+34 53 =- 15+38 53 Merits and Limitations of Range JMerits. Of the various characteristics that a good measure of dispersion should possess, the range has only two, viz. (1) It is easy to understand, and (ii) Its computation is simple. limitations. Besides the aforesaid two qualities, the range does not satisfy the other test of a good measure and hence it is often termed as a crude measure of dispersion. The following are the limitations that are inherent in the range as a concept of variability: (1) Since it is based upon two extreme cases in the entire distribution, the range may be considerably changed if either of the extreme cases happens to drop out, while the removal of any other case would not affect it at all. (ii) It does not tell anything about the distribution of values in the series relative to a measure of central tendency. (iii) It cannot be computed when distribution has open-end classes. (if11t does not take into account the entire data. These can be illustrated by the following illustration. Consider the data given in Table 4.2.

~sures ofDispersion 'Dlblt: 4.2 Distribution with the Sar1E N.Jml:x:r ofCases, but Different l!Jriability NOTES Oass Section N:J. ofStudents Section 76 Self-Instructional Material A 0-10 Section c 10-20 ... B ... 20-30 ... 30-40 1 ... 40-50 12 19 50-60 17 ... 18 60-70 29 16 70-80 18 12 18 80-90 16 20 18 90-100 6 35 21 11 25 Total 10 ... Range ... 8 ... 110 ... 110 80 ... 60 110 60 The table is designed to illustrate three distributions with the same number of cases but different variability. The removal of two extreme students from section A would make its range equal to that of B or C The greater range of A is not a description of the entire group of 110 students, but of the two most extreme students only. Further, though sections Band Chave the same range, the students in section B cluster more closely around the central tendency of the group than they do in section C Thus, the range fails to reveal the greater homogeneity of B or the greater dispersion of C Due to this defect, it is seldom used as a measure of dispersion. Specific Uses of Range In spite of the numerous limitations of the range as a measure of dispersion, there are the following circumstances when it is the most appropriate one: (a) In situations where the extremes involve some hazard for which preparation should be made, it may be more important to know the most extreme cases to be encountered than to know anything else about the distribution. For example, an explorer, would like to know the lowest and the highest temperatures on record in the region he is about to enter; or an engineer would like to know the maximum rainfall during 24 hours for the construction of a storem water drain. (b) In the study of prices of securities, range has a special field of activity. Thus to highlight fluctuations in the prices of shares or bullion it is a connnon practice to indicate the range over which the prices have moved during a certain period oftime. This information, besides being of use to the operators, gives an indication of the stability of the bullion market, or that of the investment climate. (c) In statistical quality control the range is used as a measure of variation. We, e.g., determine the range over which variations in quality are due to random causes, which is made the basis for the fixation of control limits. 4.4 QUARTILE DEVIATION (QD) Another measure of dispersion, much better than the range, is the semi-interquartile range, usually termed as 'quartile deviation'. AB stated in the previous unit, quartiles are the points which divide the array in four equal parts. More precisely, Q. gives the value of the item

114th the way up the distribution and Q3 the value ofthe item 3/4th the way up the distribution. ~asures ofDispersion Between Q1 and Q3 are included half the total number of items. The difference between Q1 NOTES and Q3 includes only the central items but excludes the extremes. Since under most Self-Instructional Mlterial 77 circumstances, the central half of the series tends to be fairly typical of all the items, the interquartile range ( Q3- Q1) affords a convenient and often a good indicator of the absolute variability. The larger the interquartile range, the larger the variability. Usually, one-half of the difference between Q3 and Q1 is used and to it is given the name of quartile deviation or semi-interquartile range. The interquartile range is divided by two for the reason that half of the interquartile range will, in a normal distribution, be equal to the difference between the median and any quartile. This means that 50 per cent items of a normal distribution will lie within the interval defined by the median plus and minus the semi-interquartile range. Symbolically, Q.D = Q3- q 2 Let us find quartile deviations for the weekly earnings of labour in the four workshop whose data is given in Table 4.1. The computations are as shown in Table 4.3. As shown in the table, Q.D. ofworkshop A is~ 2.12 and median value is 25.3. This means that if the distribution is symmetrical, the number of workers whose wages vary between (25.3-2.1) = ~ 23.2 and (25.3 + 2.1) = ~ 27.4, shall bejusthalfofthe total cases. The other half of the workers will be more than ~ 2.1 removed from the median wage. As this distribution is not symmetrical, the distance between Q1 and the median Q2 is not the same as between Q3 and the median. Hence, the interval defined by median plus and minus semi inter-quartile range will not be exactly the same as given by the value of the two quartiles. Under such conditions the range between~ 23.2 and~ 27.4 will not include precisely 50 per cent of the workers. If quartile deviation is to be used for comparing the variability of any two series, it is necessary to convert the absolute measure to a coefficient of quartile deviation. To do this the absolute measure is divided by the average size of the two quartile. Symbolically, Coefficient of Quartile Deviation = ~ - q ~+q Applying this to our illustration of four workshops, the coefficients of Q.D. are as given below. Table 4.3 Calculation ofQuartile frviation Location of Q2 -N Ubrkshop Ubrkshop Ubrkshop Ubrkshop 2 A B D c 80 =40 80 =40 80 =40 2 2 80 =40 2 2 Q2 24.5 + 40 - 30 X 2 24.5 + 40 - 30 X 2 24.5+ 40 - 30 x2 24.5 + 40- 30 X 2 22 18 16 16 = 24.5 + 0.9 = 24.5 + 1.1 = 24.5 + 0.75 = 24.5 + 0.75 = 25.61 = 25.25 = 25.4 = 25.25 Location of Q1 -N 80 =20 80 =20 80 =20 80 =20 44 4 4 4 Contd...

J..f:asures ofDispersion Contd... Ql 22.5+ 20 - 10 x2 22.5+ 20 - 16 x2 20.5 + 20 -lO X 2 22.5+ 20 - 18 x2 Location of ~ 22 14 10 16 NOTES = 22.5 + 0.91 =22.5 +0.57 =20.5 +2 =22.5 + 0.25 =23.07 =22.5 =22.75 = 23.41 -3N 3x 80 =60 60 60 60 44 Q3 26.5+ 60 - 52 x2 26.5+ 60 - 48 x2 26.5 + 60 - SO X 2 26.5+ 60-SO x2 14 16 12 12 = 26.5 + 1.14 = 26.5 + 1.5 = 26.5 + 1.67 =26.5 + 1.67 =28.0 = 28.17 = 27.64 = 28.17 Quartile Devm. tt.on -~-2- Q 27.64-23.41 28-23.07 28.17-22.5 28.17 - 22.75 2 2 2 2 4.23 =~2.12 = 4.93 =~2.46 = 5.67 =~2.83 5.42 =2 -2- 2 = 2 =~2.71 Coefficient of Quartile Deviation 28-23.07 28.17-22.5 28.17 - 22.75 27.64-23.41 28+23.07 28.17 +22.5 28.17 + 22.75 - 27.64 + 23.41 = 0.097 = 0.112 = 0.106 ~-Q ~+Q =0.083 Check Your Progress Cbaracteriatics ofQuartileDt:viatim. (1) The size ofthe quartile deviation gives an indication about -the uniformity or otherwise of the size of the items of a distribution. If the quartile 1. What is absolute measure of deviation is small it denotes large uniformity. Thus, a coefficient of quartile deviation may dispersion? be used for comparing uniformity or variation in different distributions. 2. What is relative measure of (i1) Quartile deviation is not a measure of dispersion in the sense that it does not show the dispersion? scatter around an average, but only a distance on scale. Consequently, quartile deviation is regarded as a measure of partition. 3. Defmerange. (ii1) It can be computed when the distribution has open-end classes. limitatiODS ofQuartile Dt:viatiOIJ. Except for the fact that its computation is simple and it is easy to understand, a quartile deviation does not satisfy any other test of a good measure of variation. 4;S. MEAN DEVIATION (MD) A weakness of the measures of dispersion discussed earlier, based upon the range or a portion thereof, is that the precise size of most of the variants has no effect on the result. As an illustration, the quartile deviation will be the same whether the variates between Q1 and ~ are concentrated just above q or they are spread uniformly from Q1 to ~· This is an important defect from the viewpoint of measuring the divergence of the distribution from its typical value. The mean deviation is employed to answer the objection. Mean deviation also called average deviation, of a frequency distribution is the mean of the absolute values of the deviation from some measure of central tendency. In other words, mean deviation is the arithmetic average of the variations (deviations) of the individual items of the series from a measure of their central tendency. We can measure the deviations from any measure of central tendency, but the most commonly employed ones are the median and the mean. The median is preferred because it has the important property that the average deviation from it is the least. 78 Self-Instructional Material

Calculation of the mean deviation then involves the following steps: ~asures ofDispersion (a) Calculate the median or the mean, JIJd or ~ (X). NOTES (b) Record the deviations I d I = I x- ~ I of each of the items, ignoring the sign. (c) Find the average value of deviations. NI:JdJ Mean Deviation= Example 4.1: Calculate the mean deviation from the following data giving marks obtained by 11 students in a class test. 14, 15, 23, 20, 10, 30, 19, 18, 16, 25, 12 Solution: Medt.an = s·1ze of -11 +-1 th 1. tem 2 = Size of 6th item = 18 Serial NJ. Marks fx-~dianf 2 10 I df 3 4 12 8 5 14 6 6 15 7 16 4 8 18 9 19 3 20 2 10 23 0 11 25 1 30 2 5 7 12 I ldl =so Mean Deviation from Median = Ildl ]\\ 50 = 0 = 4.5 marks For grouped data, it is easy to see that the mean deviation is given by, Mean Deviation (M.D.)= L fldl If Where I d I= Ix- median I for grouped discrete data, and I d I = M- median Ifor grouped continuous data with Mas the mid-value of a particular group. The following examples illustrate the use of this formula. Example 4.2: Calculate the mean deviation from the following data: Size of Item 6 78 9 10 11 12 Frequency 3 69 13 8 5 4 Self-Instructional !vfaterial 79

Masures ofDispersion Solution: Frequency Cumulative Deviations f/ d I Size (f) Frequency from~dian NOTES 9 6 3 I dl 12 80 Self-Instructional Material 7 6 9 8 9 33 0 9 13 92 8 10 8 18 1 10 11 5 31 0 12 12 4 60 39 1 48 44 2 48 3 Med.tan =.S1ze of -482-+ 1 = 24.5th.Item which.1s 9 Therefore, deviations (d) are calculated from 9, i.e., I dl =I x- 91 Mean .. = I fldl = 60 = 1.25 48 Devmt1on ~ Example 4.3: Calculate the mean deviation from the following data: X 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 f 18 16 15 12 10 5 2 2 Solution: This is a frequency distribution with continuous variable. Thus, deviations are calculated from midvalues. X Midvalue f Less than Deviation f/df !Tom Mdian 0-10 c.[ 342 10-20 fdf 144 20-30 5 18 18 15 30-40 15 16 34 19 132 40-50 25 15 49 9 210 50-60 35 12 61 155 60-70 45 10 71 l 82 70--80 55 5 76 102 65 2 78 ll 75 2 80 21 ll82 31 41 51 80 °Median 8 = Size of 2 th item and then, Mean Deviation = 20+-6X 10 = 24 15 _ Illdl - If

= 1182 = 14.775 A!easures ofDispersion 80 NOTES Merits and Demerits of the Mean Deviation Self-Instructional MJterial 81 (1) It is easy to understand. (il) As compared to standard deviation (discussed later), its computation is simple. (iii) As compared to standard deviation, it is less affected by extreme values. (iv) Since it is based on all values in the distribution, it is better than range or quartile deviation. Dt:merits (1) It lacks those algebraic properties which would facilitate its computation and establish its relation to other measures. (i1) Due to this, it is not suitable for further mathematical processing. 4.6 COEFFICIENT OF MEAN DEVIATION The coefficient or relative dispersion is found by dividing the mean deviations (if deviations were recorded either from the mean or from the median) by mean or by median. Thus, Coeffict.ent ofM .D. -_ -Me-an-De-vi-atio-n Mean (when deviations were recorded from the mean) M.D. Median (when deviations were recorded from the median) Applying the above formula to Example 3. Coeffitct.ent ofMean Dev1.at1.0n = -14·7-75 = 0.6 16 24 4.7 STANDARD DEVIATION By far the most universally used and the most useful measure of dispersion is the standard deviation or root mean square deviation about the mean. We have seen that all the methods of measuring dispersion so far discussed are not universally adopted for want of adequacy and accuracy. The range is not satisfactory as its magnitude is determined by most extreme cases in the entire group. Further, the range is notable because it is dependent on the item whose size is largely matter of chance. Mean deviation method is also an unsatisfactory measure of scatter, as it ignores the algebraic signs of deviation. We desire a measure of scatter which is free from these shortcomings. To some extent standard deviation is one such measure. The calculation of standard deviation differs in the following respects from that of mean deviation. First, in calculating standard deviation, the deviations are squared. This is done so as to get rid of negative signs without committing algebraic violence. Further, the squaring of deviations provides added weight to the extreme items, a desirable feature for certain types of series.

NI:asures ofDisperSion Secondly, the deviations are always recorded from the arithmetic mean, because although the sum of deviations is the minimum from the median, the sum of squares of NOTES deviations is minimum when deviations are measured from the arithmetic average. The deviation from \"i is represented by d Thus, standard deviation, cr (sigma) is defined as the square root of the mean of the squares of the deviations of individual items from their arithmetic mean. cr = /2:-<x;/')2 (4.1) For grouped data (discrete variables), xi(J = :2:. f(x- (4.2) :Lf and, for grouped data (continuous variables), cr = ~:2:. f(M- x) (4.3) If Where Mis the midvalue of the group. The use of these formulae is illustrated by the following examples. Example 4.4: Compute the standard deviation for the following data: 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 Solution: Here formula (4.1) is appropriate. We first calculate the mean as \"i = :2:. xlN = 176/11 = 16, and then calculate the deviation as follows: X (x- :i) (x- :i)2 11 -5 25 12 -4 16 13 -3 9 14 -2 4 15 -1 1 16 0 0 17 +1 1 18 +2 4 19 +3 9 20 -+4 16 21 +5 25 176 11 Thus by formula (4.1) (J = ~110 =.flO= 3.16 11 Example 4.5: Find the standard deviation of the data in the following distributions: X 12 13 14 15 16 17 18 20 f 4 11 32 21 15 8 6 4 82 Self-Instructional J1.1aterial

Solution: For this discrete variable grouped data, we use formula (4.2). Since for calculation !vleasw·es of DJ:'ij)ersion of:\\, we need L: frand then for cr we need L: r(x- x)2 , the calculations are conveniently NOTES made in the following format. Check Your Progress X r IX d=x-~ J- ref- 4. How is the variability of any two series compared 12 4 48 -3 9 36 using quartile deviation? 13 11 143 -2 4 44 5. Explain mean deviation. 6. What are the merits of mean 14 32 448 -1 1 32 deviation. 15 21 315 0 0 0 7. Define standard deviation. 16 15 240 1 1 15 Self-lllstructional Mlterial 83 X r IX d= X- 1\" cf ref- 17 8 136 2 4 32 18 5 90 3 9 45 20 4 80 5 25 100 100 1500 304 Here x = I IXI\"[J = 15001100 = 15 and = V{I3O0O4 = .J3.o4 = 1.74 Example 4.6: Calculate the standard deviation of the following data: Class 1-3 3---5 5-7 7-19 9-ll 11--13 13-15 3 Frequency 9 25 35 17 10 Solution: This is an example of continuous frequency series and formula (4.3) seems appropriate. Class Mid- Frequency Deviation Squared Squared point orMid- Deviation Deviation point(x) (x) (/) r~x) from 1inrs cf Frequency M:an(d) rei l-3 2 I 2 --6 36 36 3-5 4 16 144 5-7 6 9 36 -4 4 100 7-9 8 0 9-ll 10 25 150 -2 4 0 11-13 12 16 68 13-15 14 35 280 0 36 160 108 17 170 2 616 10 120 4 3 42 6 100 800 First the mean is calculated as, x = 'Lfx!If = 800/100 = 8.0 Then the deviations are obtained from 8.0. The standard deviation, /z: r( M- x) 2 cr v z:r cr --~1-6- = 2.48 100

!vkasures ofDispersion 4.8 CALCULATION OF STANDARD DEVIATION BY SHORT-CUT METHOD NOTES The three examples worked out above have one common simplifying feature, namely \"X in 84 Self-Instructional Material each, turned out to be an integer, thus, simplifying calculations. In most cases, it is very unlikely that it will turn out to be so. In such cases, the calculation of d and cf becomes quite time-consuming. Short-cut methods have consequently been developed. These are on the same lines as those for calculation of mean itself In the short-cut method, we calculate deviations x' from an assumed mean A Then, for ungrouped data, cr = L~2 -(I~'r (4.4) and for grouped data, (r./X)2cr= -2./X-2 - - - (4.5) 2.[ 2.[ This formula is valid for both discrete and continuous variables. In case ofcontinuous variables, x in the equation x' = x- A stands for the rnidvalue of the class in question. Note that the second term in each of the formulae is a correction term because of the difference in the values of A and \"X. When A is taken as \"X itself, this correction is automatically reduced to zero. The following examples explain the use of these formulae. Example 4.7: Compute the standard deviation by the short-cut method for the following data: 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,21 Solution: Let us assume that A = 15 11 x'=(x-15) xi2 12 13 -4 16 I4 -3 9 15 -2 4 16 -1 1 17 0 0 18 I 1 I9 2 4 20 3 9 21 4 16 5 25 N= II 6 36 Ix' = 11 Ix' 2 = I2l Vt:/- N_ :Zx'2 (Ix')2 cr- ll)= .G.! - ( 2 = JIT=1 ll 11 =JfO = 3.16 Another Method: If we assumed A as zero, then the deviation of each item from the assumed mean is the same as the value of item itself. Thus, 11 deviates from the assumed

mean of zero by 11, 12 deviates by 12, and so on. As such, we work with deviations J\\.f:asures ofDispersion without having to compute them, and the formula takes the following shape: NOTES X J- Check Your Progress 11 121 8. Calculate standard 12 144 deviation for the series 13 169 I, 2, 3, 5, 7. 14 1% Self-Instructional Milterial 85 15 225 16 256 17 289 18 324 19 361 20 400 21 441 176 2926 cr= I;; -(I:r = 2926 -c76r = J266-256 = 3.16 11 11 Example 4.8: Calculate the standard deviation of the following data by short-cut method. Person 234567 Monthly Income (Rupees) 300 400 420 440 460 480 580 Solution: In this data, the values of the variables are very large making calculations X- A --w.cumbersome. It is advantageous to take a common factor out. Thus, we use x' = The standard deviation is calculated using x' and then the true value of cr is obtained by multiplying back by 20. The effective formula used is, I~ (I~·rcr = ex 2 _ Where C represents the common factor. x' xr2 Using x' = (x- 420)/20 -6 36 X Deviation fi\"om -1 1 ksl.Jired Man 0 0 x =(x-420) 1 -- 4 300 -120 --7- 9 400 -20 420 0 1 64 2 440 20 3 Z:.x11 =115 460 40 8 480 ro -- 580 1ffi +14 N=7 Z:.x'=7

~asures of Dispersion = 20~1~5 -(~r = 78.56 NOTES Example 4.9: Calculate the standard deviation from the following data: Size 69 I2 I5 I8 Frequency 7 I2 I9 IO 2 Solution: X Frequency - frvia.tion frviation x'tinrs x 12 tirn:s (f) /}om divided Frequency ffequency . A5sunrd by Common -..(.tX' ( JXI2> Man12 Factor] (x' 67 -6 -2 -14 28 9 12 -3 -1 -12 12 12 19 00 00 15 10 3 1 10 10 18 2 62 48 N=50 :Lfx' L;fx' 2 =-12 =58 Since deviations have been divided by a common factor, we use, = 3 58- (-12)2 50 50 = 3~1.1600- 0.0576 =3 X 1.05=3.15 Example 4.10: Obtain the mean and standard deviation of the first Nnatural numbers, i.e., of I, 2, 3, ..., N- I, N Solution: Let xdenote the variable which assumes the values of the first Nnatural numbers. Then, N(N+ I) x= _,_ =---2=--- -N+-l NN 2 Hence, N LX= I + 2 + 3 + ... + (N- 1) + N I N(N+l) 2 86 Sel.f.-lnstructional l't4aterial

To calculate the standard deviation cr, we use 0 as the assumed mean A Then, ~asures ofDispersion But, NOTES Therefore, Self-Instructional 114lterial 87 cr N(N+ 1)(2N+ 1) N2 (N+ 1) 2 6N 4N2 (N+ 1) [-2N-+ 1- -N+-1]-_ (N+ 1) (N-1) 2 32 12 Thus for first 11 natural numbers, and :X= 11+1=6 Example 4.11: 2 cr (11 + 1)(11-1) = .JW = 3.16 12 0-10 Md- Frequency Deviation Deviation Squared 10-20 point trom Gass tinr Deviation (f) ofAssunrd 20-30 (x) Frequency times 30--40 18 ~n Frequency 40-50 5 16 (IX) 50-60 15 (x) ( JXI2) 60-70 -36 70-80 25 -2 -16 72 35 45 -I -52 16 55 0 65 15 0 12 0 75 12 I 20 12 10 2 15 40 53 8 45 24 5 32 .I 5 25 60 \"£[=79 60 242 -52 IJx =8 Solution: Since the deviations are from assumed mean and expressed in terms of class interval units, (L fx')cr = jx -Lx-' 2 - - - 2 NN = lQ X 242 - (_!_)2 79 79 = 10 X 1.75 = 17.5

!vkasures ofDispersion 4.9 COMBINING STANDARD DEVIATIONS OF TWO DISTRIBUTIONS NOTES If we were given two sets of data of N; and N; items with means :X1 and :X2 and standard deviations cr1 and cr2 respectively, we can obtain the mean and standard deviation x and cr of the combined distribution by the given formulae: (4.6) and 0\" = Njcr~ + Nza~ + Nj (:X- :x,)z + Ni (x- :Xz)2 (4.7) Nj+Nz Example 4.12: The mean and standard deviations of two distributions of 100 and 150 items are 50, 5 and 40, 6 respectively. Find the standard deviation of all taken together. Solution: Combined mean, 100 X 50+ 150 X 40 = 44 100 + 150 Combined standard deviation, 1\\fof + .Ni~ + 1\\f(x- ~)2 + 1\\li(x- x2)2 0\" 1\\f+.Ni 100 x(5) 2 + 150(6)2 + 100(44 -50) 2 + 150(44 -40) 2 100 + 150 = 7.46 Example 4.13: A distribution consists of three components with 200, 250, 300 items having mean 25, 10 and 15 and standard deviation 3, 4 and 5, respectively. Find the standard deviation of the combined distribution. Solution: In the usual notations, we are given here: N; = 200, N;= 250, ~ = 300 :X1 = 25, :X2 = 10, :X3 = 15 The formulae (4.6) and (4.7) can easily be extended for combination of three series as 1\\, x-, + 1\\2 x2 + 1\\3 x3 Nj+Nz+~ 200 X 25 + 250 X 10 + 300 X 15 200 + 250 + 300 = 12000 = 16 750 88 Self-Instructional Material

and, ~asures ofDispersion N.af + ~~ + 1\\3ar + N.<x- ~>2 NOTES + ~ (x- x2)2 + 1\\3 (x- x3)2 200 X 9 + 250 X 16 + 300 X 25 + 200 X 81 + 250 X 36 + 300 X 1 200 + 250 + 300 4.10 COMPARISON OF VARIOUS MEASURES OF Check Your Progress DISPERSION 9. For a group of 50 male The range is the easiest to calculate the measure of dispersion, but since it depends on workers, the mean and extreme values, it is extremely sensitive to the size ofthe sample, and to the sample variability. standard deviation of their In fact, as the sample size increases the range increases dramatically, because the more the weekly wages are~ 63 and~ items one considers, the more likely it is that some item will turn up which is larger than the 9 respectively. For a group previous maximum or smaller than the previous minimum. So, it is, in general, impossible of 40 female workers these to interpret properly the significance of a given range unless the sample size is constant. It are ~ 54 and 6 respectively. is for this reason that there appears to be only one valid application of the range, namely in Find the standard deviation statistical quality control where the same sample size is repeatedly used, so that comparison ofthe combined group of90 of ranges are not distorted by differences in sample size. workers. The quartile deviations and other such positional measures of dispersions are also 10. (a) Mean and standard easy to calculate but suffer from the disadvantage that they are not amenable to algebraic deviations of two treatment. Similarly, the mean deviation is not suitable because we cannot obtain the mean distributions of 100 and deviation of a combined series from the deviations of component series. However, it is easy 150 item are 50, 5 and to interpret and easier to calculate than the standard deviation. 40, 6 respectively. Find the mean and standard The standard deviation of a set of data, on the other hand, is one of the most deviations of all the 250 important statistics describing it. It lends itself to rigorous algebraic treatment, is rigidly items taken together. defined and is based on all observations. It is, therefore, quite insensitive to sample size (provided the size is 'large enough'} and is least affected by sampling variations. (b) Mean and standard deviations of I00 items are It is used extensively in testing of hypothesis about population parameters based on found by a student as 9 sampling statistics. and 5.. If at the time of calculations two items are In fact, the standard deviation has such stable mathematical properties that it is used wrongly taken as 40 and 50 as a standard scale for measuring deviations from the mean. If we are told that the instead of 60 and 30, fmd performance of an individual is 10 points better than the mean, it really does not tell us the correct mean and enough, for 10 points may or may not be a large enough difference to be of significance. standard deviations. But ifwe know that the cr for the score is only 4 points, so that on this scale, the performance is 2.5 cr better than the mean, the statement becomes meaningful. This indicates an extremely good performance. This sigma scale is a very commonly used scale for measuring and specifying deviations which immediately suggest the significance of the deviation. The only disadvantages of the standard deviation lies in the amount of work involved in its calculation, and the large weight it attaches to extreme values because of the process of squaring involved in its calculations. Self-Instructional 1\\llllerial 89

~asurcs ofDispersion 4.11 SOLVED PROBLEMS NOTES Problem 1: The arithmetic mean and standard deviation of a series of 20 items were calculated by a student as 20 em and 5 em respectively. But while calculating them an item 90 Self-Instructional Material 13 was misread as 30. Find the correct arithmetic mean and standard deviation. Solution: In the usual notations, we are given N= 20, X = 20 and cr = 5 Corrected L X = NX = 20 X 20 = 400 ~X = 400 - 30 + 13 = 383 Corrected X = Corrected LX= 383 = 19.1 5 ]\\, 20 Also we know that, dl = LX2 -(X)2 ]\\, or LX2 = N(a2 + X 2) = 20 (25 + 400) = 8500 Corrected L )(2 = 8500- (30)2 + (13)2 Corrected = 8500-900 + 169 = 7769 a2 = Correct~ L X 2 _ ( Correc~d L X) 2 2. = 7769 - ( 383 ) = 388.45-366.72 20 20 cr = 4.66 Hence, the correct mean is 19.15 and correct standard deviation 4.66. Problem 2: Mean, and standard deviation of the following continuous series are 31 and 15.9 respectively. The distribution after taking step deviation is as follows: )C -3 -2 -1 0 1 2 3 f 10 15 25 25 10 10 5 Determine the actual class intervals. Solution: )C -3 -2 -1 0 1 2 3 Total f 10 15 25 25 10 10 5 100 IX -30 -30 -25 0 10 20 15 -40 IX2 90 60 25 0 10 40 45 270 (I. /X)2VStandard Deviation= J-2:. IX2 ~ x1 N- N= 100 Putting the known values, we have (-40)15.9= 2 l2O70Q- loO • X] = .J2.70- 0.16 xj = 1.59 xi J._ -15-.9 -- 10 1.59 Arithmetl.c Mea.n= A+L-.DN-Cxz.

Pages:

International College of Financial Planning

BUSINESS STATICS

Like this book? You can publish your book online for free in a few minutes!

Create your own flipbook

TOP SEARCH

business design fashion music health life sports home marketing children

BUSINESS STATICS

Description: BUSINESS STATICS

Read the Text Version

International College of Financial Planning

TOP SEARCH

RELATED PUBLICATIONS