BUSINESS STATISTICS MB 06 INTERNATIONAL COLLEGE OF FINANCIAL PLANNINGTM
Authors C.R. Kothari: Units (1,2,3,4,5,10,11,12)© CRKothari, 2011 J.S. Chandan: Units (6,7,9) © JS Chandan, 2011 KB Akhilesh & S Balasubrahmanyam: Units (8,13) ©Reserved, 2011 All rights reserved. No part of this publication which is material protected by this copyright notice may be reproduced or transmitted or utilized or stored in any form or by any means now known or hereinafter invented, electronic, digital or mechanical, including photocopying, scanning, recording or by any information storage or retrieval system, without prior written permission from the Publisher. Information contained in this book has been published by VIKAS® Publishing House Pvt. Ltd. and has been obtained by its Authors from sources believed to be reliable and are correct to the best of their knowledge. However, the Publisher and its Authors shall in no event be liable for any errors, omissions or damages arising out of use of this information and specifically disclaim any implied warranties or merchantability or fitness for any particular use. Vikas® is the registered trademark of Vikas® Publishing House Pvt. Ltd. VIKAS® PUBLISHING HOUSE PVT LTD E-28, Sector-8, Noida - 201301 (UP) Phone: 0120-4078900 • Fax: 0120-4078999 Regd. Office: 576, Masjid Road, Jangpura, New Delhi 110 014 • Website: www.vikaspublishing.com • Email: [email protected]
SYLLABI-BOOK MAPPING TABLE Business Statistics Syllabi Mapping in Book Collection, Classification and Presentation of Statistical Unit 1: Introduction to Quantitative Techniques Data: Primary and Secondary data; Methods of data (Pages 3-10); collection; Tabulation of data; Graphs and charts; Frequency distributions; Diagrammatic presentation of frequency Unit 2: Data Collection and Data Presentation distributions. (Pages 11-36) Measures of Central Tendency: Common measures of Unit 3: Measures of Central Tendency central tendency - mean, median and mode; Partition values (Pages 37-71) - quartiles, Deciles, Percentiles Unit 4: Measures of Dispersion Measures of Dispersion - Common measures of dispersion (Pages 73-97) -range, Quartile deviation, Mean deviation and standard deviation; Measures of relative dispersion. Unit 5: Variance, Moments, Skewness and Kurtosis Moments, Skewness and Kurtosis - Different types of moments and their relationships; Meaning of skewness and (Pages 99-127) kurtosis; Different measures of Skewness andK.urtosis. Unit 6: Probability Theory, Permutations Probability: Basic concepts; approaches; theorems- addition, and Combinations multiplication, Conditional and Bayes; Business applications (Pages 129-155); of probability. Probability distributions: Random variable; Expected value of random variable; Binomial distribution; Unit 7: Probability Distributions Poisson distribution, Normal distribution and Exponential (Pages 157-177) distribution. Unit 8: Business Forecasting Techniques: Correlation Correlation and Regression-Scatter diagram; Simple and Regression correlation coefficient; Simple regression lines; Spearman's rank correlation; Measures of association of attributes. (Pages 179-206) Simple, Partial and multiple correlation; Regression analysis; Business application of correlation and regression. Unit 9: Business Forecasting Techniques: Time Series Analysis Time series analysis: Variations in time series; Trend analysis, Cyclical variation; Seasonal variation; Irregular (Pages 207-247) variation. Time series analysis for forecasting. SPSS for time series analysis. Unit 10: Sampling Theory and its Basic Concepts (Pages 249-285) Statistical inference: Basic concepts; standard error; Central limit theorem; Sampling and types ofsampling; Large sample Unit 11: Hypothesis tests, Small sample tests; Tests for means; Tests for (Pages 287-300) proportions; Tests for paired observations; Non-parametric tests- Chi-square test, sign test, Wilcox on Signed Rank test, Unit 12: Testing of Hypothesis and Decision-Making Krushkal Wallis test, Waid - Wolfowitz test; analysis of (Pages 301-312) variance. Unit 13: Non-Parametric Tests NSPSS for data analysis: data entry in SPSS; Data analysis (Pages 313-344) tools in SPSS; Calculation ofDescriptive statistics, Correlation and Regression; Regression model for forecasting with SPSS; Multidimensional scaling, Factor analysis and conjoint analysis with SPSS.
CONTENTS 1-2 3-10 INTRODUCTION 11-36 UNIT 1 INTRODUCTION TO QUANTITATIVE TECHNIQUES \\ 1.0 Introduction 1.1 Unit Objectives 1.2 Overview of Quantitative Techniques 1.2.1 Need of Quantitative Techniques 1.2.2 Advantages of Quantitative Techniques 1.2.3 Limitations of Quantitative Techniques 1.3 Classification of Quantitative Techniques 1.3.1 Statistical Techniques 1.3.2 Operations Research Techniques 1.4 Quantitative Methods in Decision-Making 1.5 Summary 1.6 Answers to 'Check Your Progress' 1.7 Questions and Exercises 1.8 Further Reading UNIT 2 DATA COLLECTION AND DATA PRESENTATION 2.0 Introduction 2.1 Unit Objectives 2.2 Statistical Data Collection: An Overview 2.3 Data Gathering 2.4 Sampling and Non-sampling Errors 2.5 Classification of Data 2.6 Tabulation of Data 2.6.1 Construction of Tables; 2.6.2 Types of Tables 2.6.3 Advantages of Tabulation of Data 2.7 Frequency Distribution 2.7.1 Constructing A Frequency Distribution 2.8 Cumulative Frequency Distribution 2.9 Relative Frequency Distribution 2.10 Cumulative Relative Frequency Distribution 2.11 Graphic Presentation 2.11.1 Diagrammatic Representation; 2.11.2 Graphic Representation 2.12 Solved Problems 2.13 Summary 2.14 Answers To 'Check Your Progress' 2.15 Questions and Exercises 2.16 Further Reading
UNIT 3 MEASURESOFCENTRALTENDENCY 37-71 73-97 3.0 Introduction 3.1 Unit Objectives 3.2 Descriptive Statistics 3.3 Measures of Central Tendency 3.4 Arithmetic Mean 3.4.1 Arithmetic Mean of Grouped Data 3.4.2 Advantages of Mean 3.4.3 Disadvantages of Mean 3.5 Preparing a Frequency Distribution Table 3.6 Properties of the Mean 3.7 Short-Cut Methods for Calculating Mean 3.8 The Weighted Arithmetic Mean 3.9 The Median 3.10 Location of Median by ~raphical Analysis 3.11 Quartiles, Deciles and Percentiles 3.12 Mode 3.13 Geometric Mean 3.14 Harmonic Mean 3.15 Choice of Average 3.16 Misuse of Averages 3.17 Solved Problems 3.18 Summary 3.19 Answers to 'Check Your Progress' 3.20 Questions and Exercises 3.21 Further Reading UNIT4 MEASURES OF DISPERSION 4.0 Introduction 4.1 Unit Objectives 4.2 Measure ofDispersion: Definition 4.3 Range 4.4 QuartileDeviation 4.5 MeanDeviation 4.6 Coefficient of Mean Deviation 4.7 Standard Deviation ·· 4.8 Calculation of Standard Deviation by Short-cut Method 4.9 Combining Standard Deviations of Two Distributions 4.10 Comparison ofVarious Measures of Dispersion 4.11 Solved Problems 4.12 Summary 4.13 Answers to 'Check Your Progress' 4.14 Questions and Exercises 4.15 Further Reading
UNIT 5 VARIANCE, MOMENTS, SKEWNESS AND KU~TOSIS 99-127 129-155 5.0 Introduction 157-177 5.1 Unit Objectives 5.2 Variance and Coefficient of Variation 5.3 Lorenz Curve 5.4 Moments 5.5 Moments about the Mean 5.6 Skewness 5.6.1 Karl Pearson's Measure of Skewness 5.6.2 Bowley's (Quartile) Measure of Skewness 5.6.3 Kelly's (Percentile) Measure of Skewness 5.7 Kurtosis 5.8 Solved Problems 5.9 Summary 5.10 Answers to 'Check Your Progress' 5.11 Questions and Exercises 5.12 Further Reading UNIT 6 PROBABILITY THEORY, PERMUTATIONS AND COMBINATIONS 6.0 Introduction 6.1 Unit Objectives 6.2 Elementary Probability Theory 6.2.1 Probability 6.2.2 Definition of Probability: Axiomatic, Classical and Frequency 6.2.3 Random Experiment 6.2.4 Sample Space 6.2.5 Events 6.2.5 Independent and Dependent Events 6.2.6 Addition Rule 6.2.7 Multiplication Rule 6.3 Conditional Probability 6.4 Bayes' Theorem 6.5 Permutations and Combinations 6.5.1 Permutation of x out of n Distinct Items 6.5.2 Combination of x Items out of n Distinct Items 6.6 Solved Problems 6.7 Summary 6.8 Answers to 'Check Your Progress' 6.9 Questions and Exercises 6.10 Further Reading UNIT 7 PROBABILITY DISTRIBUTIONS 7.0 Introduction 7.1 Unit Objectives 7.2 Probability Distribution Functions 7.3 Binomial Distribution
7.4 Poisson· Distribution 7.5 Normal Distribution 7.6 Summary 7.7 Answers to ~-Check Your Progress' 7.8 Questions and Exercises 7.9 Further Reading UNIT 8 BUSINESS FORECASTING TECHNIQUES: CORRELATION AND REGRESSION 179-206 8.0 Introduction 207-247 249-285 8.2 Unit Objectives _ 8.3 Correlation 8.4 Types of Correlation 8.5 Methods of Studying Correlation 8.6 RegressionAnalysis 8.6.1 Two Regression Lines 8.6.2 Formulae used in Regression 8.7 Concept of Error 8.8 Coefficient of Determination 8.9 Applications of Correlation and Regression 8.10 Summary 8.11 Answers to 'Check Your Progress' 8.12 Questions and Exercises 8.13 Further Reading UNIT 9 BUSINESS FORECASTING TECHNIQUES: TIME SERIES ANALYSIS 9.0 Introduction 9.1 Unit Objectives 9.2 Components of Time Series Analysis 9.3 Fitting of Trends 9.3.1 Trend Analysis 9.3.2 Smoothing Techniques 9.3.3 Measuring Cyclical Effect 9.4 Applications of Business Problems 9.4.1 Simple Averages 9.4.2 Moving Averages 9.4.3 Measuring Irregular Variation and Seasonal Adjustments 9.5 Solved Problems 9.6 Summary 9.7 Answers to 'Check Your Progress' 9.8 Questions and Exercises 9.9 Further Reading UNIT 10 SAMPLING THEORY AND ITS BASIC CONCEPTS 10.0 Introduction 10.1 UnitObjectives 10.2 What is Sampling? 10.3 Benefits of Sampling 10.4 Methods of Sampling 10.4.1 Deliberate Sampling 10.4.2 Random Sampling
10.4.3 Mixed Sampling 10.4.4 Various Other Sampling Techniques/Designs 10.4.5 Sampling and Non-sampling Errors 10.5 Sampling Theory 10.5.1 The Two Concepts: Parameter and Statistic 10.5.2 Objects of Sampling Theory 10.5.3 Sampling Distribution 10.5.4 The Concept of Standard Error (or S.E.) 10.5.5 Procedure of Significance Testing 10.6 Tests of Significance 10.6.1 Sampling of Attributes 10.6.2 Sampling ofVariables (Large Samples) 10.6.3 Standard Error for Different Statistics 10.6.4 Sampling ofVariables (Small Samples) 10.7 Summary 10.8 Answers to 'Check Your Progress' 10.9 Questions and Exercises 10.10 Further Reading UNIT 11 HYPOTHESIS 287-300 301-312 11.0 Introduction 11.1 Unit Objectives 11.2 What is a Hypothesis? 11.2.1 Statistical Decision-Making 11.3 11.2.2 Committing Errors: Type I and Type II Null and Alternative Hypotheses 11.4 11.3.1 Null Hypothesis and Alternative Hypothesis 11.5 11.3.2 Comparison of Null Hypothesis with Alternate Hypothesis 11.6 Critical Region 11.7 Penalty Standard Error 11.8 Decision Rule 11.9 11.7.1 Large Sample Tests 11.1 0 Summary 11.11 Answers to 'Check Your Progress' Questions and Exercises Further Reading UNIT 12 TESTING OF HYPOTHESIS AND DECISION-MAKING 12.0 Introduction 12.1 Unit Objectives 12.2 A Note on Statistical Decision-Making 12.3 Test for a Sample Mean X 12.4 Test for Equality of Two Proportions 12.5 Large Sample Test for Equality of Two Means .Xj, .\\; 12.6 Small Sample Tests of Significance 12.7 Paired Observations
12.8 To Test the Significance of an Observed Correlation Coefficient r 12.9 X2 Test of Independence 12.10 12.9.1 Contingency Tables 12.11 Test for a given Population Variance 12.12 F-Test 12.13 Summary 12.14 Answers to 'Check Your Progress' 12.15 Questions and Exercises Further Reading UNIT 13 NON-PARAMETRIC TESTS 313-344 13.0 Introduction 13.1 Unit Objectives 13.2 Chi-square Test and Goodness of Fit 13.3 Other Non-parametric Tests of Significance 13.4 Solved Problems 13.5 Summary 13.6 Answers To 'Check Your Progress' 13.7 Questions and Exercises 13.8 Further Reading
INTRODUCTION Introduction Statistics is considered as a mathematical science pertaining to the collection, analysis, NOTES interpretation or explanation, and presentation ofdata and can be categorized as Inferential Statistics and Descriptive Statistics. Statistical analysis is important for taking decisions Self-Instructional Mlterial 1 and is widely used by academic institutions, natural and social sciences departments, governments and business organizations. The word statistics is derived from the Latin word status' which means a political state or government. It was originally applied in connection with kings and monarchs collecting data on their citizenry which pertained to state wealth, collection oftaxes, study of population, and so on. In the beginning of the Indian, Greek and Egyptian civilizations, data was collected for the purpose ofplanning and organizing civilian and military projects. Proper records of such vital events as births and deaths have been kept since the MiddleAges. By the end ofthe 19th century, the field ofstatistics extended from simple data collection and record keeping to interpretation of data and drawing useful conclusions from it. • The subject ofstatistics is primarily concerned with making decisions in the various disciplines of markets and employment such as stock market trends, unemployment rates in various sectors ofindustry, demographic shifts, interest rates and inflation rates over the years. The following examples will make the concept clear about statistical statements: • The crime rate has gone up by 15 per cent over last year. • The average salary of a professor at City University ofNew York is $50,000 per year. • The rate of inflation in America is expected to remain below 4 per cent per year for the next five years. • Less than 20 per cent ofall high school graduates take admission in different colleges for higher and professional education and less than 40 per cent of those who do enter colleges, actually graduate. • The majority ofAmericans consider Japanese cars to be superior in quality than American cars. All these statements represent some form of statistical conclusions. These conclusions help us in forming specific policies and attitudes with respect to diverse areas ofinterest. Statistics, then, is a science that deals with numbers or figures describing the state of affairs ofvarious situations with which we are generally and specifically concerned. To a layman, it often means columns offigures, or perhaps tables, graphs and charts relating to population, national income, expenditures, production, consumption, supply, demand, sales, imports, exports, births, deaths and accidents. Similarly, statistical records kept at universities mayreflect the number ofstudents, the percentage offemale and male students, the number ofdivisions and courses in each division, the number of professors, the tuition received, the expenditures incurred, and so on. Hence, the subject ofstatistics deals primarily with numerical data gathered from surveys or collected using various statistical methods. Its objective is to summarize such data so that it gives us a good indication about certain characteristics ofa population or phenomenon that we wish to study. To ensure that our conclusions are meaningful, it is necessary to subject our data to scientific analyses so that rational decisions can be
Introduction made. Hence, statistics is concerned with proper collection ofdata, organization ofthis data into manageable and presentable form, and analysis and interpretation of the data NOTES into conclusions for useful purposes. Every effort has been made to prepare this book, Business Statistics, interesting and user-friendly. The concepts are analysed in a logical format, usually beginning with an overview that helps the readers to easily understand the concept, followed by explanations and solved examples. Statistical methods and its interpretation is stressed through different types of questions and examples in the 'Check Your Progress' and 'Questions and Exercises' sections. Problem-solving examples and detailed explanations ofstatistical formulae based on mathematical functions will stimulate interest in mathematics and statistical formulations. Additional explanations and examples have been provided to clarify those concepts which students often have difficulty in understanding. 2 Self-Instructional Material
UNITt INTRODUCTION TO Introduction to QUANTITATIVE TECHNIQUES Quantiflltive Techniques Structure NOTES 1.0 Introduction Self-Instructional Material 3 1.1 Unit Objectives 1.2 Overview of Quantitative Techniques 1.2.1 Need for Quantitative Techniques 1.2.2 Advantages of Quantitative Techniques 1.2.3 Limitations of Quantitative Techniques 1.3 Classification of Quantitative Techniques 1.3.1 Statistical Techniques 1.3.2 Operations Research Techniques 1.4 Quantitative Methods in Decision-Making 1.5 Summary 1.6 Answers to 'Check Your Progress' 1.7 Questions and Exercises 1.8 Further Reading 1.0 INTRODUCTION In this unit, you will learn about the importance of quantitative techniques and its various applications. Quantitative techniques are the techniques, which use mathematical and statistical tools for systematically analysing different business related problems and help the management in taking appropriate decisions. They also help the management in exploring new policies to achieve the predetermined goals. Different types of quantitative techniques are used in business enterprises based upon the scenarios for which analysis is done. The objective behind using the quantitative techniques is to forecast the future of the business by working upon the most optimal strategy to obtain maximum profits from investments. 1.1 UNIT OBJECTIVES After going through this unit, you will be able to: D Describe the areas of application of quantitative techniques D Describe the need, advantages and limitations of quantitative techniques D Describe the complexities of various managerial decisions D Explain the various statistical and operational research methods 1.2 OVERVIEW OF QUANTITATIVE TECHNIQUES Quantitative techniques are those statistical and operations research or programming techniques which help in the decision-making process specially concerning business and industry. These techniques involve analysis of data banks, which is primarily the information related to past activities. These techniques are fairly sophisticated and require experts in the field to use them. The use of quantitative techniques for solving problems and taking
Introduction to decisions always provides the best solution of problems as numerical data and actual Quantitative Techniques facts and figures are used for decision-making. NOTES 1.2.1 Need for Quantitative Techniques 4 Self-Instructional Mlterial Nowadays the behaviour of business activities is extensively governed by the use of quantitative techniques. The need to use quantitative techniques for solving business problems arise under the following situations: D To solve complex problems, which involve various aspects of the businesses to be judged for making smart business decisions. D To understand the complex interrelationship between different factors affecting a business. This can be effectively expressed in the form of mathematical equations, which can be easily interpreted and used for decision-making. D To solve problems that are repetitive in nature. 1.2.2 Advantages of Quantitative Techniques Quantitative techniques are frequently used in organizations for decision-making as these techniques provide several advantages, such as the following: D Quantitative techniques help to summarize available data. D Quantitative techniques enable you to simultaneously handle various factors affecting business. D The cause and effect relationship related to different operations can be effectively expressed using numerical figures. A large number of mathematical theorems provided by quantitative techniques can be used for deriving conclusions and for proving different hypothesis about available data. D The use of quantitative techniques aid in proper utilization and allocation of the available resources and thus help in solving problems regarding scheduling, product-mix, etc. D The quantitative techniques help in deciding the optimal strategy for executing a business policy to obtain maximum benefits for the organization by planning relevant future actions. D Quantitative techniques provide alternative strategies and solutions for problems. 1.2.3 Limitations of Quantitative Techniques Quantitative techniques provide several advantages but these techniques cannot be relied upon fully as they have certain limitations associated with them. The limitations of quantitative techniques are: D The use of quantitative techniques for decision-making involve tremendous costs. Experts in the field are required to make use of these quantitative techniques. The use of specialized people for using these techniques prove very expensive as compared to the intuition and judgement policies used by managers for taking decisions. D Quantitative techniques do not take into account intangible factors, which cannot be measured and quantified. The intangible factors, such as skill and potential of employees, are greatly responsible for the success of business. These factors are not evaluated by quantitative techniques while making decisions for the organization. D Quantitative techniques can only be regarded as tools that can aid in taking decisions. The final decisions, regarding the future of the organization, are taken
by the executives and their decision is always accompanied by the current trends Introduction to of market, experience and intuition of the executives. The importance of a Quantitative Teclmiques decision and the future of the business organization cannot be judged with the help of quantitative techniques. NOTES 0 Quantitative techniques use mathematical expressions to model different scenarios related to business. Mathematical theories incorporate the use of certain assumptions, which must be fulfilled by specific scenarios regarding which the decisions are to be taken. These underlying assumptions must be taken care of, otherwise the application of mathematical theories in business may lead to take wrong decisions. 1.3 CLASSIFICATION OF QUANTITATIVE TECHNIQUES The use of quantitative techniques for taking business-related decisions provide accuracy to the decisions. The quantitative techniques can be categorized into two broad classes: 0 Statistical techniques 0 Operations research techniques 1.3.1 Statistical Techniques Statistical techniques involve analysing the collected data for the purpose of summarizing the information. These techniques use probability theory, which measures uncertainty and randomness associated with the data. The statistical techniques help in classifying and tabulating the collected data. The various statistical measures such as mean, standard deviation, etc, help in deriving various inferences from the collected data, which can be used for the purpose of decision-making. There are several statistical techniques that are commonly used by people in taking Check Your Progress business decisions. The important statistical techniques are: I. What are quantitative 0 Sampling analysis: Sampling analysis is the technique of selecting few samples techniques? from an entire set of samples for the purpose of analysis so that time and cost 2. What are the two categories in related with the analysis of entire set can be minimized. The selected sample, which quantitative techniques acts as the representative of entire set and the behaviour shown by the selected can be classified? sample is considered as the behaviour of the entire set. The process of selecting a subset from the entire set of samples is known as sampling. Samples can be selected in two ways, which are: o Probability or random samples: In this method of selecting a sample, each element or sample of the entire set has the equal probability of getting selected for analysis. o Non-probability or purposive samples: In this method of selecting a sample, the selection of a particular element or sample is entirely based on the choice of selector. 0 Regression and correlation analysis: It is concerned with the interrelationship between the factors affecting the business. Regression analysis deals with finding out how one of the factors affecting business is related to the other factors. Correlation analysis deals with measuring how closely the factors are related to one other. Using this technique, it is possible to calculate the value of one of the factors when the value of other factor is known. 0 Index numbers: It measures variations in the price and volume of items, fluctuations in income and gains, etc., over a given period of time. The selection of a component for measuring the fluctuation is based on the aspect of the business regarding which the decision is to be taken. Self-Instructional Material 5
Introduction to D Time series analysis: It deals with the measurement of data associated with a Quantitative Techniques given context at successive interval of time. For example, in context ofbanking, NOTES the transaction data is gathered by the banks on the daily basis. The time interval at which measurement is done can be either regular interval or irregular Check Your Progress interval. This type of analysis helps the management in analysing the factors responsible for a particular kind of trend shown over an interval and helps 3. Explain regression and management in making predictions regarding the future of business. correlation analysis. D Interpolation and extrapolation: Interpolation is defined as the process of 6 Self-Instructional Mlterial estimating a value of a function, which lies between two given values. Extrapolation is defined as the process of estimating the value of a function which lies outside the given range of values. Extrapolation can also be perceived as calculating the future value of a function based upon the past values that are known. Interpolation helps to find the unknown data to make appropriate business decisions whereas extrapolation helps in determining the future from past and known trends. D Ratio analysis: It deals with analysing the financial position of the organization. Ratio analysis involves the examination of financial statements, balance sheets, etc. to find out the status of the organization in the market. Various ratios are calculated in ratio analysis to check the growth of the organization. D Variance analysis: It deals with finding out the reasons and sources behind the variation in given samples of data. Variance analysis helps organization to take corrective actions against the variations to get maximum benefits. 1.3.2 Operations Research Techniques Operations research techniques are also known as programming techniques. These techniques are used for building information model for the available data. Building of information model refers to quantifying the factors affecting the business with the help of different variables and parameters and describing interrelationships between these different variables in the form of mathematical equations. Based upon these equations an optimal solution for the business related problems is obtained to get maximum profits out of minimum costs. Several operations research techniques are used to take business related decisions. The commonly used operations research techniques are: D Linear programming: It aims at optimizing a given goal of business. For example, the goal of the business at a point of time can be to minimize the cost of business. Linear programming expresses different interrelated factors in terms of linear mathematical equations. It distributes the scarce resources of an organization in an optimal manner for achieving the goal. D Queuing theory: It is related to a real world situation where the customers have to wait in queues for obtaining the services. Queues are formed when the demand of service exceeds the supply of service. An increase in the capacity of the service may lead to the wastage of resources when the demand is not high and if the capacity of service is not increased then the waiting time for the customers will increase. Queuing theory aims at finding a solution to this scenario so that the overall cost associated with both servicing and waiting is minimized. D Game theory: It deals with finding out the optimal strategy for winning in competitive scenarios. A lot of competition exists in the market as everybody tries to move ahead of others. Game theory helps in deciding the optimal move that must be taken by a player after foreseeing all possible future moves that can be taken by the opponent.
D Decision theory: It helps in taking the optimal decisions under all types of Introduction to Quantitative Teclmiques situations. The possible situations that can be faced by an organization can be the situation of risk, uncertainty and certainty. The decision theory helps to NOTES select a strategy for dealing with all kinds of situation so as to emerge as a winner. D Network analysis: It helps to plan a sequence for carrying out some related activities of an organization to minimize the total time and cost associated with executing these activities. Programme Evaluation and Review Technique (PERT), Critical Path Method (CPM), etc., are some of the techniques that help in planning out the activities. D Simulation: It deals with producing a model of a given scenario. The model, which is produced, is tested in the real world environment and the behaviour of the model is observed. Through the technique of simulation, alternative strategies of management can be tested and the response against various strategies can be gathered from the real world that will help the management in choosing the most profitable strategy. D Replacement theory: It deals with estimating the cost of replacement of equipment within an organization. The replacement can be done either to the equipment that become obsolete with the passage of time or to the equipment that fail to serve the purpose for which these equipments were employed. The replacement theory helps in selecting the most economic replacement strategy. D Inventory planning: It deals with minimizing the cost related to inventory in an organization. Inventory can be defined as goods and raw materials in stock. Inventory planning analyses the stock and helps organization in deciding what inventory is required and when it is required. The idea behind inventory planning is to manage cost related to the holding of inventories and buying of inventories keeping in mind that the shortage of stock is not faced. 1.4 QUANTITATIVE METHODS IN DECISION-MAKING The contribution that quantitative techniques can make to management decision-making Check Your Progress is well researched. There is extensive empirical evidence that the relevant application of such techniques has resulted in significant improvements in efficiency, particularly at the 4. Describe the role of ratio microeconomic level, and has led to improvements in decision-making in both profit and analysis in an organization. not-for-profit organizations. Numerous professional journals regularly provide details of 'successful' applications of such techniques to specific business problems. 5. What is sampling analysis? 6. What is linear programming? D Coupled with this development has been the revolution that has occurred in Self-Instructional Mlterial 7 making available powerful and cost-effective computing power on the manager's desktop. Not only has this meant that the manager now has instant direct access to available business information but also that techniques which used to be the prerogative of the specialist can be applied directly by the manager through the use of appropriate and relatively cheaper and user-friendly computer software. 0 Because of the increasing complexity of the business environment, in which organizations have to function, the information needs of a manager becomes more complex and demanding. With the pace of increasing competition, and with continual improvements in telecommunications, the time available to a manager to assess, analyse and react to a problem or opportunity is much reduced. Managers, and their supporting information systems, need to take fast and appropriate decisions. Finally, to add to the problems, the consequences of taking wrong decisions become more serious and costly. Entering the wrong markets, producing the wrong
Introduction to products or providing inappropriate services will have major, and often disastrous Quantitative Teclmiques consequences for organizations. NOTES This implies that anything which can help the manager of an organization in facing up to these pressures and difficulties in the decision-making process must be seriously considered. Not surprisingly, this is where quantitative techniques have a role to play. This is not to say that such techniques will automatically resolve such problems. But, they can provide both information about a situation or problem and a different way of examining that situation that may well help. Naturally, such quantitative analysis will produce information that must be assessed and used in conjunction with other sources. Business problems are rarely, if ever, tackled solely from the quantitative perspective. Fig. J.JThe Dxision-maldng Process Figure 1.1 illustrates a business situation at the strategic or operational level and needs to be examined from both a quantitative and a qualitative perspective. Information and analysis from both these perspectives need to be brought together, assessed and acted upon. However, these techniques are not only valuable at corporate and strategic levels, but they are also useful at the operational levels in day-to-day management. The knowledge of such techniques, the ability to know when to apply them and the ability to relate the quantitative outputs from such techniques to business decision-making is critically important for every manager in every organization. 1.5 SUMMARY In this unit, you have learnt that taking decisions regarding business matters has now become a complex task, as a number of factors affect a business. Appropriate analysis of all the factors has become compulsory for taking optimal decisions. Quantitative techniques help in analysing these factors by quantifying them. Quantitative techniques can be categorized into statistical techniques and operations research techniques. Statistical techniques use probability theory and different statistical measures for deriving inferences from the available data. Sampling analysis, index numbers and time series analysis, etc., are examples of some statistical techniques. Operations research techniques build a mathematical model using equations, which shows the interrelationship between the different variables. Linear programming, decision theory and game theory, etc., are some of the important operations research techniques. Quantitative techniques help in forecasting, which is required to plan the future of business. 8 Se/f-Instructiona/11-hteria/
1.6 ANSWERS TO 'CHECK YOUR PROGRESS' lntroducrion to 1. Quantitative techniques are techniques that use mathematical and statistical tools Quantitative Techniques for analysing different business related problems and helps management in taking intelligent decisions. NOTES 2. Quantitative techniques can be classified as statistical techniques and operations SelMnstructional Mlterial 9 research techniques. 3. Regression and correlation analysis is concerned with the interrelationship between the factors affecting the business. Regression analysis deals with finding out how one of the factors affecting business is related to the other factors. Correlation analysis deals with measuring how closely the factors are related to one other. Using this technique, it is possible to calculate the value of one of the factors when the value of other factor is known. 4. Ratio analysis deals with analysing the financial position of the organization. Ratio analysis involves the examination of financial statements, balance sheets, etc. to find out the status of the organization in the market. Various ratios are calculated in ratio analysis to check the growth of the organization. 5. Sampling analysis is the technique of selecting few samples from an entire set of samples for the purpose of analysis so that time and cost related with the analysis of entire set can be minimized. 6. Linear programming is an operations research technique, which aims at optimizing a given goal of business. 1.7 QUESTIONS AND EXERCISES Short-Answer Questions 1. Explain the importance of quantitative techniques. 2. How is the selection of sample done? 3. Differentiate between probability and non-probability samples. 4. What are index numbers? 5. When is time series analysis done? 6. What do you mean by variance analysis? 7. What is the significance of game theory? 8. Describe linear programming. 9. Why decision theory is considered important in selecting a strategy? 10. Which techniques of network analysis help in planning the activities? Long-Answer Questions 1. What are the various advantages of using quantitative techniques? 2. What are the limitations of quantitative techniques? 3. Explain the difference between interpolation and extrapolation. 4. Define regression and correlation analyses with the help of suitable examples. 5. Explain the various statistical techniques that are commonly used in business.
Introduction to 6. Explain the various operations research techniques that are used while taking Quantitative Thchniques business decisions. NOTES 7. Describe the contribution that quantitative techniques can make in the managerial decision-making process. 1.8 FURTHER READING Kothari, C.R. 1984. Quantitative Techniques, 3rd Edition. New Delhi: Vikas Publishing House Pvt. Ltd. Chandan, J.S. 1998. Statistics for Business and Economics. New Delhi: Vikas Publishing House Pvt. Ltd. Chandan, J.S., Jagjit Singh and K.K. Khanna. 1995. Business Statistics, 2nd Edition. New Delhi: Vikas Publishing House Pvt. Ltd. 10 Self-Instructional Mlterial
UNIT 2 DATA COLLECTION AND Ettta Collection and Ettta Presentation DATA PRESENTATION NOTES Structure 2.0 Introduction Self-li:isiructional kh terial 11 2.1 Unit Objectives 2.2 Statistical Data Collection: An Overview 2.3 Data Gathering 2.4 Sampling and Non-sampling Errors 2.5 Classification of Data 2.6 Tabulation of Data 2.6.1 Construction of Tables; 2.6.2 Types of Tables 2.6.3 Advantages of Tabulation of Data 2. 7 Frequency Distribution 2.7.1 Constructing A Frequency Distribution 2.8 Cumulative Frequency Distribution 2.9 Relative Frequency Distribution 2.10 Cumulative Relative Frequency Distribution 2.11 Graphic Presentation 2.11.1 Diagrammatic Representation; 2.11.2 Graphic Representation 2.12 Solved Problems 2.13 Summary 2.14 Answers To 'Check Your Progress' 2.15 Questions and Exercises 2.16 Further Reading 2.0 INTRODUCTION In this unit, you will learn the data collection and data presentation methods used in statistics for systematic analysis. Statistical investigation is a comprehensive process and requires systematic collection of data about some group of people or objects, describing and organizing the data, analysing this data with the help of different statistical methods, summarizing the analysis and using these results for making judgements, decisions and predictions. The validity and accuracy of final judgement is most crucial and depends basically on how well the data was gathered in the first place. The quality of data will greatly affect the conclusions and hence, utmost importance must be given to this process and every possible precaution should be taken to ensure accuracy while gathering and collecting data. You will also learn that how all business decisions are based upon evaluation of some data. Availability of right information is very important for making right decisions. The large amount of raw data generally generated from various business sources is highly cumbersome for the management to use. As an example, imagine a major hardware distributor dealing with over 10,000 different items. Data related to identification of each item, number of each item in the inventory, replacement of depleted inventory of each item, record of sales from various retail stores, keeping account of all accounts receivables as well as paid bills, and so on, has to be kept and analysed continuously. This would give us an indication of data generated daily in one store. It is almost
lllta Collection and impossible for the management to deal with all this data in the raw form. Such data lllta Presentation must be presented in a suitable and summarized form without any loss of relevant information so that it can be efficiently used for decision-making. NOTES 2.1 UNIT OBJECTIVES 12 Self-Instructional .Mlterial After going through this unit, you will be able to: • Describe the process of data collection • Identify the steps involved in collecting data • Define and discuss the types of samples • Explain probable sampling errors • Define and discuss the various types of data • Describe how frequency distribution helps in identifying the occurrence of a variable • Discuss the types of frequency distributions • Evaluate diagrammatic and graphic representations of data 2.2 STATISTICAL DATA COLLECTION: AN OVERVIEW The statistical data may be classified under two categories depending upon the sources utilized. These categories are: · 1. Primary data. Primary data. is one which is collected by the investigator himself for the purpose of a specific inquiry or study. Such data is original in character and is generated by surveys conducted by individuals or research institutions. For example, if a researcher is interested in knowing what women think about the issue of abortion, he/she must undertake a survey and collect data ori the opinions of women by asking relevant questions. Such data collected would be considered as primary data. 2. Secondary data. When an investigator uses the data which has already been collected by others, such data is called secondary data. This data is primary data for the agency that collected it and becomes secondary data for someone else who uses this data for his own purposes. The secondary data can be obtained from journals, reports, government publications, publications of professional and research organizations, and so on. For example, if a researcher desires to analyse the weather conditions of different regions, he can get the required information or data from the records of the meteorology department. Even though secondary data is less expensive to collect in terms of money and time, the quality of this data may even be better under certain situations because it may have been collected by persons who were specifically trained for that purpose. It is necessary to critically investigate the validity of the secondary data as well as the credibility of the primary data collection agency. 2.3 DATA GATHERING Since the quality of results gained from statistical data depends upon the quality of information collected, it is important that a sound investigative process be established to ensure that the data is highly representative and unbiased This requires a high degree of skill and also certain precautionary measures may have to be taken. The following steps may be considered in the primary data collection process: 1. Planning the study. Before any procedures for data collection are undertaken, the purpose and scope of the study must be clearly specified. If any similar studies have been conducted prior to the current one, then the investigator may want to use some
secondary data in his own study and may redefine his objectives on the basis of the Data Collection and previous studies conducted. The scope of the study must take into consideration the Data Presentation field to be covered and the time period in which to conduct the study. The time span is very important because in certain areas, the conditions change very quickly and NOTES hence, by the time the study is completed, it may become irrelevant. Furthermore, the statistical units and the desired accuracy of such units must be clearly specified. Self.histructional MJterial 13 2. Modes of data collection. There are basically three widely used methods for collection of primary data: telephone survey, mail questionnaire and personal observations and interviews. Each method has its advantages and disadvantages. In general, the telephone surveys and personal interviews elicit much higher rate of response than mailed questionnaires. Mail questionnaires are generally discarded by most people, thus resulting in highly selective responses which itself introduces bias in the results. The telephone interview is convenient but excludes those who do not have a telephone and those who have an unlisted number, thus making the sample umepresentative to some degree. The personal interview can usually result in more accurate responses since the interviewer can probe more deeply into answers that show inconsistency in some form. Also, if the questions are not properly understood by the respondent, these can be clarified on the spot. One of the major problems with personal interviews is the degree of bias that can be introduced because the answers can be highly exaggerated. Accordingly, the investigator must be thoroughly trained and psychologically aware of such possible exaggerations. Furthermore, the questions and the language of the questions should be such as to induce confidence in the respondent, so that he is as objective and accurate as possible. The mail questionnaire method is more often used because of economy and practicability, specially if the population is widely dispersed geographically. However, in such situations, great care should be taken on the formulation of questionnaires or schedules as they are called so as to evoke the particular kind of responses. It is also very important to make sure that the responses are fairly representative of the population under study. The language of the questionnaires or schedules should be specific and clear so that they can be easily and correctly answered. Personal questions and questions which solicit opinions should be avoided. The questions should be such that either they elicit clear yes or no answers or some other statements of facts. They should be short and unambiguous and clearly presented. For example, a question such as 'Do you smoke?' may not lead to a clear cut answer depending upon the perception of the understanding of the respondent. A yes or no answer would not specify whether he is a definite smoker or a non-smoker. For example, some very light and occasional smokers may consider themselves as non-smokers. Accordingly, a question such as 'How many cigarettes do you smoke per day?' would result in a more specific answer. TMng a Questi011Il8ire The accuracy of collecting primary data by the questionnaire method would greatly depend upon the proper designing of the questionnaire. The types of questions in the questionnaire would depend upon the nature of inquiry. The questionnaires can be totally formal and well structured with straightforward questions where the objective of the study is clear and well understood and the types of answers expected are reasonably predetermined. The questionnaire can also be unstructured where the respondent is free to answer in any way he/she feels. Also in case of personal interviews some new questions can be developed during the interview, so that communication has an easy flow like a friendly discussion. The purpose of the study can also be disguised so that the respondent is never sure about the objective of the inquiry. Most questionnaires in statistical studies, however, are structured and undisguised so that the purposes of these studies are fairly clear. One advantage of such questionnaires is that they are simple to administer and
Dlta Collection and easy to categorize, tabulate and analyse. Some of the following points should be taken Dlta Presentation into consideration when formulating the questions: NOTES (a) The number of questions should be as few as possible. Most people are turned off by lengthy questionnaires and accordingly, less relevant questions should be avoided 14 Self-Instructional .Mlterial as far as possible. (b) The questions should be short, clear, simple and unambiguous. Technical terms or words with multiple meanings should be avoided. Also, the questions should follow a logical sequence. For example, a question such as 'How many children do you have?' should not come before the question, 'Are you married?', and so on. (c) Questions of sensitive nature should be avoided. Questions that are of personal nature, such as 'Sources of income' or 'Have you ever been defaulted on your taxes' or 'Do you and your husband have any marital problems?' should not be put in the questionnaires. Such information, if necessary, may be obtained from outside sources. (d) Mail questionnaires should be accompanied by a covering letter which should state the purpose of the questionnaire, promise of anonymity and include such instructions which are necessary in giving correct responses. The questionnaire, once ready may be pretested on a small selected sample so as to refine it further on the basis of the responses received. 3. Sample selection. The third step in the primary data collection process is selecting an adequate sample. It is necessary to take a representative sample from the population, since it is extremely costly, time-consuming and cumbersome to do a complete census. Then, depending upon the conclusions drawn from the study of the characteristics of such a sample, we can draw inferences about the similar characteristics of the population. If the sample is truly representative of the population, then the characteristics of the sample can be considered to be the same as those of the entire population. For example, the taste of soup in the entire pot of soup can be determined by tasting one spoonful from the pot if the soup is well stirred. Similarly, a small amount of blood sample taken from a patient can determine whether the patient's sugar level is normal or not. This is so because the small sample of blood is truly representative of the entire blood supply in the body. Sampling is necessary because of the following reasons 1: First, as discussed earlier, it is not technically or economically feasible to take the entire population into consideration. Second, due to dynamic changes in business, industrial and social environment, it is necessary to make quick decisions based upon the analysis of information. Managers seldom have the time to collect and process data for the entire population. Thus, a sample is necessary to save time. The time element has further importance in that if the data collection takes a long time, then the values of some characteristics may change over the period of time so that data may no longer be up to date, thus defeating the very purpose of data analysis. Third, samples, if representative, may yield more accurate results than the total census. This is due to the fact that samples can be more accurately supervised and data can be more carefully selected. Additionally, because of the smaller size of the samples the routine errors that are introduced in the sampling process can be kept at a minimum. Fourth, the quality of some products must be tested by destroying the products. For example, in testing cars for their ability to withstand accidents at various speeds, the environment of accidents must be simulated. Thus, a sample of cars must be selected and subjected to accidents by remote control. Naturally, the entire population of cars cannot be subjected to these accident tests and hence, a sample must be selected. One important aspect to be considered is the size of the sample. The sampling size, which is the number of sampling units selected from the population for investigation, 1. See Campbell, Stephen K. 1987. Applied Statistics. New York: Harper & Row, pp. 32-33.
must be optimum. If the sample size is too small, it may not appropriately represent the Dlta Collection and population or the Wll·verse as it is known, thus leading to incorrect inferences. Too large Dlta Presentation a sample would be costly in terms of time and money. The optimum sample size should fulfil the requirements of efficiency, representativeness, reliability and flexibility. What is NOTES an optimum sample size is also open to question. Some experts have suggested that 5 per cent of the population properly selected would constitute an adequate sample, while Self-Instructional Material 15 others have suggested as high as 10 per cent depending upon the size of the population under study. However, proper selection and representation of the sample is more important than the size itself. The following considerations may be taken into account while deciding the sample size: (a) The larger the size of the population, the larger should be the sample size. (b) If the resources available do not put a heavy constraint on the sample size, a larger sample would be desirable. (c) If the samples are selected by scientific methods, a larger sample size would ensure greater degree of accuracy in conclusions. (d) A smaller sample could adequately represent the population, if the population consists of mostly homogeneous units. A heterogeneous universe would require a larger sample. Types of Samples There are basically two types of samples: (a) Probability samples (b) Non-probability samples For detailed discussion on sampling methods, refer to Unit 10. 2.4 SAMPLING AND NON-SAMPLING ERRORS The basic objective of a sample is to draw inferences about the population from which such sample is drawn. This means that sampling is a technique which helps us in understanding the parameters or characteristics of the universe or the population by examining only a small part of it. Therefore, it is necessary that the sampling technique be a reliable one. The randomness of the sample is especially important because of the principle of statistical regularity, which states that a sample taken at random from a population is likely to possess almost the same characteristics as those of the population. However, in the total process of statistical analysis, some errors are bound to be introduced. These errors may be the sampling errors or the non-sampling errors. The sampling errors arise due to drawing faulty inferences about the population based upon the results of the samples. This sampling error would be less if the sample size is large relative to the population and vice versa. Non-sampling errors, on the other hand, are introduced due to technically faulty observations or during the processing of data. These errors could also arise due to defective methods of data collection, incomplete coverage of the population because some units of the population are not available for study, inaccurate information provided by the participants in the sample and errors occurring during editing, tabulating and mathematical manipulation of data. 2.5 CLASSIFICATION OF DATA When the raw data has been collected and edited, it should be put into an ordered form (ascending or descending order) so that it can be looked at more objectively. The next important step towards processing the data is classification. Classification means separating items according to similar characteristics and grouping them into various
DJta Collection and classes. The items in different classes will differ from each other on the basis of some DJta Presentation characteristics or attributes. Classification of data is very similar to sorting of mail at a post office, where mail is classified according to its geographical destination and may NOTES further be classified into the type of mail such as first class, parcel post, and so on. The data may be classified into four broad classes: Check Your Progress (a) Geographical. This classification groups the data according to locational 1. What are the various modes differences among the items. The geographical areas are usually listed in of data collection? alphabetical order for easy reference. For example, the book listing the colleges and universities in various states in America would first list the states in an alphabetical order 2. What are the two types of and then the colleges and universities within these states in an alphabetical order. samples? (b) Chronological. Chronological classification includes data according to the 3. What is a simple random time period in which the items under consideration occurred. For example, the sales of sample? automobiles in America over the last 10 years may be grouped according to the year in which such sales took place. 16 Self-Instructional Mlterial (c) Qualitative. In this type of classification, the data is grouped together according to some distinguished characteristic or attribute such as religion, sex, age, national origin, and so on. This classification simply identifies whether a given attribute is present or absent in a given population. For example, the population may be divided into two classes of males and females. Then the attribute of male will go into one class and attribute of female will go into the other. (d) Quantitative. It refers to the classification of data according to some attribute which has magnitude and can be measured such as weight, height, income, and so on. For example, the salaries of professors at a university may be classified according to their rank of instructor, assistant professor, associate professor and professor. 2.6 TABULATION OF DATA Classification ofdata is usually followed by tabulation, which is considered as the mechanical part of classification. Tabulation is the systematic arrangement of data in columns and rows. The analysis of the data is done so by arranging the columns and rows to facilitate analysis and comparisons. Tabulation has the following objectives: (1) Simplicity. The removal of unnecessary details gives a clear and concise picture of the data (i1) Economy of space and time (iii) Ease in comprehension and remembering (iv) Facility of comparisons. Comparisons within a table and with other tables may be made ( v) Ease in handling of totals, analysis, interpretation, etc. 2.6.1 Construction of Tables A table is constructed depending on the type of information to be presented and the requirements of statistical analysis. The following are the essential features of a table: (1) Tide. It should have a clear and relevant tide, which describes the contents of the table. The title should be brief and self explanatory. (il) Stubs andcaptions. It should have clear headings and sub headings. Column headings are called captions and row headings are called stubs. The stubs are usually wider than the captions. {ii1) Unit. It should indicate all the units used.
(iv) Body. The body of the table should contain all information arranged according to Data Collection and description. Data Presentation ( v) Headnote. The headnote or prefatory note, placed just below the title, in a less NOTES prominent type, gives some additional explanation about the table. Sometimes, the headnote consists of the unit of measurement. Self-Instructional Mlterial 17 ( v1) Footnotes. A footnote at the bottom of the table may clarify some omissions of special features. A source note gives information about the source used, if any. (vii) Arrangement of data. Data may be arranged according to requirements in chronological, alphabetical, geographical, or any other order. (viii) Emphasis. The items to be emphasized may be put in different print or marked suitably. (ix) Other details. Percentages, ratios, etc. should be shown in separate columns. Thick and thin lines should be drawn at proper places. A table should be easy to read and should contain only the relevant details. If the aim of clarification is not achieved, the table should be redesigned. 2.6.2 Types of Tables Depending on the nature of the data and other requirements, tables may be divided into various types. General tables or Reference tables. These contain detailed information for general use and reference, e.g., tables published by government agencies. Specific purpose or Derivative tables. They are usually summarized from general tables and are useful for comparison and analytical purposes. Averages, percentages etc. are incorporated along with information in these tables. Simple and Complex tables. A table showing only one characteristic is a simple table. The more common tables are complex and show two or more characteristics or groups of items. 18blc 2.1 Simple 1ilble Cinema Attendance among Adult !vfale Factory W>rkers in Bombay !vfarch 1972 Frequency Nnnber ofllbrkers Less than once a month 3780 1 to 4 times a month 1652 More than 4 times a month 926 The following table is the result of a survey on the cinema going habits of adult factory workers. 18ble 2.2 Simple 1ilble Cinema Attendance among Mult l\\4ale Fact01y llbrkers in Bombay l\\4arch 1972 Cinema Attendance Single Muried Frequency Under 30 Over 30 Under30 Over 30 Less than once a month 1--4 times a month 122 374 1404 1880 More than 4 times a month 1046 202 289 115 881 23 112 10 Total 2049 599 1805 2005 I
Ilita Collection and It is obvious that the tabular form of classification of data is a great improvement Ilita Presentation over the narrative form NOTES Frequently, table construction involves deciding which attribute should be taken as primary and which as secondary. For the previous table, we can also consider that whether it would be improved further if 'under 30' and '30 and over' had been the main column headings and 'single' and 'married' the sub headings. The modifications depend on the purpose of the table. If the activities of age groups are to be compared, it is best left as it stands. But if a comparison between men of different marital status is required, the change would be an improvement. 2.6.3 Advantages of Tabulation of Data (1) Tabulated data can be more easily understood and grasped than untabulated data. (ii) A table facilitates comparisons between subdivisions and with other tables. (iii) It enables the required figures to be located easily. (iv) It reveals patterns within the figures, which otherwise might not have been obvious, e.g., from the previous table, we can conclude that regular and frequent cinema attendance is mainly confined to younger age group. ( v) It makes the summation of items and the detection of errors and omissions, easier. (VI) It obviates repetition of explanatory phrases and headings and hence takes less space. 2.7 FREQUENCY DISTRIBUTION Statistical data can be organized into a frequency distribution which simply lists the value of the variable and frequency of its occurrence in a tabular form. A frequency distribution can then be defined as the list of all the values obtained in the data and the corresponding frequency with which these values occur in the data. The frequency distribution can either be ungrouped or grouped When the number of values of the variable is small, then we can construct an ungrouped frequency distribution which is simply listing the frequency of occurrence against the value of the given variable. As an example, let us assume that 20 families were surveyed to fmd out how many children each family had. The raw data obtained from the survey is as follows: 0, 2, 3, 1, 1, 3, 4, 2, 0, 3, 4, 2, 2, 1, 0, 4, 1, 2, 2, 3 This data can be classified into an ungrouped frequency distribution. The number of children becomes our variable (A} for which we can list the frequency of occurrence (iJ in a tabular form as follows: Number of Children (A} Frequency (iJ 03 14 26 34 43 Total = 20 18 Self-Instructional Mlterial
The above table is also known as discrete frequency distribution where the variable Data Collection and has discrete numerical values. Data Presentation However, when the data set is very large, it becomes necessary to condense the NOTES data into a suitable number of groups or classes of the variable values and then assign the combined frequencies of these values into their respective classes. As an example, Self-h1structional Material 19 let us assume that 100 employees in a factory were surveyed to find out their ages. Tht: youngest person was 20 years of age and the oldest was 50 years old. We can construct a grouped frequency distribution for this data so that instead of listing frequency by every year of age, we can list frequency according to an age group. Also, since age is a continuous variable, a frequency distribution would be as follows: Age Group (years) Frequency (/j 20 to less than 25 5 25 \" \" \" 30 15 30 \" \" \" 35 25 35 \" \" \" 40 30 40 \" \" \" 45 15 45 \" \" \" 50 10 Total = 100 In this example, all persons between 20 years (including 20 years old) and 25 years (but not including 25 years old) would be grouped into the first class, and so on. The interval of 20 to less than 25 is known as Class Interval (CI). A single representation of a class interval would be the midpoint (or average) of that class interval. The midpoint is also known as the class mark 2.7.1 Constructing a Frequency Distribution The number of groups and the size of class interval are more or less arbitrary in nature within the general guidelines established for constructing a frequency distribution. The following guidelines for such construction may be considered: (1) The classes should be clearly defined and each of the observations should be included in only one of the class intervals. This means that the intervals should be chosen in such a manner that one score cannot belong to more than one class interval, so that there is no overlapping of class intervals. (ii) The number of classes should be neither too large nor too small. Normally, between 6 and 15 classes are considered to be adequate. Fewer class intervals would mean a greater class interval width with consequent loss of accuracy. Too many class intervals result in greater complexity. (iii) All intervals should be of the same width. This is preferred for easy computations. A suitable class width can be obtained by knowing the range of data (which is the absolute difference between the highest value and the iowest value in the data) and the number of classes which are predetermined, so that : The width of the interval= ___R_a_n\"\"'\"\"g_e_ _ Number of classes In the case of ages of factory workers where the youngest worker was 20 years old and the oldest was 50 years old, the range would be 50 - 20 = 30. If we decide to make 10 groups then the width of each class would be: 30/10 = 3
Dlta Collection and Similarly, if we decide to make 6 classes instead of 10, then the width of each Dlta Presentation class interval would be 30/6 = 5 NOTES (i0 Open ended cases where there is no lower limit of the first group or no upper limit of the last group should be avoided since this creates difficulty in analysis and interpretation. The lower and upper values of a class interval are known as lower and upper limits. (0 Intervals should be continuous throughout the distribution. For example, in the case of factory workers, we could group them in groups of 20 to 24 years, then 25 to 29 years, and so on, but it would be highly misleading because it does not accurately represent the person who is between 24 and 25 years or between 29 and 30 years, and so on. Accordingly, it is more accurate and representative to group them as: 20 to less than 25 years, 25 to less than 30 years and so on. In this way, everybody who is 20 years and a fraction less than 25 years is included in the first category and the person who is exactly 25 years and above but a fraction less than 30 years would be included in the second category, and so on. This is especially important for continuous distributions. ( v1) The lower limits of class intervals should be simple multiples of the interval width. This is primarily for the purpose of simplicity in construction and interpretation. In our example of 20 years but less than 25 years, 25 years but less than 30 years, and 30 years but less than 35 years, the lower limit values for each class are simple multiples of the class width which is 5. 2.8 CUMULATIVE FREQUENCY DISTRIBUTION While the frequency distribution table tells us the number of units in each class interval, it does not tell us directly the total number of units that lie below or above the specified values of class intervals. This can be determined from a cumulative frequency distribution. When the interest of the investigator focusses on the number of items below a specified value, then this specified value is the upper limit of the class interval. It is known as less tl11m cumulative frequency distribution. Similarly, when the interest lies in finding the number of cases above a specified value, then this value is taken as the lower limit of the specified class interval and is known as more than cumulative frequency distribution. The cumulative frequency simply means summing up the consecutive frequencies as follows (taking the example of ages of 30 workers): Class Interval (CI) Frequency (IJ Cumulative Frequency (less than) (years) 15 and upto 25 5 5 (less than 25) 25 and upto 35 3 8 (less than 35) 35 and upto 45 7 15 (less than 45) 45 and upto 55 5 20 (less than 55) 55 and upto 65 3 23 (less than 65) 65 and upto 75 7 30 (less than 75) 20 Self-Instructional MJterial
Similarly, the following is the greater than cumulative frequency distribution: Dar:~ Collection and Data Presentation ---- - - - - - - - - - - - - - - - - - - - , NOTES IClass Interval (CI) Frequency (IJ Cumulative Frr::quency (8,r~:ater than) (years) 15 and upto 25 5 30 (greate; than , 5) 25 and upto 35 3 7.5 (g:-eater than 25) 35 and upto 45 --·--r---------·----------- ---·-- --·-·-- - -----··-- 45 and upto 55 55 and upto 65 7 22 (greater than 15) 65 and upto 75 ------------- 1----- 5 15 (greater than 45) - - - - - - ---·---··---- ---·-----· - -- --·· ·---------··- ----- ·---------- 3 c' .v'1 (greater Ci1an 55) -- --·----·--··--------.--------- 7 7 (grec.~er th.Hn 6.\"5) In the above greater than cumulative frequency distribu'·ion; :;o pc::o:ons are older than 15 years, 25 are older than 25 years, and so on. --------------·--------·----····---·.-----··-- -·-·-··-···--·-·-·--- 2.9 RELATIVE FREQUENCY DISTRIBUTION The frequency distribution, as defined earlier, is a summary tabk in which the original data is condensed into groups and their frequencies. But, if 01 ;·esea·:-::;;cr v.rould like tc know the proportion or the percentage of cases in each group, instead of simply the number of cases, he can do so by constructing a relative frequency dis'. but::::· tab'c The relative frequency distribution can be formed by dividing the frequency in each class of the frequency distribution by the total number of observations. 't ca:- .:,c C(JnVetted into a percentage frequency distribution by simply multiplying each relative frequency by l 00. The relative frequencies are particularly he!pfu: wlv~·• con:;:;arit.g two or more frequency distributions in which the number of cases lmder invest.igation are not equal. The percentage distributions make such a comparis::m more rne::;ning;~.:l, Stnce ptci·centages are relative frequencies and hence the total number in the sample or population under consideration becomes irrelevant. Considering the previous example we can calculat the p-:l<>tive f-~qu~ncy di~;tribution as follows: , - -Class Interval (CJ) Frequency (f; IRelat:ve F:~c:qenc~T;: ,~-;e~~~- -Fno-~lency -T+-------- +.__ _5 I 5/3C_________ (years) ----------t-·i------ -- ·- -------------· 15 and uoto 25 l._[>_;_l_ _ _ _ _ .. 25 and upto 35 1--3 r·----3-/-30-------1i - '{)--0-- ·----- 35 and upto 45 45 and upto 55 7 __7!]_o_ _____J_ __}_3 .3- ---· I5 .~.•130 ;s.'/ ---=---,3 7 55 and upto 65 I 73/~3~0~ _ ~+=~=~Jo~o _ _I II' 65 and upto 75 __j_____ ~:J.3 i Total ___]_Q______j ---- ____ ____ .L. __ 'o~:c .. ______j r---------------- 1 Check Your Progress 2.10CUMULATIVE RELATIVE :-?R~8.>:~_yu:~:I\"TiC .,4. Defme the tetm frequency DISTRIBUTION 1 distribution. ! i 5. Explain ungrouped frequency It is often useful to know the propmiion or the perr,c:.'l.tagc of cases falling below a 11 distribution. particular score point or falling above a particuli:E' sc:cr~?: point. A jess than cumulative relative frequency distribution shows the proportion of cases lying below the upper lirn.it 1. ·-6- :=~~~-~-ende~_e:_~_·:r:n~~ use __I of specific class interva!. Similarly, a. grca ter ihan ':.'mU.'<'-'·' v~ ; cque -:;y dis'ribLnon _~is_trib~tion method? _ Seif-J!Jstructional Ma tenal :1:1
Dlta Collection and shows the proportion of cases above the lower limit of a specified class interval. We Dlta Presentation can develop the cumulative relative frequency distributions from the less than and greater than cumulative frequency distributions constructed earlier. By following the NOTES earlier example, we get the following table of cumulative relative frequency (less than): 22 Self-Instructional !vhterial Class Interval (CI) Cumulative Frequency Cumulative Relative (years) (less than) Frequency (less than) Less than 25 5 5/30 or 16.7% Less than 35 8 8/30 or 26.7% Less than 45 15 15/30 or 50.0% Less than 55 20 20/30 or 66.7% Less than 65 23 23/30 or 76.7% Less than 75 30 30/30 or 100% In the above example, 5 out of 30 or 16.7 per cent of the persons are below 25 years of age. Similarly, 15 out of 30 or 50 per cent of the persons are below 45 years of age, and so on. Similarly, we can construct a greater than cumulative relative frequency distribution as follows for the same example: Class Interval (CI) Cumulative Frequency Cumulative Relative Frequency (greater than) (years) (greater than) 30/30 or 100% 15 and above 30 25/30 or 83.3% 22/30 or 73.3% 25 and above 25 15/30 or 50.0% 10/30 or 33.3% 35 and above 22 7/30 or 23.3% 45 and above 15 55 and above 10 65 and above 7 In this example, 100 per cent of the persons are above 15 years of age, 73.3 per cent are above 35 years of age, and so on. Note: The Jess than cumulative frequency distribution is summed up from the top downwards and the greater than cumulative frequency distribution is summed from the bottom upwards. 2.11 GRAPHIC PRESENTATION The data we collect can often be more easily understood for interpretation if it is presented graphically or pictorially. Diagrams and graphs give visual indications of magnitudes, groupings, trends and patterns in the data. These important features are more simply presented in the form of graphs. Also, diagrams facilitate comparisons between two or more sets of data. The diagrams should be clear and easy to read and understand. Too much information should not be shown in the same diagram; otherwise, it may become cumbersome and confusing. Each diagram should include a brief and self-explanatory title dealing with the subject matter. The scale of the presentation should be chosen in such a way that the resulting diagram is of appropriate size. The intervals on the vertical
as well as the horizontal axis should be of equal size; otherwise, distortions would Illta Collection and occur. Ill ta Presentation Diagrams are more suitable to illustrate the data which is discrete, while continuous NOTES data is better represented by graphs. The following are the diagrammatic and graphic representation methods that are commonly used. SeJJ:hJstructional Mlterial 23 1. Diagrammatic Representation. (a) Bar Diagram (b) Pie Chart (c) Pictogram 2. Graphic Representation. (a) Histogram (b) Frequency Polygon (c) Cumulative Frequency Curve (Ogive) Each of these is briefly explained and illustrated below. 2.11.1 Diagrammatic Representation (a) Bar Diagram. Bars are simply vertical lines where the lengths of the bars are proportional to their corresponding numerical values. The width of the bar is unimportant, but all bars should have the same width so as not to confuse the reader of the diagram. Additionally, the bars should be equally spaced. Example 2.1: Suppose that the following were the gross revenues (in $100,000.00) for a company XYZ for the years 1989, 1990 and 1991. Years 1989 1990 1991 Revenue 110 95 65 Construct a bar diagram for this data. Solution: The bar diagram for this data can be constructed as follows with the revenues represented on the vertical axis ( ¥axis) and the years represented on the horizontal axis (Xaxis). y 120 100 80 Q) :c:::l ~ 60 a: 40 ,x20 Year Fig. 2.1 A Bar Diagram The bars drawn can be further subdivided into components depending upon the type of information to be shown in the diagram. This will be clear by the following example in which we are presenting three different components in a bar.
Dlta Collection and Example 22: Dlta Presentation Construct a subdivided bar chart for the three types of expenditures in dollars for a NOTES family of four for the years 1988, 1989, 1990 and 1991 as given below: Years Food Education Other Total 1988 3000 2000 3000 8000 1989 3500 3000 4000 10500 1990 4000 3500 5000 12500 1991 5000 5000 6000 16000 Solution: The subdivided bar chart would be as follows: y 16000 • Food 14000 • Education 12000 DOther Year Fig. 2.2 A Subdivided Bar Diagram (b) Pie 01a.rt. This type of diagram enables us to show the partitioning of a total into its component parts. The diagram is in the form of a circle and is also called a pie because the entire diagram looks like a pie and the components resemble slices cut from it. The size of the slice represents the proportion of the component out of the whole. Example 2.3: The following figures relate to the cost of the construction of a house. The various components of cost that go into it are represented as percentages of the total cost. /term Labour Cement, Bricks Steel Timber, Glass Miscellaneous %Expenditure 25 30 15 20 10 Solution: The pie chart for this data is presented as follows: 24 Self-Instructional Mlterial
Pie charts are very useful for comparison purposes, especially when there are lllta Collection and only a few components. If there are too many components, it may become confusing Illta Presentation to differentiate the relative values in the pie. NOTES (c) Pictogram. Pictogram means presentation of data in the form of pictures. It is quite a popular method used by governments and other organizations for informational exhibition. Its main advantage is its attractive value. Pictograms stimulate interest in the information being presented. News magazines are very fond of presenting data in this form. For example, in comparing the strength of the armed forces of USA and Russia, they will simply show sketches of soldiers where each sketch may represent 100,000 soldiers. Similar comparison for missiles and tanks is also done. 2.11.2 Graphic Representation (a) Histogram. A histogram is the graphical description of data and is constructed from a frequency table. It displays the distribution method of a data set and is used for satistical as well as mathematical calculations. The word histogram is derived from the Greek word histos which means 'anything set upright' and gramma which means 'drawing, record, writing'. It is considered as the most important basic tool of statistical quality control process. In this type of representation the given data are plotted in the form of a series of rectangles. Class intervals are marked along the .X:axis and the frequencies along the ¥axis according to a suitable scale. Unlike the bar chart, which is one-dimensional, meaning that only the length of the bar is important and not the width, a histogram is two-dimensional in which both the length and the width are important. A histogram is constructed from a frequency distribution of a grouped data where the height of the rectangle is proportional to the respective frequency and the width represents the class interval. Each rectangle is joined with the other and any blank spaces between the rectangles would mean that the category is empty and there are no values in that class interval. As an example, let us construct a histogram for the previous example of ages of 30 workers. For convenience sake, we will present the frequency distribution along with the midpoint of each interval, where the midpoint is simply the average of the values of the lower and upper boundary of each class interval. The frequency distribution table is shown as follows: Class Interval (CI) Midpoint Frequency (f) (years) (A) 15 and upto 25 20 5 25 and upto 35 30 3 35 and upto 45 40 7 45 and upto 55 50 5 55 and upto 65 60 3 65 and upto 75 70 7 Self-Instructional !vbterial 25
Dlta Collection and The histogram of this data is shown as follows: Dlta Presentation 77 NOTES 7 6 s: 5 (';' ~4 :I ~3 LL. 2 70 Midpoint X 15 75 ~Class Interval (CI) Fzg. 2.3 Histogram (b) FrequencyPolygoll. A frequency polygon is a line chart of frequency distribution in which either the values of discrete variables or midpoints of class intervals are plotted against the frequencies and these plotted points are joined together by straight lines. Since the frequencies generally do not start at zero or end at zero, this diagram as such would not touch the horizontal axis. However, since the area under the entire curve is the same as that of a histogram which is 1()0 per cent of the data presented, the curve can be enclosed so that the starting point is joined with a fictitious preceding point whose value is zero, so that the start of the curve is at horizontal axis and the last point is joined with a fictitious succeeding point whose value is also zero, so that the curve ends at the horizontal axis. This enclosed diagram is known as the frequency polygon. We can construct the frequency polygon from the table presented above as follows: :i (40, 7) (70, 7) 5 s(';' 4 c: G) :I 3 ~ LL. 2 Check Your Progress 20 30 40 50 60 70 7. List the types of Midpoint(~ diagrammatic representation of data. Fig. 2.4 Frequency Polygon 8. What do you understand by (c) Cumulative Frequency CUne (Ogimf). The cumulative frequency curve or a histogram? ogive is the graphic represen~ation of a cumulative frequency distribution. Ogives are 9. Defme the term frequency polygon. of two types. One of these is Jess than and the other one is greater than ogive. Both 26 Self-Instructional Muerial
these ogives are constructed based upon the following table of our example of illta Collection and 30 workers. illta Presentation Class Interval (CI) Midpoint Frequency Cumulative Cumulative NOTES Frequency Frequency (years) (X) (f) (less than) (greater than) 15 and upto 25 20 5 5 (less than 25) 30 (more than 15) 25 and upto 35 30 3 35 and upto 45 40 7 8 (less than 35) 25 (more than 25) 45 and upto 55 50 5 55 and upto 65 60 3 15 (less than 45) 22 (more than 35) 65 and upto 75 70 7 20 (less than 55) 15 (more than 45) 23 (less than 65) 10 (more than 55) 30 (less than 75) 7 (more than 65) (1) Less than ogive. In this case less than cumulative frequencies are plotted against upper boundaries of their respective class intervals. 30 >- (,) c: 25 Q) ::::1 0\" ~ u.. 20 Q) ._E; 15.!!! ::::J E ::::J (.) c: 10 =\"' (/) 5(/) Q) ...J 25 35 45 55 65 75 Upper Boundaries of Class Interval Fig. 2.5 'less than' Ogive (b) Greater than ogive. In this case greater than cumulative frequencies are plotted against the lower boundaries of their respective class intervals. 3o I >- ~ 25 Q) ::::J 0\" ~ u.. 20 Q) > ~ ~ 15 ::::J (.) c: £\"' 10 15 25 35 45 55 65 Lower Boundaries of Class Interval Fig. 2.6 'Greater than' Ogive Self-Instructional Mlterial 27
Dlta Collection and These ogives can be used for comparison purposes. Several ogives can be drawn Dlta Presentation on the same grid, preferably with different colours for easier visualization and differentiation. NOTES Although, diagrams and graphs are a powerful and effective media for presenting 28 Self-Instructional MJterial statistical data, they can only represent a limited amount of information and they are not of much help when intensive analysis of data is required. 2.12 SOLVED PROBLEMS Problem 1: Standard tests were administered to 30 students to determine their IQ scores. These scores are recorded in the following data. 120 115 118 132 135 125 122 140 137 127 129 130 116 119 132 127 133 126 120 125 130 134 135 127 116 115 125 130 142 140 (a) Arrange this data into an ordered array. (b) Construct a grouped frequency distribution with suitable class intervals. (c) Compute for this data: (1) Cumulative frequency (less than) (i1) Cumulative frequency (greater than) (d) Compute: (1) Relative frequency (il) Cumulative relative frequency (less than) (ii1) Cumulative relative frequency (greater than) (e) Construct for this data: (1) A histogram (i1) A frequency polygon (iil) Cumulative frequency ogive (less than) (i0 Cumulative frequency ogive (greater than) Solution: (a) The ordered array for this data is as follows: 115 115 116 116 118 119 120 120 122 125 125 125 126 127 127 127 129 130 130 132 132 132 133 134 135 135 137 140 140 142 (b) Let there be six groupings, so that the size of the class interval be 5. The frequency distribution is shown as follows: Class Interval (CI) Frequency (.IJ 115 to less than 120 6 120 to less than 125 3 125 to less than 130 8 130 to less than 135 7 135 to less than 140 3 140 to less than 145 3
(c) The required elements are computed in the following table. Ul ta Collection and Ulta Presentation Class Interval Frequency Cumulative Frequency Cumulative Frequency NOTES (CI) ,f) (less than) (greater than) 115-120 6 6 (less than 120) 30 (more than 115) 120-125 3 9 (less than 125) 24 (more than 120) 125-130 8 17 (less than 130) 21 (more than 125) 130-135 7 24 (less than 135) 13 (more than 130) 135-140 3 27 (less than 140) 6 (more than 135) 140-145 3 30 (less than 145) 3 (more than 140) (d) The computed values of relative frequency, cumulative relative frequency (less than) and cumulative relative frequency (greater than) are shown in the following table: Class Interval Frequency Relative Cumulative Relative Cumulative Relative (Cl) (J} Frequency Frequency (less than) Frequency (greater than) 115 and upto 120 6 6/30 or 20% 6/30 or 20% (<120) 30/30 or 100% (>115) 120 and upto 125 3 3/30 or 10% 9/30 or 30% (< 125) 24/30 or 80% (> 120) 125 and upto 130 8 8/30 or 26.7% 17/30 or 56.7% (<130) 21/30 or 70% (>125) 130 and upto 135 7 7/30 or 23.3% 24/30 or 80% (<135) 13/30 or 43.3% (>130) 135 and upto 140 3 3/30 or 10% 27/30 or 90% (<140) 6/30 or 20% (>135) 140 and upto 145 3 3/30 or 10% 30/30 or 100% (<145) 3/ 30 or 10% (>140) Total 30 (e) Before we construct the histogram and other diagrams, let us first determine the midpoint (..\\J of each class interval. Class Interval (CI) Frequency (iJ Midpoint (..\\J 115-120 120-125 6 117.5 125-130 3 122.5 130-135 8 127.5 135-140 7 132.5 140-145 3 137.5 3 142.5 Self-Instructional A4l terial 29
Dlta Collection and (1) A Histogram 8 Dlta Presentation 8 6 NOTES 7 6 5 sg 4 l!I 3 2 117.5 122.5 127.5 132.5 137.5 142.5 Mldpoin1(A) (i1) A Frequency Polygon (127.5, 8) 117.5 122.5 127.5 132.5 137.5 142.5 Midpoint(X) (iii) A Cumulative Frequency Ogive (less than) 30 25 ~:> 20 ._Pil: u 15 ...J.!!! :> § 0 10 120 125 130 135 140 145 Upper Boundaries of Class Interval (CI) (i\"0 A Cumulative Frequency Ogive (greafl:r than) 30 25 » 20 15 .,0c:: 10 c:: ::> ~::u~: ~-~ ~a; C!l:; E ::> (.) 5 0 120 125 130 135 140 115 Lower Boundaries of Class Interval (CI) 30 Self-Instructional Mlterial
2.13 SUMMARY Dlta Collection and Dlta Presentation In this unit, you have learnt about the data collection and presentation methods. The process of data collection is very important in statistics. There are a variety of ways NOTES and types of collecting data depending upon the resources, availability of time, purpose of data collection and the level of skill of the concerned data collector. It also involves Self-Instructional Mlterial 31 several steps, which help in collecting the data systematically. You have also learnt that besides collecting the appropriate data, a significant emphasis is also laid on the suitable representation of that data. This calls for using several data representation techniques, which depend on the nature and type of data collected. Data can be represented through frequency distributions, bar diagrams, pictograms, histograms, frequency polygons, ogives, etc. It also requires considering the intelligence level of the target audience to which it will be presented. The statistical treatment of the data is done to present the information in a summarized form without any loss of relevant information, so that it can be efficiently used for effective decision-making. 2.14 ANSWERS TO 'CHECK YOUR PROGRESS' 1. Telephone survey, mail questionnaire, personal observations and interviews are the various modes of data collection. 2. The two types of samples are: • Probability samples • Non-probability samples 3. A simple random sample is the one in which each and every unit of the population has an equal chance of being selected into the sample. 4. Frequency distribution is the list of all the values obtained in the data and the corresponding frequency with which these values occur in the data. 5. When the number of values of the variable is small, then an ungrouped frequency distribution is constructed. It means listing the frequency of occurrence against the value of the given variable. 6. Since most often the data set is very large, it becomes necessary to condense the data into a suitable number of groups or classes of the variable values and then assign the combined frequencies of these values into their respective classes. 7. Diagrammatic representation of data can be classified into the following three types: • Bar diagram • Pie chart • Pictogram 8. A histogram is the graphical description of data and is constructed from a frequency table. It is a two-dimensional graph in which rectangles are used to represent frequencies of observations within each interval. 9. A frequency polygon is a line chart of frequency distribution in which either the values of discrete variables or midpoints of class intervals are plotted against the frequencies and these plotted points are joined together by straight lines. 2.15 QUESTIONS AND EXERCISES Short-Answer Questions 1. What do you understand by the term statistics? 2. What are the various data types? 3. Which data is called as primary data? 4. Describe secondary data types. 5. Can secondary data become primary data? Explain.
Dlta Collection and 6. What-are the various methods of data gathering? Dlta Presentation 7. What is the importance of questionnaire method? 8. Why is sampling necessary? NOTES 9. How will you explain sampling and non-sampling errors? 32 Self-Instructional MJterial 10. Write the definition of frequency distribution. 11. Explain the different types of frequency distributions. 12. How will you construct a frequency distribution? 13. Defme the term cumulative frequency distribution. 14. How will you present the data in the graphic form? 15. Differentiate between a pie chart and a pictogram. 16. Defme the terms histogram and frequency polygon. 17. What are the different types of ogives? Long-Answer Questions l. Differentiate between primary data and secondary data. Under what circumstances would secondary data be more useful than primary data? 2. Describe a project in which it would be necessary to use primary data rather than secondary data. Give reasons. 3. A manager has recommended to the president of a company that one of his subordinates be given a raise of $20,000 per year in order to motivate him to stay with the company and not move to another company. What sources of information, both internal as well as external, would the president use to decide whether such a raise is justified? 4. You are the marketing manager for a large soft drink company. You plan to introduce a soft drink with fewer calories in it for the diet conscious consumer. You believe that it will capture a large segment of female consumer market. (a) Describe in detail the type and sources of data, you would want to use and analyse before investing in the new venture. (b) If you find it necessary to gather primary data, what form of primary data collection would you use? (c) Develop a questionnaire for obtaining the necessary data for the above project. 5. What are the various modes of data collection? Under what circumstances would each method be more suitable as compared to other methods? Give reasons for your beliefs. 6. Differentiate between probability samples and non-probability samples. Under what circumstances would non-probability types of samples be more useful in statistical analysis? 7. Your college has a total population of 5000 students. It is desired to estimate the proportion of students who use drugs. (a) What type of sampling would be necessary to reach a meaningful conclusion regarding the drug use habits of all students? (b) What type of sampling would you select so that the sample is most representative of the population? (c) Drug use being a sensitive issue, what type of questions would you include in your questionnaire? What type of questions would you avoid? Give reasons.
8. At New Delhi airport, there is a green channel and a red channel. Passengers Dlta Collection and without any custom duty articles can go through the green channel. Some Dlta Presentation passengers are stopped for a random check. What type of random sampling would be appropriate in such situations? Give reasons. NOTES 9. A professor in statistics class was rated by his 20 students on a scale of 1 to Self-Instructional MJ.terial 33 , w •• , • • • • , •• , • , • , 5 was considered as excellent and the rating of 1 was considered as poor with an increasing scale from poor to excellence. These ratings are recorded individually in the following table. 4 3 1 14 5 2 2 4 4 2334354453 (a) Construct a frequency distribution for this data. (b) Draw a frequency polygon for this data. 10. Before admission into Medgar Evers College, the students have to take Basic Skills Test in fundamentals of mathematics. In one such exam, 40 students appeared in the test. Their scores are recorded below out of a total maximum of 30 points. 15 12 15 22 28 30 19 25 24 28 10 15 16 20 26 22 18 20 27 14 12 19 21 18 19 30 13 10 21 24 15 20 22 18 20 12 23 29 22 24 (a) Arrange this data in an ordered array from the lowest value to the highest value. (b) Enumerate the steps you would take to convert this data into a frequency distribution. What class interval size would you decide on and why? (c) Construct a frequency distribution for the above data with a suitable class interval and suitable number of classes. (d) Draw a frequency polygon for this data. 11. The following frequency distribution represents the number of days during a year that the faculty of the college was absent from work due to illness. Number of Days Number of Employees 0-2 5 3-5 10 6--8 20 9-11 10 12-14 5 Total 50 (a) Construct a frequency distribution for this data. (b) Construct a greater than cumulative frequency distribution as well as a less than cumulative frequency distribution for this data. (c) How many employees were absent for less than 3 days during the year? (d) How many employees were absent for more than 8 days during the year? (e) Draw a frequency polygon for this data. (f) Draw the cumulative frequency ogive (greater than) for this data. 12. Micro Age Computer Company exports laptop computers to Mexico. Their sales in millions of dollars, during the years 2000 through 2009 are shown in the following table.
Lllta Collection and ltia.r.s Sales ($ millions) Lllta Presentation 2000 120.5 NOTES 2001 150.0 2002 203.9 34 Self-Instructional M:lterial 2003 200.2 2004 250.0 2005 273.4 2006 350.0 2007 414.7 2008 516.0 2009 532.3 (a) Portray the trend of sales in the form of a graph. (b) Convert the frequency distribution into greater than cumulative frequency distribution. (c) Draw the greater than cumulative frequency ogive. 13. The Dean of Finance at the college was concerned that the faculty of the Department of Business was making too many long distance telephone calls and that most calls lasted longer than 10 minutes. A random sample of 30 such calls was taken for analysis during a given week to establish a pattern of the length of such calls in minutes. The time recorded for each call, in minutes, is given as follows: 6.8 2.3 4.8 8.3 15.9 18.7 11.8 5.6 15.9 10.4 15.3 12.3 9.1 10.4 7.2 14.5 11.2 15.3 19.8 7.6 17.7 11.1 9.0 13.2 12.0 3.7 8.0 13.4 12.5 15.0 (a) Convert this data into a continuous grouped frequency distribution using a class interval of size 3, starting from 2 minutes. (b) Do you think that the class size of 3 is reasonable? If so, why? (c) Convert the frequency distribution so constructed into relative frequency distribution. (d) Convert the relative frequency distribution into greater than cumulative relative frequency distribution. (e) What percentage of calls lasted between 8 and 11 minutes? (f) What percentage of calls lasted less than 11 minutes? (g) Draw an ogive for less than cumulative frequency. 14. The following is a frequency distribution of hourly wages paid to 590 workers at Supreme Motors Company in various departments. Hourly UBges (in dollars) Number of \"Hbrkers 8.30 to 8.89 60 8.90 to 9.49 80 9.50 to 10.09 112 10.10 to 10.69 120 10.70 to 11.29 96 11.30 to 11.89 61 11.90 to 12.49 26 12.50 to 13.09 20 13.10 to 13.69 15 (a) What is the size of the class interval? (b) How many workers are paid between $10.10 and $11.89 per hour? (c) How many employees earn less than $11.30 per hour?
(d) How many employees earn more than $11.29 per hour? Dlta Collection and (e) What percentage of workers earn between $8.90 and $9.49 per hour? Dlta Presentation (t) What percentage of workers earn more than $12.49 per hour? (g) What percentage of workers earn $11.30 or more per hour? NOTES (h) Constract the frequency distribution for the above data. (i) Draw the greater than cumulative frequency curve for this data. Self-Instructional Mlterial 35 (j) Draw a histogram for the above frequency distribution. 15. The following data represents the population of United States in millions based on the census taken every 10 years from 1950 to 1990. lears 1950 1960 1970 1980 1990 Population (in millions) 166 199 226 252 278 Construct a bar diagram for this data. 16. The percentage of votes received by Democratic and Republican candidates during Presidential elections for the period 1972 to 1992 is recorded in the following table. (The figures do not add up to 100 per cent because votes cast for other candidates are not included in the study.) lears 1972 1976 1980 1984 1988 1992 DeJTXJCratic 38 51 42 41 46 43 Republican 61 48 51 59 54 38 Construct a contiguous bar diagram for the percentage of votes received by Democratic and Republican candidates for each election year. 17. The following data represents the various expenses of two families earning the same amount of money per month. Expenditure ($) Food Clothing Education Misc. Savings 250 200 350 300 200 Family (AJ Family(B) 200 200 250 400 250 Construct a subdivided bar diagram for expenses in various categories for these two families. 18. As a professor of statistics, Dr Singh earns a net salary of $2400.00 per month. His family expenses are budgeted each month as follows: Rent Food School fees Clothing Misc. Savings $600 $500 $300 $300 $450 $250 Construct a pie chart to represent the percentages of various expenses and savings. 19. The following table represents the racial breakdown of people in the Flushing area in Queens. Race White Black Hispanic Asians Others 30,520 20,300 15,650 5,400 Nimbcr 205,000 Construct a pie chart to represent this data. (Make sure that the slices of the pie proportionately represent the various ethnic populations.)
Dlta Collection and 20. The student placement office at Medgar Evers College conducted a survey of last Dlta Presentation year's graduates from the School of Business to determine the general areas in which the graduates found jobs. The result of the survey is shown in the NOTES following table. Area Accounting Marketing Finance General Others Total Managemen Nunber of 45 32 27 16 10 130 Graduates Construct a suitable pie chart to represent this data. 21. The following data typically represents the yearly expenditure of a student in a private university. Expenditure Almunt Per cent of Total Tuition 14,400 45 Room and Boarding 9,600 30 Books and Supplies 2,560 8 Transportation 640 2 Recreation 2,240 7 Other 2,560 8 Total $32,000 100 Construct a suitable pie chart to represent this data. 22. A recent study showed that a typical American car owner incurs the following expenses, on an average, when he leases a car for three years. Expenditure item Lease amount Gasoline Insurance Maintenance Total Amount($) 4,500 1,350 1,800 1,350 $9,000 Draw a pie chart to portray this data. 2.16 FURTHER READING Kothari, C.R. 1984. Quantitative 1l:chniques, 3rd Edition. New Delhi: Vikas Publishing House Pvt. Ltd Chandan, J.S. 1998. Statistics for Business and Economics. New Delhi: Vtkas Publishing House Pvt. Ltd Chandan, J.S., Jagjit Singh and K.K. Khanna. 1995. Business Statistics, 2nd Edition. New Delhi: Vtkas Publishing House Pvt. Ltd. Campbell, Stephen K. 1987. Applied Statistics. New York: Harper & Row. 36 Self-Instructional M:uerial
UNIT 3 MEASURES OF CENTRAL ~asures ofCentral TENDENCY Tendency Structure NOTES 3.0 Introduction 3.1 Unit Objectives 3.2 Descriptive Statistics 3.3 Measures of Central Tendency 3.4 Aritlunetic Mean 3.4.1 Arithmetic Mean of Grouped Data 3.4.2 Advantages of Mean 3.4.3 Disadvantages of Mean 3.5 Preparing a Frequency Distribution Table 3.6 Properties ofthe Mean 3. 7 Short-Cut Methods for Calculating Mean 3.8 The Weighted Aritlunetic Mean 3.9 The Median 3.10 Location of Median by Graphical Analysis 3.11 Quartiles, Deciles and Percentiles 3.12 Mode 3.13 Geometric Mean 3.14 Harmonic Mean 3.15 Choice of Average 3.16 Misuse of Averages 3.17 Solved Problems 3.18 Summary 3.19 Answers to 'Check Your Progress' 3.20 Questions and Exercises 3.21 Further Reading 3.0 INTRODUCTION In this unit, you will learn about the measures of central tendency. The various methods of measurement of the central tendency are mean, median and mode. You will learn the aritlunetic procedures that can be used for analysing and interpreting quantitative data, i.e., the concept ofarithmetic mean, median and mode. For a proper understanding of quantitative data, they should be classified and converted into a frequency distribution. This process of condensation reduces their bulk and gives prominence to the underlying structure of the data. But, classification is only the first step in statistical analysis. If the characteristics of given data are to be properly revealed or if one distribution is to be compared with another, it is necessary that the frequency distribution itself must be summarized and condensed in such a manner that its essence is expressed in as few figures as possible. Self-Iustructionall\\4Jterial 37
!vl:asures ofCentral In this unit, we shall deal with some arithmetic procedures that can be used for Tendency analysing and interpreting quantitative data. These measures and procedures relate to some properties and characteristics of data which include measures of central location NOTES of data, other measures of noncentral location, measures of dispersion of data in itself and around the mean and the shape of the data. You will also learn to calculate geometric mean and harmonic mean. The unit also explains quartiles, deciles and percentiles. You will learn the method to explain grouped data as a frequency distribution. 3.1 UNIT OBJECTIVES After going through this unit, you will be able to: • Define the concept of descriptive statistics • Explain the significance of arithmetic mean in measuring central tendency • Prepare a frequency distribution table • Calculate mean by short-cut method • Calculate weighted arithmetic mean • Explain the location of median by graphical analysis • Differentiate between quartiles, deciles and percentiles • Calculate mode • Explain and calculate geometric and harmonic mean • Define the concept of choice of average • Describe how averages are misused 3.2 DESCRIPTIVE STATISTICS A single number describing some feature of a frequency distribution is called descriptive statistics. The main thrust of a statistician presenting a mass of data is to evolve few such descriptive statistics which describe the essential nature of the frequency distribution. For a proper appreciation of the various descriptive statistics involved, it is necessary to note that most of the statistical distribution have some common features. Though the size of the variables vary from item to item, most of the items are distributed in such a manner that if we move from the lowest value to the highest value of the variable, the number of items at each successive stage increases with a certain amount of regularity till we reach a maximum; and then as we proceed further, they decrease with the similar regularity. If we plot the percentage frequency density, i.e., the percentage of cases in an interval of unit variable width we get frequency curves of the type shown in Fig. 3.1. (Note that the area under each curve should be equal to 100, the total percentage points). y 38 Self-Instructional Material Variable Fig. 3.1 Frequency Curves
There are various 'gross' ways in which frequency curves can differ from one another. ~asures ofCentral Even when the 'general' shapes of the curves are the same (the area under them already Tendency made equal by the strategy of plotting the per cent density), the details of the shape may change. Thus, the curve B has a smaller spread than A, the curve C is more peaky and NOTES curve E is less symmetrical. Even when the curves have almost the same shape (i.e., same spread, peakiness, symmetry, etc.) as in curves A and D, the two may differ in location along the variable axis. Thus, the items of distribution Dare generally larger than those of A So also are those of B compared to A Hence, a kind of an 'average' location of the distribution along the variable axis is an important descriptive statistics. These statistics are collectively known as measures of location or of central tendency. 3.3 MEASURES OF CENTRAL TENDENCY As mentioned earlier, these statistics indicate the location of the frequency curve along the X:axis and ignore all other features of the distribution. There are various possible measures that can be used to 'locate' a frequency distribution, as shown in Fig. 3.2. A, the minimum value. B, the value of maximum concentration. c; the value which divides the distribution into half, such that one half of the items have value less than this and the other half more. D, the average value of all items. E, the 95th percentile, i.e., the value below which 95 per cent items lie. F, the maximum value. F Fig. 3.2 Frequency Distribution If the shape of the frequency distributions were fixed, then all these measures are equally descriptive, and fix the location of the curve. But, the practical distributions that we deal with always have some change in shape depending on the samples we take, even though the general shapes are quite similar. It is, therefore, necessary that we choose those measures oflocation which are not very sensitive to the specific values of items, in particular the extreme values. Thus, measures A and Bare generally meaningless because they depend on the values of the lowest and the highest items, respectively. The other measures, on the contrary, are less susceptible to extreme values because they are somehow related to the entire distributions. Thus, we treat B, c; D and E as the most common measures of location. There are some more of such measures which we will consider later. The most important object ofcalculating and measuring central tendency is to determine a 'single figure' which may be used to represent a whole series involving magnitudes of the same variable. In that sense, it is an even more compact description of the statistical data than the frequency distribution. Self-Instructional MJterial 39
AblsuresofCentral Since an 'average' represent the entire data, it facilitates comparison within one group Tendency or between groups of data. Thus, the performance of the members of a group can be NOTES compared by relating it to the average performance ofthe group. Likewise, the achievements of groups can be compared by a comparison of their respective averages. · 3.4 ARITHMETIC MEAN There are several commonly used measures such as arithmetic mean, mode and median. These values are very useful not only in presenting the overall picture of the entire data but also for the purpose of making comparisons among two or more sets of data. While arithmetic mean is the most commonly used measure of central location, mode and median are more suitable measures under certain set of conditions and for certain types of data. However, each measure of central tendency should meet the following requisites. 1. It should be easy to calculate and understand. 2. It should be rigidly defined. It should have only one interpretation so that the personal prejudice or bias of the investigator does not affect its usefulness. 3. It should be representative of the data. If it is calculated from a sample, then the sample should be random enough to be accurately representing the population. 4. It should have sampling stability. It should not be affected by sampling fluctuations. This means that if we pick 10 different groups of college students at random and compute the average of each group, then we should expect to get approximately the same value from each of these groups. 5. It should not be affected much by extreme values. If few very small or very large items are present in the data, they will unduly influence the value of the average by shifting it to one side or other, so that the average would not be really typical of the entire series. Hence, the average chosen should be such that it is not unduly affected by such extreme values. Let us consider the measure of central tendency, arithmetic mean. This is also commonly known as simply the mean. Even though average, in general, means any measure of central location. When we use the word average in our daily routine, we always mean the arithmetic average. The term is widely used by almost every one in daily communication. We speak of an individual being an average student or of average intelligence. We always talk about average family size or average family income or grade point average (GPA) for students and so on. For discussion purposes, let us assume a variable X which stands for some scores such as the ages of students. Let the ages of 5 students be 19, 20, 22, 22 and 17 years. Then variable X would represent these ages as follows: X 19, 20, 22, 22, 17 Placing the Greek symboll:(Sigma) before Xwould indicate a command that all values of X are to be added together. Thus: l:X= 19+20+22+22+ 17 The mean is computed by adding all the data values and dividing it by the number of such values. The symbol used for sample average is X so that: X= 19+20+22+22+17 5 40 Self-Instructional :Mltt:rial
Search
Read the Text Version
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
- 213
- 214
- 215
- 216
- 217
- 218
- 219
- 220
- 221
- 222
- 223
- 224
- 225
- 226
- 227
- 228
- 229
- 230
- 231
- 232
- 233
- 234
- 235
- 236
- 237
- 238
- 239
- 240
- 241
- 242
- 243
- 244
- 245
- 246
- 247
- 248
- 249
- 250
- 251
- 252
- 253
- 254
- 255
- 256
- 257
- 258
- 259
- 260
- 261
- 262
- 263
- 264
- 265
- 266
- 267
- 268
- 269
- 270
- 271
- 272
- 273
- 274
- 275
- 276
- 277
- 278
- 279
- 280
- 281
- 282
- 283
- 284
- 285
- 286
- 287
- 288
- 289
- 290
- 291
- 292
- 293
- 294
- 295
- 296
- 297
- 298
- 299
- 300
- 301
- 302
- 303
- 304
- 305
- 306
- 307
- 308
- 309
- 310
- 311
- 312
- 313
- 314
- 315
- 316
- 317
- 318
- 319
- 320
- 321
- 322
- 323
- 324
- 325
- 326
- 327
- 328
- 329
- 330
- 331
- 332
- 333
- 334
- 335
- 336
- 337
- 338
- 339
- 340
- 341
- 342
- 343
- 344
- 345
- 346
- 347
- 348
- 349
- 350
- 351
- 352
- 353
- 354
- 355
- 356