13.6 DATA TRANSFORMATION Data transformation refers to the conversion of the value of a given data point, using some kind of consistent mathematical transformation. Data transformation is the process of changing the format, structure, or values of data. For data analytics projects, data may be transformed at two stages of the data pipeline. Organizations that use on-premises data warehouses generally use an ETL (extract, transform, and load) process, in which data transformation is the middle step. Today, most organizations use cloud-based data warehouses, which can scale compute and storage resources with latency measured in seconds or minutes. The scalability of the cloud platform lets organizations skip preload transformations and load raw data into the data warehouse, then transform it at query time — a model called ELT ( extract, load, transform). Data transformation may be constructive (adding, copying, and replicating data), destructive (deleting fields and records), aesthetic (standardizing salutations or street names), or structural (renaming, moving, and combining columns in a database). An enterprise can choose among a variety of ETL tools that automate the process of data transformation. Data analysts, data engineers, and data scientists also transform data using scripting languages such as Python or domain-specific languages like SQL. The advantages and drawbacks of data transformation There are various advantages to transforming data: To make data more organised, it is changed. Humans and computers may find it easier to use transformed data. Null values, unexpected duplicates, wrong indexing, and incompatible formats can all be avoided with properly structured and verified data, which enhances data quality and protects programmes from potential landmines. Data transformation makes it easier for applications, systems, and different types of data to work together. Data that is utilised for several purposes may require different transformations. However, there are some difficulties in effectively transforming data: It is possible that data transformation will be costly. The price is determined by the infrastructure, software, and tools that are utilised to process data. Licensing, computing resources, and recruiting appropriate employees are all possible expenses. Data transformations can be time-consuming and resource intensive. Performing transformations after loading data into an on-premises data warehouse, or altering data before feeding it into apps, might impose a strain on other operations. Because the platform can 251
scale up to meet demand, you can conduct the changes after loading if you employ a cloud- based data warehouse. During transition, a lack of competence and negligence might cause issues. Because they are unfamiliar with the range of valid and allowed numbers, data analysts without proper subject matter experience are less likely to spot typos or inaccurate data. Someone dealing with medical data who is inexperienced with relevant terminologies, for example, may misspell illness names or fail to indicate disease names that should be mapped to a singular value. Enterprises have the ability to undertake conversions that do not meet their requirements. For one application, a company may alter information to a specific format, only to restore the information to its previous format for another. How to Change Data Data transformation can help analytic and business processes run more efficiently and enable improved data-driven decision-making. Data type conversion and flattening of hierarchical data should be included in the first phase of data transformations. These processes alter data in order to make it more compatible with analytics software. Additional transformations can be applied as needed by data analysts and data scientists as distinct layers of processing. Each processing layer should be built to accomplish a specified set of operations in order to satisfy a recognised business or technological requirement. Within the data analytics stack, data transformation serves a variety of purposes. Parsing and extraction Data ingestion is the process of taking information from a data source and then replicating it to its destination in the current ELT process. The first transformations focus on shaping the data's format and structure to ensure that it is compatible with both the target system and the data currently present. This sort of data transformation includes extracting fields from comma-delimited log data for loading into a relational database. Mapping and translation Data mapping and translation are two of the most basic data transformations. A column containing integers representing error codes, for example, can be mapped to the necessary error descriptions, making the column more understandable and suitable for display in a customer-facing application. Translation is the process of converting data from one system's format to another system's format. Web data may arrive in the form of hierarchical JSON or XML files after parsing, but it must be transformed into row and column data before being stored in a relational database. Filtering, aggregation, and summarization are all steps in the process. The goal of data transformation is to reduce the size of data and make it more manageable. Filtering out unneeded fields, columns, and records can help to consolidate data. Numerical 252
indices in data meant for graphs and dashboards, or records from business locations that aren't relevant to a given study, are examples of omitted data. Data can also be summarised or aggregated. by converting a time series of customer transactions into hourly or daily sales numbers, for example. Although BI systems can perform this filtering and aggregation, it may be more effective to perform the transformations before accessing the data with a reporting tool. Imputation and enrichment Data from several sources can be combined to produce denormalized, enriched data. The transactions of a customer can be rolled up into a grand total and stored in a customer information table for easy reference or use by customer analytics tools. As a result of these modifications, long or freeform fields can be split into many columns, and missing values or damaged data can be imputed or replaced. Organizing and indexing Data can be modified to make it logically orderable or to fit into a data storage scheme. Creating indexes in relational database management systems, for example, can improve performance or the management of relationships between tables. Encryption and anonymization Before being disseminated, data containing personally identifiable information or other information that can jeopardise privacy or security should be anonymized. Many industries demand encryption of confidential data, and systems can encrypt data at multiple levels, from individual database cells to entire records or fields. Modeling, typecasting, formatting, and renaming are all things that can be done. Finally, a number of transformations can be used to reshape data without changing its content. This includes renaming schemas, tables, and columns for clarity, casting and converting data types for compatibility, modifying dates and times with offsets and format translation, and renaming schemas, tables, and columns. The data transformation procedure is being fine-tuned. You must duplicate the data to a data warehouse architected for analytics before your company can perform analytics, and even before you transform the data. Most businesses today opt for a cloud data warehouse, which allows them to fully utilise ELT. All of your data may be loaded into your favourite data warehouse in a raw state, ready for transformation, using Stitch. Stitch is available to try for free. 253
13.7. STATISTICS IN RESEARCH Statistics' role in research is to serve as a tool for planning research, analysing data, and drawing conclusions from it. Most research investigations generate a huge amount of raw data, which must be reduced to a manageable size so that it can be read simply and used for future analysis. Clearly, any researcher cannot disregard the science of statistics, even if he may not have the opportunity to employ statistical procedures in all of their ramifications. As previously said, classification and tabulation help to achieve this goal to some extent, but we must go a step further and build indices or measures to summarise the collected/classified data. Only then can we apply the generalisation process from small groups (i.e., samples) to the entire population. In reality, descriptive statistics and inferential statistics are the two main types of statistics. Inferential statistics is concerned with the process of generalisation, whereas descriptive statistics is concerned with the generation of particular indices from raw data. Inferential statistics, often known as sampling statistics, are primarily concerned with two types of problems: I population parameter estimation and (ii) statistical hypothesis testing. The important statistical measures that are used to summarise the survey/research data are: (1) Measures of Central Tendency or Statistical Averages. (2) Measures of Dispersion. (3) Measures of Asymmetry (Skewness); (4) Measures of Relationship; And (5) Other measures. 13.7.1 Measures of Central tendency Central tendency measures (or statistical averages) reveal the point at which items have a proclivity to cluster. This figure is thought to be the most representative of the entire set of data. The statistical average is a measure of central tendency. The most commonly used averages are mean, median, and mode. The most common measure of central tendency is the mean, also known as arithmetic average. It is defined as the value obtained by dividing the total of the values of various given items in a series by the total number of items. We can figure it out as follows: 254
Median is the value of the middle item of series when it is arranged in ascending or descendingorder of magnitude. It divides the series into two halves; in one half all items are less than median,whereas in the other half all items have values higher than median. If the values of the items arrangedin the ascending order are: 60, 74, 80, 90, 95, 100, then the value of the 4th item viz., 88 is the valueof median. We can also write thus: Median is a positional average and is used only in the context of qualitative phenomena, for example, in estimating intelligence, etc., which are often encountered in sociological fields. Median is not useful where items need to be assigned relative importance and weights. It is not frequently used in sampling statistics. Mode is the most commonly or frequently occurring value in a series. The mode in a distribution is that item around which there is maximum concentration. In general, mode is the size of the item which has the maximum frequency, but at items such an item may not be mode on account of the effect of the frequencies of the neighbouring items. Like median, mode is a positional average and is not affected by the values of extreme items. it is, therefore, useful in all situations where we want to eliminate the effect of extreme variations. Mode is particularly useful in the study of popular sizes. For example, a manufacturer of shoes is usually interested in finding out the size most in demand sothat he may manufacture a larger quantity of that size. In other words, he wants a modal size to be determined for median or mean size would not serve his purpose. but there are certain limitations ofmode as well. For example, it is not amenable to algebraic treatment 255
and sometimes remainsindeterminate when we have two or more model values in a series. It is considered unsuitable incases where we want to give relative importance to items under consideration. Geometric meanis also useful under certain conditions. It is defined as the nth root of theproduct of the values of n times in a given series. Symbolically, we can put it thus: 13.7.2 Measures of dispersion Averages can only represent a series as well as a single figure can, but they can't tell the whole storey about any phenomenon under investigation. It particularly fails to convey any information about the scatter of the values of items of a variable in the series around the true average value. In order to calculate, 256
Measures of dispersion are statistical devices that are used to calculate this scatter. The range, mean deviation, and standard deviation are all important measures of dispersion. (a) Range is the most basic form of dispersion measurement, and it is defined as the difference between the values of a series' extreme items. Thus, The benefit of range is that it quickly gives an idea of variability, but the disadvantage is that range is heavily influenced by sampling fluctuations. Because it is based on only two values of the variable, its value is never stable. As a result, range is mostly used as a rough measure of variability and is rarely used in serious research studies. (b) Mean deviation is the average of the differences in item values from some series average. Deviation is the technical term for such a difference. When calculating mean deviation, we ignore the minus sign of deviations when calculating the total deviation. As a result, the mean deviation is calculated as follows: When mean deviation is divided by the average used in finding out the mean deviation itself, the resulting quantity is described as the coefficient of mean deviation. Coefficient of mean deviation is a relative measure of dispersion and is comparable to similar measure of other series. Mean deviation and its coefficient are used in statistical studies for judging the variability, and thereby render the study of central tendency of a series more precise by throwing light on the typicalness of an average. It is a better measure of variability than range as it takes into consideration the values of all items of a series. Even then it is not a frequently used measure as it is not amenable to algebraic process. 257
(c) Standard deviationis most widely used measure of dispersion of a series and is commonly denoted by the symbol ‘ s ’ (pronounced as sigma). Standard deviation is defined as the square-root of the average of squares of deviations, when such deviations for the values of individual items in a series are obtained from the arithmetic average. It is worked out as under: When we divide the standard deviation by the arithmetic average of the series, the resulting quantity is known as coefficient of standard deviation whichhappens to be a relative measure and is often used for comparing with similar measure of other series. When this coefficient of standard deviation is multiplied by 100, the resulting figure is known as coefficient of variation. Sometimes, we work out the square of standard deviation, known as variance, which is frequently used in the context of analysis of variation. The standard deviation (along with several related measures like variance, coefficient of variation, etc.) is used mostly in research studies and is regarded as a very satisfactory measure of dispersion in a series. It is amenable to mathematical manipulation because the algebraic signs are not ignored in its calculation (as we ignore in case of mean deviation). It is less affected by fluctuations of sampling. These advantages make standard deviation and its coefficient a very popular measure of the scatteredness of a series. It is popularly used in the context of estimation and testing of hypotheses. 258
13.7.3 Measures of Asymmetry (Skewness) When the distribution of item in a series happens to be perfectly symmetrical, we then have the following type of curve for the distribution: Fig 13.9 Curve showing no skewness Such a curve is technically described as a normal curveand the relating distribution as normal distribution. Such a curve is perfectly bell-shaped curve in which case the value of X or M or Z is just the same and skewness is altogether absent. But if the curve is distorted (whether on the right side or on the left side), we have asymmetrical distribution which indicates that there is skewness. If the curve is distorted on the right side, we have positive skewness but when the curve is distorted towards left, we have negative skewness as shown here under: Fig 13.10 Curve showing positive skewness(left) , Curve showing negative skewness(right) Skewness is, thus, a measure of asymmetry and shows the manner in which the items are clustered around the average. In a symmetrical distribution, the items show a perfect balance on either side of the mode, but in a skew distribution the balance is thrown to one side. The amount by which the balance exceeds on one side measures the skewness of the series. The difference between the mean, median or the mode provides an easy way of expressing skewness in a series. In case of positive skewness, we have Z <M <X and in case of negative skewness we have X <M <Z. Usually we measure skewness in this way: 259
13.7.4 Measures of Relationship So far, we've looked at statistical measures that are used in the context of a univariate population, or one in which only one variable is measured. However, if we have data on two variables, we are said to have a bivariate population, and if we have data on more than two variables, we have a multivariate population. A bivariate population is formed when there is a corresponding value of a second variable, Y, for each measurement of the first variable, X. Furthermore, we may have a corresponding value for the third variable, Z, or the fourth variable, W, and so on; the resulting pairs of values are referred to as a multivariate population. When dealing with bivariate or multivariate populations, we frequently want to know how the two or more variables in the data relate to one another. For example, we'd like to know if the number of hours students devote to their studies is related to their family's income, age, sex, or any other factor. There are several methods for determining the relationship between variables, but none of them can guarantee that a correlation is a cause- and-effect relationship. Thus, in bivariate or multivariate populations, we must answer two types of questions: (i) Does there exist association or correlation between the two (or more) variables? If yes, of what degree? (ii) Is there any cause-and-effect relationship between the two variables in case of the bivariatepopulation or between one variable on one side and two or more variables on the other sidein case of multivariate population? If yes, of what degree and in which direction? The first question is answered by the use of correlation technique and the second question by the technique of regression. There are several methods of applying the two techniques, but the important ones are as under: In case of bivariate population: Correlation can be studied through (a) cross tabulation. (b) Charles Spearman’s coefficient of correlation. (c) Karl Pearson’s coefficient of correlation. whereas cause and effect relationship can be studied through simple regression equations. 260
In case of multivariate population: Correlation can be studied through (a) coefficient of multiple correlation; (b) coefficient of partial correlation, whereas cause and effect relationship can be studied through multiple regression equations. Cross tabulationapproach is especially useful when the data are in nominal form. Under it we classify each variable into two or more categories and then cross classify the variables in these subcategories. Then we look for interactions between them which may be symmetrical, reciprocal or asymmetrical. A symmetrical relationship is one in which the two variables vary together, but we assume that neither variable is due to the other. A reciprocal relationship exists when the two variables mutually influence or reinforce each other. Asymmetrical relationship is said to exist if one variable (the independent variable) is responsible for another variable (the dependent variable). The cross-classification procedure begins with a two-way table which indicates whether there is or there is not an interrelationship between the variables. This sort of analysis can be further elaborated in which case a third factor is introduced into the association through cross classifying the three variables. By doing so we find conditional relationship in which factor X appears to affect factor Y only when factor Z is held constant. The correlation, if any, found through this approach is not considered a very powerful form of statistical correlation and accordingly we use some other methods when data happen to be either ordinal or interval or ratio data. Charles Spearman’s coefficient of correlation (or rank correlation) is the technique ofdetermining the degree of correlation between two variables in case of ordinal data where ranks are given to the different values of the variables. The main objective of this coefficient is to determine the extent to which the two sets of ranking are similar or dissimilar. This coefficient is determined as under: 261
262
Simple regression analysis Regression is the determination of a statistical relationship between two or more variables. In simple regression, we have only two variables, one variable (defined as independent) is the cause of the behaviour of another one (defined as dependent variable). Regression can only interpret what exists physically i.e., there must be a physical way in which independent variable X can affect dependent variable Y. The basic relationship between X and Y is given by Y = a + bX where the symbol $ Y denotes the estimated value of Y for a given value of X. This equation is known as the regression equation of Y on X (also represents the regression line of Y on X when drawn on a graph) which means that each unit change in X produces a change of b in Y, which is positive for direct and negative for inverse relationships. Then generally used method to find the ‘best’ fit that a straight line of this kind can give is the least-square method. To use it efficiently, we first determine 263
These measures define a and b which will give the best possible fit through the original X and Y points and the value of r can then be worked out as under: Thus, the regression analysis is a statistical method to deal with the formulation of mathematical model depicting relationship amongst variables which can be used for the purpose of prediction of the values of dependent variable, given the values of the independent variable. Multiple correlation and regression When there are two or more than two independent variables, the analysis concerning relationship is known as multiple correlation and the equation describing such relationship as the multiple regression equation. We here explain multiple correlation and regression taking only two independent variables and one dependent variable (Convenient computer programs exist for dealing with a great number of variables). In this situation the results are interpreted as shown below: Multiple regression equation assumes the form Y = a + b1X1 + b2X2 where X1 and X2 are two independent variables and Y being the dependent variable, and the constants a, b1 and b2 can be solved by solving the following three normal equations: 13.7.5 Other measures 1. Index numbers: 264
When two or more series are expressed in the same units, statistical averages can be used to compare them. However, when the units in which two or more series are expressed differ, statistical averages cannot be used to compare them. In these cases, we must rely on some form of relative measurement, which entails reducing the figures to a common base. Converting the series into a series of index numbers is one such method. When we express the given figures as percentages of a specific figure on a set of data, we are doing so. As a result, an index number can be defined as a number that is used to compare the level of a given phenomenon to the level of the same phenomenon at a specific date. The use of index number weights is more of a special type of average, designed to investigate changes in the effect of factors that are difficult to measure directly. However, it is important to remember that index numbers only reflect relative changes. Index numbers can be used to measure and compare changes in various economic and social phenomena. Different indices are used for various purposes. Specific commodity indices should only be used to track changes in that commodity's phenomenon. The cost of living of various classes of people can be measured using index numbers. Index numbers are frequently referred to as economic barometers in the economic sphere because they measure the economic phenomenon in all of its aspects, either directly by measuring the same phenomenon or indirectly by measuring something else that reflects the main phenomenon. However, index numbers have their own limitations, which the researcher must be aware of at all times. Index numbers, for example, are only approximate indicators that provide a rough idea of changes but not an exact picture. While there is always the possibility of error when constructing an index number, this does not negate its utility, as index numbers can still indicate the trend of the phenomenon being measured. To avoid erroneous conclusions, index numbers created for one purpose should not be used for another or for the same purpose in different places. 2. Time series analysis: In the context of economic and business research, we frequently obtain data pertaining to a specific time period pertaining to a specific phenomenon. This type of data is referred to as a 'Time Series.' Time series can be defined as a collection of successive observations of a given phenomenon over a period of time. Typically, such series are the result of one or more of the following factors: (i) A secular trend, also known as a long-term trend, is a pattern that shows the direction of a series over time. The effect of trend (whether it is a growth or a decline factor) is gradual, but it persists more or less consistently over the entire time period under consideration. Secular trend is sometimes referred to simply as trend (or T). (ii) Short-term oscillations, or changes that occur for a short period of time only, and which can be caused by the following factors: 265
(a) Cyclical fluctuations (or C) are long-term movements that represent consistently recurring rises and declines in an activity as a result of business cycles. (b) Seasonal fluctuations (or S) are short-term fluctuations that occur in a predictable pattern at regular intervals. Seasonal changes are the cause of such fluctuations. Typically, these fluctuations involve year-to-year patterns of change that tend to repeat themselves. Short- period regular fluctuations are made up of cyclical and seasonal fluctuations combined. (c) Irregular fluctuations (or I), also known as Random fluctuations, are changes that occur in an unpredictably random manner. All of the aforementioned factors are referred to as time series components, and when we analyse time series, we try to isolate and measure the effects of various types of these factors on the series. The other type of factor is removed from the series to study the effect of one type of factor. As a result, the effects of only one type of factor are left in the given series. We usually have two models for analysing time series: (1) multiplicative model and (2) additive model. The multiplicative model assumes that the various components interact in a multiplicative manner to produce the given values of the overall time series and can be written as The multiplicative model assumes that the various components interact in a multiplicative manner to produce the given values of the overall time series and can be written as Y=T×C×S×I where Y = observed time series values, T = Trend, C = Cyclical fluctuations, S = Seasonal fluctuations, and I = Irregular fluctuations. The additive model, which can be written as Y = T + C + S + I, takes into account the sum of various components that result in the given values of the overall time series. There are a variety of methods for isolating trend from a given series, including the free hand method, semi average method, method of moving averages, and method of least squares, as well as methods for measuring cyclical and seasonal variations, with any remaining variations being referred to as random or irregular fluctuations. Time series analysis is used to better understand the dynamic conditions that a business firm must face in order to achieve its short- and long-term objectives (s). Past trends can be used to assess the success or failure of management policies or policies that have been implemented in the past. Future patterns can be predicted based on past trends, and policy or policies can be formulated accordingly. Once the effects of trend have been eliminated, we can properly study the effects of factors causing changes in a short period of time. By studying cyclical variations, we can keep the impact of cyclical changes in mind as we formulate various policies to ensure that they are as realistic as possible. We will be able to 266
make better decisions about inventory, production, purchases, and sales policies if we have a better understanding of seasonal variations. As a result, time series analysis is important in both long and short-term forecasting, and it is regarded as a very powerful tool in the hands of business analysts and researchers. 13.8 SUMMARY A sample design is a definite plan for obtaining a sample from a given population. It refers to the technique or the procedure the researcher would adopt in selecting items for the sample. Stages of Data analysis are as follows- Editing, Coding, Data entry, Data analysis and Data Interpretation. Pre- and post-coding and categorization of a data set are guided by four rules that must be followed. A single variable's categories should be: Appropriate to the research problem and purpose. Exhaustive. Mutually exclusive. Derived from one classification dimension. Exploratory data analysis is both a data analysis perspective and a set of techniques. Exploratory data analysis is the first step in the search for evidence; without it, confirmatory data analysis will have nothing to evaluate. Cross-tabulation is a statistical model for the mainframe that follows a similar pattern. It assists you in making informed decisions about your research by identifying patterns, trends, and the relationship between the parameters of your study 13.9 KEYWORDS Data Integrity: Data integrity refers to the notion that the data file actually contains the information that the researcher promised the decision maker he or she would obtain. Coding: The process of assigning a numerical score or other character symbol to previously edited data. Codes: Rules for interpreting, classifying, and recording data in the coding process; also, the actual numerical or other character symbols assigned to raw data Record: A record is a collection of fields that are related to one another Data file: A data file is a collection of records that are related to one another and that together form a data set. 267
13.10 LEARNING ACTIVITY 1. A researcher believes that married men will push the grocery cart when grocery shopping with their wives. How would the hypothesis be tested? What tests of difference are appropriate in the following situations? 13.11 UNIT END QUESTIONS A. Descriptive Questions Short Questions 1. Define Sample design. 2. What do you mean by measures of central tendency? 3. Write short notes on measures of dispersion. 4. Explain cross-tabulation. 5. What is parsing? Long Questions 1. Identify the stages of data analysis. 2. Infer about thecross-tabulationwith example. 3. How to feed the data and perform the analysis? Explain 4. Discuss the Statistics in Research. 5. Explain about theexploratory data analysis. B. Multiple Choice Questions 1. _____ refers to the notion that the data file actually contains the information that the researcher promised the decision maker he or she would obtain. a. Data editing b. Data integrity c. Data coding d. All of these 2. Rules for interpreting, classifying, and recording data in the coding process. a. principles b. codes c. code rules d. data file 3. _____ is the process of converting information gathered through secondary or primary methods into a format that can be viewed and manipulated. a. Database 268
b. Data editing c. Data entry d. Data process 4. _____ is a statistical model for the mainframe that follows a similar pattern. a. Spreadsheets b. Correlation c. Cross-tabulation d. Histograms 5. ______ is a collection of data that has been organised to allow for computerised retrieval of the information. a. Database b. Datafile c. Warehouse d. Datastore Answers 1-b, 2-b, 3-c, 4-c, 5-a 13.12 REFERENCES References book R1, Business Research Methods – Alan Bryman& Emma Bell, Oxford University Press. R2, Research Methodology - C.R. Kothari R2, Statistics for Managers Using Microsoft Excel, Levine Stephan, Krehbiel Berenson Textbook references T1, SPSSExplained,ISBN:9780415274104,Publisher:TataMcgrawHill T2,Sancheti&Kapoor,BusinessMathematics,SultanChand,NewDelhi 269
UNIT 14 : PARAMETRIC AND NON-PARAMETRIC TESTS STRUCTURE 14.0 Learning Objective 14.1 What is a Hypothesis? 14.2 Basic concepts concerning Testing of Hypotheses 14.3 Procedure for Hypothesis Testing 14.4 Flow Diagram for Hypothesis Testing 14.5 Tests of Hypotheses 14.6 Parametric Tests 14.6.1 z-test 14.6.2 t-test 14.6.3 Chi square test 14.6.4 F-test 14.7 Non-Parametric Tests 14.7.1 Kruskal Wallis Test 14.7.2 Sign Test 14.7.3 Wilcoxon Signed-Rank Test 14.7.4 Mann Whitney U Test 14.8 Limitations of the Tests of Hypotheses 14.9 Summary 14.10 Keywords 14.11 Learning Activity 14.12 Unit End Questions 14.13 References 14.0 LEARNING OBJECTIVES After studying this unit, you will be able to: 270
Describe the basic concepts concerning testing of Hypothesis. Explain the procedure for Hypothesis testing Explain the different types of Parametric tests. List the limitation of the tests of Hypothesis Describe Non-parametric tests briefly. 14.1 WHAT IS HYPOTHESIS? In most cases, the hypothesis is thought to be the most important tool in research. Its main purpose is to generate new ideas for experiments and observations. Indeed, many experiments are conducted with the explicit goal of testing hypotheses. Decision-makers frequently encounter situations in which they want to test hypotheses based on available data and then make decisions based on the results. In social science, where direct knowledge of population parameter(s) is uncommon, hypothesis testing is a common strategy for determining whether sample data support a hypothesis sufficiently to allow generalization. As a result of hypothesis testing, we can make probability statements about population parameters (s). The hypothesis may not be proved beyond a reasonable doubt, but it is accepted in practise if it has withstood critical scrutiny. Before we go over how hypotheses are tested using various tests designed for the purpose, it's important to first define what a hypothesis is and the concepts that go along with it so that we can better understand the hypothesis testing techniques. A hypothesis may be defined as a proposition, or a set of propositions set forth as an explanation for the occurrence of some specified group of phenomena either asserted merely as a provisional conjecture to guide some investigation or accepted as highly probable in the light of established facts. Quite often a research hypothesis is a predictive statement, capable of being tested by scientific methods, that relates an independent variable to some dependent variable. For example, consider statements like the following ones: “Students who receive counselling will show a greater increase in creativity than students not receiving counselling” Or “The automobile A is performing as well as automobile B.” Hypothesis characteristics: A hypothesis must have the following characteristics: (i) The hypothesis should be precise and clear. The inferences drawn on the basis of a hypothesis that is not clear and precise cannot be trusted. (ii) The hypothesis must be able to be tested. Many times, research programmes have become bogged down in a swamp of untestable hypotheses. The researcher may conduct some preliminary research in order to make the hypothesis testable. A hypothesis is testable if it can lead to other deductions, which can then be confirmed or refuted by observation. 271
(iii)If the hypothesis is a relational hypothesis, it should state the relationship between variables. (iv) Hypotheses should be specific and limited in scope. A researcher must keep in mind that narrower hypotheses are more testable, and he should develop them. (v) Hypothesis should be stated in as simple a manner as possible so that it is easily understood by all parties involved. However, it is important to remember that the simplicity of a hypothesis has nothing to do with its importance. (vi) Hypothesis must be consistent with the majority of known facts, i.e., with a large body of established facts. To put it another way, it should be the one that judges believe is the most likely. (vii) The hypothesis should be testable in a reasonable amount of time. Even the best hypothesis should not be used if it cannot be tested in a reasonable amount of time, because one cannot spend a lifetime collecting data to test it. (viii) Hypothesis must account for the facts that led to the need for an explanation. This means that by combining the hypothesis with other well-known and accepted generalisations, the original problem condition should be deduced. As a result, a hypothesis must actually explain what it claims to explain, and it must be supported by empirical evidence. 14.2 BASIC CONCEPTS CONCERNING TESTING OF HYPOTHESES It is necessary to explain basic concepts in the context of hypothesis testing. (a) Null hypothesis and alternative hypothesis: In the context of statistical analysis, we often talk about null hypothesis and alternative hypothesis. If we are to compare method A with method B about its superiority and if we proceed on the assumption that both methods are equally good, then this assumption is termed as the null hypothesis. As against this, we may think that the method A is superior or the method B is inferior, we are then stating what is termed as alternative hypothesis. The null hypothesis is generally symbolized as H0 and the alternative hypothesis as Ha. Suppose we want to test the hypothesis that the population mean (m) is equal to the hypothesised mean mH0 =100. Then we would say that the null hypothesis is that the population mean is equal to the hypothesised mean 100 and symbolically we can express as: If our sample results do not support this null hypothesis, we should conclude that something else is true. What we conclude rejecting the null hypothesis is known as alternative hypothesis. In other words, the set of alternatives to the null hypothesis is referred to as the alternative hypothesis. If we accept H0, then we are rejecting Ha and if we reject H0, then we are accepting Ha. 272
For We may consider three possible alternative hypotheses as follows: The null hypothesis and the alternative hypothesis are chosen before the sample is drawn (the researcher must avoid the error of deriving hypotheses from the data that he collects and then testing the hypotheses from the same data). In the choice of null hypothesis, the following considerations are usually kept in view: (a) Alternative hypothesis is usually the one which one wishes to prove, and the null hypothesisis the one which one wishes to disprove. Thus, a null hypothesis represents the hypothesiswe are trying to reject, and alternative hypothesis represents all other possibilities. (b) If the rejection of a certain hypothesis when it is actually true involves great risk, it is takenas null hypothesis because then the probability of rejecting it when it is true is a (the levelof significance) which is chosen very small. (c) Null hypothesis should always be specific hypothesis i.e.; it should not state about orapproximately a certain value. Generally, in hypothesis testing we proceed on the basis of null hypothesis, keeping the alternative hypothesis in view. Why so? The answer is that on the assumption that null hypothesis is true, one can assign the probabilities to different possible sample results, but this cannot be done if we proceed with the alternative hypothesis. Hence the use of null hypothesis (at times also known as statistical hypothesis) is quite frequent. (b) The level of significance: This is a very important concept in the context of hypothesis testing. It is always some percentage (usually 5%) which should be chosen with great care, thought and reason. In case we take the significance level at 5 per cent, then this implies that H0 will be rejected when the sampling result (i.e., observed evidence) has a less than 0.05 probability of occurring if H0is true. In other words, the 5 per cent level of significance means that researcher is willing to take as much as a 5 per cent risk of rejecting the null hypothesis when it (H0) happens to be true. Thus, the significance level is the maximum value of the probability of rejecting H0 when it is true and is usually determined in advance before testing the hypothesis. 273
(c) Decision rule or test of hypothesis: Given a hypothesis H0 and an alternative hypothesis Ha, we make a rule which is known as decision rule according to which we accept H0 (i.e., reject Ha) or reject H0 (i.e., accept Ha). For instance, if (H0 is that a certain lot is good (there are very few defective items in it) against Ha) that the lot is not good (there are too many defective items in it), then we must decide the number of items to be tested and the criterion for accepting or rejecting the hypothesis. We might test 10 items in the lot and plan our decision saying that if there are none or only 1 defective item among the 10, we will accept H0 otherwise we will reject H0 (or accept Ha). This sort of basis is known as decision rule. (d) Type I and Type II errors: In the context of testing of hypotheses, there are basically two types of errors we can make. We may reject H0 when H0 is true, and we may accept H0 when in fact H0 is not true. The former is known as Type I error and the latter as Type II error. In other words, Type I error means rejection of hypothesis which should have been accepted and Type II error means accepting the hypothesis which should have been rejected. Type I error is denoted by a (alpha)known as an error, also called the level of significance of test; and Type II error is denoted by b (beta) known as b error. In a tabular form the said two errors can be presented as follows: The probability of Type I error is usually determined in advance and is understood as the level of significance of testing the hypothesis. If type I error is fixed at 5 per cent, it means that there are about 5 chances in 100 that we will reject H0 when H0 is true. We can control Type I error just by fixing it at a lower level. For instance, if we fix it at 1 per cent, we will say that the maximum probability of committing Type I error would only be 0.01. But with a fixed sample size, n, when we try to reduce Type I error, the probability of committing Type II error increases. Both types of errors cannot be reduced simultaneously. There is a trade-off between two types of errors which means that the probability of making one type of error can only be reduced if we are willing to increase the probability of making the other type of error. To deal with this trade-off in business situations, decision-makers decide the appropriate level of Type I error by examining the costs or penalties attached to both types of errors. If Type I error involves the time and trouble of reworking a batch of 274
chemicals that should have been accepted, whereas Type II error means taking a chance that an entire group of users of this chemical compound will be poisoned, then in such a situation one should prefer a Type I error to a Type II error. As a result, one must set very high level for Type I error in one’s testing technique of a given hypothesis.2 Hence, in the testing of hypothesis, one must make all possible effort to strike an adequate balance between Type I and Type II errors. (e) Two-tailed and One-tailed tests: In the context of hypothesis testing, these two terms are quite important and must be clearly understood. A two-tailed test rejects the null hypothesis if, say, the sample mean is significantly higher or lower than the hypothesised value of the mean of the population. Such a test is appropriate when the null hypothesis is some specified value and the alternative hypothesis is a value not equal to the specified value of the null hypothesis. Symbolically, the two tailed test is appropriate when we have Which may mean 275
Mathematically we can state: If the significance level is 5 per cent and the two-tailed test is to be applied, the probability of the rejection area will be 0.05 (equally splitted on both tails of the curve as 0.025) and that of the acceptance region will be 0.95 as shown in the above curve. If we take and if our sample mean deviates significantly from 100 in either direction, then we shall reject the null hypothesis; but if the sample mean does not deviate significantly from m , in that case we shall accept the null hypothesis. But there are situations when only one-tailed test is considered appropriate. A one-tailed test would be used when we are to test, say, whether the population mean is either lower than or higher than some hypothesised value. For instance, if our then we are 276
interested in what is known as left-tailed test (wherein there is one rejection region only on the left tail) which can be illustrated as below: 14.3 PROCEDURE FOR HYPOTHESIS TESTING To test a hypothesis means to tell (on the basis of the data the researcher has collected) whether or not the hypothesis seems to be valid. In hypothesis testing the main question is: whether to accept the null hypothesis or not to accept the null hypothesis? Procedure for hypothesis testing refers to all those steps that we undertake for making a choice between the two actions i.e., rejection and acceptance of a null hypothesis. The various steps involved in hypothesis testing are stated below: (i) Making a formal statement: The step consists in making a formal statement of the null hypothesis (H0) and also of the alternative hypothesis (Ha). This means that hypotheses should be clearly stated, considering the nature of the research problem. For instance, Mr. Mohan of the Civil Engineering Department wants to test the load bearing capacity of an old bridge which must be more than 10 tons, in that case he can state his hypotheses as under: 277
Take another example. The average score in an aptitude test administered at the national level is 80.To evaluate a state’s education system, the average score of 100 of the state’s students selected on random basis was 75. The state wants to know if there is a significant difference between the local scores and the national scores. In such a situation the hypotheses may be stated as under: The formulation of hypotheses is an important step which must be accomplished with due care in accordance with the object and nature of the problem under consideration. It also indicates whether we should use a one-tailed test or a two-tailed test. If Ha is of the type greater than (or of the type lesser than), we use a one-tailed test, but when Ha is of the type “whether greater or smaller” then we use a two-tailed test. (ii) Selecting a significance level: The hypotheses are tested on a pre-determined level of significance and as such the same should be specified. Generally, in practice, either 5% level or 1% level is adopted for the purpose. The factors that affect the level of significance are: (a) the magnitude of the difference between sample means; (b) the size of the samples; (c) the variability of measurements within samples; and (d) whether the hypothesis is directional or non- directional (A directional hypothesises one which predicts the direction of the difference between, say, means). In brief, the level of significance must be adequate in the context of the purpose and nature of enquiry (iii) Deciding the distribution to use: After deciding the level of significance, the next step in hypothesis testing is to determine the appropriate sampling distribution. The choice generally remains between normal distribution and the t-distribution. The rules for selecting the correct distribution are similar to those which we have stated earlier in the context of estimation. (iv) Selecting a random sample and computing an appropriate value: Another step is to selecta random sample(s) and compute an appropriate value from the sample data concerning the test statistic utilizing the relevant distribution. In other words, draw a sample to furnish empirical data. (v) Calculation of the probability: One has then to calculate the probability that the sample result would diverge as widely as it has from expectations, if the null hypothesis were in fact true. (vi) Comparing the probability: 278
Yet another step consists in comparing the probability thus calculated with the specified value for , the significance level. If the calculated probability is equal to or smaller than the value in case of one-tailed test (and /2 in case of two-tailed test), then reject the null hypothesis (i.e., accept the alternative hypothesis), but if the calculated probability is greater, then accept the null hypothesis. In case we reject H0, we run a risk of (at most the level of significance) committing an error of Type I, but if we accept H0, then we run some risk (the size of which cannot be specified as long as the H0 happens to be vague rather than specific) of committing an error of Type II. 14.4 FLOW DIAGRAM FOR HYPOTHESIS TESTING The above stated general procedure for hypothesis testing can also be depicted in the from of a flow-chart. Fig 14.1 Flow diagram for Hypothesis testing 279
14.5 TESTS OF HYPOTHESES As has been stated above that hypothesis testing determines the validity of the assumption (technically described as null hypothesis) with a view to choose between two conflicting hypotheses about the value of a population parameter. Hypothesis testing helps to decide on the basis of a sample data, whether a hypothesis about the population is likely to be true or false. Statisticians have developed several tests of hypotheses (also known as the tests of significance) for the purpose of testing of hypotheses which can be classified as: (a) Parametric tests or standard tests of hypotheses; and (b) Non-parametric tests or distribution-free test of hypotheses. Parametric tests usually assume certain properties of the parent population from which we draw samples. Assumptions like observations come from a normal population, sample size is large, assumptions about the population parameters like mean, variance, etc., must hold good before parametric tests can be used. But there are situations when the researcher cannot or does not want to make such assumptions. In such situations we use statistical methods for testing hypotheses which are called non-parametric tests because such tests do not depend on any assumption about the parameters of the parent population. Besides, most non-parametric tests assume only nominal or ordinal data, whereas parametric tests require measurement equivalent to at least an interval scale. As a result, non-parametric tests need more observations than parametric tests to achieve the same size of Type I and Type II errors. 14.6 PARAMETRIC TESTS The important parametric tests are: All these tests are based on the assumption of normality i.e., the source of data is considered to be normally distributed. In some cases the population may not be normally distributed, yet the tests will be applicable on account of the fact that we mostly deal with samples and the sampling distributions closely approach normal distributions. 14.6.1 z-test The z-test is a statistical test that is based on the normal probability distribution and is used to determine the significance of a variety of statistical measures, most notably the mean. The relevant test statistic, z, is calculated and compared to its likely value (which can be found in the table depicting the area under the normal curve) at a given point in time. For judging the significance of the measure in question, a specified level of significance is used. In research studies, this is the most commonly used test. Even when a binomial or t- 280
distribution is applicable, this test is used on the assumption that such a distribution tends to approximate normal. As ‘n' grows larger, so does the distribution. In the case of a large sample or when population variance is known, the z-test is commonly used to compare the mean of a sample to some hypothesised mean for the population. The z-test can also be used to determine the significance of a difference in the means of two independent variables. In the case of large samples or when population variance is known, samples are used. When n is large, the z-test is also used to compare the sample proportion to a theoretical value of population proportion or to judge the difference in proportions between two independent samples. This test can also be used to determine the significance of measures like median, mode, and coefficient of correlation, among others. 14.6.2 t-test The t-test is based on the t-distribution and is considered an appropriate test for judging the significance of a sample mean or a difference in the means of two samples in the case of small sample(s) when population variance is unknown (in which case we use sample variance as an estimate of population variance). When two samples are related, we use the paired t-test (also known as the difference test) to determine the significance of the mean difference between them. It can also be used to determine the significance of simple and partial correlation coefficients. The relevant test statistic, t, is calculated from the sample data and then compared to its probable value based on the t-distribution (to be read from the table that gives probable values of t for different levels of significance for different degrees of freedom) for accepting or rejecting the null hypothesis at a specified level of significance for concerning degrees of freedom. It should be noted that the t-test is only applicable when the population variance is unknown, and the sample size is small. 14.6.3 Chi square test Chi square test is based on chi-square distribution and as a parametric test is used for comparing a sample variance to a theoretical population variance. 14.6.4 F-test The F-test compares the variance of two independent samples and is based on the F- distribution. This test is also used in conjunction with analysis of variance (ANOVA) to determine the significance of multiple samples means at the same time. It's also used to determine whether multiple correlation coefficients are significant. For accepting or rejecting the null hypothesis, the test statistic, F, is calculated and compared to its probable value (as shown in the F-ratio tables for different degrees of freedom for greater and smaller variances at specified level of significance). 281
14.7. NON-PARAMETRIC TESTS 14.7.1 Kruskal Wallis Test (or H test) This test is carried out in a similar manner to the U test described earlier. This test is used to compare the null hypothesis that all ‘k' independent random samples come from identical universes to the alternative hypothesis that their means are not equal. This test is similar to one-way analysis of variance, but it does not require the assumption that the samples come from approximately normal populations or that the universes have the same standard deviation as the latter. The data are ranked together from low to high or high to low as if they were a single sample in this test, similar to the U test. For this test, the test statistic is H, which is calculated as follows: 14.7.2 Sign Test The sign test is one of the easiest parametric tests. Its name comes from the fact that it is based on the direction of the plus or minus signs of observations in a sample and not on their numerical magnitudes. The sign test may be one of the following two types: (a) One sample sign test. (b) Two sample sign tests. The one sample sign test is a very simple non-parametric test applicable when we sample a continuous symmetrical population in which case the probability of getting a sample value less than mean is 1/2 and the probability of getting a sample value greater than mean is also 1/2. To test the null hypothesis against an appropriate alternative on the basis of a random sample of size ‘n’, we replace the value of each and every item of the sample with a plus (+) sign if it is greater than mH0, and with a minus (–) sign if it is less than . But if the value happens to be equal to , then we simply discard it. After doing this, we test the null hypothesis that these + and – signs are values of a random variable, having a binomial distribution with p = 1/2. For performing one sample sign test when the sample is small, we can use tables of binomial probabilities, but when sample happens to be large, we use normal approximation to binomial distribution. Let us take an illustration to apply one sample sign test. 282
14.7.3 Fisher-Irwin Test The Fisher-Irwin test is a distribution-free test for determining whether there is no difference between two sets of data. It's used to see if two ostensibly different treatments are actually different in terms of the outcomes they produce, for example. Assume that the management of a business unit has developed a new training programme that is now ready to be tested against the old training program's performance. 14.7.4 McNemer Test When the data are nominal and relate to two related samples, the McNemer test is one of the most important nonparametric tests. As a result, this test is particularly useful when measuring the same subjects before and after. The experiment is set up in such a way that the subjects are initially divided into equal groups based on their favourable and unfavourable attitudes toward, say, any system. Following treatment, the same number of subjects are asked to express their opinions on the given system, indicating whether they support it or not. 14.7.5 Wilcoxon Signed-Rank Test When we can determine both the direction and magnitude of difference between matched values in the context of two-related samples (i.e., case of matched paires such as a study where husband and wife are matched, when we compare the output of two similar machines, or when some subjects are studied in the context of a before-after experiment), we can use an important non-param. When using this test, we first find the differences (di) between each pair of values and rank the differences from smallest to largest, regardless of sign. The test statistic T, which is the smaller of the two sums, the sum of the negative ranks and the sum of the positive ranks, is calculated after the actual signs of each difference are assigned to corresponding ranks. We may encounter two types of tie situations while using this test. When the two values of some matched pair(s) are equal, i.e., the difference between values is zero, we drop the pair(s) from our calculations. When two or more pairs have the same difference value, we assign ranks to them by averaging their rank positions. For example, if two pairs have a rank score of 5, we assign each pair a rank of 5.5, i.e. (5 + 6)/2 = 5.5, and the next largest difference is ranked as 7. 14.7.4 Mann Whitney U Test Among the rank sum tests, this is a very popular one. This test is used to see if two independent samples from the same population are the same. It incorporates more data than the sign or Fisher-Irwin tests. This test can be used in a variety of situations and only requires that the populations being sampled be continuous. In practise, however, even breaking this assumption has little impact on the outcome. To run this test, we first group the data together, treating them as if they were all from the same sample, and then rank them in increasing or decreasing order of magnitude. We usually use a low-to-high ranking system, in which we assign rank 1 to the lowest-valued item, rank 2 to the next higher-valued item, and so on. If 283
there are any ties, we will give each of the tied observations the mean of the ranks that they share. If the sixth, seventh, and eighth values are all the same, we would give each of them the rank (6 + 7 + 8)/3 = 7. After that, we calculate the sum of the ranks assigned to the values of the first sample (which we refer to as R1), as well as the sum of the ranks assigned to the values of the second sample (which we refer to as R2) (and call it R2). Then we calculate the test statistic, U, which is a measurement of the difference between the two samples' ranked observations as follows: 14.8. LIMITATIONS OF THE TESTS OF HYPOTHESES We've just gone over a few key tests that are frequently used to test hypotheses on which important decisions are made. However, there are several limitations to the aforementioned tests that a researcher should always be aware of. The following are significant limitations: (i) The tests should not be administered mechanically. It's important to remember that testing isn't a substitute for decision-making; tests are merely useful tools for decision-making. As a result, “proper statistical evidence interpretation is critical to making intelligent decisions.” (ii) Test do not explain the reasons as to why does the difference exist, say between the means of the two samples. They simply indicate whether the difference is due to sampling fluctuations or other factors, but they don't tell us what the other factor(s) are that are causing the difference. (iii) Because significance test results are based on probabilities, they cannot be expressed with absolute certainty. When a test reveals a statistically significant difference, it simply means that the difference is unlikely to be due to chance. (iv) Statistical inferences based on the significance tests cannot be said to be entirely correct evidence concerning the truth of the hypotheses. This is especially true in the case of small samples, where the likelihood of making erroneous inferences is higher. The size of the samples should be sufficiently increased for greater reliability. All of these drawbacks suggest that in the case of statistical significance, inference techniques (or tests) must be combined with adequate subject-matter knowledge and the ability to make sound decisions. 14.9 SUMMARY A hypothesis may be defined as a proposition, or a set of propositions set forth as an explanation for the occurrence of some specified group of phenomena either asserted 284
merely as a provisional conjecture to guide some investigation or accepted as highly probable in the light of established facts. Alternative hypothesis is usually the one which one wishes to prove and the null hypothesis is the one which one wishes to disprove. Thus, a null hypothesis represents the hypothesis we are trying to reject, and alternative hypothesis represents all other possibilities. If the rejection of a certain hypothesis when it is actually true involves great risk, it is taken as null hypothesis because then the probability of rejecting it when it is true is a (the level of significance) which is chosen very small. Null hypothesis should always be specific hypothesis i.e., it should not state about or approximately a certain value In the context of testing of hypotheses, there are basically two types of errors we can make. We may reject H0 when H0 is true and we may accept H0 when in fact H0 is not true. The former is known as Type I error and the latter as Type II error The various steps involved in hypothesis testing are stated below: Making a formal Selecting a significance level Deciding the distribution to use Selecting a random sample and computing an appropriate value Calculation of the probability Comparing the probability 14.10 KEYWORDS Hypothesis: A hypothesis may be defined as a proposition, or a set of propositions set forth as an explanation for the occurrence of some specified group of phenomena either asserted merely as a provisional conjecture to guide some investigation or accepted as highly probable in the light of established facts Null and Alternate Hypothesis: . If we are to compare method A with method B about its superiority and if we proceed on the assumption that both methods are equally good, then this assumption is termed as the null hypothesis. The null hypothesis is generally symbolized as H0 and the alternative hypothesis as Ha Bivariate correlation analysis: Bivariate correlation analysis a statistical technique to assess the relationship of two continuous variables measured on an interval or ratio scale. Cluster sampling: Cluster sampling a sampling plan that involves dividing the population into subgroups and then draws a sample from each subgroup, a single- stage or multistage design. 285
Clustering: Clustering a technique that assigns each data record to a group or segment automatically by clustering algorithms that identify the similar characteristics in the data set and then partition them into groups. Voice recognition: Voice recognition computer systems programmed to record verbal answers to questions. 14.11LEARNING ACTIVITY 1. A researcher plans to ask employees whether they favor, oppose,or are indifferent about a change in the company retirementprogram. Formulate a hypothesis for a chi-square test and theway the variable would be created. 14.12UNIT END QUESTIONS A. Descriptive Questions Short Questions 1. List out the different tests of significance 2. Discuss about chi-square test. 3. Which test compares the variance of two independent samples? 4. Explain about Mann Whitney U Test 5. Mention the types of sign test. Long Questions 1. Classify the different types of parametric tests. 2. Discussthe basic concepts concerning testing of Hypothesis. 3. Explain the procedure for Hypothesis testing 4. Explain the limitation of the tests of Hypothesis 5. Summarize non-parametric tests in detail B. Multiple Choice Questions 1. What is the percentage of test of significance that the hypothesis testing generally used? a. 3% b. 2% c. 5% d. 10% 2. What is the percentage of test of significance that pharmaceutical companies agrees for drugs and medicines? a. 3% b. 2% 286
c. 5% d. 1% 3. Procedure for hypothesis testing refers to all those steps that we undertake for making a choice between the two actions. a. Acceptance of null hypothesis b. rejection and acceptance of a null hypothesis c. rejection of alternate hypothesis d. Acceptance of Null hypothesis 4. _____usually assume certain properties of the parent population from which we draw samples. a. Parametric Test b. Non-Parametric Test c. Sign Test d. Rank Test 5. There are situations when the researcher cannot or does not want to make such assumption where we use_____ a. Parametric Test b. Non-Parametric Test c. Sign Test d. Rank Test Answers 1-c, 2-d, 3-a, 4-b, 5-a 14.13 REFERENCES References book R1, Business Research Methods – Alan Bryman& Emma Bell, Oxford University Press. R2, Research Methodology - C.R. Kothari R2, Statistics for Managers Using Microsoft Excel, Levine Stephan, Krehbiel Berenson Textbook references T1,SPSSExplained,ISBN:9780415274104,Publisher:TataMcgrawHill T2, Sancheti&Kapoor,BusinessMathematics,SultanChand,NewDelhi 287
Search
Read the Text Version
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
- 213
- 214
- 215
- 216
- 217
- 218
- 219
- 220
- 221
- 222
- 223
- 224
- 225
- 226
- 227
- 228
- 229
- 230
- 231
- 232
- 233
- 234
- 235
- 236
- 237
- 238
- 239
- 240
- 241
- 242
- 243
- 244
- 245
- 246
- 247
- 248
- 249
- 250
- 251
- 252
- 253
- 254
- 255
- 256
- 257
- 258
- 259
- 260
- 261
- 262
- 263
- 264
- 265
- 266
- 267
- 268
- 269
- 270
- 271
- 272
- 273
- 274
- 275
- 276
- 277
- 278
- 279
- 280
- 281
- 282
- 283
- 284
- 285
- 286
- 287