Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore MBA610 CU- MBA-Sem 2- MBA610 -Business Research Methods-converted-converted

MBA610 CU- MBA-Sem 2- MBA610 -Business Research Methods-converted-converted

Published by Teamlease Edtech Ltd (Amita Chitroda), 2021-04-19 08:20:40

Description: MBA610 CU- MBA-Sem 2- MBA610 -Business Research Methods-converted-converted

Search

Read the Text Version

Although simple random sampling is intended to be an unbiased approach to surveying, sample selection bias can occur. When a sample set of the larger population is not inclusive enough, representation of the full population is skewed and requires additional sampling techniques. SYSTEMATIC SAMPLE Systematic sampling is a type of probability sampling method in which sample members from a larger population are selected according to a random starting point but with a fixed, periodic interval. This interval, called the sampling interval, is calculated by dividing the population size by the desired sample size. Despite the sample population being selected in advance, systematic sampling is still thought of as being random if the periodic interval is determined beforehand and the starting point is random. How Systematic Sampling Works Since simple random sampling of a population can be inefficient and time-consuming, statisticians turn to other methods, such as systematic sampling. Choosing a sample size through a systematic approach can be done quickly. Once a fixed starting point has been identified, a constant interval is selected to facilitate participant selection. Systematic sampling is preferable to simple random sampling when there is a low risk of data manipulation. If such a risk is high when a researcher can manipulate the interval length to obtain desired results, a simple random sampling technique would be more appropriate. Systematic sampling is popular with researchers and analysts because of its simplicity. Researchers generally assume the results are representative of most normal populations unless a random characteristic disproportionately exists with every \"nth\" data sample (which is unlikely). In other words, a population needs to exhibit a natural degree of randomness along the chosen metric. If the population has a type of standardized pattern, the risk of accidentally choosing very common cases is more apparent. Within systematic sampling, as with other sampling methods, a target population must be selected prior to selecting participants. A population can be identified based on any number of desired characteristics that suit the purpose of the study being conducted. Some selection criteria may include age, gender, race, location, education level and/or profession. 150 CU IDOL SELF LEARNING MATERIAL (SLM)

Advantages of systematic sampling Easy to Execute and Understand Systematic samples are relatively easy to construct, execute, compare, and understand. This is particularly important for studies or surveys that operate with tight budget constraints. Control and Sense of Process A systematic method also provides researchers and statisticians with a degree of control and sense of process. This might be particularly beneficial for studies with strict parameters or a narrowly formed hypothesis, assuming the sampling is reasonably constructed to fit certain parameters. Clustered Selection Eliminated Clustered selection, a phenomenon in which randomly chosen samples are uncommonly close together in a population, is eliminated in systematic sampling. Random samples can only deal with this by increasing the number of samples or running more than one survey. These can be expensive alternatives. Low Risk Factor Perhaps the greatest strength of a systematic approach is its low risk factor. The primary potential disadvantages of the system carry a distinctly low probability of contaminating the data STRATIFIED RANDOM SAMPLE Stratified random sampling is a method of sampling that involves the division of a population into smaller sub-groups known as strata. In stratified random sampling, or stratification, the strata are formed based on members' shared attributes or characteristics such as income or educational attainment. Stratified random sampling is also called proportional random sampling or quota random sampling. How Stratified Random Sampling Works When completing analysis or research on a group of entities with similar characteristics, a researcher may find that the population size is too large for which to complete research. To save time and money, an analyst may take on a more feasible approach by selecting a small 151 CU IDOL SELF LEARNING MATERIAL (SLM)

group from the population. The small group is referred to as a sample size, which is a subset of the population that is used to represent the entire population. A sample may be selected from a population through a number of ways, one of which is the stratified random sampling method. A stratified random sampling involves dividing the entire population into homogeneous groups called strata (plural for stratum). Random samples are then selected from each stratum. For example, consider an academic researcher who would like to know the number of MBA students in 2007 who received a job offer within three months of graduation. He will soon find that there were almost 200,000 MBA graduates for the year. He might decide to just take a simple random sample of 50,000 graduates and run a survey. Better still, he could divide the population into strata and take a random sample from the strata. To do this, he would create population groups based on gender, age range, race, country of nationality, and career background. A random sample from each stratum is taken in a number proportional to the stratum's size when compared to the population. These subsets of the strata are then pooled to form a random sample. Example of Stratified Random Sampling Suppose a research team wants to determine the GPA of college students across the U.S. The research team has difficulty collecting data from all 21 million college students; it decides to take a random sample of the population by using 4,000 students. Now assume that the team looks at the different attributes of the sample participants and wonders if there are any differences in GPAs and students’ majors. Suppose it finds that 560 students are English majors, 1,135 are science majors, 800 are computer science majors, 1,090 are engineering majors, and 415 are math majors. The team wants to use a proportional stratified random sample where the stratum of the sample is proportional to the random sample in the population. Assume the team researches the demographics of college students in the U.S and finds the percentage of what students major in: 12% major in English, 28% major in science, 24% major in computer science, 21% major in engineering, and 15% major in mathematics. Thus, five strata are created from the stratified random sampling process. The team then needs to confirm that the stratum of the population is in proportion to the stratum in the sample; however, they find the proportions are not equal. The team then needs to re-sample 4,000 students from the population and randomly select 480 English, 1,120 science, 960 computer science, 840 engineering, and 600 mathematics students. 152 CU IDOL SELF LEARNING MATERIAL (SLM)

With those, it has a proportionate stratified random sample of college students, which provides a better representation of students' college majors in the U.S. The researchers can then highlight specific stratum, observe the varying studies of U.S. college students and observe the various grade point averages. Advantages of stratified sampling Accurately Reflects Population Studied Stratified random sampling accurately reflects the population being studied because researchers are stratifying the entire population before applying random sampling methods. In short, it ensures each subgroup within the population receives proper representation within the sample. As a result, stratified random sampling provides better coverage of the population since the researchers have control over the subgroups to ensure all of them are represented in the sampling. With simple random sampling, there isn't any guarantee that any particular subgroup or type of person is chosen. In our earlier example of the university students, using simple random sampling to procure a sample of 100 from the population might result in the selection of only 25 male undergraduates or only 25% of the total population. Also, 35 female graduate students might be selected (35% of the population) resulting in under-representation of male undergraduates and over-representation of female graduate students. Any errors in the representation of the population have the potential to diminish the accuracy of thestudy. Disadvantages of Stratified sampling Can't Be Used in All Studies Unfortunately, this method of research cannot be used in every study. The method's disadvantage is that several conditions must be met for it to be used properly. Researchers must identify every member of a population being studied and classify each of them into one, and only one, subpopulation. As a result, stratified random sampling is disadvantageous when researchers can't confidently classify every member of the population into a subgroup. Also, finding an exhaustive and definitive list of an entire population can be challenging. Overlapping can be an issue if there are subjects that fall into multiple subgroups. When simple random sampling is performed, those who are in multiple subgroups are more likely to be chosen. The result could be a misrepresentation or inaccurate reflection of the population. The above example makes it easy: Undergraduate, graduate, male, and female are clearly 153 CU IDOL SELF LEARNING MATERIAL (SLM)

defined groups. In other situations, however, it might be far more difficult. Imagine incorporating characteristics such as race, ethnicity, or religion. The sorting process becomes more difficult, rendering stratified random sampling an ineffective and less than ideal method. CLUSTER SAMPLE & MULTI-STAGE SAMPLING Cluster Sampling Cluster sampling is a sampling plan used when mutually homogeneous yet internally heterogeneous groupings are evident in a statistical population. It is often used in marketing research. In this sampling plan, the total population is divided into these groups (known as clusters) and a simple random sample of the groups is selected. The elements in each cluster are then sampled. If all elements in each sampled cluster are sampled, then this is referred to as a \"one-stage\" cluster sampling plan. If a simple random subsample of elements is selected within each of these groups, this is referred to as a \"two-stage\" cluster sampling plan. A common motivation for cluster sampling is to reduce the total number of interviews and costs given the desired accuracy. For a fixed sample size, the expected random error is smaller when most of the variation in the population is present internally within the groups, and not between the groups. Figure 9.1 154 CU IDOL SELF LEARNING MATERIAL (SLM)

Cluster elements The population within a cluster should ideally be as heterogeneous as possible, but there should be homogeneity between clusters. Each cluster should be a small-scale representation of the total population. The clusters should be mutually exclusive and collectively exhaustive. A random sampling technique is then used on any relevant clusters to choose which clusters to include in the study. In single-stage cluster sampling, all the elements from each of the selected clusters are sampled. In two-stage cluster sampling, a random sampling technique is applied to the elements from each of the selected clusters. The main difference between cluster sampling and stratified sampling is that in cluster sampling the cluster is treated as the sampling unit so sampling is done on a population of clusters (at least in the first stage). In stratified sampling, the sampling is done on elements within each stratum. In stratified sampling, a random sample is drawn from each of the strata, whereas in cluster sampling only the selected clusters are sampled. A common motivation of cluster sampling is to reduce costs by increasing sampling efficiency. This contrasts with stratified sampling where the motivation is to increase precision. There is also multistage cluster sampling, where at least two stages are taken in selecting elements from clusters. When clusters are of different sizes Without modifying the estimated parameter, cluster sampling is unbiased when the clusters are approximately the same size. In this case, the parameter is computed by combining all the selected clusters. When the clusters are of different sizes there are severaloptions: One method is to sample clusters and then survey all elements in that cluster. Another method is a two-stage method of sampling a fixed proportion of units (be it 5% or 50%, or another number, depending on cost considerations) from within each of the selected clusters. Relying on the sample drawn from these options will yield an unbiased estimator. However, the sample size is no longer fixed upfront. This leads to a more complicated formula for the standard error of the estimator, as well as issues with the optics of the study plan (since the power analysis and the cost estimations often relate to a specific sample size). A third possible solution is to use probability proportionate to size sampling. In this sampling plan, the probability of selecting a cluster is proportional to its size, so that a large cluster has a greater probability of selection than a small cluster. The advantage here is that when clusters are selected with probability proportionate to size, the same number of interviews should be carried out in each sampled cluster so that each unit sampled has the 155 CU IDOL SELF LEARNING MATERIAL (SLM)

same probability of selection. Applications of cluster sampling An example of cluster sampling is area sampling or geographical cluster sampling. Each cluster is a geographical area. Because a geographically dispersed population can be expensive to survey, greater economy than simple random sampling can be achieved by grouping several respondents within a local area into a cluster. It is usually necessary to increase the total sample size to achieve equivalent precision in the estimators, but cost savings may make such an increase in sample size feasible. Cluster sampling is used to estimate high mortalities in cases such as wars, famines and natural disasters. Advantages Can be cheaper than other sampling plans – e.g., fewer travel expenses, administration costs. Feasibility: This sampling plan takes large populations into account. Since these groups are so large, deploying any other sampling plan would be very costly. Economy: The regular two major concerns of expenditure, i.e., traveling and listing, are greatly reduced in this method. For example: Compiling research information about every household in a city would be very costly, whereas compiling information about various blocks of the city will be more economical. Here, traveling as well as listing efforts will be greatly reduced. Reduced variability: in the rare case of a negative intraclass correlation between subjects within a cluster, the estimators produced by cluster sampling will yield more accurate estimates than data obtained from a simple random sample (i.e. the design effect will be smaller than 1). This is not a common place scenario. Major use: when the sampling frame of all elements is not available we can resort only to the cluster sampling. Disadvantages Higher sampling error, which can be expressed by the design effect: the ratio between the variance of an estimator made from the samples of the cluster study and the variance of an estimator obtained from a sample of subjects in an equally reliable, randomly sampled un- clustered study. The larger the intraclass correlation is between subjects within a cluster the worse the design effect becomes (i.e. the larger it gets from 1. Indicating a larger expected 156 CU IDOL SELF LEARNING MATERIAL (SLM)

increase in the variance of the estimator). In other words, the more there is heterogeneity between clusters and more homogeneity between subjects within a cluster, the less accurate are our estimators become. This is because in such cases we are better off sampling as many clusters as we can and making do with a small sample of subjects from within each cluster (i.e. two stage cluster sampling). Complexity- Cluster sampling is more sophisticated and requires more attention with how to plan and on how to analyze (i.e.: to take into account the weights of subjects during the estimation of parameters, confidence intervals, etc. n statistics, multistage sampling is the taking of samples in stages using smaller and smaller sampling units at each stage. Multistage sampling Multistage sampling can be a complex form of cluster sampling because it is a type of sampling which involves dividing the population into groups (or clusters). Then, one or more clusters are chosen at random and everyone within the chosen cluster issampled. Using all the sample elements in all the selected clusters may be prohibitively expensive or unnecessary. Under these circumstances, multistage cluster sampling becomes useful. Instead of using all the elements contained in the selected clusters, the researcher randomly selects elements from each cluster. Constructing the clusters is the first stage. Deciding what elements within the cluster to use is the second stage. The technique is used frequently when a complete list of all members of the population does not exist and is inappropriate. In some cases, several levels of cluster selection may be applied before the final sample elements are reached. For example, household surveys conducted by the Australian Bureau of Statistics begin by dividing metropolitan regions into 'collection districts' and selecting some of these collection districts (first stage). The selected collection districts are then divided into blocks, and blocks are chosen from within each selected collection district (second stage). Next, dwellings are listed within each selected block, and some of these dwellings are selected (third stage). This method makes it unnecessary to create a list of every dwelling in the region and necessary only for selected blocks. In remote areas, an additional stage of clustering is used, in order to reduce travel requirements. Although cluster sampling and stratified sampling bear some superficial similarities, they are substantially different. In stratified sampling, a random sample is drawn from all the strata, where in cluster sampling only the selected clusters are studied, either in single- or multi- stage. Advantages 157 CU IDOL SELF LEARNING MATERIAL (SLM)

Cost and speed that the survey can be done in Convenience of finding the survey sample Normally more accurate than cluster sampling for the same size sample Disadvantages Not as accurate as Simple Random Sample[ambiguous] if the sample is the same size More testing is difficult to do DETERMINING SIZE OF THE SAMPLE – PRACTICAL CONSIDERATIONS IN SAMPLING AND SAMPLE SIZE. Determining sample size is a very important issue because samples that are too large may waste time, resources and money, while samples that are too small may lead to inaccurate results. In many cases, we can easily determine the minimum sample size needed to estimate a process parameter, such as the population mean. Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is determined based on the expense of data collection, and the need to have sufficient statistical power. In complicated studies there may be several different sample sizes involved in the study: for example, in a stratified survey there would be different sample sizes for each stratum. In a census, data are collected on the entire population, hence the sample size is equal to the population size. In experimental design, where a study may be divided into different treatment groups, this may be different sample sizes for each group. Sample sizes may be chosen in several different ways: • Experience – A choice of small sample sizes, though sometimes necessary, can result in wide confidence intervals or risks of errors in statistical hypothesis testing. • Using a target variance for an estimate to be derived from the sample eventually obtained, i.e. if a high precision is required (narrow confidence interval) this translates to a low target variance of the estimator. 158 CU IDOL SELF LEARNING MATERIAL (SLM)

• Using a target for the power of a statistical test to be applied once the sample is collected. • Using a confidence level, i.e. the larger the required confidence level, the larger the sample size (given a constant precision requirement). When sample data is collected and the sample mean is calculated, that sample mean is typically different from the population mean (µ). This difference between the sample and population means can be thought of as an error. The margin of error is the maximum difference between the observed sample mean and the true value of the population mean (µ): where: is known as the critical value, the positive Ζ value that is at the vertical boundary for the area of in the right tail of the standard normal distribution. o is the population standard deviation. n is the sample size. Rearranging this formula, we can solve for the sample size necessary to produce results accurate to a specified confidence and margin of error. 159 CU IDOL SELF LEARNING MATERIAL (SLM)

This formula can be used when you know and want to determine the sample size necessary to establish, with a confidence of, the mean value to within You can still use this formula if you don’t know your population standard deviation and you have a small sample size. Although it’s unlikely that you know when the population mean is not known, you may be able to determine from a similar process or from a pilot test/simulation. SUMMARY Probability sampling is defined as a sampling technique in which the researcher chooses samples from a larger population using a method based on the theory of probability. For a participant to be considered as a probability sample, he/she must be selected using a random selection. Simple random sampling, as the name suggests, is an entirely random method of selecting the sample. This sampling method is as easy as assigning numbers to the individuals (sample) and then randomly choosing from those numbers through an automated process. Finally, the numbers that are chosen are the members that are included in the sample. There are two ways in which researchers choose the samples in this method of sampling: The lottery system and using number generating software/ random number table. This sampling technique usually works around a large population and has its fair share of advantages and disadvantages. Stratified random sampling involves a method where the researcher divides a more extensive population into smaller groups that usually don’t overlap but represent the entire population. While sampling, organize these groups and then draw a sample from each group separately. A standard method is to arrange or classify by sex, age, ethnicity, and similar ways. Splitting subjects into mutually exclusive groups and then using simple random sampling to choose members from groups. Random cluster sampling is a way to select participants randomly that are spread out geographically. For example, if you wanted to choose 100 participants from the entire population of the U.S., it is likely impossible to get a complete list of everyone. Instead, the researcher randomly selects areas (i.e., cities or counties) and randomly selects from within 160 CU IDOL SELF LEARNING MATERIAL (SLM)

those boundaries. Systematic sampling is when you choose every “nth” individual to be a part of the sample. For example, you can select every 5th person to be in the sample. Systematic sampling is an extended implementation of the same old probability technique in which each member of the group is selected at regular periods to form a sample. KEY WORDS/ABBREVIATIONS • Population: In statistics, a population is a set of similar items or events which is of interest for some question or experiment. • Simple random sampling: In statistics, a simple random sample is a subset of individuals chosen from a larger set. Each individual is chosen randomly and entirely by chance, such that each individual has the same probability of being chosen at any stage during the sampling process. • Sampling: In statistics, quality assurance, and survey methodology, sampling is the selection of a subset of individuals from within a statistical population to estimate characteristics of the whole population. • Individuals: An individual is that which exists as a distinct entity. Individuality is the state or quality of being an individual; particularly of being a person separate from other people and possessing their own needs or goals, rights and responsibilities • Exchangeability: In statistics, an exchangeable sequence of random variables is a sequence X1, X2, X3, ... whose joint probability distribution does not change when the positions in the sequence in which finitely many of them appear are altered. LEARNING ACTIVITY 1. Write and discuss the application of Cluster analysis in real life. 2. Discuss when and how to use Random sampling. 161 CU IDOL SELF LEARNING MATERIAL (SLM)

UNIT END QUESTIONS (MCQ AND DESCRIPTIVE) A. Descriptive Types Questions 1. Define Simple Random Sample. 2. Define Systematic Sample. 3. Discuss Stratified Random Sample along with advantages and disadvantages. 4. Differentiate between Cluster Sample & Multi-stage sampling. 5. Discuss practical considerations in sampling and sample size. B. Multiple Choice Questions 1. Which of the following is NOT part of the sampling design process? a. Defining of the population of the study. b. Specifying the sampling unit. c. Selection of the sampling technique. d. Refining the research question. 2. Which of the following is NOT true of probability sampling? a. Sampling units are selected by chance as opposed to the judgement of the researcher. b. Estimates are statistically projectable to the population. c. The number of elements to be included in the sample set can be pre-specified. d. The results will always be more accurate than non-probability sampling. 3. What is the least expensive and least time-consuming of all sampling techniques? a. Judgmental sampling. b. Simple random sampling. c. Snowball sampling. d. Convenience sampling. 4. What are the distinguishing features of simple random sampling? a. Each element in the population has a known and equal probability of selection. 162 CU IDOL SELF LEARNING MATERIAL (SLM)

b. A sampling frame must be compiled in which each element has a unique identification number. c. Random numbers determine which elements are included in the sample. d. All of these 5. Which of the following are NOT criteria for the selection of stratification variables in stratified sampling? a. The strata should be mutually exclusive and collectively exhaustive so that every population element should be assigned to one and only onestratum. b. Elements within a stratum should be as homogeneous aspossible. c. Across the strata, the elements should be as heterogeneous as possible. d. Stratification variables should be easy to measure and apply. Answer 1. d 2. d 3. d 4. d 5. d REFERENCES • Dodge, Y. (2003). The Oxford Dictionary of Statistical Terms. OUP. ISBN 0-19- 850994-4. • Baddeley, Adrian; Vedel Jensen, Eva B. (2004). Stereology for Statisticians. p. 334. • Sarndal; Swenson; Wretman (1992). Model Assisted Survey Sampling. Springer- Verlag. ISBN 0-387-40620-4. 163 CU IDOL SELF LEARNING MATERIAL (SLM)

UNIT 10: DATA COLLECTION Structure Learning Objectives Introduction Primary versus Secondary Data Collecting Primary Data through Interview Method Data collection through Questionnaires General form: Question sequence: Question formulation and wording: Essentials of a good questionnaire: Data collection through Schedules Case Study Method Summary Key Words/Abbreviations Learning Activity Unit End Questions (MCQ and Descriptive) References LEARNING OBJECTIVES After studying this unit, you will be able to: • Differentiate between Primary and Secondary Data • Explain collection of data through various modes like Questionnaires, Schedules and Interview 164 CU IDOL SELF LEARNING MATERIAL (SLM)

INTRODUCTION To carry out a research study, you have to collect the relevant data so that the hypotheses or generalizations you hold tentatively can be verified. This involves selection of samples from the population concerned. The increasingly complex nature of business and government has focused attention on the uses of research methodology in solving managerial problems. The credibility of the results derived from the application of such methodology is dependent upon the up to date information about the various pertinent characters included in the analysis. To illustrate, the demand of disc records has dropped dramatically after cassettes have entered into the market commercially. The government must be aware of the actual scenario of the acceptance of family planning before it can formulate any policy in this matter. The components of this scenario are provided by appropriate data to be collected from various families. In industrial disputes regarding wages, cost of living index, a data based indicator of inflation is often accepted as a guideline for arbitration. In short, neither a business decision nor a governmental decision can be made in a casual manner in the highly involved environment prevailing in this age. It is through appropriate data and their analysis that the decision maker becomes equipped with proper tools of decision making. PRIMARY VERSUS SECONDARY DATA Primary Data Primary data means the raw data (data without fabrication or not tailored data) which has just been collected from the source and has not gone any kind of statistical treatment like sorting and tabulation. The term primary data may sometimes be used to refer to firsthand information. Sources of Primary Data The sources of primary data are primary units such as basic experimental units, individuals, households. Following methods are used to collect data from primary units usually and these methods depends on the nature of the primary unit. Published data and the data collected in the past is called secondary data. Personal Investigation The researcher conducts the experiment or survey himself/herself and collected data from it. The collected data is generally accurate and reliable. This method of collecting primary data is feasible only in case of small scale laboratory, field 165 CU IDOL SELF LEARNING MATERIAL (SLM)

experiments or pilot surveys and is not practicable for large scale experiments and surveys because it take too much time. Through Investigators The trained (experienced) investigators are employed to collect the required data. In case of surveys, they contact the individuals and fill in the questionnaires after asking the required information, where a questionnaire is an inquiry form having a number of questions designed to obtain information from the respondents. This method of collecting data is usually employed by most of the organizations and its gives reasonably accurate information but it is very costly and may be time taking too. Through Questionnaire The required information (data) is obtained by sending a questionnaire (printed or soft form) to the selected individuals (respondents) (by mail) who fill in the questionnaire and return it to the investigator. This method is relatively cheap as compared to “through investigator” method but non-response rate is very high as most of the respondents don’t bother to fill in the questionnaire and send it back to investigator. Through Local Sources The local representatives or agents are asked to send requisite information who provide the information based upon their own experience. This method is quick but it gives rough estimates only. Through Telephone The information may be obtained by contacting the individuals on telephone. It’s a Quick and provide accurate required information. Through Internet With the introduction of information technology, the people may be contacted through internet and the individuals may be asked to provide the pertinent information. Google survey is widely used as online method for data collection now a day. There are many paid online survey services too. It is important to go through the primary data and locate any inconsistent observations before it is given a statistical treatment. Secondary Data Data which has already been collected by someone, may be sorted, tabulated and has 166 CU IDOL SELF LEARNING MATERIAL (SLM)

undergone a statistical treatment. It is fabricated or tailored data. Sources of Secondary Data The secondary data may be available from the following sources: • Government Organizations Federal and Provincial Bureau of Statistics, Crop Reporting Service-Agriculture Department, Census and Registration Organization etc • Semi-Government Organization Municipal committees, District Councils, Commercial and Financial Institutions like banks etc • Teaching and Research Organizations • Research Journals and Newspapers • Internet Comparison Chart BASIS FOR PRIMARY DATA SECONDARY DATA COMPARISON Meaning Primary data refers to the first Secondary data means data hand data gathered by the collected by someone else researcher himself. earlier. Data Real time data Past data 166 Process CU IDOL SELF LEARNINGMATERIAL (SLM) Very involved Quick and easy

Source Surveys, observations, Government publications, experiments, questionnaire, websites, books, journal personal interview, etc. articles, internal records etc. Cost effectiveness Expensive Economical Collection time Long Short Specific Always specific to the May or may not be specific 10. researcher's needs. to the researcher's need. 3 C Available in Crude form Refined form O LL Accuracy and Reliability More Relatively less EC TI N G PRIMARY DATA THROUGH INTERVIEW METHOD The interview method of collecting data involves presentation of oral-verbal stimuli and reply in terms of oral-verbal responses. This method can be used through personal interviews and, if possible, through telephone interviews. Personal interviews: Personal interview method requires a person known as the interviewer asking questions generally in a face-to-face contact to the other person or persons. (At times the interviewee may also ask certain questions and the interviewer responds to these, but usually the interviewer initiates the interview and collects the information.) This sort of interview may be in the form of direct personal investigation or it may be indirect oral investigation. In the case of direct personal investigation the interviewer has to collect the information personally from the sources concerned. He has to be on the spot and has to meet people from whom data have to be collected. This method is particularly suitable for intensive investigations. But in certain cases it may not be possible or worthwhile to contact 167 CU IDOL SELF LEARNING MATERIAL (SLM)

directly the persons concerned or on account of the extensive scope of enquiry, the direct personal investigation technique may not be used. In such cases an indirect oral examination can be conducted under which the interviewer has to cross-examine other persons who are supposed to have knowledge about the problem under investigation and the information, obtained is recorded. Most of the commissions and committees appointed by government to carry on investigations make use of this method. The method of collecting information through personal interviews is usually carried out in a structured way. As such we call the interviews as structured interviews. Such interviews involve the use of a set of predetermined questions and of highly standardized techniques of recording. Thus, the interviewer in a structured interview follows a rigid procedure laid down, asking questions in a form and order prescribed. As against it, the unstructured interviews are characterized by a flexibility of approach to questioning. Unstructured interviews do not follow a system of pre-determined questions and standardized techniques of recording information. In a non-structured interview, the interviewer is allowed much greater freedom to ask, in case of need, supplementary questions or at times he may omit certain questions if the situation so requires. He may even change the sequence of questions. He has relatively greater freedom while recording the responses to include some aspects and exclude others. But this sort of flexibility results in lack of comparability of one interview with another and the analysis of unstructured responses becomes much more difficult and time-consuming than that of the structured responses obtained in case of structured interviews. Unstructured interviews also demand deep knowledge and greater skill on the part of the interviewer. Unstructured interview, however, happens to be the central technique of collecting information in case of exploratory or formulative research studies. But in case of descriptive studies, we quite often use the technique of structured interview because of its being more economical, providing a safe basis for generalization and requiring relatively lesser skill on the part of the interviewer. We may as well talk about focused interview, clinical interview and the non-directive interview. Focused interview is meant to focus attention on the given experience of the respondent and its effects. Under it the interviewer has the freedom to decide the manner and sequence in which the questions would be asked and has also the freedom to explore reasons and motives. The main task of the interviewer in case of a focused interview is to confine the respondent to a discussion of issues with which he seeks conversance. Such interviews are used generally in the development of hypotheses and constitute a major type of unstructured interviews. The clinical interview is concerned with broad underlying feelings or motivations or with the course of individual’s life experience. The method of eliciting information under it is generally left to the interviewer’s discretion. In case of non-directive 168 CU IDOL SELF LEARNING MATERIAL (SLM)

interview, the interviewer’s function is simply to encourage the respondent to talk about the given topic with a bare minimum of direct questioning. The interviewer often acts as a catalyst to a comprehensive expression of the respondents’ feelings and beliefs and of the frame of reference within which such feelings and beliefs take on personal significance. Despite the variations in interview-techniques, the major advantages and weaknesses of personal interviews can be enumerated in a general way. The chief merits of the interview method are as follows: 1. More information and that too in greater depth can be obtained. 2. Interviewer by his own skill can overcome the resistance, if any, of the respondents; the interview method can be made to yield an almost perfect sample of the general population. 3. There is greater flexibility under this method as the opportunity to restructure questions is always there, especially in case of unstructured interviews. 4. Observation method can as well be applied to recording verbal answers to various questions. 5. Personal information can as well be obtained easily under this method. 6. Samples can be controlled more effectively as there arises no difficulty of the missing returns; non-response generally remains very low. 7. The interviewer can usually control which person(s) will answer the questions. This is not possible in mailed questionnaire approach. If so desired, group discussions may also be held. 8. The interviewer may catch the informant off-guard and thus may secure the most spontaneous reactions than would be the case if mailed questionnaire is used. 9. The language of the interview can be adopted to the ability or educational level of the person interviewed and as such misinterpretations concerning questions can be avoided. 10. The interviewer can collect supplementary information about the respondent’s personal characteristics and environment which is often of great value in interpreting results. 169 CU IDOL SELF LEARNING MATERIAL (SLM)

But there are also certain weaknesses of the interview method. Among the important weaknesses, mention may be made of the following: 1. It is a very expensive method, especially when large and widely spread geographical sample is taken. 2. There remains the possibility of the bias of interviewer as well as that of the respondent; there also remains the headache of supervision and control ofinterviewers. 3. Certain types of respondents such as important officials or executives or people in high income groups may not be easily approachable under this method and to that extent the data may prove inadequate. 4. This method is relatively more-time-consuming, especially when the sample is large and recalls upon the respondents are necessary. 5. The presence of the interviewer on the spot may over-stimulate the respondent, sometimes even to the extent that he may give imaginary information just to make the interview interesting. 6. Under the interview method the organization required for selecting, training and supervising the field-staff is more complex with formidable problems. 7. Interviewing at times may also introduce systematic errors. 8. Effective interview presupposes proper rapport with respondents that would facilitate free and frank responses. This is often a very difficult requirement. Pre-requisites and basic tenets of interviewing: For successful implementation of the interview method, interviewers should be carefully selected, trained and briefed. They should be honest, sincere, hardworking, impartial and must possess the technical competence and necessary practical experience. Occasional field checks should be made to ensure that interviewers are neither cheating, nor deviating from instructions given to them for performing their job efficiently. In addition, some provision should also be made in advance so that appropriate action may be taken if some of the selected respondents refuse to cooperate or are not available when an interviewer calls upon them. In fact, interviewing is an art governed by certain scientific principles. Every effort should be made to create friendly atmosphere of trust and confidence, so that respondents may feel at ease while talking to and discussing with the interviewer. The interviewer must ask questions properly and intelligently and must record the responses accurately and completely. At the same time, the interviewer must answer legitimate question(s), if any, 170 CU IDOL SELF LEARNING MATERIAL (SLM)

asked by the respondent and must clear any doubt that the latter has. The interviewers approach must be friendly, courteous, conversational and unbiased. The interviewer should not show surprise or disapproval of a respondent’s answer but he must keep the direction of interview in his own hand, discouraging irrelevant conversation and must make all possible effort to keep the respondent on the track. Telephone interviews: This method of collecting information consists in contacting respondents on telephone itself. It is not a very widely used method, but plays important part in industrial surveys, particularly in developed regions. The chief merits of such a system are: 1. It is more flexible in comparison to mailing method. 2. It is faster than other methods i.e., a quick way of obtaining information. 3. It is cheaper than personal interviewing method; here the cost per response is relatively low. 4. Recall is easy; callbacks are simple and economical. 5. There is a higher rate of response than what we have in mailing method; the non- response is generally very low. 6. Replies can be recorded without causing embarrassment to respondents. 7. Interviewer can explain requirements more easily. 8. At times, access can be gained to respondents who otherwise cannot be contacted for one reason or the other. 9. No field staff is required. 10. Representative and wider distribution of sample is possible. But this system of collecting information is not free from demerits. Some of these may be highlighted. 1.Little time is given to respondents for considered answers; interview period is not likely to exceed five minutes in most cases. 2. Surveys are restricted to respondents who have telephone facilities. 3. Extensive geographical coverage may get restricted by cost considerations. 171 CU IDOL SELF LEARNING MATERIAL (SLM)

4. It is not suitable for intensive surveys where comprehensive answers are required to various questions. 5. Possibility of the bias of the interviewer is relatively more. 6. Questions have to be short and to the point; probes are difficult to handle. DATA COLLECTION THROUGH QUESTIONNAIRES This method of data collection is quite popular, particularly in case of big enquiries. It is being adopted by private individuals, research workers, private and public organizations and even by governments. In this method a questionnaire is sent (usually by post) to the persons concerned with a request to answer the questions and return the questionnaire. A questionnaire consists of a number of questions printed or typed in a definite order on a form or set of forms. The questionnaire is mailed to respondents who are expected to read and learn the questions and write down the reply in the space meant for the purpose in the questionnaire itself. The respondents have to answer the questions on their own. The method of collecting data by mailing the questionnaires to respondents is most extensively employed in various economic and business surveys. The merits claimed on behalf of this method are as follows: 1. There is low cost even when the universe is large and is widely spread geographically. 2. It is free from the bias of the interviewer; answers are in respondents’ own words. 3. Respondents have adequate time to give well thought out answers. 4. Respondents, who are not easily approachable, can also be reached conveniently. 5. Large samples can be made use of and thus the results can be made more dependable and reliable. The main demerits of this system can also be listed here: 1. Low rate of return of the duly filled in questionnaires; bias due to no-response is often indeterminate. 2. It can be used only when respondents are educated and cooperating. 3. The control over questionnaire may be lost once it is sent. 172 CU IDOL SELF LEARNING MATERIAL (SLM)

4. There is inbuilt inflexibility because of the difficulty of amending the approach once questionnaires have been dispatched. 5. There is also the possibility ofambiguous replies or omission of replies altogether to certain questions; interpretation of omissions is difficult. 6. It is difficult to know whether willing respondents are truly representative. 7. This method is likely to be the slowest of all. Before using this method, it is always advisable to conduct ‘pilot study’ (Pilot Survey) for testing the questionnaires. In a big enquiry the significance of pilot survey is felt very much. Pilot survey is in fact the replica and rehearsal of the main survey. Such a survey, being conducted by experts, brings to the light the weaknesses (if any) of the questionnaires and also of the survey techniques. From the experience gained in this way, improvement can be effected. Main aspects of a questionnaire: Quite often questionnaire is considered as the heart of a survey operation. Hence it should be very carefully constructed. If it is not properly set up, then the survey is bound to fail. This fact requires us to study the main aspects of a questionnaire viz., the general form, question sequence and question formulation and wording. Researcher should note the following with regard to these three main aspects of a questionnaire: General form: So far as the general form of a questionnaire is concerned, it can either be structured or unstructured questionnaire. Structured questionnaires are those questionnaires in which there are definite, concrete and pre-determined questions. The questions are presented with exactly the same wording and in the same order to all respondents. Resort is taken to this sort of standardization to ensure that all respondents reply to the same set of questions. The form of the question may be either closed (i.e., of the type ‘yes’ or ‘no’) or open (i.e., inviting free response) but should be stated in advance and not constructed during questioning. Structured questionnaires may also have fixed alternative questions in which responses of the informants are limited to the stated alternatives. Thus a highly structured questionnaire is one in which all questions and answers are specified and comments in the respondent’s own words are held to the minimum. When these characteristics are not present in a questionnaire, it can be termed as unstructured or non-structured questionnaire. More specifically, we can say that in an unstructured questionnaire, the interviewer is provided with a general guide on the type of information to be obtained, but the exact question 173 CU IDOL SELF LEARNING MATERIAL (SLM)

formulation is largely his own responsibility and the replies are to be taken down in the respondent’s own words to the extent possible; in some situations tape recorders may be used to achieve this goal. Structured questionnaires are simple to administer and relatively inexpensive to analyze. The provision of alternative replies, at times, helps to learn the meaning of the question clearly. But such questionnaires have limitations too. For instance, wide range of data and that too in respondent’s own words cannot be obtained with structured questionnaires. They are usually considered inappropriate in investigations where the aim happens to be to probe for attitudes and reasons for certain actions or feelings. They are equally not suitable when a problem is being first explored and working hypotheses sought. In such situations, unstructured questionnaires may be used effectively. Then on the basis of the results obtained in pretest (testing before final use) operations from the use of unstructured questionnaires, one can construct a structured questionnaire for use in the main study. Question sequence: In order to make the questionnaire effective and to ensure quality to the replies received, a researcher should pay attention to the question-sequence in preparing the questionnaire. A proper sequence of questions reduces considerably the chances of individual questions being misunderstood. The question-sequence must be clear and smoothly-moving, meaning thereby that the relation of one question to another should be readily apparent to the respondent, with questions that are easiest to answer being put in the beginning. The first few questions are particularly important because they are likely to influence the attitude of the respondent and in seeking his desired cooperation. The opening questions should be such as to arouse human interest. The following type of questions should generally be avoided as opening questions in a questionnaire: 1. questions that put too great a strain on the memory or intellect of the respondent; 2. questions of a personal character; 3. questions related to personal wealth, etc. Following the opening questions, we should have questions that are really vital to the research problem and a connecting thread should run through successive questions. Ideally, the question sequence should conform to the respondent’s way of thinking. Knowing what information is desired, the researcher can rearrange the order of the questions (this is possible in case of unstructured questionnaire) to fit the discussion in each particular case. But in a structured questionnaire the best that can be done is to determine the question- 174 CU IDOL SELF LEARNING MATERIAL (SLM)

sequence with the help of a Pilot Survey which is likely to produce good rapport with most respondents. Relatively difficult questions must be relegated towards the end so that even if the respondent decides not to answer such questions, considerable information would have already been obtained. Thus, question-sequence should usually go from the general to the more specific and the researcher must always remember that the answer to a given question is a function not only of the question itself, but of all previous questions as well. For instance, if one question deals with the price usually paid for coffee and the next with reason for preferring that particular brand, the answer to this latter question may be couched largely in terms of price differences. Question formulation and wording: With regard to this aspect of questionnaire, the researcher should note that each question must be very clear for any sort of mislearning can do irreparable harm to a survey. Question should also be impartial in order not to give a biased picture of the true state of affairs. Questions should be constructed with a view to their forming a logical part of a well thought out tabulation plan. In general, all questions should meet the following standards— a. should be easily understood; b. should be simple i.e., should convey only one thought at a time; c. should be concrete and should conform as much as possible to the respondent’s way of thinking. For instance, instead of asking. “How many razor blades do you use annually?” The more realistic question would be to ask, “How many razor blades did you use last week?” Concerning the form of questions, we can talk about two principal forms, viz., multiple choice question and the open-end question. In the former the respondent selects one of the alternative possible answers put to him, whereas in the latter he has to supply the answer in his own words. The question with only two possible answers (usually ‘Yes’ or ‘No’) can be taken as a special case of the multiple choice question, or can be named as a ‘closed question.’ There are some advantages and disadvantages of each possible form of question. Multiple choice or closed questions have the advantages of easy handling, simple to answer, quick and relatively inexpensive to analyze. They are most amenable to statistical analysis. Sometimes, the provision of alternative replies helps to make clear the meaning of the question. But the main drawback of fixed alternative questions is that of “putting answers in people’s mouths” i.e., they may force a statement of opinion on an issue about which the respondent does not in fact have any opinion. They are not appropriate when the issue under 175 CU IDOL SELF LEARNING MATERIAL (SLM)

consideration happens to be a complex one and also when the interest of the researcher is in the exploration of a process. In such situations, open-ended questions which are designed to permit a free response from the respondent rather than one limited to certain stated alternatives are considered appropriate. Such questions give the respondent considerable latitude in phrasing a reply. Getting the replies in respondent’s own words is, thus, the major advantage of open-ended questions. But one should not forget that, from an analytical point of view, open-ended questions are more difficult to handle, raising problems of interpretation, comparability and interviewer bias. In practice, one rarely comes across a case when one questionnaire relies on one form of questions alone. The various forms complement each other. As such questions of different forms are included in one single questionnaire. For instance, multiple-choice questions constitute the basis of a structured questionnaire, particularly in a mail survey. But even there, various open-ended questions are generally inserted to provide a more complete picture of the respondent’s feelings and attitudes. Researcher must pay proper attention to the wordings of questions since reliable and meaningful returns depend on it to a large extent. Since words are likely to affect responses, they should be properly chosen. Simple words, which are familiar to all respondents should be employed. Words with ambiguous meanings must be avoided. Similarly, danger words, catch-words or words with emotional connotations should be avoided. Caution must also be exercised in the use of phrases which reflect upon the prestige of the respondent. Question wording, in no case, should bias the answer. In fact, question wording and formulation is an art and can only be learnt by practice. Essentials of a good questionnaire: To be successful, questionnaire should be comparatively short and simple i.e., the size of the questionnaire should be kept to the minimum. Questions should proceed in logical sequence moving from easy to more difficult questions. Personal and intimate questions should be left to the end. Technical terms and vague expressions capable of different interpretations should be avoided in a questionnaire. Questions may be dichotomous (yes or no answers), multiple choice (alternative answers listed) or open-ended. The latter type of questions is often difficult to analyze and hence should be avoided in a questionnaire to the extent possible. There should be some control questions in the questionnaire which indicate the reliability of the respondent. For instance, a question designed to determine the consumption of particular material may be asked first in terms of financial expenditure and later in terms of weight. The control questions, thus, introduce a cross-check to see whether the information collected is correct or not. Questions affecting the sentiments of respondents should be avoided. 176 CU IDOL SELF LEARNING MATERIAL (SLM)

Adequate space for answers should be provided in the questionnaire to help editing and tabulation. There should always be provision for indications of uncertainty, e.g., “do not know,” “no preference” and so on. Brief directions with regard to filling up the questionnaire should invariably be given in the questionnaire itself. Finally, the physical appearance of the questionnaire affects the cooperation the researcher receives from the recipients and as such an attractive looking questionnaire, particularly in mail surveys, is a plus point for enlisting cooperation. The quality of the paper, along with its colour, must be good so that it may attract the attention of recipients. DATA COLLECTION THROUGH SCHEDULE This schedule method of data collection is very much like the collection of data through questionnaire, with little difference which lies in the fact that schedules (proforma containing a set of questions) are being filled in by the enumerators who are specially appointed for the purpose. These enumerators along with schedules, go to respondents, put to them the questions from the proforma in the order the questions are listed and record the replies in the space meant for the same in the proforma. In certain situations, schedules may be handed over to respondents and enumerators may help them in recording their answers to various questions in the said schedules. Enumerators explain the aims and objects of the investigation and also remove the difficulties which any respondent may feel in learning the implications of a particular question or the definition or concept of difficult terms. This method requires the selection of enumerators for filling up schedules or assisting respondents to fill up schedules and as such enumerators should be very carefully selected. The enumerators should be trained to perform their job well and the nature and scope of the investigation should be explained to them thoroughly so that they may well learn the implications of different questions put in the schedule. Enumerators should be intelligent and must possess the capacity of cross examination in order to find out the truth. Above all, they should be honest, sincere, hardworking and should have patience and perseverance. This method of data collection is very useful in extensive enquiries and can lead to fairly reliable results. It is, however, very expensive and is usually adopted in investigations conducted by governmental agencies or by some big organizations. Population census all over the world is conducted through this method. CASE STUDY METHOD 177 CU IDOL SELF LEARNING MATERIAL (SLM)

Meaning: The case study method is a very popular form of qualitative analysis and involves a careful and complete observation of a social unit, be that unit a person, a family, an institution, a cultural group or even the entire community. It is a method of study in depth rather than breadth. The case study places more emphasis on the full analysis of a limited number of events or conditions and their interrelations. The case study deals with the processes that take place and their interrelationship. Thus, case study is essentially an intensive investigation of the particular unit under consideration. The object of the case study method is to locate the factors that account for the behavior-patterns of the given unit as an integrated totality. Case study method of Data Collection According to H. Odum, “The case study method of data collection is a technique by which individual factor whether it be an institution or just an episode in the life of an individual or a group is analyzed in its relationship to any other in the group.” Thus, a fairly exhaustive study of a person (as to what he does and has done, what he thinks he does and had done and what he expects to do and says he ought to do) or group is called a life or case history. Burgess has used the words “the social microscope” for the case study method.” Pauline V. Young describes case study as “a comprehensive study of a social unit be that unit a person, a group, a social institution, a district or a community.” In brief, we can say that case study method is a form of qualitative analysis where in careful and complete observation of an individual or a situation or an institution is done; efforts are made to study each and every aspect of the concerning unit in minute details and then from case data generalizations and inferences are drawn. Characteristics of Case Study method The important characteristics of the case study method are as under: 1. Under this method the researcher can take one single social unit or more of such units for his study purpose; he may even take a situation to study the same comprehensively. 2. Here the selected unit is studied intensively i.e., it is studied in minute details. Generally, the study extends over a long period of time to ascertain the natural history of the unit so as to obtain enough information for drawing correct inferences. 3. In the context of this method we make complete study of the social unit covering all facets. Through this method we try to learn the complex of factors that are operative within a social unit as an integrated totality. 178 CU IDOL SELF LEARNING MATERIAL (SLM)

4. Under this method the approach happens to be qualitative and not quantitative. Mere quantitative information is not collected. Every possible effort is made to collect information concerning all aspects of life. As such, case study deepens our perception and gives us a clear insight into life. For instance, under this method we not only study how many crimes a man has done but shall peep into the factors that forced him to commit crimes when we are making a case study of a man as a criminal. The objective of the study may be to suggest ways to reform the criminal. 5. In respect of the case study method an effort is made to know the mutual inter- relationship of causal factors. 6. Under case study method the behavior pattern of the concerning unit is studied directly and not by an indirect and abstract approach. 7. Case study method results in fruitful hypotheses along with the data which may be helpful in testing them, and thus it enables the generalized knowledge to get richer and richer. In its absence, generalized social science may get handicapped. Evolution and scope: The case study method is a widely used systematic field research technique in sociology these days. The credit for introducing this method to the field of social investigation goes to Frederic Le Play who used it as a hand-maiden to statistics in his studies of family budgets. Herbert Spencer was the first to use case material in his comparative study of different cultures. Dr. William Healy resorted to this method in his study of juvenile delinquency, and considered it as a better method over and above the mere use of statistical data. Similarly, anthropologists, historians, novelists and dramatists have used this method concerning problems pertaining to their areas of interests. Even management experts use case study methods for getting clues to several management problems. In brief, case study method is being used in several disciplines. Not only this, its use is increasing day by day. Assumptions: The case study method is based on several assumptions. The important assumptions may be listed as follows: 1. The assumption of uniformity in the basic human nature in spite of the fact that human behavior may vary according to situations. 2. The assumption of studying the natural history of the unit concerned. 3. The assumption of comprehensive study of the unit concerned. 179 CU IDOL SELF LEARNING MATERIAL (SLM)

Major phases involved: Major phases involved in case study are as follows: 1. Recognition and determination of the status of the phenomenon to be investigated or the unit of attention. 2. Collection of data, examination and history of the given phenomenon. 3. Diagnosis and identification of causal factors as a basis for remedial or developmental treatment. 4. Application of remedial measures i.e., treatment and therapy (this phase is often characterized as case work). 5. Follow-up programme to determine effectiveness of the treatment applied. Advantages: There are several advantages of the case study method that follow from the various characteristics outlined above. Mention may be made here of the important advantages. 1. Being an exhaustive study of a social unit, the case study method enables us to learn fully the behavior pattern of the concerned unit. In the words of Charles Horton Cooley, “case study deepens our perception and gives us a clearer insight into life…. It gets at behavior directly and not by an indirect and abstract approach.” 2. Through case study a researcher can obtain a real and enlightened record of personal experiences which would reveal man’s inner strivings, tensions and motivations that drive him to action along with the forces that direct him to adopt a certain pattern of behavior. 3. This method enables the researcher to trace out the natural history of the social unit and its relationship with the social factors and the forces involved in its surrounding environment. 4. It helps in formulating relevant hypotheses along with the data which may be helpful in testing them. Case studies, thus, enable the generalized knowledge to get richer and richer. 5. The method facilitates intensive study of social units which is generally not possible if we use either the observation method or the method of collecting information through schedules. This is the reason why case study method is being frequently used, particularly in social researches. 180 CU IDOL SELF LEARNING MATERIAL (SLM)

6. Information collected under the case study method helps a lot to the researcher in the task of constructing the appropriate questionnaire or schedule for the said task requires thorough knowledge of the concerning universe. 7. The researcher can use one or more of the several research methods under the case study method depending upon the prevalent circumstances. In other words, the use of different methods such as depth interviews, questionnaires, documents, study reports of individuals, letters, and the like is possible under case study method. 8. Case study method has proved beneficial in determining the nature of units to be studied along with the nature of the universe. This is the reason why at times the case study method is alternatively known as “mode of organizing data”. 9. This method is a means to well learn the past of a social unit because of its emphasis of historical analysis. Besides, it is also a technique to suggest measures for improvement in the context of the present environment of the concerned social units. 10. Case studies constitute the perfect type of sociological material as they represent a real record of personal experiences which very often escape the attention of most of the skilled researchers using other techniques. 11. Case study method enhances the experience of the researcher and this in turn increases his analyzing ability and skill. 12. This method makes possible the study of social changes. On account of the minute study of the different facets of a social unit, the researcher can well learn the social change then and now. This also facilitates the drawing of inferences and helps in maintaining the continuity of the research process. In fact, it may be considered the gateway to and at the same time the final destination of abstract knowledge. 13. Case study techniques are indispensable for therapeutic and administrative purposes. They are also of immense value in taking decisions regarding several management problems. Case data are quite useful for diagnosis, therapy and other practical case problems. Limitations: Important limitations of the case study method may as well be highlighted. 1. Case situations are seldom comparable and as such the information gathered in case studies is often not comparable. Since the subject under case study tells history in his own words, logical concepts and units of scientific classification have to be read into it or out of it by the investigator. 181 CU IDOL SELF LEARNING MATERIAL (SLM)

2. Read Bain does not consider the case data as significant scientific data since they do not provide knowledge of the “impersonal, universal, non-ethical, non-practical, repetitive aspects of phenomena.” Real information is often not collected because the subjectivity of the researcher does enter in the collection of information in a case study. 3. The danger of false generalization is always there in view of the fact that no set rules are followed in collection of the information and only few units are studied. 4. It consumes more time and requires lot of expenditure. More time is needed under case study method since one study the natural history cycles of social units and that too minutely. 5. The case data are often vitiated because the subject, according to Read Bain, may write what he thinks the investigator wants; and the greater the rapport, the more subjective the whole process is. 6. Case study method is based on several assumptions which may not be very realistic at times, and as such the usefulness of case data is always subject to doubt. 7. Case study method can be used only in a limited sphere., it is not possible to use it in case of a big society. Sampling is also not possible under a case study method. 8. Response of the investigator is an important limitation of the case study method. He often thinks that he has full knowledge of the unit and can himself answer about it. In case the same is not true, then consequences follow. In fact, this is more the fault of the researcher rather than that of the case method. SUMMARY The difference between primary and secondary data in Statistics is that Primary data is collected firsthand by a researcher (organization, person, authority, agency or party etc) through experiments, surveys, questionnaires, focus groups, conducting interviews and taking (required) measurements, while the secondary data is readily available (collected by someone else) and is available to the public through publications, journals and newspapers. This information must be taken into consideration for formulating marketing strategy by a dealer selling musical products. Information expressed in appropriate quantitative form are known as data. The necessity and usefulness of information gathering or data collection cannot be overemphasized in government policies. 182 CU IDOL SELF LEARNING MATERIAL (SLM)

KEY WORDS/ABBREVIATIONS • Primary Data: are data collected by the organization itself. • Secondary Data: are data collected and processed by some other agency. • Observation Method: is the procedure through which the investigator collects information by personal observation. • A Questionnaire: is a proforma containing a sequence of questions to elicit information from the interviewees. • The Questionnaire Method: is the Method of collecting data by personal visit with a questionnaire. • Questionnaire Method: The Mailed Questionnaire Method is the method of collecting data by mailing questionnaire. • Interview Method: The Telephone Interview Method is the method of collecting data by contacting respondents over telephone. LEARNING ACTIVITY 1. You have been assigned the task of finding the various problems of railway commuters in Bombay. Design a suitable questionnaire to be used in this study. 2. Describe how you can ensure the quality of data collected? 1) Using an interview schedule. 2) Using observation? UNIT END QUESTIONS (MCQ AND DESCRIPTIVE) A. Descriptive Types Questions 1. Differentiate between Primary vs. Secondary Data 2. Explain process of collecting primary data through Interview Method 183 CU IDOL SELF LEARNING MATERIAL (SLM)

3. Explain Questionnaires method in detail. 4. Discuss the data can be collected through Schedules? 5. What is Case Study Method? Explain with relevant examples. B. Multiple Choice Questions 1. Snowball sampling can help the researcher to: a. Overcome the problem of not having an accessible sampling frame. b. Access difficult or hidden populations. c. Theorize inductively in a qualitative study. d. All of these 2. Which of the following is NOT a type of non-probability sampling? a. Judgmental sampling. b. Convenience sampling. c. Snowball sampling. d. Cluster sampling. 3. Which one of these is a self-administered questionnaire? a. Personal questionnaire. b. Postal questionnaire. c. Telephone questionnaire. d. Face-to-face questionnaire. 4. Which one of these is an interviewer-administered questionnaire? a. Telephone questionnaire. b. Delivery and collection questionnaire. c. On-line questionnaire. d. Postal questionnaire. 5. In situations where not all respondents are sufficiently informed to answer a question: a. filter questions should be used. b. quota sampling should be used. c. open-ended questions should be used. d. multiple questionnaires should be designed. Answer 184 CU IDOL SELF LEARNING MATERIAL (SLM)

1. d 2. d 3. b 4. a 5. a REFERENCES • Gopal, M.H. 1964. An Introduction to Research Procedure in Social Sciences, Asia Publishing House: Bombay. Kothari, C.R. 1989. • Research Methodology Methods and Techniques, Wiley Eastern Limited: New Delhi. • Sadhu, A.N. and A. Singh. 1980. Research Methodology in Social Sciences, Sterling Publishers Private Limited: New Delhi. • Wilkinson, T.S. and P.L. Bliandarkar. 1979. Methodology and Techniques ofSocial Research, Himalaya Publishing House: Bombay. 185 CU IDOL SELF LEARNING MATERIAL (SLM)

UNIT 11: PROCESSING AND ANALYSIS OF DATA Structure Learning Objectives Introduction Statistics in Research Data Preparation – Univariate analysis, Bivariate analysis , Multi variate analysis What Is Bivariate Data? What Is Bivariate Analysis? Types Of Bivariate Analysis What Is Multivariate Analysis? Computation Of The Chi-Square Statistic For Cross-Tabulation Tables Types of storage: blob, file, table, queue. 18.8.2 Determining the Target Parameter Comparison Of Two Population Proportions Hypothesis testing for difference between two means using z-statistic and t- statistic ANOVA. Summary Key Words/Abbreviations Learning Activity Unit End Questions (MCQ and Descriptive) References LEARNING OBJECTIVES 186 CU IDOL SELF LEARNING MATERIAL (SLM)

After studying this unit, you will be able to: • State Concepts of Statistics in Research • Discuss Univariate analysis, Bivariate analysis, Multi variate analysis • Describe ANOVA • Explain statistical inferences about two populations INTRODUCTION Knowledge in statistics provides you with the necessary tools and conceptual foundations in quantitative reasoning to extract information intelligently from this sea of data. Statistical methods and analyses are often used to communicate research findings and to support hypotheses and give credibility to research methodology and conclusions. It is important for researchers and also consumers of research to learn statistics so that they can be informed, evaluate the credibility and usefulness of information, and make appropriate decisions. Statistics play a vital role in researches. For example, statistics can used as in data collection, analysis, interpretation, explanation and presentation. Use of statistics will guide researchers in research for proper characterization, summarization, presentation and interpretation of the result of research. Statistics provides a platform for research as to; How to go about your research, either to consider a sample or the whole population, the Techniques to use in data collection and observation, how to go about the data description (using measure of central tendency). Statistical methods and analyses are often used to communicate research findings and to support hypotheses and give credibility to research methodology and conclusions. It is important for researchers and also consumers of research to learn statistics so that they can be informed, evaluate the credibility and usefulness of information, and make appropriate decisions. Statics is very important when it comes to the conclusion of the research. • In this aspect the major purposes of statistics are to help us learn and describe phenomena in our word and to help us draw reliable conclusions about those phenomena. Statistics has important role in determining the existing position of per capita income, unemployment, population growth rate, housing, schooling medical facilities etc…in a country. Now statistics holds a central position in almost every field like Industry, Commerce, Trade, Physics, Chemistry, Economics, Mathematics, Biology, Botany, Psychology, Astronomy, Information Technology etc…, so application of statistics is very 187 CU IDOL SELF LEARNING MATERIAL (SLM)

wide. Specialties have evolved to apply statistical theory and methods to various disciplines. So there are different fields of application of statistics. Some of those are described below. • Astro-statistics is the discipline that applies statistical analysis to the learning of astronomical data. • Biostatistics is a branch of biology that studies biological phenomena and observationsby means of statistical analysis, and includes medical statistics. • Econometrics is a branch of economics that applies statistical methods to the empirical study of economic theories and relationships. • Business analytics is a rapidly developing business process that applies statistical methods to data sets to develop new insights and learning of business performance & opportunities. STATISTICS IN RESEARCH The role of statistics in research is to function as a tool in designing research, analyzing its data and drawing conclusions there from. Most research studies result in a large volume of raw data which must be suitably reduced so that the same can be read easily and can be used for further analysis. Clearly the science of statistics cannot be ignored by any research worker, even though he may not have occasion to use statistical methods in all their details and ramifications. Classification and tabulation, as stated earlier, achieve this objective to some extent, but we have to go a step further and develop certain indices or measures to summarize the collected/classified data. Only after this we can adopt the process of generalization from small groups (i.e., samples) to population. If fact, there are two major areas of statistics viz., descriptive statistics and inferential statistics. Descriptive statistics concern the development of certain indices from the raw data, whereas inferential statistics concern with the process of generalization. Inferential statistics are also known as sampling statistics and are mainly concerned with two major type of problems: i. the estimation of population parameters, and ii. the testing of statistical hypotheses. The important statistical measures* that are used to summarize the survey/research data are: 188 CU IDOL SELF LEARNING MATERIAL (SLM)

1. measures of central tendency or statistical averages; 2. measures of dispersion; 3. measures of asymmetry (skewness); 4. measures of relationship; and 5. other measures. Amongst the measures of central tendency, the three most important ones are the arithmetic average or mean, median and mode. Geometric mean and harmonic mean are also sometimes used. From among the measures of dispersion, variance, and its square root—the standard deviation are the most often used measures. Other measures such as mean deviation, range, etc. are also used. For comparison purpose, we use mostly the coefficient of standard deviation or the coefficient of variation. In respect of the measures of skewness and kurtosis, we mostly use the first measure of skewness based on mean and mode or on mean and median. Other measures of skewness, based on quartiles or on the methods of moments, are also used sometimes. Kurtosis is also used to measure the peaked ness of the curve of the frequency distribution. Amongst the measures of relationship, Karl Pearson’s coefficient of correlation is the frequently used measure in case of statistics of variables, whereas Yule’s coefficient of association is used in case of statistics of attributes. Multiple correlation coefficient, partial correlation coefficient, regression analysis, etc., are other important measures often used by a researcher. Index numbers, analysis of time series, coefficient of contingency, etc., are other measures that may as well be used by a researcher, depending upon the nature of the problem understudy. We give below a brief outline of some important measures (out of the above listed measures) often used in the context of research studies. DATA PREPARATION – UNIVARIATE ANALYSIS, BIVARIATE ANALYSIS What is Univariate Analysis? Univariate analysis is the simplest form of analyzing data. “Uni” means “one”, so in other words your data has only one variable. It doesn’t deal with causes or relationships (unlike regression) and its major purpose is to describe; It takes data, summarizes that data and finds patterns in the data. it becomes bivariate analysis (or in the case of 3 or more What is a variable in Univariate Analysis? A variable in univariate analysis is just a condition or subset that your data falls into. You 189 CU IDOL SELF LEARNING MATERIAL (SLM)

can think of it as a “category.” For example, the analysis might look at a variable of “age” or it might look at “height” or “weight”. However, it doesn’t look at more than one variable at a time otherwise variables it would be called multivariate analysis). The following frequency distribution table shows one variable (left column) and the count in the right column. A frequency chart.-Figure 11.1 You could have more than one variable in the above chart. For example, you could add the variable “Location “or “Age” or something else, and make a separate column for location or age. In that case you would have bivariate data because you would then have two variables. Univariate Descriptive Statistics Some ways you can describe patterns found in univariate data include central tendency (mean, mode and median) and dispersion: range, variance, maximum, minimum, quartiles (including the interquartile range), and standard deviation. You have several options for describing data with univariate data. Click on the link to find out more about each type of graph or chart: • Frequency Distribution Tables. • Bar Charts. • Histograms. • Frequency Polygons. • Pie Charts. WHAT IS BIVARIATE DATA? Data in statistics is sometimes classified according to how many variables are in a particular study. For example, “height” might be one variable and “weight” might be another variable. 190 CU IDOL SELF LEARNING MATERIAL (SLM)

Depending on the number of variables being looked at, the data might be univariate, or it might be bivariate. When you conduct a study that looks at a single variable, that study involves univariate data. For example, you might study a group of college students to find out their average SAT scores or you might study a group of diabetic patients to find their weights. Bivariate data is when you are studying two variables. For example, if you are studying a group of college students to find out their average SAT score and their age, you have two pieces of the puzzle to find (SAT score and age). Or if you want to find out the weights and heights of college students, then you also have bivariate data. Bivariate data could also be two sets of items that are dependent on each other. For example: • Ice cream sales compared to the temperature that day. • Traffic accidents along with the weather on a particular day. Bivariate data has many practical uses in real life. For example, it is pretty useful to be able to predict when a natural event might occur. One tool in the statistician’s toolbox is bivariate data analysis. Sometimes, something as simple as plotting on variable against another on a Cartesian plane can give you a clear picture of what the data is trying to tell you. For example, the scatterplot below shows the relationship between the time between eruptions at Old Faithful vs. the duration of the eruption. Figure 11.2 191 CU IDOL SELF LEARNING MATERIAL (SLM)

Waiting time between eruptions and the duration of the eruption for the Old Faithful Geyser in Yellowstone National Park, Wyoming, USA. This scatterplot suggests there are generally two “types” of eruptions: short-wait-short-duration, and long-wait-long-duration. WHAT IS BIVARIATE ANALYSIS? Bivariate analysis means the analysis of bivariate data. It is one of the simplest forms of statistical analysis, used to find out if there is a relationship between two sets of values. It usually involves the variables X and Y. • Univariate analysis is the analysis of one (“uni”)variable. • Bivariate analysis is the analysis of exactly twovariables. • Multivariate analysis is the analysis of more than two variables. The results from bivariate analysis can be stored in a two-column data table. For example, you might want to find out the relationship between caloric intake and weight (of course, there is a pretty strong relationship between the two. You can read more here.). Caloric intake would be your independent variable, X and weight would be your dependent variable, Y. Figure 11.3 Bivariate analysis is not the same as two sample data analysis. With two sample data analysis (like a two sample z test in Excel), the X and Y are not directly related. You can also have a different number of data values in each sample; with bivariate analysis, there is a Y value for each X. Let’s say you had a caloric intake of 3,000 calories per dayand a weight of 300lbs. You would write that with the x-variable followed by the y-variable: (3000,300). Two sample data analysis Sample 1: 100,45,88,99 Sample 2: 44,33,101 Bivariate analysis (X, Y) = (100,56), (23,84), (398,63), (56,42) 192 CU IDOL SELF LEARNING MATERIAL (SLM)

TYPES OF BIVARIATE ANALYSIS Common types of bivariate analysis include: 1. Scatter plots, These give you a visual idea of the pattern that your variables follow. Figure 11.4 simple scatterplot. 2. Regression Analysis Regression analysis is a catch all term for a wide variety of tools that you can use to determine how your data points might be related. In the image above, the points look like they could follow an exponential curve (as opposed to a straight line). Regression analysis can give you the equation for that curve or line. It can also give you the correlation coefficient. 3. Correlation Coefficients Calculating values for correlation coefficients are using performed on a computer, although you can find the steps to find the correlation coefficient by hand here. This coefficient tells you if the variables are related. Basically, a zero means they aren’t correlated (i.e. related in some way), while a 1 (either positive or negative) means that the variables are perfectly correlated (i.e. they are perfectly in sync with each other). WHAT IS MULTIVARIATE ANALYSIS? 193 CU IDOL SELF LEARNING MATERIAL (SLM)

Multivariate analysis is used to study more complex sets of data than what univariate analysis methods can handle. This type of analysis is almost always performed with software (i.e. SPSS or SAS), as working with even the smallest of data sets can be overwhelming by hand. Multivariate analysis can reduce the likelihood of Type I errors. Sometimes, univariate analysis is preferred as multivariate techniques can result in difficulty interpreting the results of the test. For example, group differences on a linear combination of dependent variables in MANOVA can be unclear. In addition, multivariate analysis is usually unsuitable for small sets of data. There are more than 20 different ways to perform multivariate analysis. Which one you choose depends upon the type of data you have and what your goals are. For example, if you have a single data set you have several choices: • Additive trees, multidimensional scaling, cluster analysis are appropriate for when the rows and columns in your data table represent the same units and the measure is either a similarity or a distance. • Principal component analysis (PCA) decomposes a data table with correlated measures into a new set of uncorrelated measures. • Correspondence analysis is similar to PCA. However, it applies to contingency tables. Although there are fairly clear boundaries with one data set (for example, if you have a single data set in a contingency table your options are limited to correspondence analysis), in most cases you’ll be able to choose from several methods. Figure 11.5 Cluster analysis showing three groups. Cross tabulations and Chi-square test including testing hypothesis of association 194 CU IDOL SELF LEARNING MATERIAL (SLM)

Cross-tabulation is one of the most useful analytical tools and a mainstay of the market research industry. Cross-tabulation analysis, also known as contingency table analysis, is most often used to analyze categorical (nominal measurement scale) data. For reference, a cross-tabulation (or crosstab) is a two- (or more) dimensional table that records the number (frequency) of respondents that have the specific characteristics described in the cells of the table. Cross-tabulation tables provide a wealth of information about the relationship between the variables. Cross-tabulation analysis goes by several names in the research world including crosstab, contingency table, chi-square and data tabulation. Cross-tabulation analysis has its own unique language, using terms such as “banners”, “stubs”, “Chi-Square Statistic” and “Expected Values.” A typical cross-tabulation table comparing the two hypothetical variables “City of Residence” with “Favorite Baseball Team” is shown below. Are city of residence and being a fan of that team independent? The cells of the table report the frequency counts and percentages for the number of respondents in each cell. Figure 11.6 In this table, the text legend in the crosstab describes the row and column variables. You can create and analyze multiple tables in a side-by-side or sequential format. Tabulation professionals call the column variables in these multiple tables “Banners” and row variables “Stubs”. Cross-Tabulation with Chi-Square Analysis The Chi-square statistic is the primary statistic used for testing the statistical significance of the cross-tabulation table. Chi-square tests determine whether or not the two variables are 195 CU IDOL SELF LEARNING MATERIAL (SLM)

independent. If the variables are independent (have no relationship), then the results of the statistical test will be “non-significant” and we are not able to reject the null hypothesis, meaning that we believe there is no relationship between the variables. If the variables are related, then the results of the statistical test will be “statistically significant” and we are able to reject the null hypothesis, meaning that we can state that there is some relationship between the variables. The chi-square statistic, along with the associated probability of chance observation, may be computed for any table. If the variables are related (i.e., the observed table relationships would occur with very low probability, say only 5%) then we say that the results are “statistically significant” at the .05 or 5% level. This means that the variables have a low chance of being independent. Students of statistics will recall that the probability values (.05 or .01) reflect the researcher’s willingness to accept a type I error, or the probability of rejecting a true null hypothesis (meaning that we thought there was a relationship between the variables when there really wasn’t). Furthermore, these probabilities are cumulative, meaning that if 20 tables are tested, the researcher can be almost assured that one of the tables is incorrectly found to have a relationship (20 x .05 = 100% chance). Depending on the cost of making mistakes, the researcher may apply more stringent criteria for declaring significance, such as .01 or .005. COMPUTATION OF THE CHI-SQUARE STATISTIC FOR CROSS-TABULATION TABLES The chi-square statistic is computed by first computing a chi-square value for each individual cell of the table and then summing them up to form a total chi-square value for the table. The chi-square value for the cell is computed as: (Observed Value – Expected Value)2 / (Expected Value). The chi-Square computations are highlighted in gray. In this example table, we observe that the chi-square value for the table is 19.35, and has an associated probability of occurring by chance less than one time in 1000. We therefore reject the null hypothesis of no difference and conclude that there must be a relationship between the variables. We can observe the relationship in two places in the table. The most obvious is in the chi-square value computed for each cell. We observe that the cells “Red Socks and Boston”, “Blue Jays and Montreal” and “Red Socks and Montpellier, 196 CU IDOL SELF LEARNING MATERIAL (SLM)

Vermont” were the three cells where the number of observed respondents was greater than expected. We further note that when we examine the expected and observed frequencies, the “Yankees and Montreal”, “Red Socks and Montpellier, Vermont”, and “Red Socks and Montreal” frequencies were fewer than expected. Figure 11.7 Because the cell chi-square and the expected values are often not displayed, these same relationships can be observed by comparing the column total percent to the cell percent (of the row total). In cell “Red Socks and Boston” we would compare 41.10% with 64.71% and observe that more Red Socks fans liked Boston than expected. Caution is urged when interpreting relationships found in any statistical analysis. We often desire to “explain” or conclude “causality” from analyses when data either is not designed to, or does not have the power to support such conclusions. 197 CU IDOL SELF LEARNING MATERIAL (SLM)

In the current table, we observe that “Red Socks and Boston” had the greatest delta between the number of observed and expected respondents, for any team preference and city of residence. However, we must be careful in concluding that the Red Socks caused respondents to move to Boston, or that Boston as a city of residence causes fan loyalty. Red Socks and Boston are the most observed fan and city relationship but are most likely totally independent when considering other concepts or relationships. Crosstabs and chi-square are powerful ways to analyze your survey data. Another popular tool that makes an impact on research is Conjoint Analysis. Types of storage: blob, file, table, queue. Many experiments involve a comparison of two populations. For instance: • A real estate company may want to estimate the difference in mean sales price between city and suburban homes. • A consumer group might test whether two major brands of food freezers differ in the average amount of electricity they use. • A television market researcher wants to estimate the difference in the proportions of younger and older viewers who regularly watch a popular TV program. The same procedures that are used to estimate and test hypotheses about a single population can be modified to make inferences about two populations. Determining the Target Parameter Parameter Key words Type of Data μ1−μ2μ1−μ2 Mean difference; difference in averages Quantitative 198 CU IDOL SELF LEARNING MATERIAL (SLM)


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook