Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore 21ODMMT652-Research Methods and Statistics II

21ODMMT652-Research Methods and Statistics II

Published by Teamlease Edtech Ltd (Amita Chitroda), 2022-03-02 04:53:25

Description: 21ODMMT652-Research Methods and Statistics II

Search

Read the Text Version

3.9 CROSS-SECTIONAL RESEARCH A cross-sectional study is a type of research design in which you collect data from many different individuals at a single point in time. In cross-sectional research, you observe variables without influencing them. Researchers in economics, psychology, medicine, epidemiology, and the other social sciences all make use of cross-sectional studies in their work. For example, epidemiologists who are interested in the current prevalence of a disease in a certain subset of the population might use a cross-sectional design to gather and analyze the relevant data. A cross-sectional study involves looking at data from a population at one specific point in time. The participants in this type of study are selected based on particular variables of interest. Cross-sectional studies are often used in developmental psychology, but this method is also used in many other areas, including social science and education. Cross-sectional studies are observational in nature and are known as descriptive research, not causal or relational, meaning that you can't use them to determine the cause of something, such as a disease. Researchers record the information that is present in a population, but they do not manipulate variables. This type of research can be used to describe characteristics that exist in a community, but not to determine cause-and-effect relationships between different variables. This method is often used to make inferences about possible relationships or to gather preliminary data to support further research and experimentation. Characteristics of a Cross-sectional study Some of the key characteristics of a cross-sectional study include:1  The study takes place at a single point in time  It does not involve manipulating variables  It allows researchers to look at numerous characteristics at once (age, income, gender, etc.)  It's often used to look at the prevailing characteristics in a given population  It can provide information about what is happening in a current population Advantages and disadvantages of cross-sectional studies Advantages of cross-sectional studies 51 CU IDOL SELF LEARNING MATERIAL (SLM)

 Because you only collect data at a single point in time, cross-sectional studies are relatively cheap and less time-consuming than other types of research.  Cross-sectional studies allow you to collect data from a large pool of subjects and compare differences between groups.  Cross-sectional studies capture a specific moment in time. National censuses, for instance, provide a snapshot of conditions in that country at that time. Disadvantages of cross-sectional studies  It is difficult to establish cause-and-effect relationships using cross-sectional studies, since they only represent a one-time measurement of both the alleged cause and effect.  Since cross-sectional studies only study a single moment in time, they cannot be used to analyze behavior over a period of time or establish long-term trends.  The timing of the cross-sectional snapshot may be unrepresentative of behavior of the group as a whole. For instance, imagine you are looking at the impact of psychotherapy on an illness like depression. If the depressed individuals in your sample began therapy shortly before the data collection, then it might appear that therapy causes depression even if it is effective in the long term. Differences between Cross-Sectional Study and Longitudinal Study Cross-sectional and longitudinal study both are types of observational study, where the participants are observed in their natural environment. There are no alteration or changes in the environment in which the participants exist. Despite this marked similarity, there are distinctive differences between both these forms of study. Let us analyze the differences between cross-sectional study and longitudinal study. Cross-sectional study Longitudinal study Cross-sectional studies are quick to Longitudinal studies may vary from a conduct as compared to few years to even decades. longitudinal studies. A cross-sectional study is A longitudinal study requires a researcher conducted at a given point in time. to revisit participants of the study at proper intervals. 52 CU IDOL SELF LEARNING MATERIAL (SLM)

Cross-sectional study is conducted Longitudinal study is conducted with the with different samples. same sample over the years. Cross-sectional studies cannot pin Longitudinal study can justify cause-and- down cause-and-effect relationship. effect relationship. Multiple variables can be studied at Only one variable is considered to a single point in time. conduct the study. Cross-sectional study is Since the study goes on for years longitudinal study tends to get expensive. comparatively cheaper. 3.10 QUALITIES OF A GOOD RESEARCH DESIGN The qualities of a good research design are mentioned below: 1. Good research is systematic: It means that research is structured with specified steps to be taken in a specified sequence in accordance with the well-defined set of rules. Systematic characteristic of the research does not rule out creative thinking but it certainly does reject the use of guessing and intuition in arriving at conclusions. 2. Good research is logical: This implies that research is guided by the rules of logical reasoning and the logical process of induction and deduction are of great value in carrying out research. Induction is the process of reasoning from a part to the whole whereas deduction is the process of reasoning from some premise to a conclusion which follows from that very premise. In fact, logical reasoning makes research more meaningful in the context of decision making. 3. Good research is empirical: It implies that research is related basically to one or more aspects of a real situation and deals with concrete data that provides a basis for external validity to research results. 4. Good research is replicable: This characteristic allows research results to be verified by replicating the study and thereby building a sound basis for decisions. 3.11 SUMMARY Every researcher has a specific strategy that he or she will use to conduct the research. This strategy will determine what and how the researcher will do at every step in the research 53 CU IDOL SELF LEARNING MATERIAL (SLM)

process. The design of the research will also determine the resources required for conducting the research be it in terms of time frame, monetary resources materials required or man power. The research design in short is like a blue print of the entire research project. Without this blue print the research will be clueless and the research will be without any direction. The function of a research design is to ensure that the evidence obtained enables you to effectively address the research problem as unambiguously as possible. In social sciences research, obtaining evidence relevant to the research problem generally entails specifying the type of evidence needed to test a theory, to evaluate a program, or to accurately describe a phenomenon. However, researchers can often begin their investigations far too early, before they have thought critically about resources and what information is required to answer the study's research questions. Without attending to these design issues beforehand, the conclusions drawn risk being weak and unconvincing and, consequently, will fail to adequate address the overall research problem. 3.12 KEY WORDS/ ABBREVIATIONS  correlational method- A research approach in which two or more variables are measured, usually in naturalistic settings, and the covariance of the variables is examined to find relationships between or among them. This approach lacks the control of extraneous variables present in good experimental research, and so causal inferences are seldom properly made from correlational studies. This approach is often used when experiment is impossible, ecological validity is a primary concern, or ethical limitations prevent experimental research.  correlational study- Correlational study is a study of the relationships between two or more variables, usually using correlational statistics to describe the relationship.  Experimental methods- Experimental methods is a system of procedures and materials used systematically to investigate the relationships between controlled (independent) variables and uncontrolled (dependent) ones. 3.13 LEARNING ACTIVITY 1. What is Experimental Research? What are its advantages and disadvantages? ___________________________________________________________________________ ___________________________________________________________________________ 2. What is Exploratory Research? What are its advantages and disadvantages? 54 CU IDOL SELF LEARNING MATERIAL (SLM)

___________________________________________________________________________ ___________________________________________________________________________ 3.14 UNIT END QUESTIONS (MCQS AND DESCRIPTIVE) A. Descriptive Questions 1. Write a note on research design. 2. List the characteristics of research design 3. State the purpose of the research design. 4. Write a short note Experimental Research. 5. Describe Exploratory Research. B. Multiple Choice Questions (MCQs) 1. In which method of study of psychology, independent and dependent variable are important elements. (A) Introspection Method (B) Experimental Method (C) Observational Method (D) Case History Method 2. “Controlled Group” is a term used in ____________________. (A) Survey research (B) Historical research (C) Experimental research (D) Descriptive research 3. The longitudinal approach of research deals with _________. 55 (A) Horizontal researches CU IDOL SELF LEARNING MATERIAL (SLM)

(B) Long-term researches (C) Short-term researches (D) None of these 4. Which is not the purpose of research design (A) To prepare a budget (B) To give direction to the research (C) To minimize expenses (D) To collect relevant data 5. ______________________ is a study of the relationships between two or more variables, usually using correlational statistics to describe the relationship (A) Survey research (B) Historical research (C) Experimental research (D) Correlational research Answer 1. b 2 c 3 b 4 a 5 d 3.15 SUGGESTED READINGS  Introduction to Social Research: Quantitative and Qualitative Approaches by Keith F Punch  Research Methodology: Methods and Techniques By C. R. Kothari  Research Methodology By D K Bhattacharyya  Research Methodology: A Step-by-Step Guide for Beginners By Ranjit Kumar  Research Methodology By P. Sam Daniel, Aroma G. Sam 56 CU IDOL SELF LEARNING MATERIAL (SLM)

UNIT 4 SPSS: AN INTRODUCTION Structure 4.0. Learning Objectives 4.1. Introduction 4.2. What is SPSS? 4.3. Uses of SPSS 4.4. How to use SPSS? 4.5. Layout of SPSS 4.6. Entering Data in SPSS 4.7. Summary 4.8. Key Words/ Abbreviations 4.9. Learning Activity 4.10. Unit End Questions (MCQs and Descriptive) 4.11. Suggested Readings 4.0 LEARNING OBJECTIVES After studying this unit, you will be able to:  Outline an introduction to SPSS  Illustrate the use SPSS  Explain basic function of SPSS programme 4.1 INTRODUCTION SPSS, standing for Statistical Package for the Social Sciences, is a powerful, user-friendly software package for the manipulation and statistical analysis of data. The package is particularly useful for students and researchers in psychology, sociology, psychiatry, and other behavioural sciences, containing as it does an extensive range of both univariate and multivariate procedures much used in these disciplines. Our aim in this handbook is to give brief and straightforward descriptions of how to conduct a range of statistical analyses using the latest version of SPSS, SPSS 11. Each chapter deals with a different type of analytical procedure applied to one or more data sets primarily (although not exclusively) from the social and behavioural areas. Although we concentrate largely on how to use SPSS to get results and on how to correctly interpret these results, the basic theoretical background of many of the techniques used is also described in separate boxes. When more advanced procedures are used, readers are referred to other sources for details. Many of the boxes contain a few mathematical formulae, but by 57 CU IDOL SELF LEARNING MATERIAL (SLM)

separating this material from the body of the text, we hope that even readers who have limited mathematical background will still be able to undertake appropriate analyses of their data. 4.2 WHAT IS SPSS? SPSS is a Windows based program that can be used to perform data entry and analysis and to create tables and graphs. SPSS is capable of handling large amounts of data and can perform all of the analyses covered in the text and much more. SPSS is commonly used in the Social Sciences and in the business world, so familiarity with this program should serve you well in the future. SPSS is updated often. The “Statistical Package for the Social Sciences” (SPSS) is a package of programs for manipulating, analysing, and presenting data; the package is widely used in the social and behavioural sciences. There are several forms of SPSS. The core program is called SPSS Base and there are a number of add-on modules that extend the range of data entry, statistical, or reporting capabilities. In our experience, the most important of these for statistical analysis are the SPSS Advanced Models and SPSS Regression Models add-on modules. SPSS Inc. also distributes stand-alone programs that work with SPSS. SPSS for Windows consists of five different windows, each of which is associated with a particular SPSS file type. This document discusses the two windows most frequently used in analysing data in SPSS, the Data Editor and the Output Viewer windows. In addition, the Syntax Editor and the use of SPSS command syntax is discussed briefly. The Data Editor is the window that is open at start-up and is used to enter and store data in a spread-sheet format. The Output Viewer opens automatically when you execute an analysis or create a graph using a dialog box or command syntax to execute a procedure. The Output Viewer contains the results of all statistical analyses and graphical displays of data. The Syntax Editor is a text editor where you compose SPSS commands and submit them to the SPSS processor. All output from these commands will appear in the Output Viewer. This document focuses on the methods necessary for inputting, defining, and organizing data in SPSS. 4.3 USES OF SPSS With SPSS Statistics you can:  Analyze and better understand your data, and solve complex business and research problems through a user friendly interface.  More quickly understand large and complex data sets with advanced statistical procedures that help ensure high accuracy and quality decision making. 58 CU IDOL SELF LEARNING MATERIAL (SLM)

 Use extensions, Python and R programming language code to integrate with open source software.  More easily select and manage your software with flexible deployment options. 4.4 HOW TO START SPSS? To start SPSS, go to the Start icon on your Windows computer. You should find an SPSS icon under the Programs menu item. You can also start SPSS by double-clicking on an SPSS file. When the program opens, it will present you with a “What would you like to do?” dialog box. For now, hit “Cancel” to dismiss the box. Depending on how the computer you are working on is structured, you can open SPSS in one of two ways. 1. If there is an SPSS shortcut like this on the desktop, simply put the cursor on it and double click the left mouse button. 2. Click the left mouse button on the button on your screen, then put your cursor on Programs or All Programs and left click the mouse. Select SPSS 17.0 for Windows by clicking the left mouse button. (For a while that started calling the program PASW Statistics 17, but they seem to have given that up as a dumb idea when everyone else calls it SPSS. The version number may change by the time you read this.) Either approach will launch the program. You will see a screen that looks like the image on the next page. The dialog box that appears offers choices of running the tutorial, typing in data, running queries, or opening an existing data source. The window behind this is the Data Editor window which is used to display the data from whatever file you are using. You could select any one of the options on the start-up dialog box and click OK, or you could simply hit Cancel. If you hit Cancel, you can either enter new data in the blank Data Editor or you could open an existing file using the File menu bar as explained later. 4.5 LAYOUT OF SPSS The Data Editor window has two views that can be selected from the lower left hand side of the screen. Data View is where you see the data you are using. Variable View is where you can specify the format of your data when you are creating a file or where you can check the format of a pre-existing file. The data in the Data Editor is saved in a file with the extension .sav. 59 CU IDOL SELF LEARNING MATERIAL (SLM)

The other most commonly used SPSS window is the SPSS Viewer window which displays the output from any analyses that have been run and any error messages. Information from the Output Viewer is saved in a file with the extension .spo. Let’s open an output file and look at it. On the File menu, click Open and select Output. Click Ok. The following will appear. The left hand side is an outline of all of the output in the file. The right side is the actual output. To shrink or enlarge either side put your cursor on the line that divides them. When the double headed arrow appears, hold the left mouse button and move the line in either direction. Release the button and the size will be adjusted. 60 CU IDOL SELF LEARNING MATERIAL (SLM)

Figure: SPSS Menus and Icons Now, let’s review the menus and icons. Review the options listed under each menu on the Menu Bar by clicking them one at a time. Follow along with the below descriptions. File includes all of the options you typically use in other programs, such as open, save, exit. Notice, that you can open or create new files of multiple types as illustrated to the right. Edit includes the typical cut, copy, and paste commands, and allows you to specify various options for displaying data and output. 61 CU IDOL SELF LEARNING MATERIAL (SLM)

Click on Options, and you will see the dialog box to the left. You can use this to format the data, output, charts, etc. These choices are rather overwhelming, and you can simply take the default options for now. The author of your text (me) was too dumb to even know these options could easily be set. View allows you to select which toolbars you want to show, select font size, add or remove the gridlines that separate each piece of data, and to select whether or not to display your raw data or the data labels. Data allows you to select several options ranging from displaying data that is sorted by a specific variable to selecting certain cases for subsequent analyses. Transform includes several options to change current variables. For example, you can change continuous variables to categorical variables, change scores into rank scores, add a constant to variables, etc. Analyse includes all of the commands to carry out statistical analyses and to calculate descriptive statistics. Much of this book will focus on using commands located in this menu. Graphs includes the commands to create various types of graphs including box plots, histograms, line graphs, and bar charts. Click on Options, and you will see the dialog box to the left. You can use this to format the data, output, charts, etc. These choices are rather overwhelming, and you can simply take the default options for now. The author of your text (me) was too dumb to even know these options could easily be set. File includes all of the options you typically use in other programs, such as open, save, exit. Notice, that you can open or create new files of multiple types Utilities allow you to list file information which is a list of all variables, there labels, values, locations in the data file, and type. 62 CU IDOL SELF LEARNING MATERIAL (SLM)

Add-ons are programs that can be added to the base SPSS package. You probably do not have access to any of those. Window can be used to select which window you want to view (i.e., Data Editor, Output Viewer, or Syntax). Since we have a data file and an output file open, let’s try this. Select Window/Data Editor. Then select Window/SPSS Viewer. Help has many useful options including a link to the SPSS homepage, a statistics coach, and a syntax guide. Using topics, you can use the index option to type in any key word and get a list of options, or you can view the categories and subcategories available under contents. This is an excellent tool and can be used to troubleshoot most problems. The Icons directly under the Menu bar provide shortcuts to many common commands that are available in specific menus. Take a moment to review these as well. Place your cursor over the Icons for a few seconds, and a description of the underlying command will appear. For example, this icon is the shortcut for Save. Review the others yourself. Exiting SPSS To close SPSS, you can either left click on the close button located on the upper right hand corner of the screen or select Exit from the File menu. 4.6 ENTERING DATA IN SPSS A dialog box like the one below will appear for every open window asking you if you want to save it before exiting. You almost always want to save data files. Output files may be large, so you should ask yourself if you need to save them or if you simply want to print them. Click No for each dialog box since we do not have any new files or changed files to save. The Data Editor window displays the contents of the working dataset. It is arranged in a spreadsheet format that contains variables in columns and cases in rows. There are two sheets in the window. The Data View is the sheet that is visible when you first open the Data Editor; this sheet contains the data. You can access the second sheet by clicking on the tab labelled Variable View. While the second sheet is similar in appearance to the first, it does not 63 CU IDOL SELF LEARNING MATERIAL (SLM)

actually contain data. Instead, this second sheet contains information about the variables in the dataset. Beginning with version 14, you can have multiple datasets open at one time in the Data Editor (however, this can be confusing, and we recommend keeping only one dataset open at a time while you are first getting familiar with the program.) Datasets that are currently open are called working datasets; all data manipulations, statistical functions, and other SPSS procedures operate on these datasets. Data can be directly entered in SPSS, or a file containing data can be opened in the Data Editor. From the menu in the Data Editor window, choose the following menu options: File Open Data... The Open File dialog box should automatically open to the SPSS directory of example files. Choose Employee data.sav from the list and click Open. Your Data Editor should now look like this: If the file you want to open is not an SPSS data file, you can often use the Open menu item to import that file directly into the Data Editor. If a data file is not in a format that SPSS recognizes, then try using the software package in which the file was originally created to translate it into a format that can be imported into SPSS (e.g., Excel). The Syntax Editor Another important window in the SPSS environment is the Syntax Editor. In earlier versions of SPSS, SPSS on how to process your data. More recent versions contain pull-down menus with dialog boxes that allow you to submit commands to SPSS without writing syntax. This SPSS for Windows tutorial focuses on the use of dialog boxes to execute procedures; however, there are at least two reasons why you should be aware of SPSS syntax, even if you plan to primarily use the dialog boxes. First, not all procedures are available through the dialog boxes. Therefore, you may occasionally have to submit commands from the Syntax Editor. Second, the Syntax Editor is a useful way to save a log of what you have done, and to re-run what you 64 CU IDOL SELF LEARNING MATERIAL (SLM)

have done at a later date. The dialog boxes available through the pull-down menus have a button labelled Paste, which will print the syntax for the procedure you are running in the dialog box environment to the Syntax Editor. Thus, you can easily generate SPSS syntax without typing in the Syntax Editor. This process is illustrated below. The following dialog box is used to generate descriptive statistics. (You can get this dialog box by choosing Analyse, then Descriptive Statistics, then Descriptive, then clicking over the two variables using the arrow button.) By clicking on the Paste button, the procedure that the above dialog box is prepared to run will be written in the form of SPSS syntax to the Syntax Editor. Thus, clicking the Paste button in the above example would produce the following syntax: Descriptive Variables=Salbegin Salary /Statistics=Mean Stddev Min Max. This syntax will produce exactly the same output as would be generated by clicking the OK button in the above dialog box. The syntax that is printed to the Syntax Editor can then be saved and run at a later time, as long as the same dataset (or at least a dataset containing the variables with the same names) is active in the Data Editor window. Saving syntax is useful if you think you may want to rerun your analysis after you add more data, or if you want to run the same analysis on another dataset that contains the same variables. The Output Viewer When you execute a command for a statistical analysis, regardless of whether you used syntax or dialog boxes, the output will be printed in the Output Viewer. An example of the output viewer is shown below: 65 CU IDOL SELF LEARNING MATERIAL (SLM)

The left frame of the Output Viewer contains an outline of the objects contained in the window. For example, the icon labelled Log represents the command syntax shown at the top of the figure. Everything under Descriptive in the outline refers to objects associated with the descriptive statistics. The Title object refers to the bold title Descriptive in the output. The Active Dataset object refers to the line in the output that designates which dataset was used to run the analysis. The highlighted icon labelled Descriptive Statistics refers to the table containing descriptive statistics. The Notes icon has no referent in the above example, but it would refer to any notes that appeared between the title and the table. This outline can be useful for navigating in your Output Viewer when you have large amounts of output. By clicking on an icon, you can move to the location of the output represented by that icon in the Output Viewer. You can also copy, paste, or delete objects by first highlighting them in the outline and then performing the operation you want. You can control what is displayed in your output by using the Options menu item on the Edit menu: Edit Options... Selecting this option will produce the following dialog box: 66 CU IDOL SELF LEARNING MATERIAL (SLM)

This figure shows the Options dialog box with the Draft Viewer tab selected, to choose which options you want to appear in the Output Viewer. Most commands are selected by default. Here, the Display commands in log option, normally unselected, was selected so that the command syntax will be written to the log in the Output Viewer. This can be useful for keeping track of which procedures you have executed. Importing Data from Excel Files Data can be imported into SPSS from Microsoft Excel and several other applications with relative ease. This document describes a method for importing an Excel spreadsheet into SPSS. If you are working with a spreadsheet in another software package, you may want to save your data as an Excel file, then import it into SPSS. If you have a spreadsheet that is arranged in a database format (e.g., you have several tables in your Workbook that are related through identification fields), there is another method for importing Excel file that you might consider that will merge tables within your database as part of the import procedure. In order to easily import Excel data into SPSS, make sure your Excel spreadsheet is formatted as follows: 67 CU IDOL SELF LEARNING MATERIAL (SLM)

(1) The spread sheet should have a single row of variable names across the top of the file, and each variable name should begin with ordinary letters, rather than with any special characters, (2) The data should begin in the first column and second row of the Excel file, and (3) Any graphs, labels, or extra text that is not part of the dataset should be deleted. To open an Excel file, select the following options from the menu in the Data Editor window in SPSS: File Open Data... First, select the desired location on disk using the Look in option. Next, select Excel from the Files of type drop-down menu. The file you want should now appear in the main box in the Open File dialog box. You can open it by double-clicking on it. You will be presented with one more dialog box: Your Excel spreadsheet should have all variable names on the top row, so leave the Read variable names option checked. Since Excel files can consist of multiple worksheets, the Worksheet drop-down menu allows you to choose which worksheet you wish to open. You may ignore the remaining options and choose OK. You should now see data in the Data Editor window. Check to make sure that all variables and cases were read correctly. Next, save your dataset in SPSS format by choosing the Save option in the File menu. Importing data from ASCII files Data are often stored in an ASCII file format, alternatively known as a text or flat file format. Typically, columns of data in an ASCII file are separated by a space, tab, comma, or some other character. To open these files, choose: File Read Text Data 68 CU IDOL SELF LEARNING MATERIAL (SLM)

The Text Import Wizard will first prompt you to select a file to import. After you have selected a file, you will go through a series of dialog boxes that will provide you with several options for importing data. Once you have imported your data and checked it for accuracy, be sure to save a copy of the dataset in SPSS format by selecting the Save or Save As options from the File menu: File  Save Save As 4.7 SUMMARY Statistics is a powerful statistical software platform. It delivers a robust set of features that lets your organization extract actionable insights from its data. SPSS (Statistical Package for the Social Sciences) has now been in development for more than thirty years. Originally developed as a programming language for conducting statistical analysis, it has grown into a complex and powerful application with now uses both a graphical and a syntactical interface and provides dozens of functions for managing, analysing, and presenting data. Its statistical capabilities alone range from simple percentages to complex analyses of variance, multiple regressions, and general linear models. You can use data ranging from simple integers or binary variables to multiple response or logarithmic variables. SPSS also provides extensive data management functions, along with a complex and powerful programming language. 4.8 KEY WORDS/ ABBREVIATIONS  SPSS- Statistical Package for the Social Sciences (SPSS) 4.9 LEARNING ACTIVITY 1. What are the steps included in importing data in SPSS? ___________________________________________________________________________ ___________________________________________________________________________ 2. Write in detail about the layout and components of SPSS? ___________________________________________________________________________ ___________________________________________________________________________ 69 CU IDOL SELF LEARNING MATERIAL (SLM)

4.10 UNIT END QUESTIONS (MCQS AND DESCRIPTIVE) A. Descriptive Questions 1. What is SPSS? 2. What are the uses of SPSS? 3. How to start SPSS? 4. What are different component in SPSS? 5. How can we enter data in SPSS? B. Multiple Choice Questions (MCQs) 1. Which of the following variables cannot be expressed in quantitative terms? (A) Socio-economic Status (B) Marital Status (C) Numerical Aptitude (D) Professional Attitude 2. SPSS is used for ________________________ (A) Quantitative Research (B) Observation (C) Qualitative Data (D) Experimental Research 3. According to ethical norms, the responsibility of research is on ___________________ (A) Society (B) Researcher (C) Participant (D) University 70 CU IDOL SELF LEARNING MATERIAL (SLM)

4. SPSS is a ___________________ based programme (A) Windows (B) Researcher (C) Participant (D) Qualitative 5. The layout of SPSS does not include _________________ (A) Graph (B) Transform (C) Data (D) Variable Answer 1. c 2 a 3 b 4 a 5 d 4.11 SUGGESTED READINGS  Introduction to Social Research: Quantitative and Qualitative Approaches by Keith F Punch  Research Methodology: Methods and Techniques By C. R. Kothari  Research Methodology By D K Bhattacharyya  Research Methodology: A Step-by-Step Guide for Beginners By Ranjit Kumar  Research Methodology By P. Sam Daniel, Aroma G. Sam 71 CU IDOL SELF LEARNING MATERIAL (SLM)

UNIT 5 SPSS: ANALYSIS AND INTERPRETATION OF RESULTS Structure 5.0. Learning Objectives 5.1. Introduction 5.2. Creating the data definitions: the variable view 5.3. How SPSS Helps in Research & Data Analysis Programs 5.4. Summary 5.5. Key Words/ Abbreviations 5.6. Learning Activity 5.7. Unit End Questions (MCQs and Descriptive Questions) 5.8. Suggested Readings 5.0 LEARNING OBJECTIVES After studying this unit, you will be able to:  Identify the ways to use SPSS for analysing your data.  Discuss practical usage of SPSS software 5.1 INTRODUCTION SPSS (Statistical package for the social sciences) is the set of software programs that are combined together in a single package. The basic application of this program is to analyze scientific data related with the social science. This data can be used for market research, surveys, data mining, etc. With the help of the obtained statistical information, researchers can easily understand the demand for a product in the market, and can change their strategy accordingly. Basically, SPSS first store and organize the provided data, then it compiles the data set to produce suitable output. SPSS is designed in such a way that it can handle a large set of variable data formats. 5.2 CREATING THE DATA DEFINITIONS: THE VARIABLE VIEW It’s impossible to talk about SPSS (or any analysis program) without talking about data and types of data. So here goes. Each particular type of information (such as income or gender or temperature or dosage) is called a variable. You can have various types of variables such as numeric variables (any number that you can use in a calculation), string variables (text or 72 CU IDOL SELF LEARNING MATERIAL (SLM)

numbers that you can’t use in calculations), currency (numbers with two and only two decimal places) and variables with specific formats. Variable types SPSS uses (and insists upon) what are called strongly typed variables. Strongly typed means that you must define your variables according to the type of data they will contain. You can use any of the following types, as defined by the SPSS Help file. Numeric A variable whose values are numbers. Values are displayed in standard numeric format. The Data Editor accepts numeric values in standard format or in scientific notation. • Comma A numeric variable whose values are displayed with commas delimiting every three places, and with the period as a decimal delimiter. The Data Editor accepts numeric values for comma variables with or without commas; or in scientific notation. Dot A numeric variable whose values are displayed with periods delimiting every three places, and with the comma as a decimal delimiter. The Data Editor accepts numeric values for dot variables with or without dots; or in scientific notation. (Sometimes known as European notation.) Scientific notation A numeric variable whose values are displayed with an embedded E and a signed power-of- ten exponent. The Data Editor accepts numeric values for such variables with or without an exponent. The exponent can be preceded either by E or D with an optional sign, or by the sign alone--for example, 123, 1.23E2, 1.23D2, 1.23E+2, and even 1.23+2. Date A numeric variable whose values are displayed in one of several calendar date or clock-time formats. Select a format from the list. You can enter dates with slashes, hyphens, periods, commas, or blank spaces as delimiters. The century range for 2-digit year values is determined by your Options settings (from the Edit menu, choose Options and click the Data tab). Custom currency 73 CU IDOL SELF LEARNING MATERIAL (SLM)

A numeric variable whose values are displayed in one of the custom currency formats that you have defined in the Currency tab of the Options dialog box. Defined custom currency characters cannot be used in data entry but are displayed in the Data Editor. String Values of a string variable are not numeric, and hence not used in calculations. They can contain any characters up to the defined length. Uppercase and lowercase letters are considered distinct. Also known as an alphanumeric variable. 5.3 HOW SPSS HELPS IN RESEARCH & DATA ANALYSIS PROGRAMS SPSS is revolutionary software mainly used by research scientists which help them process critical data in simple steps. Working on data is a complex and time consuming process, but this software can easily handle and operate information with the help of some techniques. These techniques are used to analyze, transform, and produce a characteristic pattern between different data variables. In addition to it, the output can be obtained through graphical representation so that a user can easily understand the result. Read below to understand the factors that are responsible in the process of data handling and its execution. 1. Data Transformation: This technique is used to convert the format of the data. After changing the data type, it integrates same type of data in one place and it becomes easy to manage it. You can insert the different kind of data into SPSS and it will change its structure as per the system specification and requirement. It means that even if you change the operating system, SPSS can still work on old data. 2. Regression Analysis: It is used to understand the relation between dependent and interdependent variables that are stored in a data file. It also explains how a change in the value of an interdependent variable can affect the dependent data. The primary need of regression analysis is to understand the type of relationship between different variables. 3. ANOVA (Analysis of variance): It is a statistical approach to compare events, groups or processes, and find out the difference between them. It can help you understand which method is more suitable for executing a task. By looking at the result, you can find the feasibility and effectiveness of the particular method. 4. MANOVA (Multivariate analysis of variance): This method is used to compare data of random variables whose value is unknown. MANOVA technique can also be used to analyze different types of population and what factors can affect their choices. 74 CU IDOL SELF LEARNING MATERIAL (SLM)

5. T-tests: It is used to understand the difference between two sample types, and researchers apply this method to find out the difference in the interest of two kinds of groups. This test can also understand if the produced output is meaningless or useful. 5.4 SUMMARY Analysing numeric information produces results from data. Interpreting data through analysis is key to communicating results to stakeholders. The type of analysis depends on the research design, the types of variables, and the distribution of the data. Quantitative Data Analysis is a systematic approach to investigations during which numerical data is collected and/or the researcher transforms what is collected or observed into numerical data. It often describes a situation or event, answering the 'what' and 'how many' questions you may have about something. Descriptive analysis provides information on the basic qualities of data and includes descriptive statistics such as range, minimum, maximum, and frequency. It also includes measures of central tendency such as mean, median, mode, and standard deviation. There are many ways to describe data, and descriptive analysis can describe what the data look like. Below are some common ways to describe data. Using the set of scores below, the following are examples of descriptive statistics. 5.5 KEY WORDS/ ABBREVIATIONS  ANOVA- Analysis of variance  MANOVA- Multivariate analysis of variance  SPSS- Statistical Package for the Social Sciences 5.6 LEARNING ACTIVITY 1. Make a list of functions possible in SPSS for analysis of data (both descriptive and inferential). ___________________________________________________________________________ ___________________________________________________________________________ 2. What are the different variables in SPSS? Give examples of each ___________________________________________________________________________ ___________________________________________________________________________ 75 CU IDOL SELF LEARNING MATERIAL (SLM)

5.7 UNIT END QUESTIONS (MCQS AND DESCRIPTIVE) 76 A. Descriptive Question 1. Name 5 types of variable types in SPSS. 2. How does SPSS help in data analysis? 3. What is t test? 4. What is ANOVA? 5. What is MANOVA? B. Multiple Choice Questions (MCQs) 1. __________________ is not a variable in SPSS (A) Numeric (B) Age (C) Date (D) Comma 2._______________________ stands for Multivariate analysis of variance (A) MAV (B) MANOV (C) MAOV (D) MANOVA 3. A variable whose values are numbers is called ______________ (A) Numeric (B) Age (C) Date (D) Comma CU IDOL SELF LEARNING MATERIAL (SLM)

4. A numeric variable whose values are displayed with commas delimiting every three places, and with the period as a decimal delimiter is called (A) Numeric (B) Age (C) Date (D) Comma 5. A numeric variable whose values are displayed with periods delimiting every three places, and with the comma as a decimal delimiter is called ______________ (A) Numeric (B) Age (C) Dot (D) Comma Answer 1b 2d 3a 4d 5c 5.8 SUGGESTED READINGS  Introduction to Social Research: Quantitative and Qualitative Approaches by Keith F Punch  Research Methodology: Methods and Techniques By C. R. Kothari  Research Methodology By D K Bhattacharyya  Research Methodology: A Step-by-Step Guide for Beginners By Ranjit Kumar  Research Methodology By P. Sam Daniel, Aroma G. Sam 77 CU IDOL SELF LEARNING MATERIAL (SLM)

UNIT 6 DESCRIPTIVE STATISTICS Structure 6.0. Learning Objectives 6.1. Introduction 6.2. Levels of Measurement 6.3. Measures of Central Data 6.4. Summary 6.5. Key Words/ Abbreviations 6.6. Learning Activity 6.7. Unit End Questions (MCQs and Descriptive) 6.8. Suggested Readings 6.0 LEARNING OBJECTIVES After studying this unit, you will be able to:  Explain the Levels of Measurement  Outline the presentation of Descriptive Data  Describe Measures of Central Data 6.1 INTRODUCTION Data are systematic record of values taken by a variable or a number of variables on a particular point of time or over different points of time. Data collected on a single point of time over different sections (may be classified on demographic, geographic or other considerations) are called cross-section data. Whereas data collected over a period of time are called time series data. Data may be quantitative or qualitative in nature. For example, heights of 50students of Delhi University are quantitative whereas religion of theirs is qualitative in nature. Data of quantitative nature are technically called variables whereas data of qualitative nature are called attributes. Again, variables may be discrete as well as continuous. If a variable can take any value within its range, then it is called a continuous variable otherwise it is called a discrete variable. Heights of students of Delhi University are a continuous variable whereas number of students under different Universities of India is discrete variable. Descriptive statistics are numbers that are used to summarize and describe data. The word “data” refers to the information that has been collected from an experiment, a survey, an historical record, etc. 78 CU IDOL SELF LEARNING MATERIAL (SLM)

If we are analysing birth certificates, for example, a descriptive statistic might be the percentage of certificates issued in New York State, or the average age of the mother. Any other number we choose to compute also counts as a descriptive statistic for the data from which the statistic is computed. Several descriptive statistics are often used at one time to give a full picture of the data. Descriptive statistics are just descriptive. They do not involve generalizing beyond the data at hand. Generalizing from our data to another set of cases is the business of inferential statistics, which you'll be studying in another section. Here we focus on (mere) descriptive statistics. • Discrete data are whole numbers, and are usually a count of objects. (For instance, one study might count how many pets different families own; it wouldn’t make sense to have half a goldfish, would it?) • Measured data, in contrast to discrete data, are continuous, and thus may take on any real value. (For example, the amount of time a group of children spent watching TV would be measured data, since they could watch any number of hours, even though their watching habits will probably be some multiple of 30 minutes.) • Numerical data are numbers. • Categorical data have labels (i.e. words). (For example, a list of the products bought by different families at a grocery store would be categorical data. 6.2 LEVELS OF MEASUREMENT Data can be classified according to levels of measurement. The level of measurement determines how data should be summarized and presented. It also will indicate the type of statistical analysis that can be performed. There are four levels of measurement: nominal, ordinal, interval, and ratio. The lowest, or the most primitive, measurement is the nominal level. The highest is the ratio level of measurement. Nominal Scale of Measurement Categorical measurements (variables), also called nominal measurements, reflect qualitative differences rather than quantitative ones. Common examples include categories such as yes/no, pass/fail, male/female, etc. A key feature of categorical measurements is that there is no necessary sense in which one category has more or less of a particular quality: they are simply different. When setting up a categorical measurement system the only requirements are those of mutual exclusivity and exhaustiveness. Mutual exclusivity means that each observation (person, case, score) cannot fall into more than one category; one cannot, for example, both pass and 79 CU IDOL SELF LEARNING MATERIAL (SLM)

fail a test at the same time. Exhaustiveness simply means that your category system should have enough categories for all the observations. For biological sex there should be no observations (in this case people) who are neither male nor female. To process the data for a variable measured at the nominal level, we often numerically code the labels or names. For example, if we are interested in measuring the home state for students at Chandigarh University, we would assign a student’s home state of Punjab a code of 1, Delhi a code of 2, Rajasthan a 3, and so on. We need to realize that the number assigned to each state is still a label or name. The reason we assign numerical codes is to facilitate counting the number of students from each state with statistical software. Note that assigning numbers to the states does not give us license to manipulate the codes as numerical information. Ordinal Scale of Measurement The next higher level of measurement is the ordinal level. For this level of measurement a qualitative variable or attribute is either ranked or rated on a relative scale. As before, the assumptions of mutual exclusivity and exhaustiveness apply and cases are still assigned to categories. The big difference is that now the categories themselves can be rank- ordered with reference to some external criterion such that being in one category can be regarded as having more or less of some underlying quality than being in another category. Most psychological test scores should strictly be regarded as ordinal measures. For instance, one of the subscales of the well-known Eysenck Personality Questionnaire (Eysenck & Eysenck, 1975) is designed to measure extroversion. As this measure, and many like it, infer levels of extroversion from responses to items about behavioural propensities, it does not measure extroversion in any direct sense. Since many mental constructs within psychology cannot be observed directly, most measures tend to be ordinal. Attitudes, intentions, opinions, personality characteristics, psychological well-being, depression, etc. are all constructs which are thought to vary in degree between individuals but tend only to allow indirect ordinal measurements. Interval Scale of Measurement Like an ordinal scale, the numbers associated with interval measurement reflect more or less of some underlying dimension. The key distinction is that with interval level measures, numerically equal distances on the scale reflect equal differences in the underlying dimension. For example, the 2°C difference in temperature between 38°C and 40°C is the same as the 2°C difference between 5°C and 7°C. Many behavioural researchers are prepared to assume that scores on psychological tests can be treated as interval level measures so that they can carry out more sophisticated analyses on 80 CU IDOL SELF LEARNING MATERIAL (SLM)

their data. A well-known example of this practice is the use of IQ test scores. In order to treat scores as interval level measures, the assumption is made that the 5-point difference in IQ between someone who scores 75 and someone who gets 80 means the same difference in intelligence as the difference between someone who score 155 and someone who scores 160. Ratio Scale of Measurement Ratio scale measurement differs from interval measurement only in that it implies the existence of a potential absolute zero value. Good examples of ratio scales are length, time and number of correct answers on a test. It is possible to have zero (no) length, for something to take no time, or for someone to get no answers correct on a test. An important corollary of having an absolute zero is that, for example, someone who gets four questions right has got twice as many questions right as someone who got only two right. The ratio of scores to one another now carries some sensible meaning which was not the case for the interval scale. The difference between interval and ratio scales is best explained with an example. Suppose we measure reaction times to dangers in a driving simulator. This could be measured in seconds and would be a ratio scale measurement, as 0 seconds is a possible (if a little unlikely) score and someone who takes 2 seconds is taking twice as long to react as someone who takes 1 second. If, on average, people take 800 milliseconds (0.8 seconds) to react we could just look at the difference between the observed reaction time and this average level of performance. In this case the level of measurement is only on an interval scale. Our first person scores 1200 ms (i.e. takes 2 seconds, 1200 ms longer than the average of 800 ms) and the second person scores 200 ms (i.e. takes 1 second, 200 ms more than the average). However, the first person did not take 6 times as long (1200 ms divided by 200 ms) as the second. They did take 1000 ms longer, so the interval remains meaningful but the ratio element does not. True psychological ratio scale measures are quite rare, though there is often confusion about this when it comes to taking scores from scales made up of individual problem items in tests. We might, for instance, measure the number of simple arithmetic problems that people can get right. We test people on 50 items and simply count the number correct. The number correct is a ratio scale measure since four right is twice as many as two right, and it is possible to get none right at all (absolute zero). As long as we consider our measure to be only an indication of the number correct there is no problem and we can treat them as ratio scale measures. 6.3 MEASURES OF CENTRAL TENDENCY Central tendency is a descriptive summary of a dataset through a single value that reflects the centre of the data distribution. Along with the variability (dispersion) of a dataset, central tendency is a branch of descriptive statistics. 81 CU IDOL SELF LEARNING MATERIAL (SLM)

The central tendency is one of the most quintessential concepts in statistics. Although it does not provide information regarding the individual values in the dataset, it delivers a comprehensive summary of the whole dataset. A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. As such, measures of central tendency are sometimes called measures of central location. They are also classed as summary statistics. The mean (often called the average) is most likely the measure of central tendency that you are most familiar with, but there are others, such as the median and the mode. The mean, median and mode are all valid measures of central tendency, but under different conditions, some measures of central tendency become more appropriate to use than others. In the following sections, we will look at the mean, mode and median, and learn how to calculate them and under what conditions they are most appropriate to be used. MEAN The arithmetic mean is the most common measure of central tendency. It is simply the sum of the numbers divided by the number of numbers. The symbol “μ” is used for the mean of a population. The symbol “M” is used for the mean of a sample. The formula for μ is shown below: Mean = ∑X / N where ΣX is the sum of all the numbers in the population and N is the number of numbers in the population. Example: Find the mean of the following values, (5,10,15,20,30) Solution: In this case, Mean of the above data = 5+10+15+20+30/5 = 80/5 = 16 Sometimes, the question includes frequency of the values. In that case, the formula changes to Mean = ∑FiXi / ∑Fi, Where, Fi = frequency of the ith value of the distribution, Xi = ith value of the distribution Example: For the given distribution, find the Mean Xi Fi 82 CU IDOL SELF LEARNING MATERIAL (SLM)

13 25 38 4 4 ∑Fi= 20 Solution: Mean = (1×3+2×5+3×8+4×4)/20 = 2.65 Although the arithmetic mean is not the only “mean” (there is also a geometric mean), it is by far the most commonly used. Therefore, if the term “mean” is used without specifying whether it is the arithmetic mean, the geometric mean, or some other mean, it is assumed to refer to the arithmetic mean. Uses of Mean: There are certain general rules for using mean. Some of these uses are as following: 1. Mean is the centre of gravity in the distribution and each score contributes to the determination of it when the spread of the scores are symmetrically around a central point. 2. Mean is more stable than the median and mode. So that when the measure of central tendency having the greatest stability is wanted mean is used. 3. Mean is used to calculate other statistics like S.D., coefficient of correlation, ANOVA, ANCOVA etc. Advantages of Mean: 1. Mean is rigidly defined so that there is no question of misunderstanding about its meaning and nature. 2. It is the most popular central tendency as it is easy to understand. 3. It is easy to calculate. 4. It includes all the scores of a distribution. 5. It is not affected by sampling so that the result is reliable. 6. Mean is capable of further algebraic treatment so that different other statistics like dispersion, correlation, skew-ness requires mean for calculation. Disadvantages of Mean: 83 CU IDOL SELF LEARNING MATERIAL (SLM)

1. Mean is affected by extreme scores. 2. Sometimes mean is a value which is not present in the series. 3. Sometimes it gives absurd values. For example there are 41, 44 and 42 students in class VIII, IX and X of a school. So the average students per class are 42.33. It is never possible. 4. In case of open ended class intervals, it cannot be calculated without assuming the size of the open end classes. MEDIAN The median is also a frequently used measure of central tendency. The median is the midpoint of a distribution: the same number of scores is above the median as below it. The median is not affected by the magnitude of the extreme (smallest or largest) values. Thus, it is useful because it is not affected by one or two abnormally small or large values, and because it is very simple to calculate. When there is an odd number of numbers, the median is simply the middle number. Example: the median of 2, 4, and 7 is 4. When there is an even number of numbers, the median is the mean of the two middle numbers. Thus, the median of the numbers 2, 4, 7, 12 is: (4+7) / 2 =5.5 When there are numbers with the same values, then the formula for the third definition of the 50th percentile should be used. Uses of Median: 1. Median is used when the exact midpoint of the distribution is needed or the 50% point is wanted. 2. When extreme scores affect the mean at that time median is the best measure of central tendency. 3. Median is used when it is required that certain scores should affect the central tendency, but all that is known about them is that they are above or below the median. 4. Median is used when the classes are open ended or it is of un equal cell size. 84 CU IDOL SELF LEARNING MATERIAL (SLM)

Advantages of Median: 1. It is easy to compute and understand. 2. All the observations are not required for its computation. 3. Extreme scores does not affect the median. 4. It can be determined from open ended series. 5. It can be determined from un-equal class intervals. Disadvantages of Median: 1. It is not rigidly defined like mean because its value cannot be computed but located. 2. It does not include all the observations. 3. It cannot be further treated algebraically like mean. 4. It requires arrangement of the scores or class intervals in ascending or descending order. 5. Sometimes it produces a value which is not found in the series. MODE The mode is useful when the members of a set are very different - take, for example, the statement “there were more Ds on that test than any other letter grade” (that is, in the set {A, B, C, D, E}, D is the mode). On the other hand, the fact that the mode is absolute (for example, 2.9999 and 3 are considered just as different as 3 and 100 are) can make the mode a poor choice for determine a “centre”. For example, the mode of the set {1, 2.3, 2.3, 5.14, 5.15, 5.16, 5.17, 5.18, 10.2} is 2.3, even though there are many values that are close to, but not exactly equal to, 5.16. Uses of Mode: The mode is used: (i) When we want a quick and approximate measure of central tendency. (ii) When we want a measure of central tendency which should be typical value. For example when we want to know the typical dress style of Indian women i.e. the most popular dress style. Like this the average marks of a class is called modal marks. Advantages of Mode: 1. Mode gives the most representative value of a series. 2. Mode is not affected by any extreme scores like mean. 85 CU IDOL SELF LEARNING MATERIAL (SLM)

3. It can be determined from an open ended class interval. 4. It helps in analysing qualitative data. 5. Mode can also be determined graphically through histogram or frequency polygon. 6. Mode is easy to understand. Disadvantages of Mode: 1. Mode is not defined rigidly like mean. In certain cases it may come out with different results. 2. It does not include all the observations of a distribution but on the concentration of frequencies of the items. 3. Further algebraic treatment cannot be done with mode like mean. 4. In multimodal and bimodal cases it is difficult to determine. 5. Mode cannot be determined from unequal class intervals. 6. There are different methods and different formulae which yield different results of mode and so it is rightly remarked as the most ill-defined average. 6.4 SUMMARY There are two main branches of statistics: descriptive and inferential. Descriptive statistics is used to say something about a set of information that has been collected only. Inferential statistics is used to make predictions or comparisons about a larger group (a population) using information gathered about a small part of that population. Thus, inferential statistics involves generalizing beyond the data, something that descriptive statistics does not do. Quantitative Research is used to quantify the problem by way of generating numerical data or data that can be transformed into usable statistics. It is used to quantify attitudes, opinions, behaviours, and other defined variables – and generalize results from a larger sample population. Quantitative Research uses measurable data to formulate facts and uncover patterns in research. Quantitative data collection methods are much more structured than Qualitative data collection methods. Quantitative data collection methods include various forms of surveys – online surveys, paper surveys, mobile surveys and kiosk surveys, face-to- face interviews, telephone interviews, longitudinal studies, website interceptors, online polls, and systematic observations. Quantitative analysis deals with data in the form of numbers and uses mathematical operations to investigate their properties. The levels of measurement used in the collection of 86 CU IDOL SELF LEARNING MATERIAL (SLM)

the data i.e. nominal, ordinal, interval and ratio, are an important factor in choosing the type of analysis that is applicable, as is the numbers of cases involved. 6.5 KEY WORDS/ ABBREVIATIONS  Frequency Table- Interval Level of Measurement For data recorded at the interval level of measurement, the interval or the distance between values is meaningful. The interval level of measurement is based on a scale with a known unit of measurement.  Histogram- A graphical display of a frequency table in which the unit intervals are mapped on the x-axis and the number of scores in each interval is represented on the y-axis. The purpose of a histogram is to show what proportion of data falls into each interval; all intervals are disjointed, non-overlapping, adjacent categories (also called bins). The width and number of bins will influence the shape and interpretation of a histogram.  Median The median is the middle number of a set of numbers arranged in numerical order. If the number of values in a set is even, then the median is the sum of the two middle values, divided by 2.  Mean The mean is the sum of all the values in a set, divided by the number of values. The mean of a whole population is usually denoted by μ, while the mean of a sample is usually denoted by x.  Mode The mode is the most frequent value in a set. A set can have more than one mode; if it has two, it is said to be bimodal.  Nominal Level of Measurement Data recorded at the nominal level of measurement is represented as labels or names. They have no order. They can only be classified and counted.  Ordinal Level of Measurement Data recorded at the ordinal level of measurement is based on a relative ranking or rating of items based on a defined attribute or qualitative variable. Variables based on this level of measurement are only ranked or counted.  Ratio Level of Measurement Data recorded at the ratio level of measurement are based on a scale with a known unit of measurement and a meaningful interpretation of zero on the scale. 6.6 LEARNING ACTIVITY 1. With the help of appropriate diagrams, show the different ways of data presentation. ___________________________________________________________________________ ___________________________________________________________________________ 2. What are the methods of central tendency? What their advantages and disadvantages? 87 CU IDOL SELF LEARNING MATERIAL (SLM)

___________________________________________________________________________ ___________________________________________________________________________ 3. See the data below and present it in 3 different forms? Test Student 1 Student 2` Student 3 Student 4 Student 5 Locus of 23 30 14 15 14 control Self-control 20 11 09 10 5 Motivation 25 20 21 11 18 ___________________________________________________________________________ ___________________________________________________________________________ 6.7 UNIT END QUESTIONS (MCQS AND DESCRIPTIVE) A. Descriptive Questions 1. What is descriptive data? 2. Explain different ways of presenting descriptive data. 3. List advantages and disadvantages of mean. 4. State advantages and disadvantages of median. 5. Outline advantages and disadvantages of mode. B. Multiple Choice Questions (MCQs) 1. In a week the prices of a bag of rice were 350,280,340,290,320, 310,300. The range is a. 60 b. 70 c. 80 d. 100 2. Mean, median and mode are _________________________ 88 a. Measures of regression b. Measures of variance CU IDOL SELF LEARNING MATERIAL (SLM)

c. Measures of central tendency d. Measures of hypothesis testing 3. _________________________ is not a graphical form of data presentation a. Histogram b. Pie chart c. Line Graph d. Frequency table 4. The median of (3,3,3,5,5,6,5,5,6,10,8,9,10) is ____________ a. 3 b. 6 c. 10 d. 5 5. The mode of (3,3,3,5,5,6,5,5,6,10,8,9,10) is ____________ a. 3 b. 6 c. 10 d. 5 Answer 1. a 2 c 3 c 4 b 5 d 6.8 SUGGESTED READINGS  Data Presentation and Descriptive Statistics by NCERT  Statistics for Economics By Dr D P Jain  Introduction to Social Research: Quantitative and Qualitative Approaches by Keith F Punch  Research Methodology: Methods and Techniques By C. R. Kothari  Research Methodology By D K Bhattacharyya  Research Methodology: A Step-by-Step Guide for Beginners By Ranjit Kumar  Research Methodology By P. Sam Daniel, Aroma G. Sam 89 CU IDOL SELF LEARNING MATERIAL (SLM)

UNIT 7 MEASURES OF VARIABILITY Structure 7.0. Learning Objectives 7.1. Introduction 7.2. Range 7.3. Variance 7.4. Standard Deviation 7.5. Percentiles 7.6. Quartile Deviation 7.7. Skewedness 7.8. Kurtosis 7.9. Summary 7.10. Key Words/ Abbreviations 7.11. Learning Activity 7.12. Unit End Questions (MCQs and Descriptive) 7.13. Suggested Readings 7.0 LEARNING OBJECTIVES After studying this unit, you will be able to:  Explain measures of variability  State meaning of Range, Variance, Standard Deviation, Percentiles, Quartile deviation  Discuss Skewness and Kurtosis 7.1 INTRODUCTION A measure of variability is a summary statistic that represents the amount of dispersion in a dataset. How spread out are the values? While a measure of central tendency describes the typical value, measures of variability define how far away the data points tend to fall from the centre. We talk about variability in the context of a distribution of values. A low dispersion indicates that the data points tend to be clustered tightly around the centre. High dispersion signifies that they tend to fall further away. In statistics, variability, dispersion, and spread are synonyms that denote the width of the distribution. Just as there are multiple measures of central tendency, there are several measures of variability. In this blog post, you’ll learn why understanding the variability of your data is critical. Then, I explore the most common measures of variability—the range, interquartile range, variance, and standard deviation. I’ll help you determine which one is best for your data. 90 CU IDOL SELF LEARNING MATERIAL (SLM)

7.2 RANGE The range is the most obvious measure of dispersion and is the difference between the lowest and highest values in a dataset. The range is simple to compute and is useful when you wish to evaluate the whole of a dataset. The range is useful for showing the spread within a dataset and for comparing the spread between similar datasets. The range is the difference between the largest and smallest values in a set of values. Range is calculated by using the following formula: Range = Highest value – Lowest value Example: consider the following numbers: 1, 3, 4, 5, 5, 6, 7, 11. For this set of numbers, the range would be 11 - 1 or 10. 7.3 VARIANCE In a population, variance is the average squared deviation from the population mean, as defined by the following formula: σ2 = Σ (Xi - μ )2 / N where σ2 is the population variance, μ is the population mean, Xi is the ith element from the population, and N is the number of elements in the population. Observations from a simple random sample can be used to estimate the variance of a population. For this purpose, sample variance is defined by slightly different formula, and uses a slightly different notation: s2 = Σ ( xi - x )2 / ( n - 1 ) Where s2 is the sample variance, x is the sample mean, xi is the ith element from the sample, and n is the number of elements in the sample. Using this formula, the sample variance can be considered an unbiased estimate of the true population variance. Therefore, if you need to estimate an unknown population variance, based on data from a simple random sample, this is the formula to use. 7.4 STANDARD DEVIATION The standard deviation is a measure that summarises the amount by which every value within a dataset varies from the mean. Effectively it indicates how tightly the values in the dataset are bunched around the mean value. It is the most robust and widely used measure of dispersion since, unlike the range and inter-quartile range, it takes into account every variable in the dataset. When the values in a dataset are pretty tightly bunched together the standard 91 CU IDOL SELF LEARNING MATERIAL (SLM)

deviation is small. When the values are spread apart the standard deviation will be relatively large. The standard deviation is usually presented in conjunction with the mean and is measured in the same units. In many datasets the values deviate from the mean value due to chance and such datasets are said to display a normal distribution. In a dataset with a normal distribution most of the values are clustered around the mean while relatively few values tend to be extremely high or extremely low. Many natural phenomena display a normal distribution. The standard deviation is the square root of the variance. Thus, the standard deviation of a population is: σ = sqrt [ σ2 ] = sqrt [ Σ ( Xi - μ )2 / N ] Where σ is the population standard deviation, μ is the population mean, Xi is the ith element from the population, and N is the number of elements in the population. Statisticians often use simple random samples to estimate the standard deviation of a population, based on sample data. Given a simple random sample, the best estimate of the standard deviation of a population is: s = sqrt [ s2 ] = sqrt [ Σ ( xi - x )2 / ( n - 1 ) ] Where s is the sample standard deviation, x is the sample mean, xi is the ith element from the sample, and n is the number of elements in the sample. Let us calculate standard deviation for a simple sequence – {1,3,3,3,5} We will divide the entire process into five steps: Step 1: Calculate the mean value Mean = 1+3+3+3+5/5 = 3 Step 2: take the difference of all the terms from the mean value (x – x̅ ) 1-3 = -2; 3-3 = 0; 3-3 = 0; 3-3=0; 5-3 = 2 Step 3: Square all these differences (-2)2 = 4; 02 = 0; 02 = 0; 02 = 0; (2)2= 4 Step 4: Take the average of the squares in step 3 Average = 4+0+0+0+4/5 = 8/5 Step 5: Square root of the average is the Standard Deviation = 8/5 92 CU IDOL SELF LEARNING MATERIAL (SLM)

Effect of Changing Units Sometimes, researchers change units (minutes to hours, feet to meters, etc.). Here is how measures of variability are affected when we change units.  If you add a constant to every value, the distance between values does not change. As a result, all of the measures of variability (range, interquartile range, standard deviation, and variance) remain the same.  On the other hand, suppose you multiply every value by a constant. This has the effect of multiplying the range, interquartile range (IQR), and standard deviation by that constant. It has an even greater effect on the variance. It multiplies the variance by the square of the constant. 7.5 PERCENTILES Assume that the elements in a data set are rank ordered from the smallest to the largest. The values that divide a rank-ordered set of elements into 100 equal parts are called percentiles. An element having a percentile rank of Pi would have a greater value than i percent of all the elements in the set. Thus, the observation at the 50th percentile would be denoted P50, and it would be greater than 50 percent of the observations in the set. An observation at the 50th percentile would correspond to the median value in the set. Percentiles for the ungrouped data To calculate percentiles (a measure of the relative standing of an observation) for the ungrouped data, adopt the following procedure 1. Order the observation 2. For the mth percentile, determine the product m.n100m.n100. If m.n100m.n100 is not an integer, round it up and find the corresponding ordered value and if m.n100m.n100 is an integer, say k, then calculate the mean of the Kth and (k+1)th ordered observations. Example: For the following height data collected from students find the 10th and 95th percentiles. 91, 89, 88, 87, 89, 91, 87, 92, 90, 98, 95, 97, 96, 100, 101, 96, 98, 99, 98, 100, 102, 99, 101, 105, 103, 107, 105, 106, 107, 112. Solution: The ordered observations of the data are 87, 87, 88, 89, 89, 90, 91, 91, 92, 95, 96, 96, 97, 98, 98, 98, 99, 99, 100, 100, 101, 101, 102, 103, 105, 105, 106, 107, 107, 112. P10=10×30100=3P10=10×30100=3 So, the 10th percentile i.e. P10 is 3rd observation in sorted data is 88, means that 10 percent of the observations in data set are less than 88. 93 CU IDOL SELF LEARNING MATERIAL (SLM)

P95=95×30100=28.5P95=95×30100=28.5 29th observation is our 95th percentile i.e. P95=107. Percentiles for the Grouped data The mth percentile (a measure of the relative standing of an observation) for grouped data is Pm=l+hf(m.n100−c)Pm=l+hf(m.n100−c) Like median, m.n100m.n100 is used to locate the mth percentile group. l is the lower-class boundary of the class containing the mth percentile h is the width of the class containing Pm f is the frequency of the class containing n is the total number of frequencies Pm c is the cumulative frequency of the class immediately preceding to the class containing Pm Note that 50th percentile is the median by definition as half of the values in the data are smaller than the median and half of the values are larger than the median. Similarly, 25th and 75th percentiles are the lower (Q1) and upper quartiles (Q3) respectively. The quartiles, deciles, and percentiles are also called quantiles or fractiles. Measure of relative standing of an observation in Grouped Data Example: For the following grouped data compute P10, P25, P50, and P95 given below. Solution: 1. Locate the 10th percentile (lower deciles i.e., D1) by 10×n100=10×3o100=310×n100=10×3o100=3 observation. so, P10 group is 85.5–90.5 containing the 3rd observation 94 CU IDOL SELF LEARNING MATERIAL (SLM)

P10=l+hf(10n100−c) =85.5+56(3−0) =85.5+2.5=88P10=l+hf(10n100−c) =85.5+56(3−0) =85.5+2.5=88 2. Locate the 25th percentile (lower quartiles i.e., Q1) by 10×n100=25×3o100=7.510×n100=25×3o100=7.5 observation. so, P25 group is 90.5–95.5 containing the 7.5th observation P25=l+hf(25n100−c) =90.5+54(7.5−6) =90.5+1.875=92.375P25=l+hf(25n100−c) =90.5+54(7.5−6) =90.5+1.875=92.375 3. Locate the 50th percentile (Median i.e. 2nd quartiles, 5th deciles) by 50×n100=50×3o100=1550×n100=50×3o100=15 observation. so, P50 group is 95.5–100.5 containing the 15th observation P50=l+hf(50n100−c) =95.5+510(15−10) =95.5+2.5=98P50=l+hf(50n100−c) =95.5+510(15−10) =95.5+2.5=98 4. Locate the 95th percentile by 95×n100=95×3o100=28.595×n100=95×3o100=28.5th observation. so, P95 group is 105.5–110.5 containing the 3rd observation P95=l+hf(95n100−c) =105.5+53(28.5−26) =105.5+4.1667=109.6667 7.6 QUARTILE DEVIATION Quartiles divide a rank-ordered data set into four equal parts. The values that divide each part are called the first, second, and third quartiles; and they are denoted by Q1, Q2, and Q3, respectively. The chart below shows a set of four numbers divided into quartiles. Note the relationship between quartiles and percentiles. Q1 corresponds to P25, Q2 corresponds to P50, Q3 corresponds to P75. Q2 is the median value in the set. Skewness is asymmetry in a statistical distribution, in which the curve appears distorted or skewed either to the left or to the right. Skewness can be quantified to define the extent to which a distribution differs from a normal distribution. Consider a data set of following numbers: 22, 12, 14, 7, 18, 16, 11, 15, 12. You are required to calculate the Quartile Deviation. 95 CU IDOL SELF LEARNING MATERIAL (SLM)

Solution: First, we need to arrange data in ascending order to find Q3 and Q1 and avoid any duplicates. 7, 11, 12, 13, 14, 15, 16, 18, 22 Calculation of Q1 can be done as follows, Q1 = ¼ (9 + 1) =¼ (10) Q1=2.5 Term Calculation of Q3 can be done as follows, Q3=¾ (9 + 1) =¾ (10) Q3= 7.5 Term Calculation of quartile deviation can be done as follows,  Q1 is an average of 2nd, which is11 and adds the difference between 3rd & 4th and 0.5, which is (12-11)*0.5 = 11.50.  Q3 is the 7th term and product of 0.5, and the difference between the 8th and 7th term, which is (18-16) *0.5, and the result is 16 + 1 = 17. Q.D. = Q3 – Q1 / 2 Using the quartile deviation formula, we have (17-11.50) / 2 =5.5/2 Q.D.=2.75. 7.7 SKEWNESS Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point. The skewness for a normal distribution is zero, and any symmetric data should have a skewness near zero. Negative values for the skewness indicate data that are skewed left and positive values for the skewness indicate data that are skewed right. By skewed left, we mean that the left tail is long relative to the right tail. Similarly, a frequency distribution skewed to 96 CU IDOL SELF LEARNING MATERIAL (SLM)

the right means that the right tail is longer than the left tail. If the data are multi-modal, then this may affect the sign of the skewness. In a normal distribution, the graph appears as a classical, symmetrical \"bell-shaped curve.\" The mean, or average, and the mode, or maximum point on the curve, are equal.  In a perfect normal distribution (green solid curve in the illustration below), the tails on either side of the curve are exact mirror images of each other.  When a distribution is skewed to the left (red dashed curve), the tail on the curve's left-hand side is longer than the tail on the right-hand side, and the mean is less than the mode. This situation is also called negative skewness.  When a distribution is skewed to the right (blue dotted curve), the tail on the curve's right-hand side is longer than the tail on the left-hand side, and the mean is greater than the mode. This situation is also called positive skewness. Positive Skewness means when the tail on the right side of the distribution is longer or fatter. The mean and median will be greater than the mode. Negative Skewness is when the tail of the left side of the distribution is longer or fatter than the tail on the right side. The mean and median will be less than the mode. So, when is the skewness too much? The rule of thumb seems to be:  If the skewness is between -0.5 and 0.5, the data are fairly symmetrical.  If the skewness is between -1 and -0.5(negatively skewed) or between 0.5 and 1(positively skewed), the data are moderately skewed. 97 CU IDOL SELF LEARNING MATERIAL (SLM)

 If the skewness is less than -1(negatively skewed) or greater than 1(positively skewed), the data are highly skewed. Example Let us take a very common example of house prices. Suppose we have house values ranging from $100k to $1,000,000 with the average being $500,000. If the peak of the frequency distribution was towards the left of the average value, it portrays a positive skewness in the distribution. It would mean that many houses were being sold for less than the average value, i.e. $500k. This could be for many reasons, but we are not going to interpret those reasons here. If the peak of the distributed data was right of the average value, that would mean a negative skew. This would mean that the houses were being sold for more than the average value. 7.8 KURTOSIS Kurtosis is all about the tails of the distribution — not the peakedness or flatness. It is used to describe the extreme values in one versus the other tail. It is actually the measure of outliers present in the distribution. Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution. That is, data sets with high kurtosis tend to have heavy tails, or outliers. Data sets with low kurtosis tend to have light tails, or lack of outliers. A uniform distribution would be the extreme case. High kurtosis in a data set is an indicator that data has heavy tails or outliers. If there is a high kurtosis, then, we need to investigate why do we have so many outliers. It indicates a lot of things, maybe wrong data entry or other things. Investigate! Low kurtosis in a data set is an indicator that data has light tails or lack of outliers. If we get low kurtosis (too good to be true), then also we need to investigate and trim the dataset of unwanted results. 98 CU IDOL SELF LEARNING MATERIAL (SLM)

Mesokurtic: This distribution has kurtosis statistic similar to that of the normal distribution. It means that the extreme values of the distribution are similar to that of a normal distribution characteristic. This definition is used so that the standard normal distribution has a kurtosis of three. Leptokurtic (Kurtosis > 3): Distribution is longer, tails are fatter. Peak is higher and sharper than Mesokurtic, which means that data are heavy-tailed or profusion of outliers. Outliers stretch the horizontal axis of the histogram graph, which makes the bulk of the data appear in a narrow (“skinny”) vertical range, thereby giving the “skinniness” of a leptokurtic distribution. Platykurtic: (Kurtosis < 3): Distribution is shorter, tails are thinner than the normal distribution. The peak is lower and broader than Mesokurtic, which means that data are light- tailed or lack of outliers. The reason for this is because the extreme values are less than that of the normal distribution. 7.9 SUMMARY Measures of Variability are statistics that describe the amount of difference and spread in a data set. These measures include variance, standard deviation, and standard error of the mean. If the numbers corresponding to these statistics are high it means that the scores or values in our data set are widely spread out and not tightly centered around the mean. Variability refers to how spread apart the scores of the distribution are or how much the scores vary from each other. There are four major measures of variability, including the range, interquartile range, variance, and standard deviation. The range represents the difference between the highest and lowest score in a distribution. It is rarely used because it considers only the two extreme scores. The interquartile range, on the other hand, measures the difference between the outermost scores in only the middle fifty percent of the scores. In other words, to determine the interquartile range, the score at the 25th percentile is subtracted from the score at the 75th percentile, representing the range of the middle 50 percent of scores. 99 CU IDOL SELF LEARNING MATERIAL (SLM)

The variance is the average of the squared differences of each score from the mean. To calculate the variance, the difference between each score and the mean is squared and then added together. This sum is then divided by the number of scores minus one. When the square root is taken of the variance we call this new statistic the standard deviation. Since the variance represents the squared differences, the standard deviation represents the true differences and is therefore easier to interpret and much more commonly used. Since the standard deviation relies on the mean of the distribution, however, it is also affected by extreme scores in a skewed distribution. 7.10 KEY WORDS/ ABBREVIATIONS  Kurtosis- A measure of the degree to which a probability distribution peaks around its mean. A distribution that is more peaked than a normal curve is called leptokurtic, one that is equally peaked is called mesokurtic, and one that is less peaked than a normal curve is called platykurtic. 7.11 LEARNING ACTIVITY 1. Explain in detail the different measures of variability. ___________________________________________________________________________ ___________________________________________________________________________ 2. With the help of appropriate diagrams, show the different types of Skewedness. ___________________________________________________________________________ ___________________________________________________________________________ 3. With the help of appropriate diagrams, show the different types of kurtosis. ___________________________________________________________________________ ___________________________________________________________________________ 7.12 UNIT END QUESTIONS (MCQS AND DESCRIPTIVE) A. Descriptive Question 1. What is range? How to calculate range? 2. What are percentiles? How to calculate percentiles? 3. What is quartile? How to calculate quartile? 4. What is standard deviation? How to calculate standard deviation? 100 CU IDOL SELF LEARNING MATERIAL (SLM)


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook