5. Write a brief note on Skewedness. 6. Write a short note on Kurtosis. B. Multiple Choice Questions (MCQs) 1. _________________ is is the difference between the lowest and highest values in a dataset. a. Standard Deviation b. Mode c. Range d. None of these 2. _____________________ is a measure of Symmetry a. Standard Deviation b. Skewedness c. Range d. Kurtosis 3. _____________________is used to describe the extreme values in one versus the other tail. a. Standard Deviation b. Skewedness c. Range d. Kurtosis 4. __________________ is a type of Skewness a. Standard Deviation b. Positive c. High d. Low 5. __________________ is a type of Kurtosis 101 CU IDOL SELF LEARNING MATERIAL (SLM)
a. Low 4b 5a b. Positive c. Negative d. Symmetrical Answer 1c 2b 3d 7.13 SUGGESTED READINGS Introduction to Social Research: Quantitative and Qualitative Approaches by Keith F Punch Research Methodology: Methods and Techniques By C. R. Kothari Research Methodology By D K Bhattacharyya Research Methodology: A Step-by-Step Guide for Beginners By Ranjit Kumar Research Methodology By P. Sam Daniel, Aroma G. Sam 102 CU IDOL SELF LEARNING MATERIAL (SLM)
UNIT 8 CORRELATION Structure 8.0. Learning Objectives 8.1. Introduction 8.2. Correlation 8.3. Strong and weak correlation 8.4. Positive and Negative Correlation 8.5. Assumption of Correlation 8.6. Correlation Coefficients: Product Moment Method and Rank Order Method 8.7. Uses of Correlation 8.8. Summary 8.9. Key Words/ Abbreviations 8.10. Learning Activity 8.11. Unit End Questions (MCQs and Descriptive) 8.12. Suggested Readings 8.0 LEARNING OBJECTIVES After studying this unit, you will be able to: Explain concept of correlation Discuss Strong and weak Correlation Describe Positive and Negative Correlation State Correlation Co-efficient Outline uses of Correlation 8.1 INTRODUCTION Are stock prices related to the price of gold? Is unemployment related to inflation? Is the amount of money spent on research and development related to a company’s net worth? Correlation can answer these questions, and there is no statistical technique more useful or more abused than correlation. When the fluctuation of one variable reliably predicts a similar fluctuation in another variable, there’s often a tendency to think that means that the change in one causes the change in the other. However, correlation does not imply causation. There may be an unknown factor that influences both variables similarly. Correlation is a statistical technique that can show whether and how strongly pairs of variables are related. Although this correlation is fairly obvious your data may contain 103 CU IDOL SELF LEARNING MATERIAL (SLM)
unsuspected correlations. You may also suspect there are correlations, but don't know which are the strongest. An intelligent correlation analysis can lead to a greater understanding of your data. 8.2 CORRELATION The correlation is one of the most common and most useful statistics. A correlation is a single number that describes the degree of relationship between two variables. Let’s work through an example to show you how this statistic is computed. Correlation is a statistical method that determines the degree of relationship between two different variables. It is also known as a “bivariate” statistic, with bi- meaning two and variate indicating variable or variance. The two variables are usually a pair of scores for a person or object. 8.3 STRONG AND WEAK CORRELATION The relationship between any two variables is can vary from strong to weak or none. When a relationship is strong, this means that knowing a person's or object’s score on one variable helps to predict their score on the second variable. In other words, if a person has a high score of variable A (compared to all the other peoples’ scores on A, then they are likely to have a high score on variable B (compared to the other peoples’ scores on B). The latter would be considered a strong positive correlation. If the correlation or relationship between variable A and B is a weak one, then knowing a person's score on variable A does not help to predict their score on variable B. One very nice feature of the correlation coefficient is that it can only range from –1.00 to +1.00. Any values outside this range are invalid. Here is a graphic representation of correlation’s range. Note that the correlation coefficient is represented in a sample by the value “r.” When the correlation coefficient approaches r = +1.00 (or greater than r = +.50) it means there is a strong positive relationship or high degree of relationship between the two variables. This also means that the higher the score of a participant on one variable, the higher the score will be on the other variable. Also, if a participant scores very low on one variable then their score will also be low on the other variable. For example, there is a positive correlation between years of education and wealth. Overall, the greater the number of years of education a person has, the greater their wealth. 104 CU IDOL SELF LEARNING MATERIAL (SLM)
A strong correlation between these two variables also means the lower the number of years of education, the lower the wealth of that person. If the correlation was perfect one (r = +1.00), then there would be not a single exception in the entire sample to increasing years of education and increasing wealth. It would mean that there would be a perfect linear relationship between the two variables. However, perfect relationships do not exist between two variables in the real world of statistical sampling. Thus, a strong but not perfect relationship between education and wealth in the real world would mean that the relationship holds for most people in the sample but there are some exceptions. In other words, some highly educated people are not wealthy, and some uneducated people are wealthy. When the correlation coefficient approaches r = -1.00 (or less than r = -.50), it means that there is a strong negative relationship. This means that the higher the score of a person on one variable, the lower the score will be on the other variable. For example, there might be a strong negative relationship between the value of gold and the Dow Jones Industrial Average. In other words, when the value of gold is high, the stock market will be lower and when the stock market is doing well, the value of gold will be lower. A correlation coefficient that is close to r = 0.00 (note that the typical correlation coefficient is reported to two decimal places) means knowing a person's score on one variable tells you nothing about their score on the other variable. For example, there might be a zero correlation between the number of letters in a person's last name and the number of miles they drive per day. If you know the number of letters in a last name, it tells you nothing about how many miles they drive per day. There is no relationship between the two variables; therefore, there is a zero correlation. It is also important to note that there are no hard rules about labelling the size of a correlation coefficient. Statisticians generally do not get excited about a correlation until it is greater than r = 0.30 or less than r = -0.30. The correlational statistical technique usually accompanies correlational designs. In a correlational design, the experimenter typically has little or no control over the variables to be studied. The variables may be statistically analysed long after they were initially produced or measured. Such data is called archival. The experimenter no longer has any experimental power to control the gathering of the data. The data has already been gathered, and the experimenter now has only statistical power in his or her control. Cronbach (1967), an American statistician, stated well the difference between the experimental and correlational techniques, “… the experimentalist [is] an expert puppeteer, able to keep untangled the strands to half-a-dozen independent variables. The correlational psychologist is a mere observer of a play where Nature pulls a thousand strings.” 8.4 POSITIVE AND NEGATIVE CORRELATION To calculate correlation, one must first determine the covariance of the two variables in question. Next, one must calculate each variable's standard deviation. The correlation 105 CU IDOL SELF LEARNING MATERIAL (SLM)
coefficient is determined by dividing the covariance by the product of the two variables' standard deviations. Standard deviation is a measure of the dispersion of data from its average. Covariance is a measure of how two variables change together. However, its magnitude is unbounded, so it is difficult to interpret. By dividing covariance by the product of the two standard deviations, one can calculate the normalized version of the statistic. This is the correlation coefficient. Positive Correlation A positive correlation–when the correlation coefficient is greater than 0–signifies that both variables move in the same direction. When ρ is +1, it signifies that the two variables being compared have a perfect positive relationship; when one variable moves higher or lower, the other variable moves in the same direction with the same magnitude. The closer the value of ρ is to +1, the stronger the linear relationship. For example, suppose the value of oil prices is directly related to the prices of airplane tickets, with a correlation coefficient of +0.95. The relationship between oil prices and airfares has a very strong positive correlation since the value is close to +1. So if the price of oil decreases, airfares also decrease. Negative Correlation A negative (inverse) correlation occurs when the correlation coefficient is less than 0. This is an indication that both variables move in the opposite direction. In short, any reading between 0 and -1 means that the two securities move in opposite directions. When ρ is -1, the relationship is said to be perfectly negatively correlated. In short, if one variable increases, the other variable decreases with the same magnitude (and vice versa). However, the degree to which two securities are negatively correlated might vary over time (and they are almost never exactly correlated all the time). For example, suppose a study is conducted to assess the relationship between outside temperature and heating bills. The study concludes that there is a negative correlation between the prices of heating bills and the outdoor temperature. The correlation coefficient is calculated to be -0.96. This strong negative correlation signifies that as the temperature decreases outside, the prices of heating bills increase (and vice versa). When it comes to investing, negative correlation doesn't necessarily mean that the securities should be avoided. The correlation coefficient can help investors diversify their portfolio by 106 CU IDOL SELF LEARNING MATERIAL (SLM)
including a mix of investments that have a negative, or low, correlation to the stock market. In short, when reducing volatility risk in a portfolio, sometimes opposites do attract. For example, assume you have a $100,000 balanced portfolio that is invested 60% in stocks and 40% in bonds. In a year of strong economic performance, the stock component of your portfolio might generate a return of 12%, while the bond component may return -2% because interest rates are rising (which means that bond prices are falling). Thus, the overall return on your portfolio would be 6.4% ((12% x 0.6) + (-2% x 0.4). The following year, as the economy slows markedly and interest rates are lowered, your stock portfolio might generate - 5% while your bond portfolio may return 8%, giving you an overall portfolio return of 0.2%. What if, instead of a balanced portfolio, your portfolio was 100% equities? Using the same return assumptions, your all-equity portfolio would have a return of 12% in the first year and -5% in the second year. These figures are clearly more volatile than the balanced portfolio's returns of 6.4% and 0.2%. 8.5 ASSUMPTION OF CORRELATION Employing of correlation rely on some underlying assumptions. The variables are assumed to be independent, assume that they have been randomly selected from the population; the two variables are normal distribution; association of data is homoscedastic (homogeneous), homoscedastic data have the same standard deviation in different groups where data are heteroscedastic have different standard deviations in different groups and assumes that the relationship between the two variables is linear. The correlation coefficient is not satisfactory and difficult to interpret the associations between the variables in case if data have outliers. An inspection of a scatterplot can give an impression of whether two variables are related and the direction of their relationship. But it alone is not sufficient to determine whether there is an association between two variables. The relationship depicted in the scatterplot needs to be described qualitatively. Descriptive statistics that express the degree of relation between two variables are called correlation coefficients. A commonly employed correlation coefficient is Pearson correlation, Kendall rank correlation and Spearman correlation. Correlation used to examine the presence of a linear relationship between two variables providing certain assumptions about the data are satisfied. The results of the analysis, however, need to be interpreted with care, particularly when looking for a causal relationship. 8.6 CORRELATION COEFFICIENTS: PRODUCT MOMENT METHOD AND RANK ORDER METHOD Correlation is a Bivariate analysis that measures the strengths of association between two variables. In statistics, the value of the correlation coefficient varies between +1 and -1. When the value of the correlation coefficient lies around ± 1, then it is said to be a perfect 107 CU IDOL SELF LEARNING MATERIAL (SLM)
degree of association between the two variables. As the correlation coefficient value goes towards 0, the relationship between the two variables will be weaker. Usually, in statistics, we measure three types of correlations: Pearson correlation, Kendall rank correlation and Spearman correlation. Pearson ������ correlation: Pearson correlation is widely used in statistics to measure the degree of the relationship between linear related variables. For example, in the stock market, if we want to measure how two commodities are related to each other, Pearson correlation is used to measure the degree of relationship between the two commodities. The following formula is used to calculate the Pearson correlation coefficient Where: N = the number of pairs of scores Σxy = the sum of the products of paired scores Σx = the sum of x scores Σy = the sum of y scores Σx2 = the sum of squared x scores Σy2 = the sum of squared y scores An example of calculating Pearson's correlation An experiment conducted on 9 different cigarette smoking subjects resulted in the following data – Subject Number Cigarettes smoked per week Number of years lived (averaged over the last 5 years of their life) 1 25 63 2 35 68 108 CU IDOL SELF LEARNING MATERIAL (SLM)
3 10 72 4 40 62 5 85 65 6 75 46 7 60 51 8 45 60 9 50 55 Calculate the correlation of coefficient between the number of cigarettes smoked and the longevity of a test subject. Solution Let us first assign random variables to our data in the following way – x – the number of cigarettes smoked y – years lived We’ll be using the single formula for discrete data points here – rxy=NΣxiyi–ΣxiΣyi√ NΣx2i–(Σxi)2 √ NΣy2i–(Σyi)2 rxy=NΣxiyi–ΣxiΣyiNΣxi2–(Σxi)2NΣyi2– (Σyi)2 Let us now construct a table to compute all the values we are going to use in our correlation formula. Note that N here = 9 x x2 y y2 xy 25 625 63 3969 1575 35 1225 68 4624 2380 10 100 72 5184 720 40 1600 62 3844 2480 109 CU IDOL SELF LEARNING MATERIAL (SLM)
85 7225 65 4225 5525 46 75 5625 51 2116 3450 60 60 3600 55 2601 3060 Σyi = 542 45 2025 3600 2700 50 2500 3136 2750 Σxi = Σxi2 = Σyi2 = 33188 Σxiyi = 24640 425 24525 (Σxi)2 = (Σyi)2 = 4252 = 5422 = 180625 293764 Spearman rank correlation: Spearman rank correlation is a non-parametric test that is used to measure the degree of association between two variables. It was developed by Spearman, thus it is called the Spearman rank correlation. Spearman rank correlation test does not assume any assumptions about the distribution of the data and is the appropriate correlation analysis when the 110 CU IDOL SELF LEARNING MATERIAL (SLM)
variables are measured on a scale that is at least ordinal. The following formula is used to calculate the Spearman rank correlation coefficient: An example of calculating Spearman's correlation To calculate a Spearman rank-order correlation on data without any ties we will use the following data: Marks English 56 75 45 71 62 64 58 80 76 61 Maths 66 70 40 60 65 56 59 77 67 63 We then complete the following table: English (mark) Maths (mark) Rank (English) Rank (Maths) d d2 56 66 9 4 5 25 75 70 3 2 11 45 40 10 10 0 0 71 60 4 7 39 62 65 6 5 11 64 56 5 9 4 16 111 CU IDOL SELF LEARNING MATERIAL (SLM)
58 59 8 8 00 80 77 1 1 00 76 67 2 3 11 61 63 7 6 11 Where d = difference between ranks and d2 = difference squared. We then calculate the following: We then substitute this into the main equation with the other information as follows: as n = 10. Hence, we have a ρ (or rs) of 0.67. This indicates a strong positive relationship between the rank’s individuals obtained in the math and English exam. That is, the higher you ranked in math, the higher you ranked in English also, and vice versa. 8.7 USES OF CORRELATION Correlation is a widely-used analysis tool which sometimes is applied inappropriately. Some caveats regarding the use of correlation methods follow. 1. The correlation methods discussed in this chapter should be used only with independent data; they should not be applied to repeated measures data where the data are not independent. For example, it would not be appropriate to use these measures of 112 CU IDOL SELF LEARNING MATERIAL (SLM)
correlation to describe the relationship between Week 4 and Week 8 blood pressures in the same patients. 2. Caution should be used in interpreting results of correlation analysis when large numbers of variables have been examined, resulting in a large number of correlation coefficients. 3. The correlation of two variables that both have been recorded repeatedly over time can be misleading and spurious. Time trends should be removed from such data before attempting to measure correlation. 4. To extend correlation results to a given population, the subjects under study must form a representative (i.e., random) sample from that population. The Pearson correlation coefficient can be very sensitive to outlying observations and all correlation coefficients are susceptible to sample selection biases. 5. Care should be taken when attempting to correlate two variables where one is a part and one represents the total. For example, we would expect to find a positive correlation between height at age ten and adult height because the second quantity \"contains\" the first quantity. 6. Correlation should not be used to study the relation between an initial measurement, X, and the change in that measurement over time, Y - X. X will be correlated with Y - X due to the regression to the mean phenomenon. 7. Small correlation values do not necessarily indicate that two variables are unassociated. For example, Pearson's rp will underestimate the association between two variables that show a quadratic relationship. Scatterplots should always be examined. 8. Correlation does not imply causation. If a strong correlation is observed between two variables A and B, there are several possible explanations: (a) A influences B; (b) B influences A; (c) A and B are influenced by one or more additional variables; (d) the relationship observed between A and B was a chance error. 9. \"Regular\" correlation coefficients are often published when the researcher really intends to compare two methods of measuring the same quantity with respect to their agreement. This is a misguided analysis, because correlation measures only the degree of association; it does not measure agreement. The next section of this lesson will present a measure of agreement. 8.8 SUMMARY Correlation is a term that is a measure of the strength of a linear relationship between two quantitative variables (e.g., height, weight). This post will define positive and negative correlations, illustrated with examples and explanations of how to measure correlation. 113 CU IDOL SELF LEARNING MATERIAL (SLM)
Correlations are useful because if you can find out what relationship variables have, you can make predictions about future behaviour. Knowing what the future holds is very important in the social sciences like government and healthcare. Businesses also use these statistics for budgets and business plans. 8.9 KEY WORDS/ ABBREVIATIONS Correlation- 1. The degree of relationship between two or more variables. 2. A mathematical index of association between two or more variables. correlation matrix- A square matrix whose margins are identical lists of variables, which presents the correlations between each pair of variables in the cell which is the intersection of the row and column of a pair of variables. The left-to-right diagonal of such a matrix represents the correlation of a variable with itself, which would be 1, but is often filled with a reliability correlation of the variable involved. correlation, multiple- The degree of relationship between one variable and two or more other variables, usually measured with a linear equation and indicated by the symbol R. correlation, negative- The degree of inverse relationship between two variables, usually indicated by a minus sign with a correlation coefficient. correlation, Pearson product-movement- The most commonly used correlation coefficient, which measures the degree of linear relationship between two continuous variables scaled so that 0 indicates no relationship and +1 indicates a perfect positive relationship while −1 indicates a perfect inverse relationship. correlation, positive- An index of the degree of relationship between two variables in which an increase in one predicts an increase in the other and a decrease in one predicts a decrease in the other. Pearson product-moment correlation- A numerical index of shared, linear relationship between two variables. The Pearson product-moment correlation is the most widely used correlation coefficient and is appropriate for use with ratio variables and with interval data. It shows relationship on a scale where 0 is no relationship, +1 is a perfect positive relationship, and −1 is a perfect negative relationship. 8.10 LEARNING ACTIVITY 1. Write a note on the concept of correlation and its application. __________________________________________________________________________ __________________________________________________________________________ 2. With the help of appropriate diagrams show positive, negative and zero correlation. 114 CU IDOL SELF LEARNING MATERIAL (SLM)
__________________________________________________________________________ __________________________________________________________________________ 8.11 UNIT END QUESTIONS (MCQS AND DESCRIPTIVE) A. Descriptive Questions 1. What is correlation? 2. Discuss correlation co-efficient. 3. Explain strong correlation. 4. Describe weak correlation. 5. Outline the uses of correlation. B. Multiple Choice Questions (MCQs) 1. +0.76 is ________________________ a. Strong correlation b. Negative Correlation c. Zero Correlation d. None of these 2. -0.26 is _____________________ a. Strong correlation b. Negative Correlation c. Zero Correlation d. None of these 3. Correlation is __________________________ a. Measure of central tendency b. Measure of variation c. Parametric test d. Non-Parametric test 115 CU IDOL SELF LEARNING MATERIAL (SLM)
4. To obtain inter-item correlation, which one of the following correlation co-efficient should be used in the above analysis? a. Biserial Correlation b. Point biserial correlation c. Phi-co-efficient d. Rank difference correlation 5. In the above context, which of the following correlation should be computed to obtain item-reminder correlations? a. Biserial Correlation b. Point biserial correlation c. Phi-co-efficient d. Rank difference correlation Answer 4d 5b 1a 2b 3d 8.12 SUGGESTED READINGS Introduction to Social Research: Quantitative and Qualitative Approaches by Keith F Punch Research Methodology: Methods and Techniques By C. R. Kothari Research Methodology By D K Bhattacharyya Research Methodology: A Step-by-Step Guide for Beginners By Ranjit Kumar Research Methodology By P. Sam Daniel, Aroma G. Sam 116 CU IDOL SELF LEARNING MATERIAL (SLM)
UNIT 9 REGRESSION Structure 9.0. Learning Objectives 9.1. Introduction 9.2. What is regression? 9.3. Linear regression Analysis 9.4. Multiple Regression Analysis 9.5. Uses of Regression 9.6. Summary 9.7. Key Words/ Abbreviations 9.8. Learning Activity 9.9. Unit End Questions (MCQs and Descriptive) 9.10. Suggested Readings 9.0 LEARNING OBJECTIVES After studying this unit, you will be able to: State the concept of regression Discuss Linear Regression Describe Multiple Regression Outline uses of Regressions 9.1 INTRODUCTION The regression model is a statistical procedure that allows a researcher to estimate the linear, or straight line, relationship that relates two or more variables. This linear relationship summarizes the amount of change in one variable that is associated with change in another variable or variables. The model can also be tested for statistical significance, to test whether the observed linear relationship could have emerged by chance or not. In this section, the two variable linear regression model is discussed. In a second course in statistical methods, multivariate regression with relationships among several variables, is examined. The two variable regression model assigns one of the variables the status of an independent variable, and the other variable the status of a dependent variable. The independent variable may be regarded as causing changes in the dependent variable, or the independent variable may occur prior in time to the dependent variable. It will be seen that the researcher cannot be certain of a causal relationship, even with the regression model. However, if the researcher has reason to make one of the variables an independent variable, then the manner in which 117 CU IDOL SELF LEARNING MATERIAL (SLM)
this independent variable is associated with changes in the dependent variable can be estimated. 9.2 WHAT IS REGRESSION? In statistics, it’s hard to stare at a set of random numbers in a table and try to make any sense of it. For example, global warming may be reducing average snowfall in your town and you are asked to predict how much snow you think will fall this year. Looking at the following table you might guess somewhere around 10-20 inches. That’s a good guess, but you could make a better guess, by using regression. Essentially, regression is the “best guess” at using a set of data to make some kind of prediction. It’s fitting a set of points to a graph. There’s a whole host of tools that can run regression for you, including Excel, which I used here to help make sense of that snowfall data: 118 CU IDOL SELF LEARNING MATERIAL (SLM)
Just by looking at the regression line running down through the data, you can fine tune your best guess a bit. You can see that the original guess (20 inches or so) was way off. For 2015, it looks like the line will be somewhere between 5 and 10 inches! That might be “good enough”, but regression also gives you a useful equation, which for this chart is: y = -2.2923x + 4624.4. What that means is you can plug in an x value (the year) and get a pretty good estimate of snowfall for any year. For example, 2005: y = -2.2923(2005) + 4624.4 = 28.3385 inches, which is pretty close to the actual figure of 30 inches for that year. Best of all, you can use the equation to make predictions. For example, how much snow will fall in 2017? y = 2.2923(2017) + 4624.4 = 0.8 inches. Regression also gives you an R squared value, which for this graph is 0.702. This number tells you how good your model is. The values range from 0 to 1, with 0 being a terrible model and 1 being a perfect model. As you can probably see, 0.7 is a fairly decent model so you can be fairly confident in your weather prediction! 9.3 LINEAR REGRESSION ANALYSIS Linear Relationships In the regression model, the independent variable is labelled the X variable, and the dependent variable the Y variable. The relationship between X and Y can be shown on a graph, with the independent variable X along the horizontal axis, and the dependent variable Y along the vertical axis. The aim of the regression model is to determine the straight-line relationship that connects X and Y. The straight line connecting any two variables X and Y can be stated algebraically as 119 CU IDOL SELF LEARNING MATERIAL (SLM)
Y = a + bX Where a is called the Y intercept, or simply the intercept, and b is the slope of the line. If the intercept and slope for the line can be determined, then this entirely determines the straight line. In real life we know that although the equation makes a prediction of the true mean of the outcome for any fixed value of the explanatory variable, it would be unwise to use extrapolation to make predictions outside of the range of x values that we have available for study. On the other hand it is reasonable to interpolate, i.e., to make predictions for unobserved x values in between the observed x values. The structural model is essentially the assumption of “linearity”, at least within the range of the observed explanatory data. It is important to realize that the “linear” in “linear regression” does not imply that only linear relationships can be studied. Technically it only says that the beta’s must not be in a transformed form. It is OK to transform x or Y, and that allows many non-linear relationships to be represented on a new scale that makes the relationship linear. The error model that we use is that for each particular x, if we have or could collect many subjects with that x value, their distribution around the population mean is Gaussian with a spread, say σ 2 , that is the same value for each value of x (and corresponding population mean of y). Of course, the value of σ 2 is an unknown parameter, and we can make an estimate of it from the data. The error model described so far includes not only the assumptions of “Normality” and “equal variance”, but also the assumption of “fixed-x”. The “fixed-x” assumption is that the explanatory variable is measured without error. Sometimes this is possible, e.g., if it is a count, such as the number of legs on an insect, but usually there is some error in the measurement of the explanatory variable. In practice, we need to be sure 120 CU IDOL SELF LEARNING MATERIAL (SLM)
that the size of the error in measuring x is small compared to the variability of Y at any given x value. For more on this topic, see the section on robustness, below. 9.4 MULTIPLE REGRESSION ANALYSIS Multiple regression analysis is used to see if there is a statistically significant relationship between sets of variables. It’s used to find trends in those sets of data. Multiple regression analysis is almost the same as simple linear regression. The only difference between simple linear regression and multiple regression is in the number of predictors (“x” variables) used in the regression. Simple regression analysis uses a single x variable for each dependent “y” variable. For example: (x1, Y1). Multiple regression uses multiple “x” variables for each independent variable: (x1)1, (x2)1, (x3)1, Y1). In one-variable linear regression, you would input one dependent variable (i.e. “sales”) against an independent variable (i.e. “profit”). But you might be interested in how different types of sales effect the regression. You could set your X1 as one type of sales, your X2 as another type of sales and so on. When to Use Multiple Regression Analysis. Ordinary linear regression usually isn’t enough to take into account all of the real-life factors that have an effect on an outcome. For example, the following graph plots a single variable (number of doctors) against another variable (life-expectancy of women). From this graph it might appear there is a relationship between life-expectancy of women and the number of doctors in the population. In fact, that’s probably true and you could say it’s a 121 CU IDOL SELF LEARNING MATERIAL (SLM)
simple fix: put more doctors into the population to increase life expectancy. But the reality is you would have to look at other factors like the possibility that doctors in rural areas might have less education or experience. Or perhaps they have a lack of access to medical facilities like trauma centers. The addition of those extra factors would cause you to add additional dependent variables to your regression analysis and create a multiple regression analysis model. Multiple Regression Analysis Output. Regression analysis is always performed in software, like Excel or SPSS. The output differs according to how many variables you have but it’s essentially the same type of output you would find in a simple linear regression. There’s just more of it: Simple regression: Y = b0 + b1 x. Multiple regression: Y = b0 + b1 x1 + b0 + b1 x2…b0…b1 xn. The output would include a summary, similar to a summary for simple linear regression, that includes: R (the multiple correlation coefficient), R squared (the coefficient of determination), adjusted R-squared, The standard error of the estimate. These statistics help you figure out how well a regression model fits the data. The ANOVA table in the output would give you the p-value and f-statistic. Minimum Sample size “The answer to the sample size question appears to depend in part on the objectives of the researcher, the research questions that are being addressed, and the type of model being utilized. Although there are several research articles and textbooks giving recommendations for minimum sample sizes for multiple regression, few agree on how large is large enough and not many address the prediction side of MLR.” ~ Gregory T. Knofczynski If you’re concerned with finding accurate values for squared multiple correlation coefficient, minimizing the shrinkage of the squared multiple correlation coefficient or have another specific goal, Gregory Knofczynski’s paper is a worthwhile read and comes with lots of references for further study. That said, many people just want to run MLS to get a general idea of trends and 122 CU IDOL SELF LEARNING MATERIAL (SLM)
they don’t need very specific estimates. If that’s the case, you can use a rule of thumb. It’s widely stated in the literature that you should have more than 100 items in your sample. While this is sometimes adequate, you’ll be on the safer side if you have at least 200 observations or better yet—more than 400. 9.5 USES OF REGRESSION ANALYSIS Regression analysis refers to a method of mathematically sorting out which variables may have an impact. The importance of regression analysis for a small business is that it helps determine which factors matter most, which it can ignore, and how those factors interact with each other. The importance of regression analysis lies in the fact that it provides a powerful statistical method that allows a business to examine the relationship between two or more variables of interest. The benefits of regression analysis are manifold: The regression method of forecasting is used for, as the name implies, forecasting and finding the causal relationship between variables. An important related, almost identical, concept involves the advantages of linear regression, which is a procedure for modeling the value of one variable on the value(s) of one or more other variables. Understanding the importance of regression analysis, the advantages of linear regression, as well as the benefits of regression analysis and the regression method of forecasting can help a small business, and indeed any business, gain a far greater understanding of the variables (or factors) that can impact its success in the coming weeks, months and years into the future. Why Regression Analysis Is Important The importance of regression analysis is that it is all about data: data means numbers and figures that actually define your business. The advantages of regression analysis is that it can allow you to essentially crunch the numbers to help you make better decisions for your business currently and into the future. The regression method of forecasting means studying the relationships between data points, which can help you to: Predict sales in the near and long term. Understand inventory levels. Understand supply and demand. Review and understand how different variables impact all of these things. Companies might use regression analysis to understand, for example: Why customer service calls dropped in the past year or even the past month. 123 CU IDOL SELF LEARNING MATERIAL (SLM)
Predict what sales will look like in the next six month. Whether to choose one marketing promotion over another. Whether to expand the business or create and market a new product. The benefit of regression analysis is that it can be used to understand all kinds of patterns that occur in data. These new insights may often be very valuable in understanding what can make a difference in your business. How Is Regression Analysis Used in Forecasting The regression method of forecasting involves examining the relationship between two different variables, known as the dependent and independent variables. Suppose that you want to forecast future sales for your firm and you've noticed that sales rise or fall, depending on whether the gross domestic product goes up or down. (The gross domestic product, or GDP, is the sum of all goods and services produced within a nation's borders. In the U.S., it is calculated quarterly by the Commerce Department.) Your sales, then, would be the dependent variable, because they \"depend\" on the GDP, which is the independent variable. (An independent variable is the variable against which you are measuring something by comparison – your sales in this case.) You would need to figure out how closely these two variables - sales and GDP - are related. If the GDP goes up 2 percent, how much do your sales rise? Regression Analysis Example Though this sounds complicated, it's actually fairly simple. You could simply look back at the activity of the GDP in the last quarter or in the last three-month period, and compare it to your sales figure. In reality, the government reported that the GDP grew 2.6 percent in the fourth quarter of 2018. If your sales rose 5.2 percent during that same period, you'd have a pretty good idea that your sales generally rise at twice the rate of GDP growth because: 5.2 percent (your sales) / 2.6 percent = 2 The \"2\" means that your sales are rising at twice the rate of the GDP. You might want to go back a couple of more quarters to be sure this trend continues, say for an entire year. Suppose you sell car parts, wheat, or forklifts. It would be the same regardless of the products or services you sell. Since you know that your sales are increasing at twice the rate of GDP growth, then if the GDP increases 4 percent the next quarter, your sales will likely rise 8 percent. If the GDP goes up 3 percent, your sales would likely rise 6 percent, and so on. In this way, regression analysis can be a valuable tool for forecasting sales and help you determine whether you need to increase supplies, labor, production hours, and any number of other factors. 124 CU IDOL SELF LEARNING MATERIAL (SLM)
Using Regression Analysis to Formulate Strategies It's important to understand that a regression analysis is, essentially, a statistical problem. Businesses have adopted many concepts from statistics because they can prove valuable in helping a company determine any number of important things and then make informed, well- studied decisions based on various aspects of data. And data, according to Merriam-Webster, is merely factual information (such as measurements or statistics) used as a basis for reasoning, discussion, or calculation. Regression analysis uses data, specifically two or more variables, to provide some idea of where future data points will be. The benefit of regression analysis is that this type of statistical calculation gives businesses a way to see into the future. The regression method of forecasting allows businesses to use specific strategies so that those predictions, such as future sales, future needs for labor or supplies, or even future challenges, will yield meaningful information. The Five Applications of Regression Analysis The regression analysis method of forecasting generally involves five basic applications. There are more, but businesses that believe in the advantages of regression analysis generally use the following: Predictive analytics: This application, which involves forecasting future opportunities and risks, is the most widely used application of regression analysis in business. For example, predictive analytics might involve demand analysis, which seeks to predict the number of items that consumers will purchase in the future. Using statistical formulas, predictive analytics might predict the number of shoppers who will pass in front of a given billboard and use then use that information to place billboards where they will be the most visible to potential shoppers. And, insurance companies use predictive analysis to estimate the credit standing of policyholders and a possible number of claims in a given time period. Operation efficiency: Companies use this application to optimize the business process. For example, a factory manager might use regression analysis to see what the impact of oven temperature will be on loaves of bread baked in those ovens, such as how long their shelf life might be. Or, a call center can use regression analysis to see the relationships between wait times of callers and the number of complaints they register. This kind of data-driven decision-making can eliminate guesswork and make the process of creating optimum efficiency less about gut instinct and more about using well-crafted predictions based on real data. Supporting decisions: Many companies and their top managers today are using regression analysis (and other kinds of data analytics) to make an informed business decision and eliminate guesswork and gut intuition. Regression helps businesses adopt a scientific angle in 125 CU IDOL SELF LEARNING MATERIAL (SLM)
their management strategies. There is actually, often, too much data literally bombarding both small and large businesses. Regression analysis helps managers sift through the data and pick the right variables to make the most informed decisions Correcting errors: Even the most informed and careful managers do make mistakes in judgment. Regression analysis helps managers, and businesses in general, recognize and correct errors. Suppose, for example, a retail store manager feels that extending shopping hours will increase sales. Regression analysis may show that the modest rise in sales might not be enough to offset the increased cost for labor and operating expenses (such as using more electricity, for example). Using regression analysis could help a manager determine that an increase in hours would not lead to an increase in profits. This could help the manager avoid making a costly mistake New Insights: Looking at the data can provide new and fresh insights. Many businesses gather lots of data about their customers. But that data is meaningless without proper regression analysis, which can help find the relationship between different variables to uncover patterns. For example, looking at the data through regression analysis might indicate a spike in sales during certain days of the week and a drop in sales on others. Managers could then make adjustments to compensate, such as making sure to maintain stock on those days, bringing in extra help, or even ensuring that the best sales or service people are working on those days. What Is the Significance of Regression Analysis in Business? Regression analysis, then, is clearly a significant factor in business because it is a statistical method that allows firms, and their managers, to make better-informed decisions based on hard numbers. As Amy Gallo notes in the Harvard Business Review: \"In order to conduct a regression analysis, you gather the data on the variables in question....You take all of your monthly sales numbers for, say, the past three years and any data on the independent variables you’re interested in. So, in this case, let’s say you find out the average monthly rainfall for the past three years. . . Glancing at this data, you probably notice that sales are higher on days when it rains a lot. That’s interesting to know - but by how much? If it rains 3 inches, do you know how much you’ll sell? What about if it rains 4 inches?\" Regression analysis is significant, then, because it forces you, or any business, to take a look at the actual data, rather than simply guessing. In Gallo's example, a business would plot the points showing monthly rainfall for the past three years. That would be the independent variable. Then, you would look at the monthly sales figures for the business for the past three years, which is the depending variable: In essence, you're saying rising or falling sales depend on the amount of rainfall in a given month. 126 CU IDOL SELF LEARNING MATERIAL (SLM)
Rain vs. Sales Suppose your business is selling umbrellas, winter jackets, or spray-on waterproof coating. You might find that sales rise a bit when there are 2 inches of rain in a month. But you might also see that sales rise 25 percent or more during months of heavy rainfall, where there are more than 4 inches of rain. You could, then, be sure to stock up on umbrellas, winter jackets or spray-on waterproof coating during those heavy-rain months. You might also extend business hours during those months and possibly bring in more help. The example shows the benefits of linear regression; that is, you are using a single line that you draw through the plot points. The line might go up or down, depending on the rain total for each month, but you are essentially comparing two variables: monthly rainfall versus monthly sales. This type of linear regression gives you a clear, visual look at when a company's sales crest and fall. This example may seem obvious: More rain equals more sales of umbrellas or other rain- related products. But it shows how any business, can use regression analysis to make data- driven predictions about the future. Put another way, regression analysis can help your business avoid potentially costly gut-level decisions - and instead - base your decisions about the future on hard data, giving you a clearer, more accurate path into the future. 9.6 SUMMARY Regression is a statistical method used in finance, investing, and other disciplines that attempts to determine the strength and character of the relationship between one dependent variable (usually denoted by Y) and a series of other variables (known as independent variables). Regression helps investment and financial managers to value assets and understand the relationships between variables, such as commodity prices and the stocks of businesses dealing in those commodities. The two basic types of regression are simple linear regression and multiple linear regression, although there are non-linear regression methods for more complicated data and analysis. Simple linear regression uses one independent variable to explain or predict the outcome of the dependent variable Y, while multiple linear regression uses two or more independent variables to predict the outcome. Regression can help finance and investment professionals as well as professionals in other businesses. Regression can also help predict sales for a company based on weather, previous sales, GDP growth, or other types of conditions. The capital asset pricing model (CAPM) is an often-used regression model in finance for pricing assets and discovering costs of capital. 127 CU IDOL SELF LEARNING MATERIAL (SLM)
9.7 KEY WORDS/ ABBREVIATIONS chi-square test- A statistical test in which the sum of the squared deviations of observed frequencies minus theoretically expected frequencies is compared to that expected in a chi-square distribution which tests whether or not the observed and expected frequencies are likely to be drawn from the same population. factor analysis- Factor analysis is any of a set of analytic techniques applied to a group of observed variables seeking to discover a smaller set of artificial variables which capture or explain the relatedness of the observed variables. Artificial variables, called factors, are selected; they pass through the densest areas in the multidimensional space created by all the variables and are then rotated, usually to select the ones which collectively account for the most linear covariance, and are most uncorrelated with one another. Some of these procedures can also be used to test whether an a priori factor. factorial analysis of variance- factorial analysis of variance is an analysis of variance which compares the between-groups differences in the dependent variable associated with different levels in each independent variable and all combinations of the levels of the independent variables in an experimental design in which two or more independent variables are simultaneously and systematically varied so as to compare their individual and compounded influences on a dependent variable. structure is the best fit. inferential statistics- Inferential statistics is the branch of statistics concerned with using samples to draw conclusions about populations by means of hypothesis testing. 9.8 LEARNING ACTIVITY 1. What is Regression? What are some of the applications of Regression Analysis? ___________________________________________________________________________ ___________________________________________________________________________ 2. What is over fitting? How can we detect and avoid it? ___________________________________________________________________________ ___________________________________________________________________________ 9.9 UNIT END QUESTIONS (MCQs AND DESCRIPTIVE) A. Descriptive Questions 1. What is regression? 128 CU IDOL SELF LEARNING MATERIAL (SLM)
2. Discuss linear regression. 3. Explain multiple regression 4. What is over fitting? 5. State the significance of regression analysis in business B. Multiple Choice Questions (MCQs) 1. ____________________ is when model is too complicated for data a. Over fitting b. Regression Model c. Regression d. Linear Regression 2. ______________________ is a statistical procedure that allows a researcher to estimate the linear, or straight line, relationship that relates two or more variables. a. Central Tendency b. Regression Model c. Tendency of variance d. Inferential Statistics 3. What is a type of regression a. Multiple Regression b. Least Square Regression Model c. All of these d. Linear Regression 4. How can we detect over fitting? 129 a. Cross-Validation b. All of the above c. Shrinkage & Resampling d. Automated Methods CU IDOL SELF LEARNING MATERIAL (SLM)
5. ________________is the branch of statistics concerned with using samples to draw conclusions about populations by means of hypothesis testing. a. Central Tendency b. Regression c. Regression d. Inferential Statistics Answer 4b 5d 1a 2b 3c 9.10 SUGGESTED READINGS Introduction to Social Research: Quantitative and Qualitative Approaches by Keith F Punch Research Methodology: Methods and Techniques By C. R. Kothari Research Methodology By D K Bhattacharyya Research Methodology: A Step-by-Step Guide for Beginners By Ranjit Kumar Research Methodology By P. Sam Daniel, Aroma G. Sam 130 CU IDOL SELF LEARNING MATERIAL (SLM)
UNIT 10 USE OF TOOLS FOR RESEARCH Structure 10.0. Learning Objectives 10.1. Introduction 10.2. Mendeley Reference Management 10.3. APA Referencing 10.4. Software for detection of Plagiarism 10.5. Summary 10.6. Key Words/ Abbreviations 10.7. Learning Activity 10.8. Unit End Questions (MCQs and Descriptive) 10.9. Suggested Readings 10.0 LEARNING OBJECTIVES After studying this unit, you will be able to: Explain Mendeley reference management Discuss APA referencing Identify the software’s used for Plagiarism checks 10.1 INTRODUCTION The software tools of research are typically more abundant than hardware tools in the social sciences. Software is usually thought of to mean computer programs that tell the hardware what to do, but any tool not related to a physical device can be considered software. Included in this category is statistical software, consent forms, published tests, questionnaires, observation forms, and, to a lesser degree, the interview. Simple statistical problems, such as determining the mean or the median of a small data set, can easily be done with a calculator. Most formulas that will be used in a research report, however, are a lot more complex. While a calculator will work, a statistical program can reduce the computation time by hours, days, or even weeks. Imagine trying to determine the mean, standard deviation, t-score, and z-score conversions of twelve data sets each containing 300 subjects. Even the best statistician will spend many hours on this project that could be done by a computer in a matter of minutes once the data is entered. Difference between Data Analysis, Data Mining & Data Modeling Data analysis is done with the purpose of finding answers to specific questions. Data analytics techniques are similar to business analytics and business intelligence. 131 CU IDOL SELF LEARNING MATERIAL (SLM)
Data Mining is about finding the different patterns in data. For this, various mathematical and computational algorithms are applied to data and new data will get generated. Data Modeling is about how companies organize or manage the data. Here, various methodologies and techniques are applied to data. Data analysis is required for data modeling. In this article, we will take a look at the top data analysis software in detail along with their features. The most widely used statistical software used for social science research is the Statistical Package for the Social Sciences (SPSS) and is relatively easy to use if you have basic computer knowledge. SPSS can perform hundreds of statistical computations and even graph your data. Another program, SAS, also performs these functions and is gaining popularity with many researchers. Both, however, can be expensive to purchase so it would be wise to use your school’s software or look into a student version. 10.2 MENDELEY REFERENCE MANAGER Mendeley Reference Manager is a free web and desktop reference management application. It helps you simplify your reference management workflow so you can focus on achieving your goals. With Mendeley Reference Manager you can: Store, organize and search all your references from just one library. Seamlessly insert references and bibliographies into your Microsoft® Word documents using Mendeley Cite. Read, highlight and annotate PDFs, and keep all your thoughts across multiple documents in one place. Collaborate with others by sharing references and ideas. What is Mendeley Cite? Mendeley Cite add-in for Microsoft® Word allows you to easily insert references from your Mendeley library into your Word document, change your citation style and generate a bibliography - all without leaving your document. You can use Mendeley Cite to: Search for references in your Mendeley library and insert them into the document you're working on. 132 CU IDOL SELF LEARNING MATERIAL (SLM)
Select and insert individual or multiple references at once. Create a bibliography of all the references you've cited. Change to any of your preferred citation styles in just a few clicks. Cite without having Mendeley Reference Manager open or even installed - once you sign in to Mendeley Cite, your Mendeley library is downloaded from the cloud. Keep sight of your Word document at all times - Mendeley Cite opens as a separate panel in Word alongside your document window, not over it. How To Use Mendeley? Insert a citation: Position the cursor where you want to insert a citation in your document. Now go to the Mendeley Cite add-in window. On the 'References' tab in Mendeley Cite select the check box of the reference(s) you wish to insert. Select ‘Insert citation’ to insert the reference into your document. If you wish to insert multiple references, simply select more tick boxes. The citation will automatically update with the correct formatting. Editing a reference in a citation To edit references within a citation you have already created, position the cursor on the citation you want to edit and click to select it. In the edit panel of Mendeley Cite, you can now see your selected citation. You can now select the reference pill you wish to edit to open the attributes panel. Creating a bibliography After you have inserted one or more citations you can use Mendeley Cite to automatically create a bibliography of all the references you’ve cited. Position the cursor where you want the bibliography to appear in your document and go to the Mendeley Cite add-in window. Select the 'More' menu and select the ‘Insert Bibliography’ button in the drop down menu. Choosing and changing citation styles Mendeley Cite allows you to change the citation style you’re using - even after you’ve finished adding all of your citations and creating your bibliography. If you need to resubmit to a different journal with a different style, this feature allows you to easily restyle your document to meet different specifications. 133 CU IDOL SELF LEARNING MATERIAL (SLM)
The appearance of your citations is determined by the style you currently have selected. To select a new style, go to the 'Citation Style' tab in the Mendeley Cite add-in window. The 'Citation Style' tab displays a list of all the citation styles you currently have installed. When you first access Mendeley Cite with your Mendeley account, the selected style will be APA 6th edition and the tab will display the top 10 most common citation styles. You can change the style by selecting any of the displayed styles and then selecting 'Update citation style'. 10.3 APA REFERENCING APA referencing style is an author-date referencing system published by the American Psychological Association. There are two components in the APA referencing style: in-text citations and their corresponding reference list entries. With anything that you have read, used and referred to in your academic writing, you must: Acknowledge in text (i.e., in the work / assignment/ essay you are writing) Include in your reference list (i.e., the list at the end of your work of all the sources you refer to) How to cite references in APA format? In-Text Citation In-text references must be included following the use of a quote or paraphrase taken from another piece of work. In-text citations are citations within the main body of the text and refer to a direct quote or paraphrase. They correspond to a reference in the main reference list. These citations include the surname of the author and date of publication only. For example, there is a single author, James Mitchell, this takes the form: Mitchell (2017) states… or … (Mitchell, 2017. In case, there are two authors, the surname of both authors is stated with either ‘and’ or an ampersand between. For example: Mitchell and Smith (2017) state… Or … (Mitchell & Smith, 2017) For the first cite, all names should be listed: Mitchell, Smith, and Thomson (2017) state… or … (Mitchell, Smith, & Thomson, 2017) 134 CU IDOL SELF LEARNING MATERIAL (SLM)
If there are six or More Authors, Only the first author’s surname should be stated followed by et al, see the above example. If the author is unknown, the first few words of the reference should be used. This is usually the title of the source. If this is the title of a book, periodical, brochure or report, is should be italicized. For example: (A guide to citation, 2017) If this is the title of an article, chapter or web page, it should be in quotation marks. For example: (“APA Citation”, 2017) Works should be cited with a, b, c etc following the date. These letters are assigned within the reference list, which is sorted alphabetically by the surname of the first author. For example: (Mitchell, 2017a) or (Mitchell, 2017b) How to Cite Different Source Types In-text citation doesn’t vary depending on source type, unless the author is unknown. Reference list citations are highly variable depending on the source. Referencing Books Book referencing is the most basic style; it matches the template above, minus the URL section. So, the different examples of a book reference are as follows: Book referencing examples: Mitchell, J.A., Thomson, M., & Coyne, R.P. (2017). A guide to citation. London, England: My Publisher Jones, A.F & Wang, L. (2011). Spectacular creatures: The Amazon rainforest (2nd ed.). San Jose, Costa Rica: My Publisher Edited book example: Williams, S.T. (Ed.). (2015). Referencing: A guide to citation rules (3rd ed.). New York, NY: My Publisher Edited book chapter example: 135 CU IDOL SELF LEARNING MATERIAL (SLM)
In the following example, B.N. Troy is the author of the chapter and S.T. Williams is the editor. Troy, B.N. (2015). APA citation rules. In S.T, Williams (Ed.). A guide to citation rules (2nd ed., pp. 50-95). New York, NY: Publishers. E-Book example: Mitchell, J.A., Thomson, M., & Coyne, R.P. (2017). A guide to citation. Retrieved from https://www.mendeley.com/reference-management/reference-manager E-Book chapter example: Troy, B.N. (2015). APA citation rules. In S.T, Williams (Ed.). A guide to citation rules (2nd ed., pp. 50-95). New York, NY: Publishers. An E-Book reference is the same as a book reference expect the publisher is swapped for a URL. The basic structure is as follows: Author surname, initial(s) (Ed(s). *). (Year). Title (ed.*). Retrieved from URL *optional. E-Book example: Mitchell, J.A., Thomson, M., & Coyne, R.P. (2017). A guide to citation. Retrieved from https://www.mendeley.com/reference-management/reference-manager E-Book chapter example: Troy, B.N. (2015). APA citation rules. In S.T, Williams (Ed.). A guide to citation rules (2nd ed., pp. 50-95). Retrieved from https://www.mendeley.com/reference- management/reference-manager Referencing Journal Articles Articles differ from book citations in that the publisher and publisher location are not included. For journal articles, these are replaced with the journal title, volume number, issue number and page number. Journal Article Examples: Mitchell, J.A. (2017). Citation: Why is it so important. Mendeley Journal, 67(2), 81-95 Mitchell, J.A. (2017). Citation: Why is it so important. Mendeley Journal, 67(2), 81-95. Retrieved from https://www.mendeley.com/reference-management/reference-manager Referencing Newspaper Articles 136 CU IDOL SELF LEARNING MATERIAL (SLM)
The basic structure is as follows: Author surname, initial(s). (Year, Month Day). Title. Title of Newspaper, column/section, p. or pp. Retrieved from URL* **Only include if the article is online. Note: the date includes the year, month and date. Newspaper Articles Example: Mitchell, J.A. (2017). Changes to citation formats shake the research world. The Mendeley Telegraph, Research News, pp.9. Retrieved from https://www.mendeley.com/reference- management/reference-manager Referencing Magazine Articles in Print or Online: The basic structure is as follows: Author surname, initial(s). (Year, month day). Title. Title of the Magazine, pp. Magazine Article Example: Mitchell, J.A. (2017). How citation changed the research world. The Mendeley, pp. 26-28 Referencing Non-Print Material: How to Cite an Image in APA Format The basic format to cite an image is: Image Example: 137 Millais, J.E. (1851-1852). Ophelia [painting]. Retrieved from www.tate.org.uk/art/artworks/millais-ophelia-n01506 Referencing Websites: When citing a website, the basic structure is as follows: Author surname, initial(s). (Year, month day). Title. Retrieved from URL Website example: CU IDOL SELF LEARNING MATERIAL (SLM)
Mitchell, J.A. (2017, May 21). How and when to reference. Retrieved from https://www.howandwhentoreference.com. 10.4 SOFTWARE FOR DETECTION OF PLAGIARISM Plagiarism is seen as academic misconduct. Plagiarism is not taken lightly by academic and research institutions and is punished and penalized severely. Plagiarism occurs when you copy and paste a large chunk of text from a document written by someone else without giving credit to the author. This is seen as copying and taking credits for somebody’s work. Even if you paraphrase the text and use it in your text, it will still be seen as plagiarism. One of the common forms of plagiarism is self-plagiarism. Self-plagiarism is the use of one’s own previous work in another context without citing that it was used previously. This is because once you publish your work, the publisher holds the copyright for your text, so you need to either get permission from the publisher to reuse the text or you should cite the source. There are plenty of plagiarism detection software and online checking tools available that you can use to check how much of your text overlap with previously published materials. You can fix these mistakes before submitting your academic essay or research paper. Some of the tools for checking plagiarism are listed below. In the tech age that we live in, a plagiarism checker software tool is really necessary for copyright protection. In many cases, plagiarism can result in lawsuits, criminal charges, and sometimes even imprisonment. Even if you commit unintentional plagiarism, it will still be viewed as plagiarism in the eyes of the law, and this is very important to remember. Use one of these trusted tools, according to your needs, and eliminate the risk of being penalized for plagiarism. By using plagiarism software, you will be able to make sure that your content is 100 percent original. Some of the examples are, Turnitin Grammarly Premium Plagiarism Checker X Turnitin: Turnitin is one software that focuses on integrity which is one of the reasons it’s so amazingly popular in the academic environments. 138 CU IDOL SELF LEARNING MATERIAL (SLM)
The tool is quick to identify plagiarism by comparing sets of content against the largest collection of digitalized academic materials and highlighting similarities. Let’s quickly look at its key features: Effective plagiarism detection solution Implement authorship based on data-driven insights Feedback and grading features to facilitate instructional dialog Educational, creative resources to enhance academic skills (blogs, guides, white papers, and more) Grammarly Premium Grammarly needs no introduction. It’s the most famous grammar checker tool to this date, ideal for work, personal projects, and academic writing. However, few people know that Grammarly includes powerful plagiarism checker software that verifies over 16 billion web pages. The only issue is that it’s only available for Grammarly Premium users. Let’s quickly look at its key features: Comprehensive plagiarism detection software Automatically checks your grammar for errors Set audience, formality, domain, tone, and intent goals Get tailored writing suggestions based on your goals See correctness, clarity, engagement, and delivery alerts Get readability and vocabulary enhancement suggestions In-depth spellchecker and writing recommendations Create a custom dictionary associated with your account Native app for PC and Mac Plagiarism Checker X Plagiarism Checker X is our top choice when it comes to plagiarism software. It has an intuitive interface, straightforward options, and uses a large database to check for text similarity and original content. Let’s quickly look at its key features: Paste text or load files to check for plagiarism Compare any two text documents side by side 139 CU IDOL SELF LEARNING MATERIAL (SLM)
Check several text files at the same time Generates reports with plagiarism results Runs offline for side-by-side comparisons Cheap professional license with lifetime availability Reasonable business cost for 5 PCs Fully-functional free edition but with limited features Some other websites to check plagiarism are: Scribbr Viper PaperRater WriteCheck PlagiarismDetector PlagiarismSoftware DupliChecker Plagium PlagTracker 10.5 SUMMARY Data analysis is the process of working on data with the purpose of arranging it correctly, explaining it, making it presentable, and finding a conclusion from that data. It is done for finding useful information from data to make rational decisions. As it is done for decision making, it is important to understand the sole purpose of data analysis. The main purpose of data analysis is interpretation, evaluation & organization of data and to make the data presentable. 10.6 KEY WORDS/ ABBREVIATIONS American Psychiatric Association (APA) - A professional organization of psychiatrists in the United States which is responsible for producing the DSM-IV-TR, which is used to define categories of mental disabilities and embody current research and understanding of pathology in general and specific pathological processes. American Psychological Association- (APA) A professional organization of persons holding doctoral degrees in psychology which promotes the clinical practice of psychology (among the many disciplines of psychology), sets ethical guidelines for psychologists, and publishes numerous research journals. 140 CU IDOL SELF LEARNING MATERIAL (SLM)
Data Analysis- Data analysis is done with the purpose of finding answers to specific questions. Data analytics techniques are similar to business analytics and business intelligence. Data Mining- Data Mining is about finding the different patterns in data. For this, various mathematical and computational algorithms are applied to data and new data will get generated. Data Modeling- Data Modeling is about how companies organize or manage the data. Here, various methodologies and techniques are applied to data. Data analysis is required for data modeling. 10.7 LEARNING ACTIVITY 1. Have you used any tool or software in your research? What were your experiences while doing so? Discuss with your friends and note your observations. ___________________________________________________________________________ ___________________________________________________________________________ 2. What are some of the free and paid tool or software used in research? ___________________________________________________________________________ ___________________________________________________________________________ 10.8 UNIT END QUESTIONS (MCQS AND DESCRIPTIVE) A. Descriptive Questions 1. Have you heard about plagiarism? What are your thought on the same? 2. What are the implications of referencing appropriately? 3. How can you use a referencing tool? 4. Imagine you are a research student. With examples shoe how you will reference material from printed sources? 5. Some applications help us detect plagiarism. Can you explain the features of one of those websites or applications? B. Multiple Choice Questions (MCQs) 1. One of the following is a reference management system: (A) DSpace 141 CU IDOL SELF LEARNING MATERIAL (SLM)
(B) Mendeley (C) Green-stone (D) Linux 2. ________________________ is a function of Mendeley. (A) Referencing (B) Detecting Plagiarism (C) Data Management (D) End note 3. _______________ is seen as academic misconduct. (A) Plagiarism (B) Data Modelling (C) Data Mining (D) Data Analysis 4. _________________________ is about how we mention of the authors of the original work in our reports. (A) Plagiarism (B) Referencing (C) Data Mining (D) Data Analysis 5. ________________ is a free website that helps us in addressing plagiarism 142 (A) Turnitin (B) Grammarly Premium CU IDOL SELF LEARNING MATERIAL (SLM)
(C) Green-stone 4b 5d (D) Duplichecker Answer 1b 2a 3a 10.9 SUGGESTED READINGS Introduction to Social Research: Quantitative and Qualitative Approaches by Keith F Punch Research Methodology: Methods and Techniques By C. R. Kothari Research Methodology By D K Bhattacharyya Research Methodology: A Step-by-Step Guide for Beginners By Ranjit Kumar Research Methodology By P. Sam Daniel, Aroma G. Sam 143 CU IDOL SELF LEARNING MATERIAL (SLM)
UNIT 11 REPORT WRITING Stucture 11.0. Learning Objectives 11.1. Introduction 11.2. Components of Research Report 11.3. Structure of A Research Report 11.4. Tips for Writing Research Reports 11.5. Summary 11.6. Key Words/ Abbreviations 11.7. Learning Activity 11.8. Unit End Questions (MCQs and Descriptive) 11.9. Suggested Readings 11.0 LEARNING OBJECTIVES The unit focuses on the different aspects of report writing. In this unit, we will look into; Components of Research Report Structure of A Research Report Tips for Writing Research Reports 11.1 INTRODUCTION Research reports are recorded data prepared by researchers or statisticians after analyzing information gathered by conducting organized research, typically in the form of surveys or qualitative methods. Reports usually are spread across a vast horizon of topics but are focused on communicating information about a particular topic and a very niche target market. The primary motive of research reports is to convey integral details about a study for marketers to consider while designing new strategies. Certain events, facts and other information based on incidents need to be relayed on to the people in charge and creating research reports is the most effective communication tool. Ideal research reports are extremely accurate in the offered information with a clear objective and conclusion. There should be a clean and structured format for these reports to be effective in relaying information. A research report is a reliable source to recount details about a conducted research and is most often considered to be a true testimony of all the work done to garner specificities of research. The various sections of a research report are: 144 CU IDOL SELF LEARNING MATERIAL (SLM)
1. Summary 2. Background/Introduction 3. Implemented Methods 4. Results based on Analysis 5. Deliberation 6. Conclusion 11.2 COMPONENTS OF RESEARCH REPORT Research is imperative for launching a new product/service or a new feature. The markets today are extremely volatile and competitive due to new entrants every day who may or may not provide effective products. An organization needs to make the right decisions at the right time to be relevant in such a market with updated products that suffice customer demands. The details of a research report may change with the purpose of research but the main components of a report will remain constant. The research approach of the market researcher also influences the style of writing reports. Here are seven main components of a productive research report: Research Report Summary: The entire objective along with the overview of research are to be included in a summary which is a couple of paragraphs in length. All the multiple components of the research are explained in brief under the report summary. It should be interesting enough to capture all the key elements of the report. Research Introduction: There always is a primary goal that the researcher is trying to achieve through a report. In the introduction section, he/she can cover answers related to this goal and establish a thesis which will be included to strive and answer it in detail. This section should answer an integral question: “What is the current situation of the goal?”. After the research was conducted, did the organization conclude the goal successfully or they are still a work in progress – provide such details in the introduction part of the research report. Research Methodology: This is the most important section of the report where all the important information lies. The readers can gain data for the topic along with analyzing the quality of provided content and the research can also be approved by other market researchers. Thus, this section needs to be highly informative with each aspect of research discussed in detail. Information needs to be expressed in chronological order according to its priority and importance. Researchers should include references in case they gained information from existing techniques. 145 CU IDOL SELF LEARNING MATERIAL (SLM)
Research Results: A short description of the results along with calculations conducted to achieve the goal will form this section of results. Usually, the exposition after data analysis is carried out in the discussion part of the report. Research Discussion: The results are discussed in extreme detail in this section along with a comparative analysis of reports that could probably exist in the same domain. Any abnormality uncovered during research will be deliberated in the discussion section. While writing research reports, the researcher will have to connect the dots on how the results will be applicable in the real world. Research References and Conclusion: Conclude all the research findings along with mentioning each and every author, article or any content piece from where references were taken. 11.3 STRUCTURE OF A RESEARCH REPORT A key feature of reports is that they are formally structured in sections. The use of sections makes it easy for the reader to jump straight to the information they need. Unlike an essay which is written in a single narrative style from start to finish, each section of a report has its own purpose and will need to be written in an appropriate style to suit – for example, the methods and results sections are mainly descriptive, whereas the discussion section needs to be analytical. Understanding the function of each section will help you to structure your information and use the correct writing style. Reports for different briefs require different sections, so always check carefully any instructions you've been given. Title The title needs to concisely state the topic of the report. It needs to be informative and descriptive so that someone just reading the title will understand the main issue of your report. You don’t need to include excessive detail in your title but avoid being vague and too general. Abstract (Also called the Summary or Executive Summary) This is the ‘shop window’ for your report. It is the first (and sometimes the only) section to be read and should be the last to be written. It should enable the reader to make an informed decision about whether they want to read the whole report. The length will depend on the extent of the work reported but it is usually a paragraph or two and always less than a page. A good way to write an abstract is to think of it as a series of brief answers to questions. These would probably include: What is the purpose of the work? 146 CU IDOL SELF LEARNING MATERIAL (SLM)
What methods did you use for your research? What were the main findings and conclusions reached as a result of your research? Did your work lead you to make any recommendations for future actions? What is the purpose of the work? What methods did you use for your research? What were the main findings and conclusions reached as a result of your research? Did your work lead you to make any recommendations for future actions? Introduction (Also called Background or Context) In this section you explain the rationale for undertaking the work reported on, including what you have been asked (or chosen) to do, the reasons for doing it and the background to the study. It should be written in an explanatory style. State what the report is about - what is the question you are trying to answer? If it is a brief for a specific reader (e.g. a feasibility report on a construction project for a client), say who they are. Describe your starting point and the background to the subject, for instance: what research has already been done (if you have been asked to include a Literature Survey later in the report, you only need a brief outline of previous research in the Introduction); what are the relevant themes and issues; why are you being asked to investigate it now? Explain how you are going to go about responding to the brief. If you are going to test a hypothesis in your research, include this at the end of your introduction. Include a brief outline of your method of enquiry. State the limits of your research and reasons for them, for example; “Research will focus on native English speakers only, as a proper consideration of the issues arising from speaking English as a second language is beyond the scope of this project”. Literature survey (Also called Literature Review or Survey/Review of Research) This is a survey of publications (books, journals, authoritative websites, sometimes conference papers) reporting work that has already been done on the topic of your report. It should only include studies that have direct relevance to your research. A literature survey should be written like an essay in a discursive style, with an introduction, main discussion grouped in themes and a conclusion. Introduce your review by explaining how you went about finding your materials, and any clear trends in research that have emerged. Group your texts in themes. Write about each theme as a separate section, giving a critical summary of each piece of work and showing its relevance to your research. Conclude 147 CU IDOL SELF LEARNING MATERIAL (SLM)
with how the review has informed your research (things you’ll be building on, gaps you’ll be filling etc). Methods (Also called Methodology) You need to write your Methods section in such a way that a reader could replicate the research you have done. There should be no ambiguity here, so you need to write in a very factual informative style. You need to state clearly how you carried out your investigation. Explain why you chose this particular method (questionnaires, focus group, experimental procedure etc), include techniques and any equipment you used. If there were participants in your research, who were they? How many? How were they selected? Write this section concisely but thoroughly – go through what you did step by step, including everything that is relevant. You know what you did, but could a reader follow your description? Results (Also called Data or Findings) This section has only one job which is to present the findings of your research as simply and clearly as possible. Use the format that will achieve this most effectively e.g. text, graphs, tables or diagrams. When deciding on a graphical format to use, think about how the data will look to the reader. Choose just one format - don’t repeat the same information in, for instance, a graph and a table. Label your graphs and tables clearly. Give each figure a title and describe in words what the figure demonstrates. Writing in this section should be clear, factual and informative. Save your interpretation of the results for the Discussion section. Discussion This is probably the longest section and worth spending time on. It brings everything together, showing how your findings respond to the brief you explained in your introduction and the previous research you surveyed in your literature survey. It should be written in a discursive style, meaning you need to discuss not only what your findings show but why they show this, using evidence from previous research to back up your explanations. This is also the place to mention if there were any problems (for instance, if your results were different from expectations, you couldn’t find important data, or you had to change your method or participants) and how they were or could have been solved. Conclusion Your conclusions should be a short section with no new arguments or evidence. Sum up the main points of your research - how do they answer the original brief for the work reported on? This section may also include: Recommendations for action Suggestions for further research 148 CU IDOL SELF LEARNING MATERIAL (SLM)
References (Also called Reference List or Bibliography) List here full details for any works you have referred to in the report, including books, journals, websites and other materials. You may also need to list works you have used in preparing your report but have not explicitly referred to - check your instructions for this and for the correct style of referencing to use. You can find information about how to reference more unusual materials (television programmes, blogs etc.) from various websites including the Learn Higher website on referencing. If you're not sure, the rule is to be consistent and to give enough details that a reader can find the same piece of information that you used. Appendices The appendices hold any additional information that may help the reader but is not essential to the report’s main findings: anything that 'adds value'. That might include (for instance) interview questions, raw data or a glossary of terms used. Label all appendices and refer to them where appropriate in the main text (e.g., ‘See Appendix A for an example questionnaire’). Which section should I write first? It can be helpful to write up sections as you go along. This means that you write about what you've done while it's still fresh in your mind and you can see more easily if there are any gaps that might need additional research to fill them. In addition, you don't end up with a large piece of writing to do in one go - that can be overwhelming. Here is a suggested order for writing the main sections: 1. Methods and Data/Results: As a rough guide, the more factual the section, the earlier you should write it. So sections describing ‘what you did and what you found?’ are likely to be written first. 2. Introduction and Literature Survey: Sections that explain or expand on the purpose of the research should be next. What questions are you seeking to answer, how did they arise, why are they worth investigating? These will help you to see how to interpret and analyse your findings. 3. Discussion: Once you’ve established the questions your research is seeking to answer, you will be able to see how your results contribute to the answers and what kind of answers they point to. Write this early enough that you still have time to fill any gaps you find. 4. Conclusions and Recommendations: These should follow logically from your Discussion. They should state your conclusions and recommendations clearly and simply. 5. Abstract/Executive Summary: Once the main body is finished you can write a succinct and accurate summary of the main features. My report doesn’t seem to fit into these sections 149 CU IDOL SELF LEARNING MATERIAL (SLM)
11.4 TIPS FOR WRITING RESEARCH REPORTS Writing research reports in the manner can lead to all the efforts going down the drain. Here are 15 tips for writing impactful research reports: Prepare the context before starting to write and start from the basics: This was always taught to us in school – be well-prepared before taking a plunge into new topics. The order of survey questions might not be the ideal or most effective order for writing research reports. The idea is to start with a broader topic and work towards a more specific one and focus on a conclusion or support, which a research should support with the facts. The most difficult thing to do in reporting, without a doubt is to start. Start with the title, the introduction, then document the first discoveries and continue from that. Once the marketers have the information well documented, they can write a general conclusion. Keep the target audience in mind while selecting a format that is clear, logical and obvious to them: Will the research reports be presented to decision makers or other researchers? What are the general perceptions around that topic? This requires more care and diligence. A researcher will need a significant amount of information to start writing the research report. Be consistent with the wording, the numbering of the annexes and so on. Follow the approved format of the company for the delivery of research reports and demonstrate the integrity of the project with the objectives of the company. Have a clear research objective: A researcher should read the entire proposal again, and make sure that the data they provide contributes to the objectives that were raised from the beginning. Remember that speculations are for conversations, not for research reports, if a researcher speculates, they directly question their own research. Establish a working model: Each study must have an internal logic, which will have to be established in the report and in the evidence. The researchers’ worst nightmare is to be required to write research reports and realize that key questions were not included. Gather all the information about the research topic. Who are the competitors of our customers? Talk to other researchers who have studied the subject of research, know the language of the industry. Misuse of the terms can discourage the readers of research reports from reading further. Read aloud while writing. While reading the report, if the researcher hears something inappropriate, for example, if they stumble over the words when reading them, surely the reader will too. If the researcher can’t put an idea in a single 150 CU IDOL SELF LEARNING MATERIAL (SLM)
Search
Read the Text Version
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154