Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore -Earl_Babbie-_The_Practice_of_Social_Research(BookFi)

-Earl_Babbie-_The_Practice_of_Social_Research(BookFi)

Published by dinakan, 2021-08-12 20:20:06

Description: e-Book ini adalah untuk tujuan pembacaan sahaja dan tidak berasaskan sebarang keuntungan.

Search

Read the Text Version

Sociological Diagnostics 425 TABLE 14-13 TABLE 14-14 ASimplification ofTable 14-12 Gender, Job Tenure, and Income, 1984 (Full-time workers 21-64 Percent Who Attendabout Weekly Men Women Years working with Average hourly inwme Women/Men 20 23 current employer Men Women ratio Under 40 (504) (602) Less than 2years $8.46 $6.03 0.71 40 and Older 30 38 2to 4years $9.38 $6.78 0.72 5to 9years $10.42 $7.56 0.73 (695) (936) 10 years or more $12.38 $7.91 0.64 50u((e:General Social Survey, 2000, Source: US, Bureau of the Census, Current Population Reports, Series P-70, No 10, Mole-Female Differences in Work Experience, Occupation, and Earning, 1984 On the basis of this recognition, Table 14-12 (Washington, DC: US, Government Printing Office, 1987),4 could be presented in the alternative format of Table 14-13, In Table 14-13, the percentages of people say- probably have less seniority at work than men will, ing they attend religious services about weekly are and income increases with seniority, A 1984 study reported in the cells representing the intersections of by the Census Bureau showed this reasoning to be the two independent variables. The numbers pre- part(v true, as Table 14-14 shows. sented in parentheses below each percentage repre- sent the number of cases on which the percentages Table 14-14 indicates, first of all, that job tenure are based. Thus, for example, the reader knows did indeed affect income. Among both men and there are 602 women under 40 years of age in the women, those with more years on the job earned sample, and 23 percent of them attend religious ser· more. This is seen by reading down the first two vices weekly. We can calculate from this that 138 of colunms of the table. those 602 women attend weekly and that the other 464 younger women (or 77 percent) attend less fre- The table also indicates that women earned less quently. This new table is easier to read than the for- than men, regardless of job seniority. This can be mer one, and it does not sacrifice any detail. seen by comparing average wages across the rows of the table, and the ratio of women-to-men wages Sociological Diagnostics is shown in the third column. Thus, years on the job was an important determinant of earnings, but The multivariate techniques we're now ex'})loring seniority did not adequately explain the pattern of can serve as powerful tools for diagnosing social women earning less than men. In fact. we see that problems. They can be used to replace opinions women with ten or more years on the job earned with facts and to settle ideological debates with data substantially less ($7.91/hour) than men with less analysis. than two years ($8,46/hour). For an example, let's return to the issue of gen- Although years on the job did not fully explain der and income. Many explanations have been ad- the difference between men's and women's pay, vanced to account for the long-standing pattern of there are other possible ex'})lanations: level of edu- women in the labor force earning less than men. cation, child care responsibilities, and so forth. The One explanation is that. because of traditional fam- researchers who calculated Table 14-14 also exam- ily patterns, women as a group have participated ined some of the other variables that might reason- less in the labor force and many only began work- ably explain the differences in pay without repre- ing outside the home after completing certain senting gender discrimination, including these: child-rearing tasks. Thus, women as a group ,vill \" Number of years in the current occupation \" Total years of work experience (any occupation) \" Whether they have usually worked full time

426 Chapter 14: Quantitative Data Analysis • Marital status TABLE 14-15 • Size of city or town they live in • Whether covered by a union contract Education, Gender, and Income • Type of occupation Educational Level Average Yeally Income Ratio ofWomen's • Number of employees in the firm Men Women to Men's Income • Whether private or public employer Less than 9years • Whether they left previous job involuntarily 9-12 years $24,692 $17,131 0.69 HS Graduate 28,832 19,063 0.66 • Time spent between current and previous job Some College 36,770 24,970 0.68 Associate Degree 44,911 29,273 0,65 • Race Bachelors or more 46,226 31,681 0.69 • Whether they have a disability 77,963 47,224 0.61 • Health status Source: US Bureau of the Census,Statistical Abstract ofthe United States (Wash- • Age of children ington, DC: US Government Printing Office, 2000)'Table 666, p.440 • Whether they took an academic curriculum in about 69 percent as much as her male counterpart high school (US. Bureau of the Census 2000: 440), But does that difference represent sexual discrimination or • Number of math, science, and foreign language does it reflect legitimate factors? classes in high school For example, some argue that education affects • Whether they attended private or public high income and that in the past, women have gotten school less education than men. We might start, therefore, by checking whether educational differences ex- • Educational level achieved plain why women today earn less, on average, than men. Table 14-15 offers data to test this hypothesis. • Percentage of women in the occupation As the table shows, at each level of comparable • College major education, women earn substantially less than men do. Clearly, education does not explain the Each of the variables listed here might reason- discrepancy. ably affect earnings and, if women and men differ in these regards, could help to account for male/fe- In fact. education may actually conceal the ex- male income differences. When all these variables tent of the gender difference. Without taking edu- vvere taken into account, the researchers were able cation into account, the ratio of women-to-men in- to account for 60 percent of the discrepancy be- comes was 0.69 or 69 percent, but in Table 14-15, tween the incomes of men and women. The re- four of the six educational categories describe even maining 40 percent, then, is a function of other worse discrepancies. This odd pattern suggests that \"reasonable\" variables and/or prejudice. This kind the women among full-time, year-round workers of conclusion can be reached only by examining have more education than men do, rather than less. the effects of several variables at the same time- Because educational level affects income, women's that is, through multivariate analysis. higher educational achievement slightly reduces the apparent income discrepancy when education I hope this example shows how the logic im- is not taken into account. plicit in day-to-day conversations can be repre- sented and tested in a quantitative data analysis This is the kind of analysis you are now like this. Along those lines, you might be asking equipped to undertake. yourself, These data point to salary discrimination against women in 1984, but hasn't that been reme- As another example of multivariate data analy- died? Not really, as indicated by more recent data. sis in real life, consider the common observation that minority group members are more likely to be In 2000 the average full-time, year-round male worker earned $50,557. The average full-time, year-round female worker earned $32,641, or

Main Points 427 denied bank loans than white applicants are. be appropriate to the nature and objectives of A counterexplanation might be that the minority the study. applicants in question were more likely to have had a prior bankruptcy or that they had less collateral • A codebook is the document that describes to guarantee the requested loan-both reasonable (1) the identifiers aSSigned to different variables bases for granting or denying loans. However, the and (2) the codes aSSigned to the attributes of kind of multivariate analysis we've just examined those variables. could easily resolve the disagreemenL Univariate Analysis Let's say we look only at those who have not had a prior bankruptcy and who have a certain level • Univariate analysis is the analysis of a single of collateral. Are whites and minorities equally variable. Because univariate analysis does not likely to get the requested loan? We could conduct involve the relationships between two or more the same analysis in subgroups determined by level variables, its purpose is descriptive rather than of collateral. If whites and minorities were equally explanatory. likely to get their loans in each of the subgroups, we would need to conclude that there was no ethnic • Several techniques allow researchers to sum- discrimination. If minorities were still less likely to marize their original data to make them more get their loans, however, that would indicate that manageable while maintaining as much of the bankruptcy and collateral differences were not the original detail as possible. Frequency distribu- explanation-strengthening the case that discrimi- tions, averages, grouped data, and measures of nation was at work. dispersion are all ways of summarizing data concerning a single variable. All this should make it clear that social research can playa powerful role in serving in the human Subgroup Comparisons community. It can help us determine the current state of affairs and can often point the way to • Subgroup comparisons can be used to describe where we want to go. similarities and differences among subgroups with respect to some variable. Welcome to the world of sociological diagnostics! Bivariate Analysis MAIN POINTS • Bivariate analysis focuses on relationships be- tween variables rather than on comparisons of Introduction groups. Bivariate analysis explores the statistical association between the independent variable • Quantitative analysis involves the techniques and the dependent variable. Its purpose is usu- by which researchers convert data to a numeri- ally eA1llanatory rather than merely descriptive. cal form and subject it to statistical analyses. • The results of bivariate analyses often are pre- Quantification of Data sented in the form of contingency tables, which are constructed to reveal the effects of the inde- • Some data, such as age and income, are intrin- pendent variable on the dependent variable> sically numerical. Introduction to Multivariate Analysis • Often, quantification involves coding into categories that are then given numerical • Multivariate analysis is a method of analyzing representations. the simultaneous relationships among several variableso It may also be used to understand the • Researchers may use existing coding schemes, relationship between two variables more fully. such as the Census Bureau's categorization of occupations, or develop their own coding cate- • The logic and techniques involved in quantita- gories. In either case, the coding scheme must tive research can also be valuable to qualitative researchers.

428 Chapter 14: Quantitative Data Analysis Sociological Diagnostics c. The multivariate relationship linking age. political orientation. and attitude toward • Sociological diagnostics is a quantitative analy- sis technique for determining the nature of abortion social problems such as ethnic or gender discrimination. Age Political Attitude Frequency Young Orientation toward Abortion KEY TERMS Young 90 Young Liberal Favor 10 The following terms are defined in context in the Young Liberal Oppose 60 chapter and at the bottom of the page where the term Old Conservative Favor 40 is introduced. as well as in the comprehensive glossary Old Conservative Oppose 60 at the back of the book. Old Liberal Favor 40 Old Liberal Oppose 20 average mean Conservative Favor 80 bivariate analysis median Conservative Oppose codebook mode contingency table multivariate analysis ADDITIONAL READINGS continuous variable quantitative analysis discrete variable standard deviation Babbie, Earl, Fred Halley, and Jeanne Zaino. 2000. dispersion univariate analysiS Advelltures in Social Research. Newbury Park, CA: frequency distribution Pine Forge Press. This book introduces you to the analysis of social research data through REVIEW QUESTIONS AND EXERCISES SPSS for Windows. Several of the basic statisti- cal techniques used by social researchers are L How might the various majors at your college discussed and illustrated. be classified into categories? Create a coding system that would allow you to categorize them Bernstein. Ira H\" and Paul Havig. 1999. Computer according to some meaningful variable. Then Lireracy: Getting the Most from YOllr PC Thousand create a different coding system, using a differ- Oaks. CA: Sage. Here's a quick overview of the ent variable. various ways social scientists use computers, in- cluding many common applications programs. 2. How many ways could you be described in nu- merical terms? What are some of your intrinsi- Davis, James. 1971. Elementary Sllrvey AnalysiS. En- cally numerical attributes? Could you express glewood Cliffs, NJ: Prentice-Hall. An extremely some of your qualitative attributes in quantita- well-written and well-reasoned introduction to tive terms? analysiS. In addition to covering the materials we've just covered in this chapter, Davis's book 3. How would you construct and interpret a con- is well worth reading in terms of measurement tingency table from the following information: and statistics. 150 Democrats favor raising the minimum wage, and 50 oppose it; 100 Republicans favor Ferrante. Joan, and Angela Vaughn. 1999. Let's Go raising the minimum wage, and 300 oppose it? Sociology: Travels on the II/temet. Belmont, CA: Wadsworth. This accessible little book gives 4.. Using the hypothetical data in the following an excellent introduction to the Internet and table, how would you construct and interpret suggests many websites of interest to social tables showing the following? researchers. a. The bivariate relationship between age and Lewis-Beck, MichaeL 1995. Data Analysis: All Intro- attitude toward abortion duction. Volume 103 in the Quantitative Appli- cation in the Social Sciences series. Thousand b. The bivariate relationship between political Oaks, CA: Sage. This is a very popular short orientation and attitude toward abortion book that makes statistical language accessible to the novice. You should enjoy the clarity of explanations and the thorough use of examples.

Online Study Resources 429 Nardi, Peter. 2006.. lmerpreting Data: A Guide to Un- based on your quiz results Use this study plan derstanding Research. Boston: Pearson. This excel- with its interactive exercises and other resources lent little book offers an accessible guide to un- to master the material. derstanding commonly used statistical analyses in the social sciences. 3.. When you're finished with your review, take the posttest to confirm that you're ready to Newton. Rae R, and Kjell Erik Rudestam. 1999. Your move on to the next chapter. Statistical Consultallt: Answers to Your Data Analysis Questioll5. Thousand Oaks, CA: Sage. An excel- WEBSITE FOR THE PRACTICE OF lent reader-friendly manual that will answer all SOCIAL RESEARCH 11TH EDITION sorts of questions you have or will have as soon as you begin to analyze quantitative data. Go to your book's website at http://sociology .wadsworth.com/babbie_practicelle for tools to ZieseL Hans. 1957. Say It with Figures. New York: aid you in studying for your exams. You'llfind Tuto- Harper & Row. An excellent discussion of table rial Quizzes with feedback. Internet Exercises, Flashcards, construction and other elementary analyses. and Chapter Tutorials, as well as fu.1ended Projeds, InJo- Though many years old, this is still perhaps the Trac College Edition search terms, Social Research in Cyber- best available presentation of that specific topic. space, GSS Data, Web Links, and primers for using vari- It is eminently readable and understandable and ous data-analysis software such as SPSS and NVivo. has many concrete examples. WEB LINKS FOR THIS CHAPTER SPSS EXERCISES Please realize that the Internet is an evolving See the booklet that accompanies your text for exer- entity, subject to change. Nevertheless, these cises using SPSS (Statistical Package for the Social Sci- few vvebsites should be fairly stable. Also, ences). There are exercises offered for each chapter, check your book's website for even more Web Links. and you'll also find a detailed primer on using SPSS. These websites, current at the time of this book's pub- lication, provide opportunities to learn about quantita- Online Study Resources tive analysis. Sociology~ Now'\": Research Methods University of Michigan, Survey Documentation and Analysis 1. Before you do your final review of the chapter. http://webapp.icpsLumich.edu/GSS/ . take the SociologyNolV.. Research Methods diagnos- Here's a program that will let you analyze General So- tic quiz to help identify the areas on which you cial Survey data online, without having to buy an ex- should concentrate. You'll find information on pensive program. this online tool, as well as instructions on how to access all of its great resources, in the front of Hee-Joe Cho and William Trochim, Categorical the book. Data Analysis http://www.socialresearchmethods.net/tutorial/Cho/ 2. As you review, take advantage of the Sociology outline.htm Now.. Research Methods customized study plan, This website offers an extensive examination of the logic and techniques for analyzing nominal and ordi- nal data.

The Elaboration Model Introduction Interpretation Specification The Origins of Refinements to the Elaboration Model the Paradigm The Elaboration Paradigm Elaboration and Ex Post Replication Facto Hypothesizing Explanation SociologyWNow'\": Research Methods Use this online tool to help you make the grade on your next exam. After reading this chapter, go to the \"Online Study Resources\" at the end of the chapter for instructions on how to benefit from SociologyNow: Research Methods.

The Origins of the Elaboration Model 431 Introduction Using both hypothetical and real examples, we'll see that the testing of an observed relation- This chapter is devoted to a perspective on social ship may result in a variety of discoveries and logi- scientific analysis that is referred to variously as the cal interpretations. Spuriousness is only one of the elaboration modeL the interpretation method, possibilities, the Lazarsfeld method, or the Columbia school. Its many names reflect the fact that it aims at elaborat- The accompanying box \"Why Do Elaboration?\" ing on an empirical relationship among variables in by one of the elaboration model's creators, Patricia order to interpret that relationship in the manner Kendall, provides another powerful justification for developed by Paul LazarsJeld while he was at COIIlIll- using this modeL bia University. As such, the elaboration model is one method for doing multivariate analysis, The Origins of the Elaboration Model Researchers use the elaboration model to un- derstand the relationship between two variables The historical origins of the elaboration model pro- through the simultaneous introduction of addi- vide a good illustration of how scientific research tional variables. Though developed primarily works in practice. As I mentioned in Chapter L through the medium of contingency tables, it during World War II Samuel Stouffer organized may be used vvith other statistical techniques, as and headed a special social research branch within Chapter 16 will show. the U.S. Army. Throughout the war, this group conducted a large number and variety of surveys I firmly believe that the elaboration model among U.S. servicemen. Although the objectives offers the clearest available picture of the logic of these studies varied somewhat, they generally of causal analysis in social research. Especially focused on the factors affecting soldiers' combat through the use of contingency tables, this method effectiveness. portrays the logical process of scientific analysis. Moreover, if you can comprehend fully the use of Several of the studies examined morale in the the elaboration model using contingency tables, military. Because morale seemed to be related posi- you should greatly improve your ability to use and tively to combat effectiveness, improving morale understand more-sophisticated statistical tech- would make the war effort more effective, Stouffer niques, such as partial regressions and log-linear and his research staff sought to uncover some of the models, for example. variables that affected morale, In part, the group sought to confirm empirically some commonly ac- In a sense, this discussion of elaboration analysis cepted propositions, including the following: is an extension of our earlier examination of spuri- ousness in Chapter 4. As you'll recall, one of the cri- 1. Promotions surely affect soldiers' morale, so teria of causal relations in social research is that the soldiers serving in units with low promotion observed relationship between two variables not be rates should have relatively low morale. an artifact caused by some other variable, In the case of the positive relationship between the number elaboration model A logical model for under- of fire trucks responding to a fire and the amount of standing the relationship between rwo methods by damage done, for example, vl'e saw that the size of controlling for the effects of a third. Developed prin- the fire explained away the apparent relationship cipally by Paul Lazarsfeld. The various outcomes of between trucks and damage. The bigger the fire, the an elaboration analysis are replication, specification, more trucks responding to it; and the bigger the fire, explanation, and interpretation, the more damage done. The logic used in that hypo- thetical example was the same as the logic of the elaboration model.

432 Chapter 15: The Elaboration Model by Patricia L. Kendall qualifications and regardless of the desire ofthe colleges to accept them) and the control group to other colleges and universities, wait 20 years or Department ofSoci%gy, Queens College, CUNY so until the two groups have reached professional maturity,and then measure the relative success of the two groups. Certainly abizarre There are several aspects of atrue controlled experimentThe most process. crucial are (1) creating experimental and control groups that are identical within limits of chance (this is done by assigning individuals to Sociologists also investigate the hypothesis that coming from a the two groups through processes of randomization using tables of ran- broken home leads to juvenile delinquency. How would we go about dom numbers, fiipping coins, etc); (2) making sure that it is the experi- studying this experimentally? If you followed the example above, you menter who introduces the stimulus, not external events; and (3) waiting would see that studying this hypothesis through atrue experiment to see whether the stimulus has had its presumed effect. would be totally impossibleJust think of what the experimenter would have to do! We may have the hypothesis, for example, that attending Ivy League colleges leads to greater success professionally than attending The requirements oftrue experiments are so unrealistic in socio- other kinds of colleges and universities does. How would we study this logical research that we are forced to use other,and less ideal, methods in through atrue experiment? Suppose you said,\"Take agroup of people all but the most trivial situationsWe can study experimentally whether in their 40s, find out which ones went to Ivy League colleges, and see students learn more from one type of lecture than another, or whether whether they are more successful than those who went to other kinds afilm changes viewers' attitudes. But these are not always the sorts of of colleges.\" If that is your answer,you are wrong questions in which we are truly interested. Atrue experiment would require the investigator to select several We therefore resort to approximations-generally surveys-that classes of high school seniors, divide each class at random into experi- have their own shortcomings. However, the elaboration model allows mental and control groups, send the experimental groups to Ivy League us to examine survey data, take account oftheir possible shortcomings, colleges (regardless of their financial circumstances or academic and draw rather sophisticated conclusions about important issues. 2. Given racial segregation and discrimination in about the promotion system than did those serving the South, African American soldiers being in the Army Air Corps (where promotions were the trained in northern training camps should have fastest in the army). The other proposition fared higher morale than should those being trained just as badly. African American soldiers serving in the South. in northern training camps and those serving in southern training camps seemed to differ little if at 3. Soldiers with more education should be more all in their general morale. And less-educated sol- likely to resent being drafted into the army diers were more likely to resent being drafted into as enlisted men than should those with less the army than those with more education were. education. Rather than trying to hide the findings or just Each of these propositions made sense logically, running tests of statistical significance and publish- and common wisdom held each to be true. Stouffer ing the results, Stouffer asked, \"Why?\" He found decided to test each empirically To his surprise, the answer to this question within the concepts of none of the propositions was confirmed. reference group and relative deprivation. Put sim- ply, Stouffer suggested that soldiers did not evaluate We discussed the first proposition in Chapter 1. their positions in life according to absolute, objec- As you may recall, Stouffer found that soldiers serv- tive standards, but rather on the basis of their rela- ing in the Military Police (where promotions were tive position vis-a.-vis others around them. The the slowest in the army) had fewer complaints

The Origins of the Elaboration Model 433 people they compared themselves ,'lith were in 3. During wartime, many production-line in- their reference group, and they felt relative depriva- dustries and farming are vital to the na- tion if they didn't compare favorably in that regard. tional interest; workers in those industries and farmers are exempted from the draft. Following this logic, Stouffer found an answer to each of the anomalies in his empirical data. Re- 4. A man with little education is more likely garding promotion, he suggested that soldiers to have friends in draft-exempt occupations judged the fairness of the promotion system based than a man with more education. on their own experiences relative to others around them. In the Military Police, where promotions 5. When each compares himself with his were few and slow, few soldiers knew of a less- friends, a less educated draftee is more qualified buddy who had been promoted faster likely to feel discriminated against than a than they had. In the Army Air Corps, however, draftee with more education. the rapid promotion rate meant that many soldiers knew of less-qualified buddies who had been pro- (1949-1950 122-27) moted faster than seemed appropriate. Thus, ironi- cally, the MPs said the promotion system was gen- Stouffer'S explanations unlocked the mystery erally fair, and the air corpsmen said it was not. of the three anomalous findings. Because they were not part of a preplanned study design, how- A similar analysis seemed to explain the case ever, he lacked empirical data for testing them. of the African American soldiers. Rather than com- Nevertheless, Stouffer's logical exposition provided paring conditions in the North with those in the the basis for the later development of the elab- South, African American soldiers compared their oration model: understanding the relationship own status with the status of the African American between two variables through the controlled civilians around them. In the South, where dis- introduction of other variables. crimination was at its worst, they found that being a soldier insulated them somewhat from adverse Paul Lazarsfeld and his associates at Columbia cultural norms in the surrounding conm1Unity. University formally developed the elaboration Whereas southern African American civilians model in 1946. In a methodological review of were grossly discriminated against and denied self- Stouffer'S army studies, Lazarsfeld and Patricia esteem, good jobs, and so forth, African American Kendall used the logic of the elaboration model to soldiers had a slightly better status. In the North, present hypothetical tables that would have proved however, many of the African American civilians Stouffer'S contention regarding education and ac- they encountered held well-paying defense jobs. ceptance of induction had the empirical data been And with discrimination being less severe, being a available (Kendall and Lazarsfeld 1950). soldier did not help one's status in the conununity. The central logic of the elaboration model be- Finally, the concepts of reference group and gins with an observed relationship between two relative deprivation seemed to explain the anomaly variables and the possibility that one variable may of highly educated draftees accepting their induc- be causing the other. In the Stouffer example, the tion more willingly than those with less education initial two variables were educational level and accept- did. Stouffer reasoned as follows: ance ofbeing drafted as fair. Because the soldiers' edu- cational levels were set before they were drafted 1. A person's friends, on the whole, have (and thus having an opinion about being drafted) about the same educational status as that it would seem that educational level was the cause, person does. or independent variable, and acceptance ofinduction was the effect, or dependent variable. As we just 2. Draft-age men with less education are more saw, however, the observed relationship countered likely to engage in semi-skilled production- what the researchers had expected. line occupations and farming than more educated men. The elaboration model examines the impact of other variables on the relationship first observed.

434 Chapter 15: The Elaboration Model TABLE 15-1 TABLE 15-2 Summary of Stouffer's Data on Education Hypothetical Relationship and Acceptance of Induction between Education and Deferment of Friends LowEd. Friends Deferred? Ed. LowEd. Should not have been deferred 88% 70% Yes 19% 79% Should have been deferred 12 30 No 81 21 100 100 100 (1,761) (1,876) 100 (1,876) (1,761) Source:Tables 1S-1, 15-2, 15-3,and 15-4 are modified with permission of TABLE 15-3 Macmillan Publishing Co., Inc, from Continuities in Social Reseonh:Studies in the Scope and Method of\"The American Soldier\" by Robert K. Merton and Paul E Hypothetical Relationship between Deferment Lazarsfeld (edsJCopyright1950 byThe Free Press,a Corporation,renewed 1978 of Friends and Acceptance of One's Own Induction by Robert K. Merton Friends Deferred? Sometimes this analysis reveals the mechanisms Should not have been deferred Yes No through which the causal relationship occurs. Should have been deferred 63% 94% Other times an elaboration analysis disproves the 37 6 existence of a causal relationship altogether. 100 100 (1,819) (1,818) In the present example, the additional variable was whether or not a soldier's friends were de- Notice that the numbers of soldiers with high ferred or drafted. In Stouffer's speculative explana- and low education are the same as in Stouffer's real tion, this variable showed how it was actually logi- data. In later tables, you see that the numbers who cal that soldiers with more education would be the accepted or resented being drafted are kept true to more accepting of being drafted: because it was the original data. Only the numbers saying that likely that their friends would have been drafted. friends were or were not deferred were made up. Those with the least education were likely to have been in occupations that often brought deferments Stouffer's explanation next assumed that sol- from the draft, leading those drafted to feel they diers with friends who had been deferred would had been treated unfairly. be more likely to resent their own induction than those who had no deferred friends would. Table Kendall and Lazarsfeld began with Stouffer's 15-3 presents the hypothetical data that would data showing the positive association between edu- have supported that assumption. cation and acceptance of induction (see Table 15-1). In this and the follOwing tables, \"should have been The hypothetical data in Tables 15-2 and 15-3 deferred\" and \"should not have been deferred\" rep- would confirm linkages that Stouffer had specified resent inductees' judgments of their own situation, in his explanation. First. soldiers with low educa- with the latter group feeling it was fair for them to tion were more likely to have friends who were de- have been drafted. ferred than soldiers with more education were. Second, having friends who were deferred made a Then, Kendall and Lazarsfeld created some hy- soldier more likely to think he should have been pothetical tables to represent what the analysis deferred. Stouffer had suggested that these two re- might have looked like had soldiers been asked lationships would clarify the original relationship whether most of their friends had been drafted or between education and acceptance of induction. deferred. In Table 15-2, 19 percent of those with high education hypothetically said their friends were deferred, compared vvith 79 percent of the soldiers with less education.

The Origins of the Elaboration Model 435 TABLE 15-4 Hypothetical Data Relating Education to Acceptance of Induction through the Factor of Having Friends Who Were Deferred Kendall and Lazarsfeld created a hypothetical table increased, acceptance of one's own induction also that would confirm Stouffer'S explanation (see increased. The nature of this empirical relationship, Table 15-4). however, was interpreted through the introduction of a third variable. The variable, deferment offriends, Recall that the original finding was that draftees did not deny the original relationship; it merely with high education were more likely to accept clarified the mechanism through which the origi- their induction into the army as fair than those nal relationship occurred. with less education were. In Table 15-4, however, we note that level of education has no effect on the This, then, is the heart of the elaboration model acceptance of induction among those who repon and of multivariate analysis. Having observed an having friends deferred: 63 percent among both empirical relationship between two variables (such educational groups indicate that they accept their as level ofedllcation and acceprance ofinduction) , we induction (that is, they say they should not have seek to understand the nature of that relationship been deferred). Similarly, educational level has no through the effects produced by introducing other significant effect on acceptance of induction among variables (such as having friends who were deferred). those who reported having no friends deferred: Mechanicall}, we accomplish this by first dividing 94 and 95 percent say they should not have been our sample into subsets on the basis of the test deferred. variable, also called the control variable. In our ex- ample, having friends deferred or not is the test On the other hand, among those with high ed- variable, and the sample is divided into those who ucation the acceptance of induction is strongly have deferred friends and those who do not. The related to whether or not friends were deferred: relationship between the original two variables 63 percent versus 94 percent. And the same is true (acceptance ofinductiOlz and level ofeducation) is then among those with less education. The hypothetical recomputed separately for each of the subsamples. data in Table 15-4, then, would support Stouffer'S The tables produced in this manner are called the contention that education affected acceptance of partial tables, and the relationships found in the induction only through the medium of having friends deferred. Highly educated draftees were test variable A variable that is held constant in an less likely to have friends deferred and, by virtue of attempt to clarify further the relationship between that fact. were more likely to accept their own in- two other variables. Having discovered a relationship duction as fair. Those vvith less education were between education and prejudice, for example, we more likely to have friends deferred and, by virtue might hold gender constant by examining the rela- of that fact. were less likely to accept their own tionship between education and prejudice among men induction. only and then among women only. In this example, gender would be the [est variable. Recognize that neither Stouffer's eA'})lanation nor the hypothetical data denied the reality of the original relationship. As educational level

436 Chapter 15: The Elaboration Model partial tables are called the partial relationships, Independent --~.\",+. Test =====~ Dependent or pm·rials. The partial relationships are then com- variable variable variable pared with the initial relationship discovered in the total sample, often referred to as the zero-order FIGURE 15-1 relationship to indicate that no test variables have Intervening Test Variable been controlled for. Test \"Independent\" Although the elaboration was first demon- variable variable strated through the use of hypothetical data, it laid out a logical method for analyzing relation- FIGURE 15-2 \"Dependent\" ships among variables that have been actually Antecedent Test Variable variable measured. As we'll see, our first, hypothetical ex- ample describes only one possible outcome in the If the test variable is antecedent to both the in- elaboration modeL There are others. dependent and dependent variables, a different model must be used (see Figure 15-2). Here the The Elaboration Paradigm test variable affects both the \"independent\" and \"dependent\" variables. Realize, of course, that the This section presents guidelines for understanding terms independent variable and dependent variable are, an elaboration analysis. To begin, we must know strictly speaking, used incorrectly in the diagram. whether the test variable is antecedent (prior in In fact, we have one independent variable (the test time) to the other two variables or whether it is variable) and two dependent variables. The incor- imervening between them, because these positions rect terminology has been used only to provide suggest different logical relationships in the multi- continuity with the preceding example. Because variate model. If the test variable is intervening, as of their individual relationships to the test variable, in the case of education, deferment of friends, and the \"independent\" and \"dependent\" variables are acceptance of induction, then the analysis is based empirically related to each other, but there is no on the model shown in Figure IS -L The logic causal link between them. Their empirical relation- of this multivariate relationship is that the inde- ship is merely a product of their coincidental rela- pendent variable (educarionallevel) affects the in- tionships to the test variable. (Subsequent exam- tervening test variable (Jzaving friends deferred or ples will further clarify this relationship.) not), which in turn affects the dependent variable (accepting induction). Table 15-5 is a guide to understanding an elabo- ration analysis. The two columns in the table indi- partial relationship In the elaboration model, this cate whether the test variable is antecedent or inter- is the relationship between two variables when ex- vening in the sense described previously. The left amined in a subset of cases defined by a third vari- side of the table shows the nature of the partial rela- able. Beginning with a zero-order relationship be- tionships as compared vvith the original relationship tween political party and attitudes toward abortion, for between the independent and dependent variables. example, we might want to see whether the rela- The body of the table gives the technical notations- tionship held true among both men and women replication, explanation, interpretation, and (Le., controlling for gender). The relationship found specification-assigned to each case. We'll discuss among men and the relationship found among each in turn. women would be the partial relationships, some- tin1es simply called the partials zero-order relationship In the elaboration model, tillS is the original relationship between two vari- ables, with no test variables controlled for.

The Elaboration Paradigm 437 TABLE 15-5 relationship between having friends deferred and attitude toward being drafted. The Elaboration Paradigm Researchers frequently use the elaboration Test Variable model rather routinely in the hope of replicating their findings among subsets of the sample. If we Partial Relationships Antecedent Intervening discovered a relationship between education and Compared with Original prejudice, for example, we might introduce such test variables as age, region ofthe country, race, religion, Same Relationship Replication Replication and so forth to test the stability of the original less or none Explanation Interpretation relationship. If the relationship were replicated Split* Specification Specification among young and old, among people from differ- ent parts of the country, and so forth, we would *One partial is the same or greater, and the other is less or none have grounds for concluding that the original relationship was a genuine and general one. Replication Explanation Whenever the partial relationships are essentially the same as the original relationship, the term Explanation is the term used to describe a spurious replication is assigned to the result, regardless of relationship: an original relationship shown to be whether the test variable is antecedent or interven- false through the introduction of a test variable. ing. This means that the original relationship has This requires two conditions: (I) The test variable been replicated under test conditions. If, in our must be antecedent to both the independent and previous example, education still affected accep- dependent variables. (2) The partial relationships tance of induction both among those who had must be zero or significantly less than those found friends deferred and those who did not, then we in the original. Several examples will illustrate this would say the original relationship had been repli- situation. cated. Note, however, that this finding would not confirm Stouffer's explanation of the original rela- Let's look at an example we touched on in tionship. Having friends deferred or not would Chapter 4. There is an empirical relationship be- not be the mechanism through which education tween the number of storks in different areas and affected the acceptance of induction. the birthrates for those areas. The more storks in an area, the higher the birthrate. This empirical re- To see what a replication looks like, turn back lationship might lead one to assume that the num- to Tables 15-3 and 15-4. Imagine that our initial ber of storks affects the birthrate. An antecedent discovery was that having friends deferred strongly test explains away this relationship, however. Rural influenced how soldiers felt about being drafted, areas have both more storks and higher birthrates as shown in Table 15-3. Had we first discovered this relationship, we might have wanted to see replication A technical term used in connection whether it was equally true for soldiers of different with the elaboration model, referring to the elabora- educational backgrounds. To find out, we would tion outcome in whidl the initially observed rela- have made education our control or test variable. tionship between two variables persists when a con- trol variable is held constant, thereby supporting the Table 15-4 contains the results of such an exam- idea that the original relationship is genuine. ination, though it is constructed somewhat differ- explanation An elaboration model outcome in ently from what we would have done had we used which the original relationship between fWO vari- education as the test variable. Nevertheless, we see ables is revealed to have been spurious, because the in the table that having friends deferred or not still relationship disappears when an antecedent test influences attitudes toward being drafted among variable is introduced. those soldiers v'lith high education and those with low education. (Compare columns 1 and 3, then 2 and 4.) This result represents a replication of the

438 Chapter 15: The Elaboration Model J. Birthrates of Towns and Cities Having Few birthrates., Also notice that only one mral place had few storks and only one urban place had lots or Many Storks of storks. L L LLL HH Here's a similar example, also mentioned in Chapter 4. There is a positive relationship between L LLL LH HL H H H H the number of fire tmcks responding to a fire and L L L LL L L the amount of damage done. If more trucks re- HHH H spond, more damage is done. One might assume from this fact that the fire trucks themselves cause II. Birthrates of Towns and Cities Having Few the damage. However, an antecedent test variable, the size of the fire, explains away the original rela- or Many Storks, Controlling for Rural (Towns) tionship. Large fires do more damage than small and Urban (Cities) ones do, and more fire trucks show up at large fires than at small ones. Looking only at large fires, we L L L L LL L H would see that the original relationship vanishes (or perhaps reverses itself); and the same would LLL L HH be true looking only at small fires. L LL L L Finally, let's take a real research example. Years ago, I found an empirical relationship between the H = Town or city with high birthrate region of the country in which medical school fac~ ulty members attended medical school and their at- L = Town or city with low birthrate titudes toward Medicare (Babbie 1970). To simplify matters, only the East and the South will be exam- FIGURE 15-3 ined. Of faculty members attending eastern medical The Facts of Life about Storks and Babies schools, 78 percent said they approved of Medicare, compared with 59 percent of those attending than urban areas do. Within mral areas, there is southern medical schools. This finding made sense no relationship between the number of storks and in view of the fact that the South seemed generally the birthrate; nor is there a relationship within more resistant to such programs than the East did, urban areas. and medical school training should presumably af- fect a doctor's medical attitudes. However, this rela- Figure 15-3 illustrates how the mral!urban tionship is explained away when we introduce an variable causes the apparent relationship between antecedent test variable: the region of the country storks and birthrates. Part I of the figure shows the in which the faculty member was raised. Of faculty original relationship, Notice that all but one of the members raised in the East, 89 percent attended entries in the box for towns and cities with many medical school in the East and 11 percent in the storks have high birthrates and that all but one of South. Of those raised in the South, 53 percent at- those in the box for towns and cities vvith few tended medical school in the East and 47 percent storks have low birthrates, In percentage form, we in the South. Moreover, the areas in which fac- say that 93 percent of the tovvns and cities with ulty members were raised related to attitudes to- many storks also had high birthrates, contrasted ward Medicare. Of those raised in the East, 84 with 7 percent of those with few storks, That's percent approved of Medicare, as compared with quite a large difference and represents a strong 49 percent of those raised in the South, association between the two variables. Table 15-6 presents the three-variable relation- Part II of the figure separates the towns from ship among (1) region in which raised, (2) region the cities (the mral from urban areas) and exam~ of medical school training, and (3) attitude toward ines storks and babies in each type of place sepa~ Medicare. Faculty members raised in the East are rately. Now we can see that all the mral places have quite likely to approve of Medicare, regardless of high birthrates, and all the urban places have low

The Elaboration Paradigm 439 TABLE 15-6 induction is not explained away; it is still a genuine Region of Origin, Region of Schooling, relationship. In a real sense, educational differences and Attitude toward Medicare cause differential acceptance of induction. The in~ tervening variable, deferment of friends, merely Percent Who helps to interpret the mechanism through which Approve ofMedicare the relationship occurs. Thus, an interpretation does not deny the validity of the original causal re- Region in Which Raised lationship but simply clarifies the process through which that relationship functions. East South Here's another example of interpretation. Re- Region of Medical East 84 50 searchers have observed that children from broken School Training homes are more likely to become delinquent than South 80 47 are those from intact homes are. This relationship may be interpreted, however, through the intro- Source: Earl R. Babbie,Science and Morality in Medicine (Berkeley: University of duction of SllpelYision as a test variable. Among chil- California Press, 1970), 181 dren who are supervised, delinquency rates are not affected by whether or not their parents are where they attended medical schooL Those raised divorced. The same is tme among those who are in the South are relatively less likely to approve of not supervised. It is the relationship between Medicare, but, again, the region of their medical broken homes and the lack of supervision that school training has little or no effect. These data in- produced the original relationship. dicate, therefore, that the original relationship be~ tween region of medical training and attitude to- Specification ward Medicare was spurious; it was due only to the coincidental effect of region of origin on both region Sometimes the elaboration model produces partial of medical training and attitude toward Medicare. relationships that differ significantly from each When region of origin is held constant, as in Table otheI'. For example, one partial relationship is the 15-6, the original relationship disappears in the same as or stronger than the original two-variable partials. relationship, and the second partial relationship is less than the original and may be reduced to zero. In In \"Attending an Ivy League College and Suc- the elaboration paradigm, this situation is referred cess in Later Professional Life,\" Patricia Kendall, one to as specification: We have specified the condi- of the founders of the elaboration model, recalls a tions under which the original relationship occurs. study in which the researcher suspected an expla- nation but found a replication. Though the data are interpretation A technical term used in connection no longer current, the topic is still of vital interest to with the elaboration model. It represents the research students: To what extent does your professional outcome in which a control variable is discovered to success depend on attending the \"right\" school? be the mediating factor through which an indepen- dent variable has its effect on a dependent variable. Intelpretation specification A teclmical tenn used in connection \"vith the elaboration model, representing the elabo- Interpretation is similar to explanation, except for ration outcome in which an initially observed rela- the time placement of the test variable and the im- tionship between two variables is replicated among plications that follow from that difference, Interpre- some subgroups created by the control variable but tation represents the research outcome in which a not among others. In such a situation, you will have test or control variable is discovered to be the medi- specified the conditions under which the original re- ating factor through which an independent vari- lationship exists: for example, among men but not able has its effect on a dependent variable. The ear- among women. lier example of education, friends deferred, and acceptance of induction is an excellent illustration of interpretation. In terms of the elaboration model, the effect of education on acceptance of

440 Chapter 15: The Elaboration Model TABLE 1* Department ofSociology, Queens College, CUNY College Attended (X) main danger for survey analysts is that arelationship Later Professional Ivy League Other College hope is causal will turn out to be spurious That is, the original Success (Y) College or University relationship between Xand Vis explained by an antecedent test factor Successful 1,300 (65%) 2,000 (25%) °More specifically, the partial relationships between Xand Yreduce to Unsuccessful 700 (35%) 6,000 (75%) Total 8,000 (100%) when that antecedent test factor is held constant. 2,000 (100%) This was adistinct possibility in amajor finding from astudy car- *1 have had to invent relevant figures because the only published version of ried out several decades ago One of my fellow graduate students at West's study contained no totals. See Ernest Havemann and Patricia Salter Columbia University, Patricia SalterWest, based her dissertation on ques- West, They Wentto College (New York: Harcourt, Brace, 1952) tionnaires obtained by Time Magazine from 10,000 ofits male subscribers. Among many ofthe hypotheses developed byWestwas that male gradu- TABLE 2 ates of Ivy League schools (Brown,Columbia,Cornell,Dartmouth, Harvard, University of Pennsylvania, Princeton,and Yale) were more successful in Attendance at Ivy League Colleges According to their later professional careers, as defined by their annual earnings,than Socioeconomic Status (SES) those who graduated from other colleges and universities. Family SES (T) The initial four-fold table (Table 1) supported West's expectation. Although Imade up the figures, they conform closely to what West actu- College Attended (X) HighSES LowSES ally found in her study Having anended an Ivy League school seems to 1,500 (33%) 500 (9%) lead to considerably greater professional success than does being agrad- Ivy League colleges uate of some other kind of college or university Other colleges and 3,000 (67%) 5,000 (91%) 4,500 (100%) 5,500 (100%) But wait aminute. Isn't this arelationship that typically could universities be spurious?Who can afford to send their sons to Ivy League schools? Total Wealthy families, of course.; And who can provide the business and pro- fessional connections that could help sons become successful in their According to Table 2, athird of those coming from families defined careers? Again, wealthy or well-to-do families. as wealthy, compared with 1in 11 coming from less well-to-do back- grounds, attended Ivy League colleges Thus there is avery high correla- In other words, the socioeconomic status of the student's family tion between the two variables, Xand T(There is asimilarly high may explain away the apparent causal relationship In fact, some of West's findings suggest that this might indeed be the case. Now recall the study, cited earlier in this church involvement provides an alternative form of book, of the sources of religious involvement gratification for people who are denied gratification (Glock, Ringer, and Babbie 1967: 9h It was dis- in the secular society. This conclusion explained covered that among Episcopal church members, why women were more religious than men, why involvement decreased as social class increased. old people were more religious than young people, This finding is reponed in Table 15-7, which and so forth. Glock reasoned that people of lower examines mean levels of church involvement social class (measured by income and education) among women parishioners at different levels of had fewer chances to gain self-esteem from the social class. secular society than people of higher social class did. To illustrate this idea, he noted that social class Glock interpreted this finding in the con- was strongly related to the likelihood that a woman text of others in the analysis and concluded that

The Elaboration Paradigm 441 TABLE 3 between Xand Ywith THeld Constant Low Family SES (T) Partial High Family SES (T) Latersuccess (Y) Ivy League Other Ivy League Other College (X) College (X) College (X) College Successful Not successful 1,000 (67%) 1,000 (33%) 300 (60%) 1,000 (20%) Total 500 (33%) 2,000 (67%) 200(40%) 4,000 (80%) 3,000 (100%) 500 (100%) 5,000 (100%) 1,500 (100%) correlation betvleen family socioeconomic status [Tj and later profes- original relationship. ConSider, for example, the intelligence ofthe sional success [Y]) students (as measured by 10 tests or SAT scores) Ivy League colleges pride themselves on the excellence of their student bodies. They may The magnitude of these so-called marginal correlations suggest therefore be willing to award merit scholarships to students with that West's hypothesis regarding the causal nature of having attended an exceptional qualifications but not enough money to pay tuition and Ivy League college might be incorrect; it suggests instead that the so- board. Once admitted to these prestigious colleges, bright students may cioeconomic status of the students' families accounted for the original develop the skills-and connections-that will lead to later profes- relationship she observed sional success Since West had no data on the imelligence of the men she studied, she was unable to study whether the partial relationships disap- We are not done yet,howeverThe crucial question is what peared once this test factor was introduced happens to the partial relationships once the test faclOr is controlled These are shown in Table 3 In sum, the elaboration paradigm permits the investigator to rule out certain possibilities and to gain support for others. It does not permit These partial relationships show that, even when family socioeco- us to prove anything nomic status is held constant, there is still amarked relationship between having anended an Ivy League college and success in later profeSSional .Since she had no direct data on family socioeconomic status, West defined as wealthy life.As aresult,West's initial hypothesis received support from the or high socioeconomic SlaLUS those who supported their sons completely dur- analysis she carried out ing all Ihe deiined as less wealthy or having 10\\'1 socioeconomic status those whose sons worked their way through college, in part or totaily. Despite this,vVest had in no way proved her hypothesisThere are almost always additional antecedent factors that might explain the had ever held an office in a secular organization related to church involvement among those who (see Table 15-8) had held such office. Glock then reasoned that if social class were Table 15-9 presents an example of a specifi- related to church involvement only by virtue of cation. Among women who have held office in sec- the fact that lower-class women would be denied ular organizations, there is essentially no relation- opportunities for gratification in the secular society, ship between social class and church involvement. the original relationship should not hold among In effect the table specifies the conditions under women who were getting gratification\" As a rough which the original relationship holds: among those indicator of the receipt of gratification from the women lacking gratification in the secular society\" secular society, he used as a variable the holding of secular office. In this test, social class should be un- The term specification is used in the elaboration paradigm regardless of whether the test variable is

442 Chapter 15: The Elaboration Model TABLE 15-7 antecedent or intervening. In either case, the meaning is the same. We have specified the Social Class and Mean Church Involvement particular conditions under which the original Women relationship holds. Social Class Levels Refinements to the Paradigm Low High The preceding sections have presented the primary logic of the elaboration model as developed by o 2 34 Lazarsfeld and his colleagues. Here we look at some logically possible variations, some of which can be Mean involvement 0.63 0.58 0.49 0.48 0.45 found in a book by Morris Rosenberg (1968). Note: Mean scores rather than percentages have been used here First. the basic paradigm assumes an initial rela- Source: Tables 15-7,15-8, and 15-9 are from Charles YGlock, Benjamin B. Ringer, tionship between two variables. It might be useful, and Earl R. Babbie, To Comfort and to Challenge (Berkeley: University of California however, in a more comprehensive model to differ- Press,1967) Used with permission of the Regents of the University of California entiate between positive and negative relationships. Moreover, Rosenberg suggests using the elaboration TABLE 15-8 High model even vvith an original relationship of zero. He 4 cites as an example a study of union membership Social Class and the Holding of Office and attitudes toward having Jews on the union staff in Secular Organizations 83 (see Table 15-10). The initial analysis indicated that length of union membership did not relate to the at- Social Class Levels titude: Those who had belonged to the union less than four years were just as willing to accept Jews on Low the staff as were those vvho had belonged for more than four years. The age of union members, how- 0 23 ever, was found to suppress the relationship between Percent who length of union membership and attitude toward have held office in a Jews. OveralL younger members were more favor- secular organization 46 47 54 60 able to Jews than older members were. At the same time, of course, younger members were not likely to TABLE 15-9 have been in the union as long as the old members. Within specific age groups, however, those in the Church Involvement by Social Class and Holding union longest were the most supportive of having Secular Office Jews on the staff. Age, in this case, was a suppres- sor variable, concealing the relationship between Mean Church Involvement length of membership and attitude toward Jews. for Social Class Levels Second, the basic paradigm focuses on partials Low High being the same as or weaker than the original rela- 34 tionship but does not provide guidelines for specify- o72 ing what constitutes a significant difference between the original and the partials. When you use the Have held office 0.46 0.53 0.46 0.46 0.46 elaboration modeL you'll frequently find yourself making an arbitrary decision about whether a given Have not held office 0.62 0.55 0.47 0.46 0.40 partial is significantly weaker than the originaL This, then, suggests another dimension that could be suppressor variable In the elaboration modeL a added to the paradigm. test variable that prevents a genuine relationship from appearing at the zero-order leveL

The Elaboration Paradigm 443 TABLE 15-10 Example of aSuppressor Variable I: No Apparent Relationship between Attitudes towardJews and Length ofTime in the Union Percent who don't care if there are Jews on the union staff Less than four years Fouryears or more 49.2 50.5 (126) (256) 1/: In Each Age Group, Length ofTime in Union Inaeases Wilfingness to Have Jews on Union Staff Percent who don't care ifthere are Jews on the union staff more Age: 29 years and under 56.4 62.7 (78) (51) 30-49 years 37.1 48.3 (35) (116) 50 years and older 38.4 56.1 (13) (89) Source: Adapted from Morris Rosenberg, The LogicofSurvey Analysis (New York: Basic Books, 1968), 88-89. Used by permission. Third, the limitation of the basic paradigm to found among whites. Holding race constant, then, partials that are the same as or weaker than the the researcher would conclude that support for the original neglects two other possibilities. A partial civil rights movement was greater among the relationship might be stronger than the originaL middle class than among the working class. Or, on the other hand, a partial relationship might be the reverse of the original-for example, nega- Here's another example of a distorter variable tive where the original was positive. at work. When Michel de Seve set out to examine the starting salaries of men and women in the same Rosenberg provides a hypothetical example of organization, she was surprised to find the women the latter possibility by first suggesting that a re- were receiving higher starting salaries, on the aver- searcher might find that working-class respondents age, than their male counterparts were. The dis- in his study are more supportive of the civil rights torter variable was time affirst hire. Many of the movement than middle-class respondents are (see women had been hired relatively recently, when Table 15-11). He further suggests that race might salaries were higher overall than in the earlier years be a distorter variable in this instance, reversing when many of the men had been hired (reported the true relationship between class and attitudes. in E. Cook 1995). Presumably, African American respondents would be more supportive of the movement than whites All these new dimensions further complicate would, but African Americans would also be over- the notion of specification. If one partial is the represented among working-class respondents and same as the originaL and the other partial is even underrepresented among the middle class. Middle- class African American respondents might be more distorter variable In the elaboration model. a test supportive than working-class African Americans, variable that reverses the direction of a zero-order however; and the same relationship might be relationship.

444 \" Chapter 15: The Elaboration Model TABLE 15-11 I'm not saying all this to fault the basic elabora- tion paradigm\" To the contrary, I want to emphasize Example of a Distorter Variable (Hypothetical) that the elaboration model is not a simple algo- rithm-a set of procedures through which to ana- I: Working-Class Subjects Appear More Liberalon Civil Rights than lyze research. Rather, it's primarily a logical device Middle-Class Subjects for assisting the researcher in understanding his or her data. A firm understanding of the elaboration Civil Rights Middle Class Class model will make a sophisticated analysis easier. Score 45% However, this model suggests neither which vari- High 37% 55 ables should be introduced as controls nor definitive 63 100 conclusions about the nature of elaboration results. low 100 (120) For all these things, you must look to your own in- (120) genuity. Such ingenuity, moreover, will come only 100%= through extensive experience. By pointing to over- simplifications in the basic elaboration paradigm, II: (ontrolling for Race Shows the Middle Class to Be More Liberal than I've sought to bring horne the point that the model the Working Clas5 provides only a logical framework. You'll find so- phisticated analyses far more complicated than the Social Class examples used to illustrate the basic paradigm. Civil Blacks Whites At the same time, if you fully understand the Rights basic model, you'll understand other techniques Score Middle Working Middle Working such as correlations, regressions, and factor analy- High Class Class Class Class ses a lot more easily. Chapter 16 places such tech- niques as partial correlations and partial regressions low 70% 50% 30% 20% in the context of the elaboration model. 100% = 30 50 70 80 Elaboration and Ex Post Facto Hypothesizing 100 100 100 100 Before we leave the discussion of the elaboration (20) (100) (100) (20) model, we should look at it in connection with a form of fallacious reasoning called ex post facto soune: Morris Rosenberg, The Logic ofSurvey Analysis (New York: Basic Books, hypothesizing.. The reader of methodological lit- 1968),94-95\" Used by permission erature will find myriad references warning against it. But although the intentions of such injunctions stronger, how should you react to that situation? are correct, inexperienced researchers can some- You've specified one condition under which the times be confused about its implications. original relationship holds up, but you've also specified another condition under which it holds \"Ex post facto\" means \"after the facL\" When even more clearly. you observe an empirical relationship between two variables and then simply suggest a reason for that Finally, the basic paradigm focuses primarily relationship, that is sometimes called ex post facto on dichotomous test variables, In fact, the elabora- hypothesizing. You've generated a hypothesis link- tion model is not so limited-either in theory or ing two variables after their relationship is already in use-but the basic paradigm becomes more knovvn. You'll recall, from an early discussion in complicated when the test variable divides the this book, that all hypotheses must be subject to sample into three or more subsamples\" And the paradigm becomes more complicated yet when more than one test variable is used simultaneously\" ex post facto hypothesis A hypothesis created af- ter confirming data have already been collected. It is a meaningless construct because there is no way for it to be disconfirmed.

Main Points 445 disconfirmation in order to be meaningful. Unless women who were receiving gratification from the you can specify empirical findings that would secular society-those who had held office in a disprove your hypothesis, it's not really a hypothesis secular organization. This hypothesis was then sub- as researchers use that term. You might reason, jected to an empirical tesL Had the new hypothesis therefore, that once you've observed a relationship not been confirmed by the data, he would have between two variables, any hypothesis regarding been forced to reconsider. that relationship cannot be disproved. These additional comments should further illus- This is a fair assessment if you're doing nothing trate the point that data analysis is a continuing pro- more than dreSSing up your empirical observations cess, demanding all the ingenuity and perseverance with deceptive hypotheses after the fact. Having you can muster. The image of a researcher carefully observed that women are more religious than men, laying out hypotheses and then testing them in a you should not simply assert that women will be ritualistic fashion results only in ritualistic research. more religious than men because of some general dynamic of social behavior and then rest your case In case you're concerned that the strength of ex on the initial observation. post facto proofs seems to be less than that of the traditional kinds, let me repeat the earlier assertion The unfortunate spin-off of the injunction that \"scientific proof\" is a contradiction in terms. against ex post facto hypothesizing is in its inhibi- Nothing is ever proved scientifically. Hypotheses, ex- tion of good, honest hypothesizing after the fact. In- planations, theories, or hunches can all escape a experienced researchers are often led to believe that stream of attempts at disproof, but none can be they must make all their hypotheses before examin- proved in any absolute sense. The acceptance of a ing their data-even if that process means making a hypothesis, then, is really a function of the extent lot of poorly reasoned ones. Furthermore, they're to which it has been tested and not disconfirmed. led to ignore any empirically observed relationships No hypothesis, therefore, should be considered that do not confirm some prior hypothesis. sound on the basis of one test-whether the hy- pothesis was generated before or after the observa- Surely, few researchers would now wish that tion of empirical data. With this in mind, you Sam Stouffer had hushed up his anomalous find- should not deny yourself some of the most fruitful ings regarding morale among soldiers in the army\" avenues available to you in data analysis. You Stouffer noted peculiar empirical observations and should always try to reach an honest understand- set about hypothesizing the reasons for those find- ing of your data, develop meaningful theories for ings. And his reasoning has proved invaluable to more general understanding, and not worry about researchers ever since. The key is that his \"after the manner of reaching that understanding. the fact\" hypotheses could themselves be tested. MAIN POINTS There is another, more sophisticated point to be made here, however. Anyone can generate hy- Introduction potheses to explain observed empirical relation- ships in a body of data, but the elaboration model <1> The elaboration model is a method of multi- provides the logical tools for testillg those hypothe- variate analysis appropriate to social research. ses within the same body of data. A good example It is primarily a logical model that can illustrate of this testing may be found in the earlier discus- the basic logic of other multivariate methods. sion of social class and church involvement. Glock explained the original relationship in terms of so- The Origins of the Elaboration Model cial deprivation theory If he had stopped at that <1> Paul Lazarsfeld and Patricia Kendall used the point, his comments would have been interesting but hardly persuasive. He went beyond that point, logic of the elaboration model to present hypo- however. He noted that if the hypothesis was cor- thetical tables regarding Samuel Stouffer's rect, then the relationship between social class and church involvement should disappear among those

446 Chapter 15: The Elaboration Model work regarding education and acceptance of Elaboration and Ex Post Facto induction in the Army. Hypothesizing e A partial relationship (or \"partial\") is the ob- e Ex post facto hypothesizing, or the develop- served relationship between two variables ment of hypotheses \"predicting\" relationships within a subgroup of cases based on some that have already been observed, is invalid in attribute of the test or control variable. science, because disconfirming such hypotheses is impossible. Although nothing prevents us e A zero-order relationship is the observed rela- from suggesting reasons that observed relation- tionship between two variables without a third ships may be the way they are, we should not variable being held constant or controlled. frame those reasons in the form of \"hypothe- ses.\" More important, one observed relation- The Elaboration Paradigm ship and possible reasons for it may suggest hypotheses about other relationships that have e The basic steps in elaboration are as follows: not been examined. The elaboration model (1) A relationship is observed to exist between is an excellent logical device for this kind of two variables, (2) a third variable (the test vari- unfolding analysis of data. able) is held constant in the sense that the cases under study are subdivided according to the KEY TERMS attributes of that third variable, (3) the original two-variable relationship is recomputed vvithin The following terms are defined in context in the each of the subgroups, and (4) the comparison chapter and at the bottom of the page where the term of the original relationship with the relation- is introduced, as well as in the comprehensive glossary ships found within each subgroup (the partial at the back of the book. relationships) provides a fuller understanding of the original relationship itself. distorter variable replication elaboration model specification e The logical relationships of the variables differ ex post facto hypothesis suppressor variable depending on whether the test variable is ante- explanation test variable cedent to the other two variables or intervening interpretation zero-order relationship between them. partial relationship e The outcome of an elaboration analysis may REVIEW QUESTIONS AND EXERCISES be replication (whereby a set oJ partial rela- tionships is essentially the same as the corre- I. Review the Stouffer-Kendall-Lazarsfeld ex- sponding zero-order relationship), explanation ample of education, friends deferred, and atti- (whereby a set of partial relationships is reduced tudes toward being drafted. Suppose they had essentially to zero when an antecedent variable begun with an association between friends de- is held constant). interpretation (whereby a set ferred and attitudes toward being drafted, and of partial relationships is reduced essentially to then they had controlled for education. What zero when an intervening variable is held con- conclusion would they have reached? stant), or specification (whereby one partial relationship is reduced, ideally to zero, and the 2.. In your own words describe the elaboration other remains about the same as the original logic of (a) replication, (b) interpretation, relationship or is stronger). (c) explanation, and (d) specification. e A suppressor variable conceals the relationship 3. Review the box on Ivy League colleges and between two other variables; a distorter vari- success in later professional life. In your own able causes an apparent reversal in the relation- words, explain what Patricia Kendall means ship between two other variables (from nega- when she says, \"Despite this [support from the tive to positive or vice versa).

Online Study Resources 447 analysis of partial relationships], West had in accessible book for those who want to gain a no way proved her hypothesis.\" What conclu- more statistical grounding for multivariate sions can one reasonably draw from West's analysis through the elaboration model. The study? author uses real research examples from vari- ous disciplines to explain why and how to use 4.. Construct hypothetical examples of suppressor multivariate analysis. and distorter variables. SPSS EXERCISES 5. Search the web for a research report on the discovery of a spurious relationship. Give the See the booklet that accompanies your text for exer- web address of the document and quote or cises using SPSS (Statistical Package for the Social paraphrase what was discovered . Sciences). There are exercises offered for each chapter, and you'll also find a detailed primer on using SPSS. ADDITIONAL READINGS Online Study Resources Glock, Charles, ed. 1967. Survey Research ill [he Social Sciences. New York: Russell Sage Foundation, Sociology~ Now'\": Research Methods Chapter I An excellent discussion of the logic of elaboration. Glock's own chapter in this I. Before you do your final review of the chapter. book presents the elaboration model. providing take the SociologyNolV: Research lvIethods diagnos- concrete illustrations. tic quiz to help identify the areas on which you should concentrate. You'll find information on Lazarsfeld, Paul F 1982. The Varied Sociology o/Paul F this online tool. as well as instructions on how Lazarsfeld. Edited by Patricia L Kendall. New to access all of its great resources, in the front York: Columbia University Press. A collection of the book. of essays on the logic of social research by the prinCipal founder of the elaboration model. 2. As you review, take advantage of the Sociology edited by his cofounder. NolV Research Methods customized study plan, based on your quiz results. Use this study Lazarsfeld, Paul. Ann Pasanella, and Morris Rosen- plan with its interactive exercises and other berg, eds. 1972. COlltinuities ill the Lallguage of resources to master the material Social Research. New York: Free Press. An excel- lent and classic collection of conceptual discus- 3.. When you're finished with your review, take sions and empirical illustrations. Section II is the posttest to confirm that you're ready to especially relevant, though the logic of elabora- move on to the next chapter. tion runs throughout most of the volume. WEBSITE FOR THE PRACTICE Merton, Robert K, James S. Coleman, and Peter H, OF SOCIAL RESEARCH 11TH EDITION Rossi. eds. 1979. Qualirarive alld Quamitative Social Research. New York: Free Press. This col- Go to your book's website at http://socio]ogy lection of articles written in honor of Lazars- .wadswortluom/babbie_practice lIe for tools to feld illustrates the logic he brought to social aid you in studying for your exams. You'll find Ii/to- research. rial Quizzes with feedback, llllemet Exercises, Flashcards, and Chapter Tzlforials, as well as Ewended Projeas, lnfo- Rosenberg, Morris. The Logic ofSurvey Allalysis. 1968. hac College Editioll search terms, Social Research in New York: Basic Books . The most comprehen- Cyberspace, GSS Data, Web Links, and primers for using sive statement of elaboration available. Rosen- various data-analysis software such as SPSS and berg presents the basic paradigm and goes on to NVivo. suggest logical extensions of it. It's difficult to decide which is more important, this aspect of the book or its voluminous illustrations. Both are excellent Tacq, Jacques . Multimriate Analysis Techniques in Social Science Research From Problem to Analysis. 1997. Thousand Oaks, CA: Sage. This is a very

448 Chapter 15: The Elaboration Model WEB LINKS FOR THIS CHAPTER Martin Tusler, The Elaboration Model http://www,uoregon,edu/-mtusler/Show2/index Please realize that the Internet is an evolv- ,html ing entity, subject to change. Nevertheless, This excellent PowerPoint slide show walks you these few websites should befairly stable, through the logic of the elaboration model. with Also, check your book's website for even more Web useful examples, Links. These websites, current at the time of this book's publication, provide opportunities to learn about the North Carolina State University, elaboration model. The Elaboration Model http://wvvvv2.chass.ncsuedu/mlvasu/ps47I/DI2.htm Idee Winfield, The Elaboration Model http://vl/ww,cofcedu/-vvinfield/socy37I1 elaboration -model.htm!

Statistical Analyses Introduction Other Multivariate Techniques Descriptive Statistics Data Reduction Path Analysis Measures of Association Time-Series Analysis Regression Analysis Factor Analysis Analysis of Variance Inferential Statistics Discriminant Analysis Univariate Inferences Log-Linear Models Tests of Statistical Geographic Information Significance Systems (GIS) The Logic of Statistical Significance Chi Square Sociology@Now'\": Research Methods Use this online tool to help you make the grade on your next exam, After reading this chapter. go to the \"Online Study Resources\" at the end of the chapter for instmctions on how to benefit from SociologyNolV: Research Methods,

450 Chapter 16: Statistical Analyses Introduction intended to teach you statistics or torture you with them, Rather, I want to sketch out a logical context It has been my experience over the years that vvithin which you might learn and understand many students are intimidated by statistics. Some- statistics, times statistics makes them feel they're We'll be looking at two types of statistics: de- A few clowns short of a circus scriptive and inferentiaL Descriprive statisrics is a Dumber than a box of hair medium for describing data in manageable forms., A few feathers short of a duck Illjerenrialstarisrics, on the other hand, assists re- All foam, no beer searchers in drawing conclusions from their ob- Missing a few buttons on their remote control servations; typically, this involves drawing con- A few beans short of a burrito clusions about a population from the study of a As screwed up as a football bat sample drawn from iL After that, I'll briefly intro- \" About as sharp as a bowling ball duce you to some of the analytical techniques you About four cents short of a nickel may come across in your reading of the social sci- \" Not running on full thrusters* ence literature. Many people are intimidated by quantitative Descriptive Statistics research because they feel uncomfortable with mathematics and statistics. And indeed, many re- As I've already suggested, descriptive statistics search reports are filled with unspecified computa- present quantitative descriptions in a manageable tions. The role of statistics in social research is often form. Sometimes we want to describe single vari- important, but it's equally important to see this role ables, and sometimes we want to describe the asso- in its proper perspective. ciations that connect one variable with another. Let's look at some of the ways to do these things. Empirical research is first and foremost a logical rather than a mathematical operation. Mathematics Data Reduction is merely a convenient and efficient language for accomplishing the logical operations inherent in Scientific research often involves collecting large quantitative data analysis, Statistics is the applied masses of data, Suppose we surveyed 2,000 people, branch of mathematics especially appropriate to a asking each of them 100 questions-not an unusu- variety of research analyses. This textbook is not ally large study, We would then have a staggering 200,000 answers! No one could possibly read all * Thanks to the many contributors to humor lists on the those answers and reach any meaningful conclu- Internet. sion about them, Thus, much scientific analysis in- volves the reduction of data from unmanageable descriptive statistics Statistical computations de- details to manageable summaries, scribing either the characteristics of a sample or the relationship among variables in a sample. Descrip- To begin our discussion, let's look briefly at the tive statistics merelv summarize a set of sample ob- raw data matrix created by a quantitative research servations, wherea~ inferential statistics move be- projecL Table 16-1 presents a partial data matrix. vond the description of specific observations to make Notice that each row in the matrix represents a per- inferences about the larger population from which son (or other unit of analysis), each column repre- the sample observations were drawn. sents a variable, and each cell represents the coded attribute or value a given person has on a given variable. The first column in Table 16-1 represents a

Descriptive Statistics 451 TABLE 16-1 Partial Raw Data Matrix Gender Political Political Religious Importance Age Education Income Occupation Affiliation Orientation Affiliation ofReligion 2 Person 1 2 32 412 04 Person 2 2 Person 3 4244 12 Person 4 Person 5 2 5 522423 Person 6 5 4 4 32224 3786 5 3 3 535 TABLE 16-2 Hypothetical Raw Data on Education and Prejudice Educational Level Prejudice None Grade School High School (allege Graduate Degree High 23 34 156 67 16 Medium 102 23 Low 11 21 123 164 77 6 12 95 person's gender. Let's say a \"1\" represents male and Measures ofAssociation a \"2\" represents female, This means that persons 1 and 2 are male, person 3 is female, and so forth, The association between any two variables can also be represented by a data matrix, this time produced In the case of age, person 1's \"3\" might mean by the joint frequency distributions of the two vari- 30-39 years old, person 2's \"4\" might mean ables, Table 16-2 presents such a matrix. It provides 40-49, However age has been coded (see Chapter all the information needed to determine the nature 14), the code numbers shown in Table 16-1 de- and extent of the relationship between education scribe each of the people represented there. and prejudice. Notice that the data have already been reduced Notice, for example, that 23 people (1) have no somewhat by the time a data matrix like this one education and (2) scored high on prejudice; 77 has been created, If age has been coded as sug- people (1) had graduate degrees and (2) scored low gested previously, the specific answer \"33 years on prejudice. old\" has already been assigned to the category \"30-39.\" The people responding to our survey may Like the raw-data matrix in Table 16- L this have given us 60 or 70 different ages, but we've matrix provides more information than can easily now reduced them to 6 or 7 categories, be comprehended, A careful study of the table shows that as education increases from \"None\" to Chapter 14 discussed some of the ways of fur- \"Graduate Degree,\" there is a general tendency for ther summarizing univariate data: averages such as prejudice to decrease, but no more than a general the mode, median, and mean and measures of dis- impression is possible. For a more precise summary persion such as the range, the standard deviation, of the data matrix, we need one of several types and so forth. It's also possible to summarize the of descriptive statistics. Selecting the appropriate associations among variables.

452 Chapter 16: Statistical Analyses measure depends initially on the nature of the two Nominal Variables variables. If the two variables consist of nominal data (for ex- We'll turn now to some of the options available ample, gender. religious affiliation, race), lambda (A) would be one appropriate measure. (Lambda is for summarizing the association between two vari- a letter in the Greek alphabet corresponding to I in ables. Each of these measures of association is our alphabeL Greek letters are used for many con- based on the same model-proportionate reduc- cepts in statistics, which perhaps helps to account for the number of people who say of statistics, \"It's tion of error (PRE). all Greek to me.\") Lambda is based on your ability To see how this model works, let's assume that I to auess values on one of the variables: the PRE asked you to guess respondents' attributes on a acl~ieved through knowledge of values on the other aiven variable: for example, whether they an- ;wered yes or no to a given questionnaire item. To variable. assist you, let's first assume you know the overall Imagine this situation. I tell you that a room distribution of responses in the total sample-say, 60 percent said yes and 40 percent said no. You contains 100 people and I would like you to guess would make the fewest errors in this process if you the gender of each person, one at a time. If half are always guessed the modal (most frequent) re- men and half women, you'll probably be right half the time and wrong half the time. sponse: yes. Second, let's assume you also know the empiri- But suppose I tell you each person's occupation before you guess that person's gendeL What gender cal relationship between the first variable and some would you guess if I said the person was a truck other variable: say, gendeL Now, each time I ask driver? You would probably be wise to guess you to guess whether a respondent said yes or no, \"male\"; although there are now plenty of women truck drivers, most are still men . If I said the next rn tell you whether the respondent is a man or a person was a nurse, you'd probably be wisest to guess \"female,\" following the same logic Although woman. If the two variables are related, you should you would still make errors in guessing genders, make fewer errors the second time. It's possible, you would clearly do better than you would if you therefore, to compute the PRE by knowing the re- didn't know their occupations. The extent to which lationship between the two variables: the greater you did better (the proportionate reduction of er- the relationship, the greater the reduction of ror) would be an indicator of the association that exists between gender and occupation. errOL This basic PRE model is modified slightly to Here's another simple hypothetical example that illustrates the logic and method of lambda. take account of different levels of measurement- Table 16-3 presents hypothetical data relating gen- nominal. ordinal. or intervaL The following sec- der to employment status. Overall. we note that tions will consider each level of measurement and 1,100 people are employed, and 900 are not em- present one measure of association appropriate to ployed. If you were to predict whether people were each. Bear in mind that the three measures dis- cussed are only an arbitrary selection from among many appropriate measures. proportionate reduction of error (PRE) ~ log~­ TABLE 16-3 cal model for assessing the strength of a relatIonshIp by asking how much knowing values on one vari- Hypothetical Data Relating Gender to Employment Status able would reduce our errors in guessing values on the other. For example, if we know how much edu- Employed Men Women Total cation people have, we can improve our ability to es- Unemployed timate how much they earn, thus indicating there is Total 900 200 1,100 a relationship between the two variables. 100 800 900 1,000 1,000 2,000

Descriptive Statistics 453 employed, and if you knew only the overall distri- lambda, gamma is based on our ability to guess val- bution on that variable, you would always predict ues on one variable by knovving values on anotheL \"employed,\" because that would result in fewer er- However, whereas lambda is based on guessing ex- rors than always predicting \"not employed.\" Never- act values, gamma is based on guessing the ordinal theless, this strategy would result in 900 errors out arrangement of values. For any given pair of cases, of 2,000 predictions. we guess that their ordinal ranking on one variable will correspond (positively or negatively) to their Let's suppose that you had access to the data in ordinal ranking on the other. Table 16-3 and that you were told each person's gen- der before making your prediction of employment Let's say we have a group of elementary stu- status. Your strategy would change in that case. For dents. It's reasonable to assume that there is a rela- every man you would predict \"employed,\" and for tionship between their ages and their heighK We every woman you would predict \"not employed.\" can test this by comparing every pair of students: In this instance, you would make 300 errors-the Sam and Mary, Sam and Fred, Mary and Fred, and 100 men who were not employed and the 200 em- so forth. Then we ignore all the pairs in which the ployed women - or 600 fewer errors than you students are the same age and/or the same heighL would make without knOwing the person's gendeL We then classify each of the remaining pairs (those who differ in both age and height) into one of two Lambda, then, represents the reduction in er- categories: those in which the older child is also the rors as a proportion of the errors that would have taller (\"same\" pairs) and those in which the older been made on the basis of the overall distribution. child is the shorter (\"opposite\" pairs). So, if Sam is In this hypothetical example, lambda would equal older and taller than Mary, the Sam-Mary pair is 0.67; that is, 600 fewer errors divided by the 900 counted as a \"same.\" If Sam is older but shorter total errors based on employment status alone. In than Mary, then that pair is an \"opposite.\" this fashion, lambda measures the statistical associ- ation between gender and employment status. To determine whether age and height are re- lated to each other, we compare the number of If gender and employment status were statisti- same and opposite pairs. If the same pairs outnum- cally independent we would find the same distri- ber the opposite pairs, we can conclude that there bution of employment status for men and women. is a positive association between the two variables- In this case, knowing each person's gender would as one increases, the other increases. If there are not affect the number of errors made in predicting more opposites than sames, we can conclude that employment status, and the resulting lambda the relationship is negative. If there are about as would be zero.. It on the other hand, all men were many sames as opposites, we can conclude that age employed and none of the women were employed, and height are not related to each another, that by knowing gender you would avoid errors in pre- they're independent of each other., dicting employment status. You would make 900 fewer errors (out of 900), so lambda would be Here's a social science example to illustrate the LO-representing a perfect statistical association. simple calculations involved in gamma. Let's say you suspect that religiosity is positively related to Lambda is only one of several measures of as- political conservatism, and if Person A is more reli- sociation appropriate to the analysis of two nomi- gious than Person B. you guess that A is also more nal variables. You could look at any statistics text- conservative than B. Gamma is the proportion of book for a discussion of other appropriate paired comparisons that fits this pattern. measures. Table 16-4 presents hypothetical data relating Ordinal Variables social class to prejudice. The general nature of the relationship between these two variables is that as If the variables being related are ordinal (for ex- social class increases, prejudice decreases. There is ample, social class, religiosity, alienatioll), gamma (y) a negative association between social class and is one appropriate measure of association. Like prejudice.

454 Chapter 16: Statistical Analyses Gamma is computed from two quantities: 3,430,000. Gamma is computed from the numbers (1) the number of pairs having the same ranking of same-ranked pairs and opposite-ranked pairs as on the two variables and (2) the number of pairs follows: having the opposite ranking on the two variables. The pairs having the same ranking are computed as same - opposite follows. The frequency of each cell in the table is multiplied by the sum of all cells appearing below gamllla same + opposite and to the right of it-with all these products being summed. In Table 16-4, the number of pairs with In our example, gamma equals (830,000 - the same ranking would be 200(900 + 300 + 400 + 3,430,000) divided by (830,000 + 3,430,000), or 100) + 500(300 + 100) + 400(400 + 100) + 900(100), or 340,000 + 200,000 + 200,000 + -0\"61. The negative sign in this answer indicates 90,000 830,000. the negative association suggested by the initial in- The pairs having the opposite ranking on the spection of the table. Social class and prejudice, in two variables are computed as follows: The fre- quency of each cell in the table is multiplied by the this hypothetical example, are negatively associated sum of all cells appearing below and to the left of it-with all these products being summed\" In Table with each otheL The numerical figure for gamma 16-4, the numbers of pairs with opposite rankings indicates that 61 percent more of the pairs exam- would be 700(500 + 800 + 900 + 300) + 400 (800 + 300) + 400(500 + 800) + 900(800), or ined had the opposite ranking than the same L750,000 + 440,000 + 520,000 + 720,000 = ranking. Note that whereas values of lambda vary from o to 1, values of gamma vary from 1 to + 1, rep- resenting the direction as well as the magnitude of the association. Because nominal variables have no ordinal structure, it makes no sense to speak of the direction of the relationship. (A negative lambda would indicate that you made more errors in pre- TABLE 16-4 duiecstiOnIa~ values on one variable while knowing val- the second than you made in ignorance of Hypothetical Data Relating Social Class to Prejudice the second, and that's not logically possible.) Prejudice Lower Class Middle Class Upper Class Table 16-5 is an example of the use of gamma in social research. To study the extent to which Low 200 400 700 widows sanctified their deceased husbands, Medium 500 900 400 High 800 300 100 Helena Lopata (1981) administered a questionnaire to a probability sample of 301 widows. In part, the questionnaire asked the respondents to TABLE 16-5 the Semantic Differentiation Items of the Sanctification Scale Gamma Associations Useful Honest Kind Warm 0.83 Good 0.79 0.88 0.80 0.90 0.79 0.72 0.68 0.82 Useful 0.84 0.71 0.77 0.79 0.60 OB Honest 0.83 0.89 0.88 0.90 Superior 0.78 0.90 Kind Friendly Source: Helena Znaniecki Lopata/Widowhood and Husband Sanctification,\"Journal ofMarriage and the Family (May 1981):439 50

Descriptive Statistics 455 characterize their deceased husbands in terms of association are complex enough to lie outside the the following semantic differentiation scale: scope of this book, so I'll make only a few general comments here. Characteristic Like both gamma and lambda, r is based on Positive Negative guessing the value of one variable by knovving an- Extreme Extreme otheL For continuous interval or ratio variables, however, it's unlikely that you could predict the Good 2 3 4 5 6 7 Bad precise value of the variable. On the other hand, Useful 2 3 4 5 6 7 Useless predicting only the ordinal arrangement of values Honest 2 3 4 5 6 7 Dishonest on the two variables would not take advantage of Superior 2 3 4 5 6 7 Inferior the greater amount of information conveyed by an Kind 2 3 4 5 6 7 Cruel interval or ratio variable. In a sense, r reflects how Friendly 2 3 4 5 6 7 Unfriendly dosely you can guess the value of one variable Warm 2 4 5 6 7 Cold through your knowledge of the value of another. Respondents were asked to describe their de- To understand the logic of r, consider the way ceased spouses by circling a number for each pair of you might hypothetically guess values that particu- opposing characteristics. Notice that the series of lar cases have on a given variable\" With nominal numbers connecting each pair of characteristics is variables, we've seen that you might always guess an ordinal measure. the modal value. But for interval or ratio data, you would minimize your errors by always guessing the Next, Lopata wanted to discover the extent to mean value of the variable. Although this practice which the several measures were related to one an- produces few if any perfect guesses, the extent of otheL Appropriately, she chose gamma as the mea- your errors will be minimized. Imagine the task of sure of association. Table 16-5 shows how she pre- guessing peoples' incomes and how much better sented the results of her investigation. you would do if you knew how many years of education they had as well as the mean incomes The format presented in Table 16-5 is called a for people ,vith 0, L 2 (and so forth) years of correlariol1l11arrix:. For each pair of measures, Lopata education. has calculated the gamma. Good and UsefuL for example, are related to each other by a gamma In the computation of lambda, we noted the equal to 0.79. The matrix is a convenient way of number of errors produced by always guessing the presenting the intercorrelations among several vari- modal value. In the case of r, errors are measured ables, and you'll find it frequently in the research in terms of the sum of the squared differences be- literature. In this case, we see that all the variables tween the actual value and the mean. This sum is are quite strongly related to one another, though called the rotal variation some pairs are more strongly related than others. To understand this concept, we must eX1Jand Gamma is only one of several measures of asso- the scope of our examination. Let's look at the logic ciation appropriate to ordinal variables. Again, any of regression analysis and discuss correlation vvithin introductory statistics textbook will give you a that context. more comprehensive treatment of this subject. Regression Analysis Interval or Ratio Variables The general formula for describing the association If interval or ratio variables (for example, age, in- between two variables is Y = fiX). This formula is come, grade point average, and so forth) are being as- read\" Y is a function of X,\" meaning that values of sociated, one appropriate measure of association is Y can be eX1Jlained in terms of variations in the val- Pearson's product-moment correlation (r). The ues of X Stated more strongly, we might say that X derivation and computation of this measure of causes Y so the value of X determines the value of

456 Chapter 16: Statistical Analyses Y. Regression analysis is a method of determin- 4 • ing the specific function relating Y to X There are • several forms of regression analysis, depending on 3 the complexity of the relationships being studied. :>- Let's begin with the simplest. '0 Linear Regression 2(J) The regression model can be seen most clearly in the case of a linear regression analysis, in which CIl a perfect linear association between two variables :::I exists or is approximated. Figure 16-1 is a scatter- gram presenting in graphic form the values of X >iii and Yas produced by a hypothetical study. It shows that for the four cases in our study, the values of X 0 3 4 and Yare identical in each instance. The case with 02 a value of 1 on X also has a value of 1 on Y, and so Values of X forth. The relationship between the two variables in this instance is described by the equation Y = X; FIGURE 16·1 this is called the regression equation. Because all four Simple Scattergram of Values of Xand Y points lie on a straight line, we could superimpose that line over the points; this is the regression line. A somewhat more realistic example is pre- sented in Figure 16-2, representing a hypothetical The linear regression model has inlportant de- relationship between population and crime rate in scriptive uses. The regression line offers a graphic small- to medium-size cities. Each dot in the scat- picture of the association between X and y, and the tergram is a city, and its placement reflects that regression equation is an efficient form for summa- city's population and its crime rate. As was the case rizing that association. The regression model has in- in our previous example, the values of Y (crime ferential value as welL To the extent that the re- rates) generally correspond to those of X (popula- gression equation correctly describes the general tions), and as values of X increase, so do values of association between the two variables, it may be Y. However, the association is not nearly as clear as used to predict other sets of values. If, for example, it is in Figure 16-1. we know that a new case has a value of 3.5 on X. we can predict the value of 3.5 on Yas well. In Figure 16-2 we can't superimpose a straight line that will pass through all the points in the scat- In practice, of course, studies are seldom lim- tergram. But we can draw an approxinlate line ited to four cases, and the associations between shov\\ring the best possible linear representation of variables are seldom as clear as the one presented the several points. I've drawn that line on the in Figure 16-1. graph. regression analysis A method of data analysis in You may (or may not) recall from algebra that which the relationships among variables are repre- any straight line on a graph can be represented by sented in the form of an equation, called a regres- sion equation. an equation of the form Y = a + bx. where X and linear regression analysis A form of statistical analysis that seeks the equation for the straight line Yare values of the two variables. In this equation, a that best describes the relationship between rwo ra· equals the value of Y when X is 0, and b represents tio variables. the slope of the line. If we know the values of a and b, we can calculate an estimate of Y for every value ofX. We can now say more formally that regression analysis is a technique for establishing the regres- sion equation representing the geometric line that comes closest to the distribution of points on a graph. The regression equation provides a

Descriptive Statistics r:J 457 4,000 , - - - - - - - - - - - - - - - -••- - · - - - - - 7 .. '2 .S! 1ii ..\"coc5-- 3,000 I------------..------=---_...-.-\"--/' .o ..c00 0 2,0001---- .. ..\"C-Il .e: ..CIl .. \"- -(l) CIl 1,000 .E;:: () 0 0 20,000 40,000 60,000 80,000 100,000 Population FIGURE 16-2 AScattergram of the Values of Two Variables with Regression line Added (Hypothetical) mathematical description of the relationship between ofenol' corresponding to the similar quantity in the the variables, and it allows us to infer values of Y when we have values of X. Recalling Figure 16-2, computation of lambda. In the present case, this we could estimate crime rates of cities if we knew quantity is the correlarion squared: r2. Thus, if r = 0.7, their populations. then r2 = 0.49, meaning that about half the varia- tion has been explained. In practice, we compute r To improve your guessing, you construct are- rather than 1'2, because the product-moment corre- gressionline, stated in the form of a regression equa- lation can take either a positive or a negative sign, tion that permits the estimation of values on one depending on the direction of the relationship be- variable from values on the other. The general for- tween the two variables. (Computing r2 and taking a square root would always produce a positive quan- mat for this equation is Y' = a + b(X), where a and tity.) You can consult any standard statistics text- book for the method of computing r, although I an- b are computed values, X is a given value on one ticipate that most readers using this measure will variable, and Y' is the estimated value on the other. have access to computer programs designed for this The values of a and b are computed to minimize function. the differences between actual values of Y and tlle corresponding estimates (Y') based on the known Unfortunately-or perhaps fortunately-social value of X. The sum of squared differences between life is so complex that the simple linear regression actual and estimated values of Y is called the llllex- model often does not sufficiently represent the plained variation because it represents errors that state of affairs. As we saw in Chapter 14, it's pos- still exist even when estimates are based on known sible, using percentage tables, to analyze more than values of X. two variables. As the number of variables increases, such tables become increasingly complicated and The explained variation is the difference between hard to read. The regression model offers a useful the total variation and the unexplained variation. alternative in such cases. Dividing the explained variation by the total varia- tion produces a measure of the proportionate reduction

458 Chapter 16: Statistical Analyses Multiple Regression contributions of the several independent variables in determining final student-performance scores. Very often, social researchers find that a given de- She also calculated the multiple-correlation pendent variable is affected simultaneously by sev- coefficient as an indicator of the extent to which all eral independent variables. Multiple regression six variables predict the final scores. This follows analysis provides a means of analyzing such situa- the same logic as the simple bivariate correlation tions. This was the case H'hen Beverly Yerg (1981) discussed earlier, and it's traditionally reported as a set about studying teacher effectiveness in physical capital R. In this case, R 0.877, meaning that 77 education. She stated her expectations in the form percent of the variance (0.8772 0.77) in final of a multiple regression equation: scores is explained by the six variables acting in concert. F bo + bll + b2X I + b3X 2 + b4 X 3 + b,X4 + e, 'where Partial Regression F = Final pupil-performance score In exploring the elaboration model in Chapter 15, I Initial pupil-performance score we paid special attention to the relationship be- XI = Composite of guiding and supporting tween two variables when a third test variable was held constant. Thus, we might examine the effect practice of education on prejudice with age held constant X2 Composite of teacher mastery of content testing the independent effect of education. To do X3 = Composite of providing specific. task- 50, we would compute the tabular relationship be- tween education and prejudice separately for each related feedback age group. X4 Composite of clear, concise task Partial regression analysis is based on this presentation same logical modeL The equation summarizing the b == Regression weight relationship between variables is computed on the e Residual basis of the test variables remaining constant. As in the case of the elaboration modeL the result may (Adapted [1'0111 Yerg 1981. 42) then be compared with the uncontrolled relation- ship between the two variables to clarify further Notice that in place of the single X variable in a the overall relationship. linear regression, there are several Xs, and there are also several b's instead of just one. Also, Yerg Curvilinear Regression has chosen to represent a as bo in this equation but with the same meaning as discussed previously. Fi- Up to now, we've been discussing the association nally, the equation ends with a residual factor (e), among variables as represented by a straight line. which represents the variance in Y that is not ac- The regression model is even more general than counted for by the X variables analyzed. our discussion thus far has implied. Beginning with this equation, Yerg calculated You may already know that curvilinear func- the values of the several b'5 to show the relative tions, as well as linear ones, can be represented by multiple regression analysis A form of statistical equations. For example, the equation X2 + y2 = 25 analysis that seeks the equation representing the im- pact of two or more independent variables on a describes a circle \\\"lith a radius of 5. Raising vari- single dependent variable. ables to powers greater than 1 has the effect of pro- partial regression analysis A form of regression ducing curves rather than straight lines. And in the analysis in which the effects of one or more vari- real world there is no reason to assume that the ables are held constant. similar to the logic of the relationship among every set of variables will be elaboration model.

Inferential Statistics 459 linear. In some cases, then, curvilinear regression predictions. An equation linking population and analysis can provide a better understanding of em- crimes, for example, might seem to suggest that pirical relationships than any linear model can. small towns \"vith, say, a population of 1,000 should produce 123 crimes a year. This failure in Recall, however, that a regression line serves predictive ability does not disqualify the equation two functions. It describes a set of empirical obser- but dramatizes that its applicability is limited to a vations, and it provides a general model for making particular range of population sizes. Second, re- inferences about the relationship between two vari- searchers sometimes overstep this limitation, ables in the general population that the observa- drawing inferences that lie outside their range of tions represent. A very complex equation might observation, and you'd be right in criticizing them produce an erratic line that would indeed pass for that. through every individual point. In this sense, it would perfectly describe the empirical observations. The preceding sections have introduced some There vi'ould be no guarantee, however, that such a of the techniques for measuring associations line could adequately predict new observations or among variables at different levels of measurement. that it in any meaningful way represented the rela- Matters become slightly more complex when the tionship between the two variables in general. two variables represent different levels of measure- Thus, it would have little or no inferential value. ment. Though we aren't going to pursue this issue in this textbook. \"Measures of Association and Earlier in this book, we discussed the need for Levels of Measurement.\" by Peter Nardi, may be a balancing detail and utility in data reduction. Ulti- useful resource if you ever have to address such mately, researchers attempt to provide the most situations. faithful, yet also the simplest, representation of their data. This practice also applies to regression analy- Inferential Statistics sis. Data should be presented in the simplest fashion that best describes the actual data; as such, linear Many, if not most, social scientific research projects regressions are the ones most frequently used. involve the examination of data collected from a Curvilinear regression analysis adds a new option to sample drawn from a larger population. A sample the researcher in this regard, but it does not solve of people may be interviewed in a survey; a sample the problems altogether. Nothing does that. of divorce records may be coded and analyzed; a sample of newspapers may be examined through Cautions in Regression Analysis content analysis. Researchers seldom if ever study samples just to describe the samples per se; in most The use of regression analysis for statistical infer- instances, their ultimate purpose is to make asser- ences is based on the same assumptions made for tions about the larger population from which the correlational analysis: simple random sampling, the sample has been selected. Frequently, then, you'll absence of nonsampling errors, and continuous in- wish to interpret your univariate and multivariate terval data. Because social scientific research sel- sample findings as the basis for inferences about dom completely satisfies these assumptions, you some population. should use caution in assessing the results in re- gression analyses. curvilinear regression analysis A form of regres- sion analysis that allows relationships among vari- Also, regression lines-linear or curvilinear- ables to be expressed with curved geometric lines in- can be useful for il1terpolariol1 (estimating cases lying stead of straight ones. between those observed), but they are less trust- worthy when used for eA7rapolatioll (estimating cases that lie beyond the range of observations). This limitation on extrapolations is important in two ways. First. you're likely to come across regression equations that seem to make illogical

460 Chapter 16: Statistical Analyses by Peter Nardi are commonly organized. Also, notice that the levels of measurement are themselves an ordinal scale. Pitzer College Ifyou want to use an interval/ratio-level variable in acrosstab, you table itselfis set up with the dependent variables in must first recode it into an ordinal-level variable. rows and the independent variable in the columns, as tables Dependent Nominal Nominal Independent Variable Interval/Ratio Variable Ordinal Crosstabs Chi-square Ordinal Correlate Interval/Ratio Lambda Crosstabs Pearson r Crosstabs Chi-square Regression (R) Chi-square Lambda Lambda Crosstabs Chi-square Means Lambda I-test Gamma ANOVA Kendall's tau Sommers'd Means t-test ANOVA This section examines inferential statistics- studied. Now we'll use such measures to make the statistical measures used for making inferences broader assertions about a population. This section from findings based on sample observations to a addresses two univariate measures: percentages larger population. We'll begin with univariate data and means. and move to multivariate. If 50 percent of a sample of people say they had Univariate Inferences colds during the past year, 50 percent is also our best estimate of the proportion of colds in the total Chapter 14 dealt with methods of presenting population from which the sample was drawn. univariate data . Each summary measure was in- (This estimate assumes a simple random sample, of tended as a method of describing the sample course.) It's rather unlikely, however, that precisely 50 percent of the population had colds during the inferential statistics The body of statistical com- year. If a rigorous sampling design for random se- putations relevant to making inferences from find- lection has been followed, however, we'll be able to ings based on sample observations to some larger estimate the expected range of error when the population. sample finding is applied to the population. Chapter 7, on sampling theory, covered the procedures for making such estimates, so I'll only

Inferential Statistics 461 review them here. In the case of a percentage, the inferential statistics. Finally, the calculation of stan- dard error in sampling assumes a 100 percent com- quantity )px q pletion rate-that is, that everyone in the sample completed the survey. This problem increases in n seriousness as the completion rate decreases. where p is a proportion, q equals (1 - p), and n is Third, inferential statistics are addressed to the sample size, is called the standard error. As noted sampling error only, not nonsampling error such in Chapter 7, this quantity is very important in the as coding errors or misunderstandings of questions estimation of sampling error. We may be 68 per- by respondents. Thus, although we might state cor- cent confident that the population figure falls rectly that between 47.5 and 52.5 percent of the within plus or minus one standard error of the population (95 percent confidence) would report sample figure; we may be 95 percent confident that having colds during the previous year, we couldn't it falls within plus or minus two standard errors; so confidently guess the percentage who had actu- and we may be 99.9 percent confident that it falls ally Izad them. Because nonsampling errors are within plus or minus three standard errors. probably larger than sampling errors in a respect- able sample design, we need to be especially cau- Any statement of sampling error, then, must tious in generalizing from our sample findings to contain two essential components: the confidence the population. level (for example, 95 percent) and the confidence in- terval (for example, :±::2.5 percent). If 50 percent of Tests ofStatistical Significance a sample of 1,600 people say they had colds during the year, we might say we're 95 percent confident There is no scientific answer to the question of that the population figure is between 47.5 percent whether a given association between two variables and 52.5 percent. is significant, strong, important, interesting, or worth reporting. Perhaps the ultimate test of In this example we've moved beyond simply significance rests in your ability to persuade your describing the sample into the realm of making es- audience (present and future) of the association's timates (inferences) about the larger population. In significance. At the same time, there is a body of in- doing so, we must take care in several ways. ferential statistics to assist you in this regard called parametric tests ofsigllificance As the name suggests, First, the sample must be dravvn from the popu- parametric statistics are those that make certain as- lation about which inferences are being made. A sumptions about the parameters describing the pop- sample taken from a telephone directory cannot le- ulation from which the sample is selected. They al- gitimately be the basis for statistical inferences about low us to determine the statistical significance of the population of a city, but only about the popula- associations. \"Statistical significance\" does not imply tion of telephone subscribers with listed numbers. \"importance\" or \"significance\" in any general sense. It refers simply to the likelihood that relationships Second, the inferential statistics assume several things. To begin with, they assume simple random nonsampling error Those imperfections of data sampling, which is virtually never the case in quality that are a result of factors other than sam- sample surveys. The statistics also assume sampling pling erroL Examples include misunderstandings of with replacement, which is almost never done- questions by respondents, erroneous recordings by but this is probably not a serious problem. Although interviewers and coders, and keypunch errors. systematic sampling is used more frequently than statistical significance A general term referring random sampling, it, too, probably presents no seri- to the likelihood that relationships observed in a ous problem if done correctly. Stratified sampling, sample could be attributed to sampling error alone. because it improves representativeness, clearly pre- sents no problem. Cluster sampling does present a problem, however, because the estimates of sam- pling error may be too small. Quite clearly, street- corner sampling does not warrant the use of

462 Chapter 16: Statistical Analyses observed in a sample could be attributed to sam- 1. Assumptions regarding the independence of pling error alone. Researchers often distinguish be- two variables in the population study tween statistical significance and substalllive sig11ifi- callce in this regard, with the latter referring to L Assumptions regarding the representativeness whether the relationship between variables is big of samples selected through conventional enough to make a meaningful difference. Whereas probability-sampling procedures statistical significance can be calculated, substantive significance is always a judgment call. 3. The observed joint distribution of sample ele- ments in terms of the two variables Although tests of statistical significance are 'widely reported in social scientific literature, the Figure 16-3 represents a hypothetical popula- logic underlying them is rather subtle and often tion of 256 people; half are women, half are men. misunderstood. Tests of significance are based on The diagram also indicates how each person feels the same sampling logic discussed elsewhere in this about seeing women as equal to men. In the dia- book. To understand that logic, let's return for a gram, those favoring equality have open circles, moment to the concept of sampling error in regard those opposing it have their circles filled in. to univariate data. The question we'll be investigating is whether Recall that a sample statistic normally provides there is any relationship between gender and feel- the best single estimate of the corresponding popu- ings about equality for men and women. More lation parameter, but the statistic and the parame- specifically, we'll see if women are more likely to ter seldom correspond precisely. Thus, we report favor equality than men are, because women the probability that the parameter falls within a would presumably benefit more from it. Take a certain range (confidence interval). The degree of moment to look at Figure 16-3 and see what the uncertainty within that range is due to normal answer to this question is. sampling errOL The corollary of such a statement is, of course, that it is improbable that the parame- The illustration in the figure indicates no rela- ter would fall outside the specified range Dilly as a tionship between gender and attitudes about equal- result of sampling erroL Thus, if we estimate that a ity Exactly half of each group favors equality and parameter (99.9 percent confidence) lies between half opposes it. Recall the earlier discussion of pro- 45 percent and 55 percent, we say by implication portionate reduction of error. In this instance, that it is extremely improbable that the parameter knowing a person's gender would not reduce the is actually, say, 90 percent if our o11ly error of esti- \"errors\" we'd make in guessing his or her attitude mation is due to normal sampling. This is the basic toward equality The table at the bottom of Figure logic behind tests of statistical significance. 16-3 provides a tabular view of what you can ob- serve in the graphic diagram. The Logic ofStatistical Significance Figure 16-4 represents the selection of a one- I think I can illustrate the logic of statistical fourth sample from the hypothetical population. In significance best in a series of diagrams represent- terms of the graphic illustration, a \"square\" selec- ing the selection of samples from a population. tion from the center of the population provides a Here are the elements in the logic: representative sample. Notice that our sample con- tains 16 of each type of person: Half are men and tests of statistical significance A class of statistical half are women; half of each gender favors equality, computations that indicate the likelihood that the and the other half opposes it. relationship observed between variables in a sample can be attributed to sampling error only. The sample selected in Figure 16-4 would allow us to draw accurate conclusions about the relation- ship between gender and equality in the larger population. Follovving the sampling logic we saw in Chapter 7, we'd note there was no relationship be- tween gender and equality in the sample; thus, we'd conclude there was similarly no relationship

Inferential Statistics 463 ? ? ? ? ? S? ? ? c5c5c5c5c5ddd 100% 100% ? S? ? ? S? ? ? S? dddddddc5 ???????? ddddddc5d S? ? ? ? ? ? ? ? d c5 d d d d d c5 ? ? ? S? ? ? S? ? d d d d d d c5 d ? S? ? ? ? ? ? ? dddddddd ? ? ? ? ? S? ? ? dddddddd S? ? ? S? ? ? S? ? dddddddd !!!!!!!! •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• Legend !!!!!!!! !!!!!!!! ? =Woman who favors equality !!!!!!!! !!!!!!!! d = Man who favors equality !!!!!!!! !!!!!!!! ! = Woman who opposes equality !!!!!!!! • =Man who opposes equality FIGURE 16-3 AHypothetical Population of Men and Women Who Either Favor or Oppose Sexual Equality in the larger population-because we've presum- which the two variables were unrelated to each ably selected a sample in accord with the conven- other, we'd be sorely misled by our sample. tional rules of sampling. As you'll recalL it's unlikely that a properly Of course, real-life samples are seldom such dravvn probability sample would ever be as inaccu- perfect reflections of the populations from which rate as the one shown in Figure 16-5. In fact. if we they are drawn. It would not be unusual for us to actually selected a sample that gave us the results have selected, say, one or two extra men who op- this one does, we'd look for a different explanation. posed equality and a couple of extra women who Figure 16-6 illustrates the more likely situation. favored it-even if there was no relationship be- tween the two variables in the population. Such Notice that the sample selected in Figure 16-6 minor variations are part and parcel of probability also shows a strong relationship between gender sampling, as we saw in Chapter 7. and equality. The reason is quite different this time. We've selected a perfectly representative sample, Figure 16- 5, however, represents a sample that but we see that there is actually a strong relation- falls far short of the mark in reflecting the larger ship between the two variables in the population at population. Notice it includes far too many support- large. In this latest figure, women are more likely ive women and opposing men. As the table shows, to support equality than men are: That's the case in three-fourths of the women in the sample support the population, and the sample reflects it. equality, but only one-fourth of the men do so. If we had selected this sample from a population in In practice, of course, we never know what's so for the total population; that's why we select

464 Chapter 16: Statistical Analyses Sample 99999999 00000000 9 9 9 9 o0 0 0 9 9 9 9 o0 0 0 999 999 9 9 0 0 0 0 0 000 9 9 9 9 o0 0 0 9 9 9 9 000 0 9 9 9 9 9 9 9 9 0 0 0 0 0 000 99 99 99 9 9 999 0 0 0 0 00 00 00 00 !!!! •••••••••••••••• 9 r----~-'--'-----t !!!! 9999 0000 !!!! 9999 0000 !!!! 9999 0000 !!!! •••• !!!! •••• !!!! •••• ! ! ! ! '-----.,------' •••• ! ! ! ! ! ! !! •••••••• ! ! ! ! ! ! !! •••••••• ! ! ! ! ! ! !! •••••••• ! ! ! ! ! ! !! •••••••• 100% 100% FIGURE 16-4 ARepresentative Sample samples. So if we selected a sample and found the unrepresentativeness and a low probability of a strong relationship presented in Figures 16-5 and large degree of unrepresentativeness. 16-6, we'd need to decide whether that finding ac- curately reflected the population or was simply a The statistical significance of a relationship ob- product of sampling error. served in a set of sample data, then, is always ex- pressed in terms of probabilities. \"Significant at the The fundamental logic of tests of statistical . 05 level (p :S .05)\" simply means that the probabil- significance, then, is this: Faced with any discrep- ity that a relationship as strong as the observed one ancy between the assumed independence of vari- can be attributed to sampling error alone is no ables in a population and the observed distribution more than 5 in 100. Put somewhat differently, if of sample elements, we may explain that discrep- two variables are independent of each other in the ancy in either of two ways: (1) we may attribute it population, and if 100 probability samples are se- to an unrepresentative sample, or (2) we may re- lected from that population, no more than 5 of ject the assumption of independence. The logic those samples should provide a relationship as and statistics associated with probability sampling strong as the one that has been observed. methods offer guidance about the varying probabil- ities of varying degrees of unrepresentativeness There is, then, a corollary to confidence (expressed as sampling error). Most simply put, intervals in tests of significance, which represents there is a high probability of a small degree of the probability of the measured associations being due only to sampling error. This is called the

Inferential Statistics 465 ,--._----- 9 9 <2 <2 9 <2 <2 9 Sample 9 9 9 999 9 9 00000000 <2 <2 <2 <2 9 9 9 9 9 999 9 999 o 00 0 0 0 0 0 <2 <29<2 -o- - --- o 000 0 0 0 0 <2 9 <2 <2 9999 0 00 00000000 <2 <2 <2 <2 o 0 0 0 999 9 00000000 00000000 !!!! •••••••••••••••••••••••• 999 9 '!---!-_! ..-! 999 9 000 0 9999 o0 0 0 ••••! ! ! ! '-------., !!!! ••••! ! ! ! ! ! ! ! ••••! ! ! ! ! ! ! ! ••••! ! ! ! ! ! ! ! ••••! ! ! ! ! ! ! ! • • !••••!!! ! ! ! ! 1-.-:---:-._.:--•....,-J. • ••••••••! ! ! ! ! ! ! ! 100% 100% FIGURE 16-5 An Unrepresentative Sample level of significance. Like confidence intervals, advance the level of significance they'll regard as levels of significance are derived from a logical sufficient. If any measured association is statistically model in which several samples are drawn from a significant at that leveL they'll regard it as repre- given population. In the present case, we assume senting a genuine association between the two that there is no association between the variables in variables. In other words, they're willing to dis- the population, and then we ask what proportion of count the possibility of its resulting from sampling the samples drawn from that population would pro- error only. duce associations at least as great as those measured in the empirical data. Three levels of significance are level of significance In the context of tests of sta· frequently used in research reports: .05, .01. and tistical significance, the degree of likelihood that an .OOL These mean, respectively, that the chances of observed, empirical relationship could be attributable obtaining the measured association as a result of to sampling error. A relationship is significant at the sampling error are 51100, lIlOO, and 111.000. .05 level if the likelihood of its being only a function of sampling error is no greater than 5 out of 100. Researchers who use tests of significance nor- mally follow one of two patterns. Some specify in

466 Chapter 16: Statistical Analyses Sample -- 999 9 c5 c5 c5 c5 9 9 9 9 9 9 9 9 c5 c5 c5 c5 c5 c5 c5 c5 9 9 9 9 c5 c5 c5 c5 9 9 9 9 9 999 c5 c5 c5 c5 c5 c5 c5 c5 ••• •9 9 9 9 9999 9 9 9 9 9 999 c5 c5 c5 c5 c5 c5 c5 c5 ••••9 9 9 9 • •9 9 9 9 9 9 9 999 9 9 c5 c5 c5 c5 c5 c5 c5 c5 •••••• 9999 c5 c5 c5 c5 ••••••• •! ! ! ! 9999 c5 c5 c5 c5 !!!! 999 9 ••••• ••••• ••••• ••••• '--- 9999 9999 999 9 !!!! ! ! ! ! L - -_ _.,.--_ _ _ _---'. • • • ! ! ! ! ! ! !! •••••••• ! ! ! ! ! ! !! •••••••• ! ! ! ! ! ! !! •••••••• ! ! ! ! ! ! !! •••••••• 100% 100% FIGURE 16-6 ARepresentative Sample from aPopulation in Which the Variables Are Related Other researchers prefer to report the specific expected frequencies for all the cells in the contin- level of significance for each association, disregard- gency table> We then compare this expected distri- ing the conventions of .05, .01, and .OOL Rather bution with the distribution of cases actually found than reporting that a given association is significant in the sample data, and we determine the probabil- at the .05 level, they might report significance at ity that the discovered discrepancy could have re- the .023 level, indicating the chances of its having sulted from sampling error alone> An example will resulted from sampling error as 23 out of 1,000> illustrate this procedure. Chi Square Let's assume we're interested in the possible re- lationship between church attendance and gender Chi square (X\") is a frequently used test of signifi- for the members of a particular church. To test this cance in social science> It's based on the Ilulllzyporlz- relationship, we select a sample of 100 church esis: the assumption that there is no relationship be- members at random> We find that our sample is tween two variables in the total population (as you made up of 40 men and 60 women and that may recall from Chapter 2) . Given the observed 70 percent of our sample say they attended church distribution of values on the two separate variables, during the preceding week, whereas the remaining we compute the conjoint distribution that would be 30 percent say they did nOL expected if there were no relationship between the two variables. The result of this operation is a set of If there is no relationship between gender and church attendance, then 70 percent of the men in the sample should have attended church during

Inferential Statistics 467 TABLE 16-6 Men Women Total A Illustration of Chi I> Expected Cell Frequendes 28 42 70 Attended church 12 18 30 Did not attend church 40 60 100 Total Men Women Total II> Observed Cell Frequencies Attended Church 20 50 70 Did not attend church 20 10 30 Total 40 60 100 III> (Observed Expected)2 7 Expected Men Women Attended church Did not attend church i2.29 1.52 = 12.70 5.33 3.56 P< .001 the preceding week, and 30 percent should have This value is the overall discrepancy between stayed away> Moreover, women should have at- the observed conjoint distribution in the sample tended in the same proportion. Table 16-6 (part I) and the distribution we would expect if the two shows that, based on this model, 28 men and 42 variables were unrelated to each other.. Of course, women would have attended church, with 12 men the mere discovery of a discrepancy does not prove and 18 women not attending. that the two variables are related, because normal sampling error might produce discrepancies even Part II of Table 16-6 presents the observed at- when there is no relationship in the total popula- tendance for the hypothetical sample of 100 tion> The magnitude of the value of chi square, church members> Note that 20 of the men report however, permits us to estimate the probability of having attended church during the preceding that having happened> week, and the remaining 20 say they did noL Among the women in the sample, 50 attended Degrees ofFreedom church and 10 did nOL Comparing the expected and observed frequencies (parts I and II), we note To determine the statistical significance of the ob- that somewhat fewer men attended church than served relationship, we must use a standard set of expected, whereas somewhat more women at- chi square values> This will require the computa- tended than expected> tion of the degrees offreedom, which refer to the pos- sibilities for variation 'within a statistical modeL Chi square is computed as follows. For each cell Suppose I challenge you to find three numbers in the tables, the researcher (1) subtracts the ex- whose mean is 11. There is an infinite number of pected frequency for that cell from the observed solutions to this problem: (11, 11, 11), (10, 11, 12), frequency, (2) squares this quantity, and (3) divides (-11, 11, 33), and so on. Now, suppose I require the squared difference by the expected frequency. that one of the numbers be 7> There would still be This procedure is carried out for each cell in the an infinite number of possibilities for the other two tables; Part III ofTable 16-6 presents the cell-by-cell numbers. computations> The several results are then added together to find the value of chi square: 12. 70 in If I told you one number had to be 7 and the example. another 10, however, there would be only one

468 Chapter 16: Statistical Analyses possible value for the third. If the average of three For chi square, the degrees of freedom are numbers is 11, their sum must be 33. If two of the computed as follows: the number of rows in the numbers total 17, the third must be 16. In this situ- table of observed frequencies, minus L is multi- ation, we say there are two degrees of freedom. plied by the number of columns, minus 1. This may Two of the numbers could have any values we be written as (1' - 1)(c 1). For a three-by-three choose, but once they are specified, the third num- table, then, there are four degrees of freedom: ber is determined. (3 1)(3 - 1) = (2)(2) 4. More generally, whenever we're examining In the example of gender and church atten- the mean of N values, we can see that the degrees dance, we have two rows and two columns (dis- of freedom equal N - 1. Thus, in the case of the counting the totals), so there is one degree of free- mean of 23 values, we could make 22 of them dom. Turning to a table of chi square values (see anything we liked, but the 23rd would then be Appendix D), we find that for one degree of free- determined. dom and random sampling from a population in which there is no relationship between two vari- A similar logic applies to bivariate tables, such ables, 10 percent of the time we should expect a as those analyzed by chi square. Consider a table chi square of at least 2.7. Thus, if we selected 100 reporting the relationship between two dichoto- samples from such a population, we should expect mous variables: gender (men/women) and abor- about 10 of those samples to produce chi squares lion alTirude (approve/disapprove). Notice that the equal to or greater than 2.7. Moreover, we should table provides the marginal frequencies of both ex-pect chi square values of at least 6.6 in only variables. 1 percent of the samples and chi square values of 10.8 in only one tenth of a percent (.001) of the Abortion Attitude Men Women Total samples. The higher the chi square value, the less Approve 500 500 500 probable it is that the value could be attributed to Disapprove 500 sampling error alone. Total 1,000 In our example, the computed value of chi Despite the conveniently round numbers in square is 12.70. If there were no relationship be- this hypothetical example, notice that there are nu- tween gender and church attendance in the church merous possibilities for the cell frequencies. For ex- member population and a large number of samples ample, it could be the case that all 500 men ap- had been selected and studied, then we would ex- prove and all 500 women disapprove, or it could be pect a chi square of this magnitude in fewer than just the reverse. Or there could be 250 cases in each III 0 of 1 percent (.001) of those samples. Thus, cell. Notice there are numerous other possibilities. the probability of obtaining a chi square of this magnitude is less than .001, if random sampling Now the question is, How many cells could we has been used and there is no relationship in the fill in pretty much as we choose before the remain- population. We report this finding by saying the der are determined by the marginal frequencies? relationship is statistically significant at the .001 The answer is only one. If we know that 300 men level. Because it is so improbable that the observed approved, for example, then 200 men would have relationship could have resulted from sampling had to disapprove, and the distribution would need error alone, we're likely to reject the null hypothe- to be just the opposite for the women. sis and assume that there is a relationship between the two variables in the population of church In this instance, then, we say the table has one members. degree of freedom. Now, take a few minutes to construct a three-by-three table. Assume you know Most measures of association can be tested for the marginal frequencies for each variable, and see statistical significance in a similar manner. Standard if you can determine how many degrees of free- tables of values permit us to determine whether a dom it has. given association is statistically significant and at

Inferential Statistics 469 what leveL Any standard statistics textbook pro- We'd conclude, in fact that they were essentially vides instructions on the use of such tables. the same age. Some Words of Caution Second, lest you be misled by this hypothetical example, realize that statistical significance should Tests of significance provide an objective yardstick not be calculated on relationships observed in data that we can use to estimate the statistical signifi- collected from whole populations. Remember, tests cance of associations between variables. They help of statistical significance measure the likelihood of us rule out associations that may not represent relationships between variables being only a prod- genuine relationships in the population under uct of sampling error; if there's no sampling, there's study. However, the researcher who uses or reads no sampling error. reports of significance tests should remain wary of several dangers in their interpretation. Third, tests of significance are based on the same sampling assumptions vve used in computing con- First we've been discussing tests of statistical fidence intervals. To the extent that these assump- significance; there are no objective tests of sub- tions are not met by the actual sampling design, the slantive significance. Thus, we may be legitimately tests of Significance are not strictly legitimate. convinced that a given association is not due to sampling error, but we may be in the position of We've examined statistical significance here in asserting H'ithout fear of contradiction that two the form of chi square, but social scientists com- variables are only slightly related to each other. monly use several other measures as well. Analysis Recall that sampling error is an inverse function of of variance and t-tests are two examples you may sample size-the larger the sample, the smaller the run across in your studies. expected error. Thus, a correlation of, say, .1 might very well be significant (at a given level) if discov- As is the case for most matters covered in this ered in a large sample, whereas the same correlation book, I have a personal prejudice. In tlus instance, between the same two variables would not be signif- it's against tests of significance. I don't object to the icant if found in a smaller sample. This makes per- statistical logic of those tests, because the logic is fectly good sense given the basic logic of tests of sig- sound. Rather, I'm concerned that such tests seem nificance: In the larger sample, there is less chance to mislead more than they enlighten. My principal that the correlation could be simply the product of reservations are the following: sampling error. In both samples, however, it might represent an essentially zero correlation. 1. Tests of significance make sampling assump- tions that are virtually never satisfied by actual The distinction between statistical and substan- sampling designs. tive significance is perhaps best illustrated by those cases where there is absolute certainty that observed 2. They depend on the absence of nonsampling differences cannot be a result of sampling error. errors, a questionable assumption in most ac- This would be the case when we observe an entire tual empirical measurements. population. Suppose we were able to learn the ages of every public official in the United States and of 3. In practice, they are too often applied to mea- every public official in Russia. For argument's sake, sures of association that have been computed let's assume further that the average age of US. in \\'iolation of the assumptions made by those officials was 45 years old compared with, say, 46 for measures (for example, product-moment cor- the Russian officials. Because vl'e would have the relations computed from ordinal data). ages of all officials, there would be no question of sampling error. We would know with certainty that 4.. Statistical significance is too easily misinter- the Russian officials were older than their US. preted as \"strength of association,\" or substan- counterparts. At the same time, we would say that tive Significance. the difference was of no substantive significance. These concerns are underscored by a study (Sterling, Rosenbaum, and Weinkam 1995) exam- ining the publication policies of nine psychology and three medical journals. As the researchers dis-

470 Chapter 16: Statistical Analyses covered, the journals were quite unlikely to publish suggest an extremely conservative approach to tests articles that did not report statistically significant of significance-that you should use them only correlations among variables. They quote the fol- when all assumptions are met-my general per- lowing from a rejection letter: spective is just the reverse, Unfortunately, we are not able to publish this I encourage you to use any statistical tech- manuscript. The manuscript is very well writ- nique-any measure of association or test of ten and the study was well documented. Un- significance-if it will help you understand your fortunately, the negative results translate into a data. If the computation of product-moment corre- minimal contribution to the field. We encour- lations among nominal variables and the testing of age you to continue your work in this area and statistical significance in the context of uncon- we will be glad to consider additional manu- trolled sampling \"vill meet this criterion, then I en- scripts that you may prepare in the future. courage such activities. I say this in the spirit of what Hanan Selvin, another pioneer in developing (Slerlillg el al. 1995. 109) the elaboration model. referred to as \"data- dredging techniques.\" Anything goes, if it leads Let's suppose a researcher conducts a scien- ultimately to the understanding of data and of the tifically excellent study to determine whether X social world under study. causes Y. The results indicate no statistically signifi- cant correlation. That's good to know. If we're in- The price of this radical freedom, however, is terested in what causes cancer, war, or juvenile the giving up of strict, statistical interpretations. delinquency. it's good to know that a possible cause You will not be able to base the ultimate impor- actually does not cause it. That knowledge would tance of your finding solely on a significant correla- free researchers to look elsewhere for causes. tion at the .05 level. Whatever the avenue of dis- covery, empirical data must ultin1ately be presented As we've seen, however, journals might very in a legitimate manner, and their importance must well reject such a study. Other researchers would be argued logically. likely continue testing whether .tY. causes Y, not knovving that previous studies found no causal re- Other Multivariate Techniques lationship. This would produce many wasted stud- ies, none of which would see publication and draw For the most part, this book has focused on rather a close to the analysis of X as a cause of Y. rudimentary forms of data manipulation, such as the use of contingency tables and percentages. The From what you've learned about probabilities, elaboration model of analysis was presented in this however, you can understand that if enough stud- form, as 'well as many of the examples of data ies are conducted, one will eventually measure a analysis throughout the book statistically significant correlation between X and Y. If there is absolutely no relationship between the This section of the chapter presents a cook's two variables, we would expect a correlation tour of several other multivariate techniques from significant at the .05 level five times out of a the logical perspective of elaborating the relation- hundred, because that's what the .05 level of ships among social variables. This discussion is in- significance means. If a hundred studies were con- tended not to teach you how to use these tech- ducted, therefore, we could expect five to suggest niques but rather to present sufficient information a causal relationship where there was actually so that you can understand them if you run across none-and those five studies would be published! them in a research report. The methods of analysis that we'll examine-path analysis, time-series There are, then, serious problems inherent in analysis, factor analysis, analysis of variance, dis- too much reliance on tests of statistical Significance. criminant analysis, log-linear models, and Geo- At the same time (perhaps paradoxically) I would graphic Information Systems-are only a few of suggest that tests of significance can be a valuable asset to the researcher-useful tools for under- standing data. Although many of my comments

Other Multivariate Techniques 471 the many multivariate techniques used by social religion is the\"only true faith\"; (3) acceptance of scientists. the view that the Jews crucified Jesus; (4) religious hostility toward contemporary Jews, such as believ- Path Analysis ing that God is punishing them or that they 'will suffer damnation unless they convert to Christian- Path analysis is a causal model for understanding ity; and (5) secular anti-Semitism, such as believing relationships between variables. Though based on that Jews cheat in business, are disloyal to their regression analysis, it can provide a more useful country, and so forth. graphic picture of relationships among several vari- ables than other means can. Path analysis assumes To start with, the researchers who conducted that the values of one variable are caused by the this analysis proposed that secular anti-Semitism values of another, so distinguishing independent was produced by moving through the five vari- and dependent variables is essential. This require- ables: Orthodoxy caused particularism, which ment is not unique to path analysis, of course, but caused the view of the historical Jews as crucifiers, path analysis provides a unique way of displaying which caused religious hostility toward contem- ell.-planatory results for interpretation. porary Jews, which resulted, finally, in secular anti-Semitism. Recall for a moment one of the ways I repre- sented the elaboration model in Chapter 15 (Figure The path diagram tells a different story. The re- 15-1). Here's how we might diagram the logic of searchers found, for example, that belief in the his- interpretation: torical role of Jews as the crucifiers of Jesus doesn't seem to matter in the process that generates anti- Independent Intervening Dependent Semitism. And, although particularism is a part of variable variable variable one process resulting in secular anti-Semitism, the diagram also shows that anti-Semitism is created The logic of this presentation is that an inde- more directly by orthodoxy and religious hostility. pendent variable has an impact on an intervening Orthodoxy produces religious hostility even with- variable, which in turn has an impact on a depen- out particularism, and religious hostility generates dent variable. The path analyst constructs similar secular hostility in any event. patterns of relationships among variables, but the typical path diagram contains many more variables One last comment on path analysis is in order. than shown in this diagram. Although it's an excellent way of handling complex causal chains and networks of variables, path Besides diagramming a network of relation- analysis itself does not tell the causal order of the ships among variables, path analysis also shows variables. Nor was the path diagram generated by the strengths of those several relationships. The computer. The researcher decided the structure of strengths of relationships are calculated from a relationships among the variables and used com- regression analysis that produces numbers analo- puter analysis merely to calculate the path gous to the partial relationships in the elaboration coefficients that apply to such a structure. model. These path coefficients, as they're called, rep- resent the strengths of the relationships between Time-Series Analysis pairs of variables, 'with the effects of all other vari- ables in the model held constant. The various forms of regression analysis are often used to examine time-series data, representing The analysis in Figure 16-7, for example, fo- changes in one or more variables over time. As I'm cuses on the religious causes of anti-Semitism among Christian church members. The variables in ------,------------------ the diagram are, from left to right, (I) orthodoxy, path analysis A form of multivariate analysis in or the extent to which the subjects accept conven- which the causal relationships among variables are tional beliefs about God, Jesus, biblical miracles, presented in a graphical fom1at. and so forth; (2) particularism, the belief that one's

472 Chapter 16: Statistical Analyses ORTHODOXY RELIGIOUS SECULAR HOSTILITY ANTI-SEMITISM .04 ~!/\\:/)$ 7/ // // // // // // // // HISTORICAL JEWS AS \"CRUCIFIERS\" FIGURE 16-7 Diagramming the Religious Sources of Anti-Semitism Source: Rodney Stark, Bruce D. Foster, Charles Y. Glock, and Harold E. Quinley, Wayward Shepherds-Prejudice and the Protestant Clergy. Copyright © 1971 by Anti-Defamation League of B'nai Brith. Reprinted by permission of Harper &Row, Publishers, Inc. sure you know, US. crime rates have generally in- Each dot on the graph represents the number creased over the years. A time-series analysis of of larcenies reported to police during the year crime rates could express the long-term trend in a indicated. regression format and provide a way of testing ex- planations for the trend-such as population Suppose we feel that larceny is partly a func- growth or economic fluctuations-and could per- tion of overpopulation. You might reason that mit forecasting of future crime rates. crowding would lead to psychological stress and frustration, resulting in increased crimes of many In a simple illustration, Figure 16-8 graphs sorts. Recalling the discussion of regression analysis, the larceny rates of a hypothetical city over time. we could create a regression equation representing the relationship between larceny and population time-series analysis An analysis of changes in a density-using the actual figures for each variable, variable (e.g., crime rates) over time. with years as the units of analysis. Having created the best-fitting regression equation, we could then calculate a larceny rate for each year, based on that

Other Multivariate Techniques 473 Regression line .. /\" based on G ,,\". c \"\".Q population density ,,\"\" \",.ro:; ~0 G ....... ~\" \"It'''ao.(l \": \"qoo ,,/0 0 \"41,, .. ,. Actual larceny rate ../~ 0 ¢== \".{ ,G,0/{ • .;'4 \" ~4\" 4\" 1950 1960 1970 1980 1990 Year FIGURE 16-8 The Larceny Rates over TIme in aHypothetical City year's population density rate. For the sake of sim- create a regression equation that predicted a given plicity, let's assume that the city's population size year's larceny rate based, in part, on the previous (and hence density) has been steadily increasing. year's unemployment rate or perhaps on an aver- This would lead us to predict a steadily increasing age of the two years' unemployment rates. The larceny rate as well, These regression estimates possibilities are endless. are represented by the dashed regression line in Figure 16-8. If you think about it, a great many causal rela- tionships are likely to involve a time lag. Histori- Time-series relationships are often more com- cally, many of the world's poor countries have plex than this simple illustration suggests. For one maintained their populations by matching high thing, there can be more than one causal variable. death rates vvith equally high birthrates. It has been For example, we might find that unemployment observed repeatedly, moreover, that when a soci- rates also had a powerful impact on larceny. We ety's death rate is drastically reduced-through im- might develop an equation to predict larceny on proved medical care, public sanitation, and im- the basis of both of these causal variables. As a re- proved agriculture, for example-that society's sult, the predictions might not fall along a simple, birthrate drops sometime later on, but with an in- straight line. Whereas population density was in- tervening period of rapid population grovvth. Or, to creasing steadily in the first model, unemployment take a very different example, a crackdown on rates rise and fall. As a consequence, our predic- speeding on a state's highways would likely reduce tions of the larceny rate would similarly go up the average speed of cars. Again, however, the and down. causal relationship would undoubtedly involve a tin1e lag-days, weeks, or months, perhaps-as Pursuing the relationship between larceny and motorists began to realize the seriousness of the unemployment rates, we might reason that people crackdown. do not begin stealing as soon as they become un- employed. Typically, they might first exhaust their In all such cases, the regression equations gen- savings, borrow from friends, and keep hoping for erated might take many forms. In any event, the work. Larceny would be a last resort. criterion for judging success or failure is the extent to which the researcher can account for the actual Time-lagged regression analysis could be used to values observed for the dependent variable. address this more complex case. Thus, we might

474 Chapter 16: Statistical Analyses Factor Analysis factor and low loadings on the mathematical ability factor. Data items measuring mathematical ability Factor analysis is a unique approach to multivariate would have just the opposite pattern. analysis> Its statistical basis is complex enough and different enough from the foregoing discussions to In practice, factor analysis does not proceed in suggest a general discussion here> this fashion. Rather, the variables are input to the program, and the program outputs a series of fac- Factor analysis is a complex algebraic method tors ,'lith appropriate factor loadings. The analyst used to discover patterns among the variations in must then determine the meaning of a given factor values of several variables> This is done essentially on the basis of those variables that load highly on through the generation of artificial dimensions (fac- iL The program's generation of factors, however, tors) that correlate highly with several of the real has no reference to the meaning of variables, only variables and that are independent of one another. to their empirical associations. Two criteria are A computer must be used to perform this complex taken into account: (1) a factor must ex\"})lain a rela- operation> tively large portion of the variance found in the study variables, and (2) every factor must be more Let's suppose that a data file contains several or less independent of every other factor. indicators of subjects' prejudice, Each item should provide some indication of prejudice, but none will Here's an example of the use of factor analysis> give a perfect indication> All of these items, more- Many social researchers have studied the problem over, should be highly intercorrelated empirically> of delinquency> If you look deeply into the prob- In a factor analysis of the data, the researcher lem, however, you'll discover that there are many would create an artificial dimension that would be different types of delinquents> In a survey of high highly correlated vvith each of the items measuring school students in a small Wyoming town, Morris prejudice> Each subject would essentially receive a Forslund (1980) set out to create a typology of value on that artificial dimension, and the value as- delinquency. His questionnaire asked students to signed would be a good predictor of the observed report whether they had committed a variety of attributes on each item> delinquent acts. He then submitted their responses to factor analysis> The results are shovvn in Suppose now that the same study provided Table 16-7. several indicators of subjects' mathematical ability. It's likely that the factor analysis would also gener- As you can see in this table, the various delin- ate an artificial dimension highly correlated with quent acts are listed on the left. The numbers each of those items. shown in the body of the table are the factor load- ings on the four factors constructed in the analysis> The output of a factor analysis program consists You'll notice that after examining the dimensions, of columns representing the several factors or factors, Forslund labeled them. I've bracketed (artificial dimensions) generated from the observed the items on each factor that led to his choice of la- relations among variables plus the correlations be- bels. Forslund summarized the results as follows: tween each variable and each factor-called the factor loadings. For the total sample four fairly distinct patterns of delinquent acts are apparent. In order of In the preceding example, it's likely that one variance explained, they have been labeled: factor would more or less represent prejudice, and 1) Property Offenses, including both vandalism another would more or less represent mathematical and theft; 2) Incorrigibility; 3) Drugs / Truancy; ability. Data items measuring prejudice would have and 4) Fighting. It is interesting, and perhaps high loadings on (correlations with) the prejudice surprising, to find both vandalism and theft ap- pear together in the same factor. It would seem factor analysis A complex algebraic method for that those high school students who engage in determining the general dimensions or factors that property offenses tend to be involved in both exist within a set of concrete observations.,


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook