Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore The Practice of Social Research by Earl R. Babbie (z-lib.org)

The Practice of Social Research by Earl R. Babbie (z-lib.org)

Published by Mr.Phi's e-Library, 2022-01-25 04:20:04

Description: The Practice of Social Research by Earl R. Babbie (z-lib.org)

Search

Read the Text Version

176 ■ Chapter 6: From Concept to Measurement The Srole scale illustrates another important of why this is so (we’ll discuss this point more fully point. Letting conceptualization and operationaliza- in Part 4). tion be open-ended does not necessarily produce anarchy and chaos, as you might expect. Order It’s easy to see the importance of clear and often emerges. For one thing, although we could precise definitions for descriptive research. If we define anomia any way we chose—in terms of, want to describe and report the unemployment say, shoe size—we’re likely to define it in ways not rate in a city, our definition of being unemployed too different from other people’s mental images. If is obviously critical. That definition will depend on you were to use a really offbeat definition, people our definition of another term: the labor force. If would probably ignore you. it seems patently absurd to regard a three-year-old child as being unemployed, it is because such a A second source of order is that, as researchers child is not considered a member of the labor force. discover the utility of a particular conceptualization Thus, we might follow the U.S. Census Bureau’s and operationalization of a concept, they’re likely convention and exclude all people under 14 years to adopt it, which leads to standardized definitions of age from the labor force. of concepts. Besides the Srole scale, examples include IQ tests and a host of demographic and This convention alone, however, would not economic measures developed by the U.S. Census give us a satisfactory definition, because it would Bureau. Using such established measures has two count as unemployed such people as high school advantages: They have been extensively pretested students, the retired, the disabled, and homemak- and debugged, and studies using the same scales ers. We might follow the census convention further can be compared. If you and I do separate stud- by defining the labor force as “all persons 14 years ies of two different groups and use the Srole scale, of age and over who are employed, looking for we can compare our two groups on the basis of work, or waiting to be called back to a job from anomia. which they have been laid off or furloughed.” If a student, homemaker, or retired person is not look- Social scientists, then, can measure anything ing for work, such a person would not be included that’s real; through conceptualization and opera- in the labor force. Unemployed people, then, tionalization, they can even do a pretty good job would be those members of the labor force, as of measuring things that aren’t. Granting that such defined, who are not employed. concepts as socioeconomic status, prejudice, com- passion, and anomia aren’t ultimately real, social But what does “looking for work” mean? Must scientists can create order in handling them. It is a person register with the state employment service an order based on utility, however, not on ultimate or go from door to door asking for employment? truth. Or would it be sufficient to want a job or be open to an offer of employment? Conventionally, “look- Definitions in Descriptive ing for work” is defined operationally as saying yes and Explanatory Studies in response to an interviewer’s asking “Have you been looking for a job during the past seven days?” As you’ll recall from Chapter 4, two general pur- (Seven days is the period most often specified, but poses of research are description and explanation. for some research purposes it might make more The distinction between them has important impli- sense to shorten or lengthen it.) cations for definition and measurement. If it seems that description is simpler than explanation, you As you can see, the conclusion of a descrip- may be surprised to learn that definitions are more tive study about the unemployment rate depends problematic for descriptive research than for ex- directly on how each issue of definition is resolved. planatory research. Before we turn to other aspects Increasing the period during which people are of measurement, you’ll need a basic understanding counted as looking for work would add more unemployed people to the labor force as defined, thereby increasing the reported unemployment rate. If we follow another convention and speak of

Operationalization Choices ■ 177 the civilian labor force and the civilian unemploy- is best? As we saw in the discussion of indicators, ment rate, we’re excluding military personnel; that, this is not necessarily an insurmountable obstacle too, increases the reported unemployment rate, to our research. Suppose we found old people to because military personnel would be employed— be more conservative than young people in terms by definition. Thus, the descriptive statement that of all 25 definitions. Clearly, the exact definition the unemployment rate in a city is 3 percent, or wouldn’t matter much. We would conclude that old 9 percent, or whatever it might be, depends directly people are generally more conservative than young on the operational definitions used. ­people—even though we couldn’t agree about exactly what conservative means. This example is relatively clear because there are several accepted conventions relating to the In practice, explanatory research seldom results labor force and unemployment. Now, consider how in findings quite as unambiguous as this example difficult it would be to get agreement about the suggests; nonetheless, the general pattern is quite definitions you would need in order to say, “Forty- common in actual research. There are consistent five percent of the students at this institution are patterns of relationships in human social life that politically conservative.” Like the unemployment result in consistent research findings. However, rate, this percentage would depend directly on the such consistency does not appear in a descriptive definition of what is being measured—in this case, situation. Changing definitions almost inevitably political conservatism. A different definition might results in different descriptive conclusions. The result in the conclusion “Five percent of the stu- Tips and Tools feature, “The Importance of Variable dent body are politically conservative.” Names,” explores this issue in connection with the variable citizen participation. What percentage of the population do you sup- pose is “disabled”? That’s the question Lars Gronvik Operationalization Choices asked in Sweden. He analyzed several databases that encompassed four different definitions or mea- In discussing conceptualization, I frequently have sures of disablility in Swedish society. One study referred to operationalization, for the two are asked people if they had hearing, seeing, walking, intimately linked. To recap: Conceptualization is or other functional problems. Two other measures the refinement and specification of abstract con- were based on whether people received one of two cepts, and operationalization is the development of forms of government disability support. Another specific research procedures (operations) that will study asked people whether they believed they result in empirical observations representing those were disabled. concepts in the real world. The four measures indicated different popula- As with the methods of data collection, social tion totals for those citizens defined as “disabled,” researchers have a variety of choices when opera- and each measure produced different demographic tionalizing a concept. Although the several choices profiles that included variables such as sex, age, are intimately interconnected, I’ve separated them education, living arrangement, education, and for the sake of discussion. Realize, though, that ­labor-force participation. As you can see, it is im- operationalization does not proceed through a possible to answer a descriptive question such as systematic checklist. this without specifying the meaning of terms. Range of Variation Ironically, definitions are less problematic in the case of explanatory research. Let’s suppose we’re In operationalizing any concept, researchers must interested in explaining political conservatism. Why be clear about the range of variation that interests are some people conservative and others not? More them. The question is, to what extent are they will- specifically, let’s suppose we’re interested in whether ing to combine attributes in fairly gross categories? conservatism increases with age. What if you and I have 25 different operational definitions of con- servative, and we can’t agree on which definition

178 ■ Chapter 6: From Concept to Measurement Tips and Tools The Importance of Variable Names local government meetings; another might maintain a record of the different topics addressed by private citizens at similar meetings; Patricia Fisher while a third might record the number of local government meeting Graduate School of Planning, University of Tennessee attendees, letters and phone calls received by the mayor and other public officials, and meetings held by special interest groups during Operationalization is one of those things that’s easier said than done. It a particular time period. As skilled researchers, we can readily see is quite simple to explain to someone the purpose and importance of that each planner would be measuring (in a very simplistic fashion) operational definitions for variables, and even to describe how operation- a different dimension of citizen participation: extent of citizen par- alization typically takes place. However, until you’ve tried to operationalize ticipation, issues prompting citizen participation, and form of citizen a rather complex variable, you may not appreciate some of the subtle participation. Therefore, the original naming of our variable, citizen difficulties involved. Of considerable importance to the operationalization participation, which was quite satisfactory from a conceptual point of effort is the particular name that you have chosen for a variable. Let’s view, proved inadequate for purposes of operationalization. consider an example from the field of Urban Planning. The precise and exact naming of variables is important in A variable of interest to planners is citizen participation. Planners research. It is both essential to and a result of good operationaliza- are convinced that participation in the planning process by citizens is tion. Variable names quite often evolve from an iterative process of important to the success of plan implementation. Citizen participation forming a conceptual definition, then an operational definition, then is an aid to planners’understanding of the real and perceived needs of renaming the concept to better match what can or will be measured. a community, and such involvement by citizens tends to enhance their This looping process continues (our example illustrates only one cooperation with and support for planning efforts. Although many differ- iteration), resulting in a gradual refinement of the variable name and ent conceptual definitions might be offered by different planners, there its measurement until a reasonable fit is obtained. Sometimes the would be little misunderstanding over what is meant by citizen partici- concept of the variable that you end up with is a bit different from pation. The name of the variable seems adequate. the original one that you started with, but at least you are measur- ing what you are talking about, if only because you are talking about However, if we ask different planners to provide very simple op- what you are measuring! erational measures for citizen participation, we are likely to find a variety among their responses that does generate confusion. One planner might keep a tally of attendance by private citizens at city commission and other Let’s suppose you want to measure people’s the general U.S. population, a bottom category of incomes in a study by collecting the information $5,000 or less usually works fine. from either records or interviews. The highest annual incomes people receive run into the mil- In studies of attitudes and orientations, the lions of dollars, but not many people earn that question of range of variation has another dimen- much. Unless you’re studying the very rich, it sion. Unless you’re careful, you may end up mea- probably won’t add much to your study to keep suring only half an attitude without really meaning track of extremely high categories. Depending on to. Here’s an example of what I mean. whom you study, you’ll probably want to estab- lish a highest income category with a much lower Suppose you’re interested in people’s attitudes floor—maybe $100,000 or more. Although this t­oward expanding the use of nuclear power gen- decision will lead you to throw together people erators. You’d anticipate that some people consider who earn a trillion dollars a year with paupers nuclear power the greatest thing since the wheel, earning a mere $100,000, they’ll survive it, and whereas other people have absolutely no inter- that mixing probably won’t hurt your research est in it. Given that a­ nticipation, it would seem to any, either. The same decision faces you at the make sense to ask people how much they favor other end of the income spectrum. In studies of expanding the use of n­ uclear ­energy and to give them answer categories ranging from “Favor it very much” to “Don’t favor it at all.”

Operationalization Choices ■ 179 This operationalization, however, conceals half a group labeled 10 to 19 years old? Don’t answer the attitudinal spectrum regarding nuclear energy. too quickly. If you wanted to study rates of voter Many people have feelings that go beyond simply registration and participation, you’d definitely want not favoring it: They are, with greater or lesser to know whether the people you studied were degrees of intensity, actively opposed to it. In this old enough to vote. In general, if you’re going to instance, there is considerable variation on the left measure age, you must look at the purpose and side of zero. Some oppose it a little, some quite a procedures of your study and decide whether fine bit, and others a great deal. To measure the full or gross differences in age are important to you. range of variation, then, you’d want to operation- In a survey, you’ll need to make these decisions in alize attitudes toward nuclear energy with a range order to design an appropriate questionnaire. In from favoring it very much, through no feelings the case of in-depth interviews, these decisions will one way or the other, to opposing it very much. condition the extent to which you probe for details. This consideration applies to many of the vari- The same thing applies to other variables. If ables social scientists study. Virtually any public you measure political affiliation, will it matter to your issue involves both support and opposition, each ­inquiry whether a person is a conservative Democrat in varying degrees. In measuring religiosity, people rather than a liberal Democrat, or will it be sufficient are not just more or less religious; some are posi- to know the party? In measuring religious affiliation, tively antireligious. Political orientations range is it enough to know that a person is Protestant, or from very liberal to very conservative, and depend- do you need to know the denomination? Do you ing on the people you’re studying, you may want simply need to know if a person is married, or will to allow for radicals on one or both ends. it make a difference to know if he or she has never married or is separated, widowed, or divorced? The point is not that you must measure the full range of variation in every case. You should, how- There is, of course, no general answer to such ever, consider whether you need to, given your questions. The answers come out of the purpose of particular research purpose. If the difference be- a given study, or why we are making a particular tween not religious and antireligious isn’t relevant measurement. I can give you a useful guideline, to your research, forget it. Someone has defined though. Whenever you’re not sure how much de- pragmatism as “any difference that makes no tail to pursue in a measurement, get too much detail difference is no difference.” Be pragmatic. rather than too little. When a subject in an i­n-depth interview volunteers that she is 37 years old, record Finally, decisions on the range of variation “37” in your notes, not “in her thirties.” When should be governed by the expected distribution you’re analyzing the data, you can always combine of attributes among the subjects of the study. In precise attributes into more general categories, but a study of college professors’ attitudes toward the you can never separate any variations you lumped value of higher education, you could probably stop together during observation and measurement. at no value and not worry about those who might consider higher education dangerous to students’ A Note on Dimensions health. (If you were studying students, however . . .) We’ve already discussed dimensions as a charac- Variations between the Extremes teristic of concepts. When researchers get down to the business of creating operational measures Degree of precision is a second consideration in of variables, they often discover—or worse, never operationalizing variables. What it boils down to notice—that they’re not exactly clear about which is how fine you will make distinctions among the dimensions of a variable they’re really interested various possible attributes composing a given vari- in. Here’s an example. able. Does it matter for your purposes whether a person is 17 or 18 years old, or could you con- Let’s suppose you’re studying people’s attitudes duct your inquiry by throwing them together in toward government, and you want to include an

180 ■ Chapter 6: From Concept to Measurement examination of how people feel about corruption. Although people sometimes use the terms, sex Here are just a few of the dimensions you might and gender, interchangeably, they mean different examine: things. “Sex” is the proper name of the variable composed of the physical attributes female and • Do people think there is corruption in male, while “gender” is a social-identity and behav- ioral variable composed of the attributes, feminine government? and masculine. Femininity represents those qualities we traditionally associate with women, and mascu- • How much corruption do they think there is? linity represents those qualities we traditionally as- • How certain are they in their judgment of how sociate with men. However, women and men often feel, act on, and are perceived as having qualities much corruption there is? associated with the other sex. Although the distinc- tions between these two concepts are sometimes • How do they feel about corruption in govern- blurred, even in social research reports, my inten- tion is to stick to their technical meanings in this ment as a problem in society? textbook. • What do they think causes it? In any case, the conceptualization and oper- • Do they think it’s inevitable? ationalization processes can be seen as the speci- • What do they feel should be done about it? fication of variables and the attributes composing • What are they willing to do personally to elimi- them. Thus, in the context of a study of unemploy- ment, employment status is a variable having the nate corruption in government? attributes employed and unemployed; the list of attri- butes could also be expanded to include the other • How certain are they that they would be will- possibilities discussed earlier, such as homemaker. ing to do what they say they would do? Levels of Measurement The list could go on and on—how people feel All variables are composed of attributes, but as we about corruption in government has many dimen- are about to see, the attributes of a given variable sions. It’s essential to be clear about which ones are can have a variety of different relationships to one important in our inquiry; otherwise, you may mea- another. In this section, we’ll examine four levels of sure how people feel about corruption when you measurement: nominal, ordinal, interval, and ratio. really wanted to know how much they think there is, or vice versa. Nominal Measures Once you’ve determined how you’re going Variables whose attributes are simply different to collect your data (for example, survey, field re- from one another are called nominal measures. Ex- search) and have decided on the relevant range of amples include gender, religious affiliation, political variation, the degree of precision needed between party affiliation, birthplace, college major, and hair color. the extremes of variation, and the specific dimen- Although the attributes composing each of these sions of the variables that interest you, you may variables—as male and female compose the vari- have another choice: a mathematical-logical one. able gender—are distinct from one another, they That is, you may need to decide what level of mea- have no additional structures. Nominal measures surement to use. To discuss this point, we need to merely offer names or labels for characteristics. take another look at attributes and their relation- ship to variables. Imagine a group of people characterized in terms of one such nominal variable and physically Defining Variables and Attributes grouped by the applicable attributes. For example, say we’ve asked a large gathering of people to stand An attribute, you’ll recall, is a characteristic or quality of something. Female is an example. So is old or student. Variables, on the other hand, are logical sets of attributes. Thus, sex is a variable composed of the attributes female and male. What could be simpler?

Operationalization Choices ■ 181 together in groups according to the states in which group. This manner of grouping people satisfies they were born: all those born in Vermont in one the nominal-variable quality of being different, as group, those born in California in another, and so discussed earlier. In addition, however, we might forth. The variable is state of birth; the attributes are logically arrange the three groups in terms of the born in California, born in Vermont, and so on. All the relative amount of formal education (the shared people standing in a given group have at least one attribute) each had. We might arrange the three thing in common and differ from the people in all groups in a row, ranging from most to least for- other groups in that same regard. Where the indi- mal education. This arrangement would provide vidual groups form, how close they are to one an- a physical representation of an ordinal mea- other, or how the groups are arranged in the room sure. If we knew which groups two individuals is irrelevant. What matters is that all the members were in, we could determine that one had more, of a given group share the same state of birth and less, or the same formal education as the other. that each group has a different shared state of birth. All we can say about two people in terms of In this example, it is irrelevant how close or far a nominal variable is that they are either the same apart the educational groups are from one another. or different. The college and high school groups might be 5 feet apart, and the less-than-high-school group 500 feet Ordinal Measures farther down the line. These actual distances don’t have any meaning. The high school group, how- Variables with attributes we can logically rank- ever, should be between the less-than-high-school order are ordinal measures. The different attributes group and the college group, or else the rank order of ordinal variables represent relatively more or less will be incorrect. of the variable. Variables of this type are social class, conservatism, alienation, prejudice, intellectual sophisti- Interval Measures cation, and the like. In addition to saying whether two people are the same or different in terms of For the attributes composing some variables, the an ordinal variable, you can also say one is “more” actual distance separating those attributes does have than the other—that is, more conservative, more meaning. Such variables are interval m­ easures. For religious, older, and so forth. these, the logical distance between attributes can be expressed in meaningful standard intervals. In the physical sciences, hardness is the most frequently cited example of an ordinal measure. For example, in the Fahrenheit temperature We may say that one material (for example, dia- scale, the difference, or distance, between 80 degrees mond) is harder than another (say, glass) if the former can scratch the latter and not vice versa. By nominal measure  A nominal variable has at- attempting to scratch various materials with other tributes that are merely different, as distinguished materials, we might eventually be able to arrange from ordinal, interval, or ratio measures. Sex is an several materials in a row, ranging from the soft- example of a nominal measure. All a nominal vari- est to the hardest. We could never say how hard able can tell us about two people is if they are the a given material was in absolute terms; we could same or different. only say how hard in relative terms—which mate- rials it is harder than and which softer than. ordinal measure  A level of measurement describ- ing a variable with attributes we can rank-order Let’s pursue the earlier example of group- along some dimension. An example is socioeconomic ing the people at a social gathering. This time status as composed of the attributes high, medium, imagine that we ask all the people who have low. graduated from college to stand in one group, all those with only a high school diploma to stand interval measure  A level of measurement describ- in another group, and all those who have not ing a variable whose attributes are rank-ordered and graduated from high school to stand in a third have equal distances between adjacent attributes. The Fahrenheit temperature scale is an example of this, because the distance between 17 and 18 is the same as that between 89 and 90.

182 ■ Chapter 6: From Concept to Measurement and 90 degrees is the same as that between mentioned previously, are based on a true zero 40 degrees and 50 degrees. However, 80 degrees point. The Kelvin temperature scale is one such Fahrenheit is not twice as hot as 40 degrees, measure. Examples from social science research in- ­because the zero point in the Fahrenheit scale clude age, length of residence in a given place, number of is arbitrary; zero degrees does not really mean organizations belonged to, number of times attending reli- lack of heat. Similarly, minus 30 degrees on this gious services during a particular period of time, number scale doesn’t represent 30 degrees less than no of times married, and number of Arab friends. heat. (This is true for the Celsius scale as well. In c­ ontrast, the Kelvin scale is based on an absolute Returning to the illustration of methodological zero, which does mean a complete lack of heat.) party games, we might ask a gathering of people to group themselves by age. All the one-year-olds About the only interval measures commonly would stand (or sit or lie) together, the two-year- used in social science research are constructed mea- olds together, the three-year-olds, and so forth. The sures such as standardized intelligence tests that fact that members of a single group share the same have been more or less accepted. The interval sepa- age and that each different group has a different rating IQ scores of 100 and 110 may be regarded shared age satisfies the minimum requirements for as the same as the interval separating scores of 110 a nominal measure. Arranging the several groups and 120 by virtue of the distribution of observed in a line from youngest to oldest meets the addi- scores obtained by many thousands of people who tional requirements of an ordinal measure and lets have taken the tests over the years. But it would be us determine if one person is older than, younger incorrect to infer that someone with an IQ of 150 than, or the same age as another. If we space the is 50 percent more intelligent than someone with groups equally far apart, we satisfy the additional an IQ of 100. (A person who received a score of 0 requirements of an interval measure and can say on a standard IQ test could not be regarded, strictly how much older one person is than another. Fi- speaking, as having no intelligence, although we nally, because one of the attributes included in age might feel he or she was unsuited to be a college represents a true zero (babies carried by women professor or even a college student. But perhaps a about to give birth), the phalanx of hapless party dean . . . ?) goers also meets the requirements of a ratio mea- sure, permitting us to say that one person is twice When comparing two people in terms of an as old as another. (Remember this in case you’re interval variable, we can say they are different from asked about it in a workbook assignment.) Another each other (nominal), and that one is more than example of a ratio measure is income, which extends the other (ordinal). In addition, we can say “how from an absolute zero to approximately infinity, if much” more. you happen to be the founder of Microsoft. Ratio Measures Comparing two people in terms of a ratio vari- able, then, allows us to conclude (1) whether they Most of the social science variables meeting the are different (or the same), (2) whether one is minimum requirements for interval measures more than the other, (3) how much they differ, and also meet the requirements for ratio measures. In (4) what the ratio of one to another is. Figure 6-1 ratio measures, the attributes composing a vari- summarizes this discussion by presenting a graphic able, besides having all the structural characteristics illustration of the four levels of measurement. ratio measure  A level of measurement describing Implications of Levels of Measurement a variable with attributes that have all the qualities of nominal, ordinal, and interval measures and in Because it’s unlikely that you’ll undertake the addition are based on a “true zero” point. Age is an physical grouping of people just described (try it example of a ratio measure. once, and you won’t be invited to many parties), I should draw your attention to some of the practical

Operationalization Choices ■ 183 Nominal Measure Example: Sex Figure 6-1 Levels of Measurement. Often you can choose among different levels of measurement—nominal, ordinal, interval, or ratio—carrying progressively more amounts of information. implications of the differences that have been dis- More precisely, you should anticipate drawing tinguished. These implications appear primarily in research conclusions appropriate to the levels of the analysis of data (discussed in Part 4), but you measurement used in your variables. For example, need to anticipate such implications when you’re you might reasonably plan to determine and report structuring any research project. the mean age of a population under study (add up all the individual ages and divide by the number Certain quantitative analysis techniques re- of people), but you should not plan to report the quire variables that meet certain minimum levels mean religious affiliation, because that is a nominal of measurement. To the extent that the variables to variable, and the mean requires ratio-level data. be examined in a research project are limited to a (You could report the modal—the most common— particular level of measurement—say, ordinal—you religious affiliation.) should plan your analytic techniques accordingly.

184 ■ Chapter 6: From Concept to Measurement At the same time, you can treat some variables measure to a higher-level one. That is a one-way as representing different levels of measurement. street worth remembering. Ratio measures are the highest level, descending through interval and ordinal to nominal, the low- The level of measurement is significant in est level of measurement. A variable representing terms of the arithmetic operations that can be ap- a higher level of measurement—say, ratio—can plied to a variable and the statistical techniques also be treated as representing a lower level of using those operations. The accompanying table m­ easurement—say, ordinal. Recall, for example, summarizes some of the implications, including that age is a ratio measure. If you wished to exam- ways of stating the comparison of two incomes. ine only the relationship between age and some ordinal-level variable—say, self-perceived religios- Level of How to Express the Fact That Jan ity: high, ­medium, and low—you might choose Measurement Arithmetic Earns $80,000 a Year and Andy to treat age as an ordinal-level variable as well. Operations Earns $40,000 You might characterize the subjects of your study Nominal as being young, middle-aged, and old, specifying Ordinal = ≠ Jan and Andy earn different amounts. what age range composed each of these groupings. Interval > <  Jan earns more than Andy. Finally, age might be used as a nominal-level vari- Ratio  + −  Jan earns $40,000 more than Andy. able for certain research purposes. People might ÷  × Jan earns twice as much as Andy. be grouped as being born during the Depression or not. Another nominal measurement, based on Typically a research project will tap variables birth date rather than just age, would be the group- at different levels of measurement. For example, ing of people by astrological signs. William Bielby and Denise Bielby (1999) set out to examine the world of film and television, using a The level of measurement you’ll seek, then, is nomothetic, longitudinal approach (take a moment determined by the analytic uses you’ve planned for to remind yourself what that means). In what they a given variable, keeping in mind that some vari- referred to as the “culture industry,” the authors ables are inherently limited to a certain level. If a found that reputation (an ordinal variable) is the variable is to be used in a variety of ways, requiring best predictor of screenwriters’ future productivity. different levels of measurement, the study should More interestingly, they found that screenwriters be designed to achieve the highest level required. who were represented by “core” (or elite) agencies For example, if the subjects in a study are asked were not only far more likely to find jobs (a nomi- their exact ages, they can later be organized into nal variable), but also jobs that paid more (a ratio ordinal or nominal groupings. variable). In other words, the researchers found that agencies’ reputations (ordinal) were a key in- Again, you need not necessarily measure dependent variable for predicting a screenwriter’s variables at their highest level of measurement. career success. The researchers also found that If you’re sure to have no need for ages of people being older (ratio), female (nominal), an ethnic at higher than the ordinal level of measurement, minority (nominal), and having more years of ex- you may simply ask people to indicate their age perience (ratio) were disadvantageous for a writer’s range, such as 20 to 29, 30 to 39, and so forth. In career. On the other hand, higher earnings from a study of the wealth of corporations, rather than previous years (measured in ordinal categories) led seek more precise information, you may use Dun to more success in the future. In Bielby and Biel- & Bradstreet ratings to rank corporations. When- by’s terms, “success breeds success” (1999: 80). ever your research purposes are not altogether clear, however, seek the highest level of measurement Single or Multiple Indicators possible. As we’ve discussed, although ratio mea- sures can later be reduced to ordinal ones, you With so many alternatives for operationalizing cannot convert an ordinal measure to a ratio one. social science variables, you may find yourself More generally, you cannot convert a lower-level

Operationalization Choices ■ 185 worrying about making the right choices. To coun- courses vary in number of credits, we adjust the ter this feeling, let me add a momentary dash of point values accordingly.) Creating such composite certainty and stability. measures in social research is often appropriate. Many social research variables have fairly obvi- Some Illustrations ous, straightforward measures. No matter how you of Operationalization Choices cut it, sex usually turns out to be a matter of male or female: a nominal-level variable that can be To bring together all the operationalization choices measured by a single observation—either by look- available to the social researcher and to show the ing (well, not always) or by asking a question (usu- potential in those possibilities, let’s look at some ally). In a study involving the size of families, you’ll of the distinct ways you might address various want to think about adopted and foster children, research problems. The alternative ways of op- as well as blended families, but it’s usually pretty erationalizing the variables in each case should easy to find out how many children a family has. demonstrate the opportunities that social research For most research purposes, the resident popula- can present to our ingenuity and imaginations. To tion of a country is the resident population of that simplify matters, I have not attempted to describe ­country—you can look it up in an almanac and all the research conditions that would make one know the answer. A great many variables, then, alternative superior to the others, though in a given have obvious single indicators. If you can get one situation they would not all be equally appropriate. piece of information, you have what you need. Here are specific research questions, then, and Sometimes, however, there is no single some of the ways you could address them. We’ll indicator that will give you the measure of a vari- begin with an example discussed earlier in the able you really want. As discussed earlier in this chapter. It has the added advantage that one of the chapter, many concepts are subject to varying variables is straightforward to operationalize. interpretations—each with several possible indica- tors. In these cases, you’ll want to make several 1. Are women more compassionate than men? observations for a given variable. You can then combine the several pieces of information you’ve a. Select a group of subjects for study, with collected, creating a composite measurement of equal numbers of men and women. Pres- the variable in question. Chapter 7 is devoted to ent them with hypothetical situations that ways of doing that, so here let’s just discuss one involve someone being in trouble. Ask simple illustration. them what they would do if they were confronted with that situation. What would Consider the concept “college performance.” they do, for example, if they came across a All of us have noticed that some students per- small child who was lost and crying for his form well in college courses and others don’t. In or her parents? Consider any answer that studying these differences, we might ask what involves helping or comforting the child as characteristics and experiences are related to high an indicator of compassion. See whether levels of performance (many researchers have men or women are more likely to indicate done just that). How should we measure overall they would be compassionate. performance? Each grade in any single course is a potential indicator of college performance, but it b. Set up an experiment in which you pay a also may not typify the student’s general perfor- small child to pretend that he or she is lost. mance. The solution to this problem is so firmly Put the child to work on a busy sidewalk established that it is, of course, obvious: the grade and observe whether men or women are point average (GPA). We assign numerical scores more likely to offer assistance. Also be sure to each letter grade, total the points earned by a to count the total number of men and given student, and divide by the number of courses women who walk by, because there may taken, thus obtaining a composite measure. (If the be more of one than the other. If that’s the

186 ■ Chapter 6: From Concept to Measurement case, simply calculate the percentage of what they consider the best state to live in. men and the percentage of women who Look up some recent results in the library or help. through your local newspaper. c. Select a sample of people and do a survey c. Compare suicide rates in the two states. in which you ask them what organizations they belong to. Calculate whether women 4. Who are the most popular instructors on your or men are more likely to belong to those campus, those in the social sciences, the natural that seem to reflect compassionate feelings. sciences, or the humanities? To account for the case in which one group belongs to more organizations than the a. If your school has a provision for student other does, do this: For each person you evaluation of instructors, review some study, calculate the percentage of his or her recent results and compute the average organizational memberships that reflect rating of each of the three groups. compassion. See if men or women have a higher average percentage. b. Begin visiting the introductory courses given in each group of disciplines and mea- 2. Are sociology students or accounting students sure the attendance rate of each class. better informed about world affairs? c. In December, select a group of faculty in a. Prepare a short quiz on world affairs and each of the three divisions and ask them arrange to administer it to the students to keep a record of the numbers of holiday in a sociology class and in an accounting greeting cards and presents they receive class at a comparable level. If you want to from admiring students. See who wins. compare sociology and accounting ­majors, be sure to ask students what they are The point of these examples is not ­necessarily m­ ajoring in. to suggest respectable research projects but to illustrate the many ways variables can be b. Get the instructor of a course in world operationalized. affairs to give you the average grades of sociology and accounting students in the The Research in Real Life feature, “Measuring course. College Satisfaction,” briefly overviews the preced- ing steps in terms of a concept mentioned at the c. Take a petition to sociology and accounting outset of this chapter. classes that urges that “the United Nations headquarters be moved to New York City.” Operationalization Goes Keep a count of how many in each class On and On sign the petition and how many inform you that the UN headquarters is already Although I’ve discussed conceptualization and located in New York City. operationalization as activities that precede data collection and analysis—for example, you must 3. Do people consider New York or California the design questionnaire items before you send out better place to live? a questionnaire—these two processes continue throughout any research project, even if the data a. Consulting the Statistical Abstract of the United have been collected in a structured mass survey. States or a similar publication, check the As we’ve seen, in less-structured methods such as migration rates into and out of each state. field research, the identification and specification of See if you can find the numbers moving di- relevant concepts is inseparable from the ongoing rectly from New York to California and vice process of observation. versa. Imagine, for example, that you’re doing a b. The national polling companies—­Gallup, qualitative, observational study of members of a Harris, Roper, and so forth—often ask people new religious cult, and, in part, you want to iden- tify those members who are more religious and

Criteria of Measurement Quality ■ 187 Research in Real Life How would you measure each of these dimensions? One method would be to ask a sample of students,“How would you rate your level of Measuring College Satisfaction satisfaction with each of the following?”and giving them a list of items similar to those listed here and providing a set of categories for them to Early in this chapter, we considered“college satisfaction”as an example use (such as very satisfied, satisfied, dissatisfied, very dissatisfied). of a concept people often talk about casually. To study such a concept, however, we need to engage in the processes of conceptualization and But suppose you didn’t have the time and/or money to conduct operationalization. I’ll sketch out the process briefly, then you might try a survey and were interested in comparing overall levels of satisfaction your hand at expanding on my comments. at several schools. What data about schools (the unit of analysis) might give you the answer you were interested in? Retention rates might be What are some of the dimensions of college satisfaction? Here are one general indicator. Can you think of others? a few to get you started, but feel free to add your own: Notice that you can measure college quality both positively and Academic quality: faculty, courses, majors negatively. Modern classrooms with WiFi access would count positively, Physical facilities: classrooms, dorms, cafeteria, grounds whereas the number of crimes on campus would count negatively. But Athletics and extracurricular activities the latter could be used as a measure of college quality: with low crime Costs and availability of financial aid rates counting as high quality. Sociability of students, faculty, staff Security, crime on campus those who are less religious. You may begin with and organize your notes for analysis, you may a focus on certain kinds of ritual behavior, only to again see unanticipated patterns and redirect your eventually discover that the members of the group analysis. place a higher premium on religious experience or steadfast beliefs. Regardless of whether you are using quali- tative or quantitative methods, you should al- The open-endedness of conceptualization and ways be open to reexamining your concepts and operationalization is perhaps more obvious in qual- definitions. The ultimate purpose of social research itative than in quantitative research, since changes is to clarify the nature of social life. The validity can be made at any point during data collection and utility of what you learn in this regard doesn’t and analysis. In quantitative methods such as sur- depend on when you first figured out how to look vey research or experiments, you will be required at things any more than it matters whether you got to commit yourself to particular measurement the idea from a learned textbook, a dream, or your structures. Once a questionnaire has been printed brother-in-law. and administered, for example, altering it would be impractical if not impossible, even when the un- Criteria of Measurement Quality folding of the research might suggest changes. Even in the case of a survey questionnaire, however, you This chapter has come some distance. It began with may have some flexibility in how you measure the bald assertion that social scientists can measure variables during the analysis phase, as we’ll see in anything that exists. Then we discovered that most the following chapter. of the things we might want to measure and study don’t really exist. Next we learned that it’s possible As I mentioned, however, the qualitative to measure them anyway. Now we’ll discuss some researcher has a greater flexibility in this regard. of the yardsticks against which we judge our rela- Things you notice during in-depth interviews, for tive success or failure in measuring things—even example, may suggest a different set of questions things that don’t exist. than you initially planned, allowing you to pursue unanticipated avenues. Then later, as you review

188 ■ Chapter 6: From Concept to Measurement Precision and Accuracy Reliability To begin, measurements can be made with varying In the abstract, reliability is a matter of whether degrees of precision. As we saw in the discussion of a particular technique, applied repeatedly to the operationalization, precision concerns the fineness same object, yields the same result each time. Let’s of distinctions made between the attributes that say you want to know how much I weigh. (No, I compose a variable. The description of a woman as don’t know why.) As one technique, say you ask “43 years old” is more precise than “in her forties.” two different people to estimate my weight. If the Saying a street-corner gang was formed “in the first person estimates 150 pounds and the other summer of 1996” is more precise than saying “dur- estimates 300, we have to conclude the technique ing the 1990s.” of having people estimate my weight isn’t very reliable. As a general rule, precise measurements are su- perior to imprecise ones, as common sense dictates. Suppose, as an alternative, that you use a bath- There are no conditions under which imprecise room scale as your measurement technique. I step measurements are intrinsically superior to precise on the scale twice, and you note the same result ones. Even so, exact precision is not always neces- each time. The scale has presumably reported the sary or desirable. If knowing that a woman is in same weight for me both times, indicating that her forties satisfies your research requirements, the scale provides a more reliable technique for then any additional effort invested in learning her measuring a person’s weight than asking people to precise age is wasted. The operationalization of estimate it does. concepts, then, must be guided partly by an under- standing of the degree of precision required. If your Reliability, however, does not ensure accuracy needs are not clear, be more precise rather than less. any more than precision does. Suppose I’ve set my bathroom scale to shave five pounds off my Don’t confuse precision or specificity with ac- weight just to make me feel better. Although you curacy, however. Describing someone as “born in would (reliably) report the same weight for me New England” is less specific than “born in Stowe, each time, you would always be wrong. This new Vermont”—but suppose the person in question was element, called bias, is discussed in Chapter 9. For actually born in Boston. The less-specific descrip- now, just be warned that reliability does not en- tion, in this instance, is more accurate, a better sure accuracy. reflection of the real world. Let’s suppose we’re interested in studying mo- Precision and accuracy are obviously impor- rale among factory workers in two different kinds tant qualities in research measurement, and they of factories. In one set of factories, workers have probably need no further explanation. When social specialized jobs, reflecting an extreme division of scientists construct and evaluate measurements, labor. Each worker contributes a tiny part to the however, they pay special attention to two techni- overall process performed on a long assembly line. cal considerations: reliability and validity. In the other set of factories, each worker performs many tasks, and small teams of workers complete reliability  That quality of measurement method the whole process. that suggests that the same data would have been collected each time in repeated observations of the How should we measure morale? Following same phenomenon. In the context of a survey, we one strategy, we could observe the workers in each would expect that the question “Did you attend factory, noticing such things as whether they joke religious services last week?” would have higher re- with one another, whether they smile and laugh liability than the question “About how many times a lot, and so forth. We could ask them how they have you attended religious services in your life?” like their work and even ask them whether they This is not to be confused with validity. think they would prefer their current arrangement or the other one being studied. By comparing what we observed in the different factories, we might

Criteria of Measurement Quality ■ 189 reach a conclusion about which assembly process coders will code the same editorial differently. Or produces the higher morale. Notice that I’ve just we might want to classify a few hundred specific described a qualitative measurement procedure. occupations in terms of some standard coding scheme, say a set of categories created by the De- Now let’s look at some reliability problems partment of Labor or by the Census Bureau. You inherent in this method. First, how you and I are and I would not place all those occupations in the feeling when we do the observing will likely color same categories. what we see. We may misinterpret what we see. We may see workers kidding each other but think Each of these examples illustrates problems of they’re having an argument. We may catch them reliability. Similar problems arise whenever we ask on an off day. If we were to observe the same people to give us information about themselves. group of workers several days in a row, we might Sometimes we ask questions that people don’t arrive at different evaluations on each day. Further, know the answers to: “How many times have you even if several observers evaluated the same be- been to religious services?” Sometimes we ask havior, they might arrive at different conclusions people about things they consider totally irrelevant: about the workers’ morale. “Are you satisfied with China’s current relationship with Albania?” In such cases, people will answer Here’s another strategy for assessing morale, differently at different times because they’re making a quantitative approach. Suppose we check the up answers as they go. Sometimes we explore issues company records to see how many grievances so complicated that a person who had a clear opin- have been filed with the union during some fixed ion in the matter might arrive at a different interpre- period. Presumably this would be an indicator of tation of the question when asked a second time. morale: the more grievances, the lower the morale. This measurement strategy would appear to be So how do you create reliable measures? If more reliable: Counting up the grievances over and your research design calls for asking people for over, we should keep arriving at the same number. information, you can be careful to ask only about things the respondents are likely to know the an- If you find yourself thinking that the number swer to. Ask about things relevant to them, and be of grievances doesn’t necessarily measure morale, clear in what you’re asking. Of course, these tech- you’re worrying about validity, not reliability. We’ll niques don’t solve every possible reliability prob- discuss validity in a moment. The point for now lem. Fortunately, social researchers have developed is that the last method is more like my bathroom several techniques for cross-checking the reliability scale—it gives consistent results. of the measures they devise. In social research, reliability problems crop up Test-Retest Method in many forms. Reliability is a concern every time a single observer is the source of data, because we Sometimes it’s appropriate to make the same mea- have no certain guard against the impact of that surement more than once, a technique called the observer’s subjectivity. We can’t tell for sure how test-retest method. If you don’t expect the sought-­ much of what’s reported originated in the situation after information to change, then you should ex- observed and how much in the observer. pect the same response both times. If answers vary, the measurement method may, to the extent of Subjectivity is not only a problem with single that variation, be unreliable. Here’s an illustration. observers, however. Survey researchers have known for a long time that different interviewers, In their research on Health Hazard Appraisal because of their own attitudes and demeanors, (HHA), a part of preventive medicine, Jeffrey get different answers from respondents. Or, if we Sacks, W. Mark Krushat, and Jeffrey Newman were to conduct a study of newspapers’ editorial (1980) wanted to determine the risks associated positions on some public issue, we might create a with various background and lifestyle factors, team of coders to take on the job of reading hun- making it possible for physicians to counsel their dreds of editorials and classifying them in terms of their position on the issue. Unfortunately, different

190 ■ Chapter 6: From Concept to Measurement patients appropriately. By knowing patients’ life social class. This procedure lays the groundwork situations, physicians could advise them on their for another check on reliability. Let’s say you’ve potential for survival and on how to improve it. created a questionnaire that contains ten items This purpose, of course, depended heavily on the you believe measure prejudice against women. accuracy of the information gathered about each Using the split-half technique, you would ran- subject in the study. domly assign those ten items to two sets of five. Each set should provide a good measure of preju- To test the reliability of their information, Sacks dice against women, and the two sets should clas- and his colleagues had all 207 subjects complete sify respondents the same way. If the two sets of a baseline questionnaire that asked about their items classify people differently, you most likely characteristics and behavior. Three months later, a have a problem of reliability in your measure of follow-up questionnaire asked the same subjects the variable. for the same information, and the results of the two surveys were compared. Overall, only 15 per- Using Established Measures cent of the subjects reported the same information in both studies. Another way to help ensure reliability in getting in- formation from people is to use measures that have Sacks and his colleagues report the following: proved their reliability in previous research. If you want to measure anomia, for example, you might Almost 10 percent of subjects reported a differ- want to follow Srole’s lead. ent height at follow-up examination. Parental age was changed by over one in three subjects. The heavy use of measures, though, does not One parent reportedly aged 20 chronologic guarantee their reliability. For example, the Scho- years in three months. One in five ex-smokers lastic Assessment Tests (SATs) and the Minnesota and ex-drinkers have apparent difficulty in Multiphasic Personality Inventory (MMPI) have reliably recalling their previous consumption been accepted as established standards in their pattern. respective domains for decades. In recent years, though, they’ve needed fundamental overhaul- (1980: 730) ing to reflect changes in society, eliminating out- dated topics and gender bias in wording. Some subjects erased all trace of previously reported heart murmur, diabetes, emphysema, Reliability of Research Workers arrest record, and thoughts of suicide. One subject’s mother, deceased in the first questionnaire, was As we’ve seen, it’s also possible for measurement apparently alive and well in time for the second. unreliability to be generated by research workers: One subject had one ovary missing in the first interviewers and coders, for example. There are study but present in the second. In another case, several ways to check on reliability in such cases. an ovary present in the first study was missing in To guard against interviewer unreliability in sur- the second study—and had been for ten years! veys, for example, a supervisor will call a subsam- One subject was reportedly 55 years old in the first ple of the respondents on the telephone and verify study and 50 years old three months later. (You selected pieces of information. have to wonder whether the physician-counselors could ever have nearly the impact on their patients Replication works in other situations also. If that their patients’ memories did.) Thus, test-retest you’re worried that newspaper editorials or oc- revealed that this data-collection method was not cupations may not be classified reliably, you could especially reliable. have each independently coded by several coders. Those cases that are classified inconsistently can Split-Half Method then be evaluated more carefully and resolved. As a general rule, it’s always good to make more Finally, clarity, specificity, training, and prac- than one measurement of any subtle or complex tice can prevent a great deal of unreliability and social concept, such as prejudice, alienation, or

Criteria of Measurement Quality ■ 191 grief. If you and I spent some time reaching a clear out how many books the workers took out of the agreement on how to evaluate editorial positions library during their off-duty hours, you’d undoubt- on an issue—discussing various positions and read- edly raise a more serious objection: That measure ing through several together—we could probably wouldn’t have much face validity. do a good job of classifying them in the same way independently. Second, I’ve already pointed to many of the more formally established agreements that define The reliability of measurements is a funda- some concepts. The Census Bureau, for example, mental issue in social research, and we’ll return has created operational definitions of such concepts to it more than once in the chapters ahead. For as family, household, and employment status that now, however, let’s recall that even total reliability seem to have a workable validity in most studies doesn’t ensure that our measures actually measure using these concepts. what we think they measure. Now let’s plunge into the question of validity. Three additional types of validity also specify particular ways of testing the validity of measures. Validity The first, criterion-related validity, sometimes called predictive validity, is based on some external In conventional usage, validity refers to the extent criterion. For example, the validity of College to which an empirical measure adequately reflects Board exams is shown in their ability to predict the real meaning of the concept under consider- students’ success in college. The validity of a writ- ation. A measure of social class should measure ten driver’s test is determined, in this sense, by the social class, not political orientations. A measure relationship between the scores people get on the of political orientations should measure political test and their subsequent driving records. In these orientations, not sexual permissiveness. Validity examples, college success and driving ability are the means that we are actually measuring what we say criteria. we are measuring. To test your understanding of criterion-related Whoops! I’ve already committed us to the validity, see whether you can think of behaviors view that concepts don’t have real meanings. How can we ever say whether a particular measure validity  A term describing a measure that accu- adequately reflects the concept’s meaning, then? rately reflects the concept it is intended to measure. ­Ultimately, of course, we can’t. At the same time, For example, your IQ would seem a more valid as we’ve already seen, all of social life, including measure of your intelligence than the number of social research, operates on agreements about the hours you spend in the library would. Though the terms we use and the concepts they represent. ultimate validity of a measure can never be proved, There are several criteria of success in making mea- we may agree to its relative validity on the basis surements that are appropriate to these agreed-on of face validity, criterion-related validity, construct meanings of concepts. validity, content validity, internal validation, and external validation (see Chapter 7). This must not be First, there’s something called face validity. confused with reliability. Particular empirical measures may or may not jibe with our common agreements and our individual face validity  That quality of an indicator that mental images concerning a particular concept. For makes it seem a reasonable measure of some vari- example, you and I might quarrel about whether able. That the frequency of attendance at religious counting the number of grievances filed with the services is some indication of a person’s religiosity union will adequately measure morale. Still, we’d seems to make sense without a lot of explanation. It surely agree that the number of grievances has has face validity. something to do with morale. That is, the measure is valid “on its face,” whether or not it’s adequate. If I criterion-related validity  The degree to which a were to suggest that we measure morale by finding measure relates to some external criterion. For ex- ample, the validity of College Board tests is shown in their ability to predict the college success of students. Also called predictive validity.

192 ■ Chapter 6: From Concept to Measurement that might be used to validate each of the following Tests of construct validity, then, can offer a attitudes: weight of evidence that your measure either does or doesn’t tap the quality you want it to measure, Is very religious without providing definitive proof. Although I have suggested that tests of construct validity are Supports equality of men and women less compelling than those of criterion validity, there is room for disagreement about which kind Supports far-right militia groups of test a particular comparison variable (driving record, marital fidelity) represents in a given situ- Is concerned about the environment ation. It’s less important to distinguish the two types of validity tests than to understand the logic Some possible validators would be, respectively, of validation that they have in common: If we’ve attends religious services, votes for women can- succeeded in measuring some variable, then our didates, belongs to the NRA, and belongs to the measures should relate in some logical way to Sierra Club. other measures. Sometimes it’s difficult to find behavioral Finally, content validity refers to how much criteria that can be taken to validate measures as a measure covers the range of meanings included directly as in such examples. In those instances, within a concept. For example, a test of math- however, we can often approximate such criteria ematical ability cannot be limited to addition but by applying a different test. We can consider how also needs to cover subtraction, multiplication, divi- the variable in question ought, theoretically, to sion, and so forth. Or, if we’re measuring prejudice, relate to other variables. Construct validity is do our measurements reflect all types of preju- based on the logical relationships among variables. dice, ­including prejudice against racial and ethnic groups, religious minorities, women, the elderly, Suppose, for example, that you want to study and so on? the sources and consequences of marital satisfac- tion. As part of your research, you develop a mea- Figure 6-2 presents a graphic portrayal of the sure of marital satisfaction, and you want to assess difference between validity and reliability. If you its validity. think of measurement as analogous to repeatedly shooting at the bull’s-eye on a target, you’ll see In addition to developing your measure, you’ll that reliability looks like a “tight pattern,” regard- have developed certain theoretical expectations less of where the shots hit, because reliability is a about the way the variable marital satisfaction re- function of consistency. Validity, on the other hand, lates to other variables. For example, you might is a function of shots being arranged around the reasonably conclude that satisfied husbands and bull’s-eye. The failure of reliability in the figure is wives will be less likely than dissatisfied ones to randomly distributed around the target; the failure cheat on their spouses. If your measure relates of validity is systematically off the mark. Notice to marital fidelity in the expected fashion, that that neither an unreliable nor an invalid measure is constitutes evidence of your measure’s construct likely to be very useful. validity. If satisfied marriage partners are as likely to cheat on their spouses as the dissatisfied ones are, however, that would challenge the validity of your measure. construct validity  The degree to which a measure Who Decides What’s Valid? relates to other variables as expected within a sys- tem of theoretical relationships. Our discussion of validity began with a reminder that we depend on agreements to determine what’s content validity  The degree to which a mea- real, and we’ve just seen some of the ways social sure covers the range of meanings included within scientists can agree among themselves that they a concept. have made valid measurements. There is yet an- other way of looking at validity.

Criteria of Measurement Quality ■ 193 Figure 6-2 An Analogy to Validity and Reliability. A good measurement technique should be both valid (measuring what it is intended to measure) and reliable (yielding a given measurement dependably). Social researchers sometimes criticize them- There are a number of ways to escape this— selves and one another for implicitly assuming turning culture into folklore and collect- they are somewhat superior to those they study. ing it, turning it into traits and counting it, For example, researchers often seek to uncover ­turning it into institutions and classifying it, motivations that the social actors themselves are turning it into structures and toying with unaware of. You think you bought that new Turbo it. But they are escapes. The fact is that to com- Tiger because of its high performance and good mit oneself to a semiotic concept of culture looks, but we know you’re really trying to achieve a and an interpretive approach to the study of it higher social status. is to commit oneself to a view of ethnographic assertion as, to borrow W. B. Gallie’s by now This implicit sense of superiority would fit f­amous phrase, “essentially contestable.” comfortably with a totally positivistic approach ­Anthropology, or at least interpretive anthro- (the biologist feels superior to the frog on the lab pology, is a science whose progress is marked table), but it clashes with the more humanistic less by a perfection of consensus than by a and typically qualitative approach taken by many refinement of debate. What gets better is the social scientists. We’ll explore this issue more precision with which we vex each other. deeply in Chapter 11. In seeking to understand the way ordinary people make sense of their (1973: 29) worlds, ethnomethodologists have urged all social scientists to pay more respect to the natural social Ultimately, social researchers should look both processes of conceptualization and shared mean- to their colleagues and to their subjects as sources ing. At the very least, behavior that may seem of agreement on the most useful meanings and irrational from the scientist’s paradigm may make measurements of the concepts they study. Some- logical sense when viewed through the actor’s paradigm. times one source will be more useful, somCetiemnesg a g e L e a r n i n g the other. But neither one should be dismBisasebdb. ie: The Practice of Clifford Geertz (1973) applies the term thick description in reference to the goal of understanding, Social Research, 13/e as deeply as possible, the meanings that elements of a culture have for those who live within that Tension between Reliability1-133-04979-6 Fig. 5-2 culture. He recognizes that the outside observer and Validity will never grasp those meanings fully, however, and warns, “Cultural analysis is intrinsically incom- Clearly, we want our measures to be both reliable plete.” He then elaborates: and valid. However, a tension often arises between the criteria of reliability and validity, forcing a trade-off between the two.

194 ■ Chapter 6: From Concept to Measurement Recall the example of measuring morale in dif- The Ethics of Measurement ferent factories. The strategy of immersing yourself in the day-to-day routine of the assembly line, ob- Measurement decisions can sometimes be judged serving what goes on, and talking to the workers by ethical standards. We have seen that most of would seem to provide a more valid measure of mo- the concepts of interest to social researchers are rale than counting grievances would. It just seems open to varied meanings. Suppose, for example, obvious that we’d get a clearer sense of whether the that you are interested in sampling public opinion morale was high or low using this first method. on the abortion issue in the United States. Notice the difference it would make if you conceptual- As I pointed out earlier, however, the count- ized one side of the debate as “pro-choice” or as ing strategy would be more reliable. This situation “pro-abortion.” If your personal bias made you reflects a more general strain in research mea- want to minimize support for having an abortion, surement. Most of the really interesting concepts you might be tempted to frame the concept and we want to study have many subtle nuances, so the measurements based on it in terms of people specifying precisely what we mean by them is hard. being “pro-abortion,” thereby eliminating all those Researchers sometimes speak of such concepts as who were not especially fond of abortion per se but having a “richness of meaning.” Although scores of felt a woman should have the right to make that books and articles have been written on the topic choice for herself. To pursue this strategy, however, of anomie/anomia, for example, they still haven’t would violate accepted research ethics. exhausted its meaning. Consider the choices available to you in con- Very often, then, specifying reliable operational ceptualizing attitudes toward the U.S. invasion of definitions and measurements seems to rob con- Iraq in 2003. Imagine the different levels of support cepts of their richness of meaning. Positive morale you would “discover” if you framed the position as is much more than a lack of grievances filed with an unprovoked invasion of a sovereign nation, as the union; anomia is much more than what is a retaliation for the September 11, 2001, attack on measured by the five items created by Leo Srole. the World Trade Towers (many Americans still be- Yet, the more variation and richness we allow for a lieve Saddam Hussein masterminded that attack), concept, the more opportunity there is for disagree- as a defensive act against a perceived threat, as part ment on how it applies to a particular situation, of a global war on terrorism, or in any of the other thus reducing reliability. ways this event has been portrayed. There is no one, correct way to conceptualize this issue, but To some extent, this dilemma explains the per- it would be unethical to seek to slant the results sistence of two quite different approaches to social through a biased definition of the issue. research: quantitative, nomothetic, structured tech- niques such as surveys and experiments on the one Main Points hand, and qualitative, idiographic methods such as field research and historical studies on the other. Introduction In the simplest generalization, the former methods tend to be more reliable, the latter more valid. • The interrelated processes of conceptualization, By being forewarned, you’ll be effectively operationalization, and measurement allow re- forearmed against this persistent and inevitable searchers to move from a general idea about what dilemma. If there is no clear agreement on how they want to study to effective and well-defined to measure a concept, measure it several different measurements in the real world. ways. If the concept has several dimensions, mea- sure them all. Above all, know that the concept does Measuring Anything That Exists not have any meaning other than what you and I give it. The only justification for giving any concept • Conceptions are mental images we use as sum- a particular meaning is utility. Measure concepts in ways that help us understand the world around us. mary devices for bringing together observations and experiences that seem to have something

Proposing Social Research: Measurement ■ 195 in common. We use terms or labels to reference • Whereas reliability means getting consistent re- these conceptions. sults from the same measure, validity refers to • Concepts are constructs; they represent the agreed- getting results that accurately reflect the concept being measured. on meanings we assign to terms. Our concepts don’t exist in the real world, so they can’t be mea- • Researchers can test or improve the reliability sured directly, but we can measure the things that our concepts summarize. of measures through the test-retest method, the split-half method, the use of established measures, Conceptualization and the examination of work performed by research workers. • Conceptualization is the process of specifying ob- • The yardsticks for assessing a measure’s validity servations and measurements that give concepts definite meaning for the purposes of a research include face validity, criterion-related validity, study. construct validity, and content validity. • Conceptualization includes specifying the indica- • Creating specific, reliable measures often seems tors of a concept and describing its dimensions. to diminish the richness of meaning our general Operational definitions specify how variables concepts have. This problem is inevitable. The best relevant to a concept will be measured. solution is to use several different measures, tap- ping the different aspects of a concept. Definitions in Descriptive and Explanatory Studies The Ethics of Measurement • Precise definitions are even more important in de- • Conceptualization and measurement must never scriptive than in explanatory studies. The degree be guided by bias or preferences for particular re- of precision needed varies with the type and pur- search outcomes. pose of a study. Key Terms Operationalization Choices The following terms are defined in context in the • Operationalization is an extension of conceptual- chapter and at the bottom of the page where the term is introduced, as well as in the comprehensive glossary ization that specifies the exact procedures that will at the back of the book. be used to measure the attributes of variables. conceptualization interval measure • Operationalization involves a series of interrelated construct validity nominal measure choices: specifying the range of variation that is appropriate for the purposes of a study, determin- content validity ordinal measure ing how precisely to measure variables, account- ing for relevant dimensions of variables, clearly criterion-related validity ratio measure defining the attributes of variables and their rela- tionships, and deciding on an appropriate level of dimension reliability measurement. face validity specification • Researchers must choose from four levels of mea- indicator validity surement, which capture increasing amounts of information: nominal, ordinal, interval, and ratio. Proposing Social Research: Measurement The most appropriate level depends on the pur- pose of the measurement. This chapter has taken us deeper into the matter of measurement. In previous exercises, you’ve identified • A given variable can sometimes be measured the concepts and variables you want to address in your research project. Now you’ll need to get more specific at different levels. When in doubt, researchers in terms of conceptualization and operationalization. should use the highest level of measurement ap- You should conclude this portion of the proposal with propriate to that variable so they can capture the a description of how, precisely, you will make distinc- greatest amount of information. tions regarding your variables. If you want to compare liberals and conservatives, for example, how exactly • Operationalization begins in the design phase of will you identify subjects’ political orientations? a study and continues through all phases of the research project, including the analysis of data. Criteria of Measurement Quality • Criteria of the quality of measures include preci- sion, accuracy, reliability, and validity.

196 ■ Chapter 6: From Concept to Measurement The ease or difficulty of this exercise may vary 4. In a good dictionary, look up truth and true, then with the type of data collection you’re planning. It will copy out the definitions. Note the key terms used probably be easier to accomplish in the case of quan- in those definitions (such as reality), look up the titative studies, such as surveys, where you can report definitions of those terms, and copy out these the questionnaire items you’ll use for measurements. definitions as well. Continue this process until In qualitative research, however, you’ll have more op- no new terms appear. Comment on what you’ve portunities to modify the ways variables are measured learned from this exercise. Did you discover as the study unfolds, taking advantage of insights “truth”? gained “in the trenches.” Even so, you’ll still need to begin with some clear ideas about how you’ll begin S P SS E x e r c i s e s your measurements. See the booklet that accompanies your text for ex- Criteria such as precision, accuracy, validity, and ercises using SPSS (Statistical Package for the Social reliability matter greatly in all kinds of social research Sciences). There are exercises offered for each chapter, projects. and you’ll also find a detailed primer on using SPSS. Review Questions and Exercises Online Study Resources 1. Pick a social science concept such as liberalism Access the resources your instructor has assigned. For or alienation, then specify that concept so that this book, you can access: it could be studied in a research project. Be sure to specify the indicators you’ll use as well as the C ourseMate for The dimensions you wish to include in and exclude Practice of Social Research from your conceptualization. Login to CengageBrain.com to access chapter-specific 2. What level of measurement—nominal, ordinal, learning tools including Learning Objectives, Practice interval, or ratio—describes each of the following Quizzes, Videos, Internet Exercises, Flash Cards, Glossaries, variables? Web Links, and more from your Sociology CourseMate. a. Race (white, African American, Asian, and so If your professor has assigned Aplia homework: on) 1. Sign into your account. 2. After you complete each page of questions, click b. Order of finish in a race (first, second, third, and so on) “Grade It Now” to see detailed explanations of every answer. c.  Number of children in families 3. Click “Try Another Version” for an opportunity to improve your score. d.  Populations of nations Visit www.cengagebrain.com to access your account and purchase materials. e. Attitudes toward nuclear energy (strongly approve, approve, disapprove, strongly disapprove) f. Region of birth (Northeast, Midwest, and so on) g. Political orientation (very liberal, somewhat lib- eral, somewhat conservative, very conservative) 3. To conceptualize the variable prejudice, use your favorite web browser to search for this term. After reviewing several of the websites resulting from your search, make a list of some different forms of prejudice that might be studied in an omnibus project dealing with that topic.

CHAPTER 7 Typologies, Indexes, and Scales chapter o v er v i e w Researchers often need to employ multiple indicators to measure a variable adequately and validly. Indexes, scales, and typologies are useful composite measures made up of several indicators of variables. Introduction Scale Construction Bogardus Social Distance Indexes versus Scales Scale Thurstone Scales Index Construction Likert Scaling Item Selection Semantic Differential Examination of Empirical Guttman Scaling Relationships Index Scoring Typologies Handling Missing Data Index Validation The Status of Women: An Illustration of Index Construction Aplia for The Practice of Social Research After reading, go to “Online Study Resources” at the end of this chapter for

198 ■ Chapter 7: Typologies, Indexes, and Scales Introduction looking at the figure the newspaper reports. In the case of complex concepts, however, researchers As we saw in Chapter 6, many social science con- can seldom develop single indicators before they cepts have complex and varied meanings. Making actually do the research. This is especially true with measurements that capture such concepts can be a regard to attitudes and orientations. Rarely can a challenge. Recall our discussion of content validity, survey researcher, for example, devise single ques- which concerns whether we have captured all the tionnaire items that adequately tap respondents’ different dimensions of a concept. degrees of prejudice, religiosity, political orientation, alienation, and the like. More likely, the researcher To achieve broad coverage of the various di- will devise several items, each of which provides mensions of a concept, we usually need to make some indication of the variables. Taken individually, multiple observations pertaining to that concept. each of these items is likely to prove invalid or un- Thus, for example, Bruce Berg (1989: 21) advises reliable for many respondents. A composite mea- in-depth interviewers to prepare essential ques- sure, however, can overcome this problem. tions, which are “geared toward eliciting specific, desired information.” In addition, the researcher Second, researchers may wish to employ a should prepare extra questions: “questions roughly rather refined ordinal measure of a particular equivalent to certain essential ones, but worded variable (alienation, say), arranging cases in several slightly differently.” ordinal categories from very low to very high, for example. A single data item might not have Multiple indicators are used with quantitative enough categories to provide the desired range of data as well. Suppose you’re designing a survey. variation. However, an index or scale formed from Although you can sometimes construct a single several items can provide the needed range. questionnaire item that captures the variable of interest—“Sex:    Male    Female” is a simple Finally, indexes and scales are efficient devices e­ xample—other variables are less straightforward for data analysis. If considering a single data item and may require you to use several questionnaire gives us only a rough indication of a given variable, items to measure them adequately. considering several data items can give us a more comprehensive and more accurate indication. For Quantitative data analysts have developed example, a single newspaper editorial may give us specific techniques for combining indicators into some indication of the political orientations of that a single measure. This chapter discusses the con- newspaper. Examining several editorials would struction of two types of composite measures of probably give us a better assessment, but the ma- variables—indexes and scales. Although these nipulation of several data items simultaneously measures can be used in any form of social re- could be very complicated. Indexes and scales (es- search, they are most common in survey research pecially scales) are efficient data-reduction devices: and other quantitative methods. A short section at They allow us to summarize several indicators in the end of this chapter considers typologies, which a single numerical score, while sometimes nearly are relevant to both qualitative and quantitative maintaining the specific details of all the individual research. indicators. Composite measures are frequently used in Indexes versus Scales quantitative research, for several reasons. First, social scientists often wish to study variables that The terms index and scale are typically used im- have no clear and unambiguous single indicators. precisely and interchangeably in social research Single indicators do suffice for some variables, such literature. The two types of measures do have some as age. We can determine a survey respondent’s age by simply asking, “How old are you?” Similarly, we can determine a newspaper’s circulation by merely

Indexes versus Scales ■ 199 characteristics in common, but in this book we’ll Some might agree with both, some might disagree distinguish between the two. However, you should with both. But suppose I told you someone agreed be warned of a growing tendency in the literature with one and disagreed with the other: Could you to use the term scale to refer to both indexes and guess which statement they agreed with and which scales, as they are distinguished here. they did not? I’d guess the person in question agreed that women were different but disagreed First, let’s consider what they have in com- that they should be prohibited from voting. On the mon. Both scales and indexes are ordinal measures other hand, I doubt that anyone would want to of variables. Both rank-order the units of analysis prohibit women from voting, while asserting that in terms of specific variables such as religiosity, there is no difference between men and women. alienation, socioeconomic status, prejudice, or That would make no sense. intellectual sophistication. A person’s score on ei- ther a scale or an index of religiosity, for example, Now consider this. The two responses we gives an indication of his or her relative religiosity wanted from each person would technically yield ­vis-à-vis other people. four response patterns: agree/agree, agree/disagree, disagree/agree, and disagree/disagree. We’ve just Further, both scales and indexes are compos- seen, however, that only three of the four patterns ite measures of variables—that is, measurements make any sense or are likely to occur. Where in- based on more than one data item. Thus, a survey dexes score people based on their responses, scales respondent’s score on an index or scale of religios- score people on the basis of response patterns: We ity is determined by the responses given to several determine what the logical response patterns are questionnaire items, each of which provides some and score people in terms of the pattern their re- indication of religiosity. Similarly, a person’s IQ sponses most closely resemble. score is based on answers to a large number of test questions. The political orientation of a newspaper Figure 7-1 provides a graphic illustration of the might be represented by an index or scale score difference between indexes and scales. Let’s assume reflecting the newspaper’s editorial policy on vari- we want to develop a measure of political activism, ous political issues. distinguishing those people who are very active in political affairs, those who don’t participate much Despite these shared characteristics, it’s useful at all, and those who are somewhere in between. to distinguish between indexes and scales. In this book, we’ll distinguish them by the way scores The first part of Figure 7-1 illustrates the logic are assigned in each. We construct an index sim- of indexes. The figure shows six different politi- ply by accumulating scores assigned to individual cal actions. Although you and I might disagree on a­ ttributes. We might measure prejudice, for exam- some specifics, I think we could agree that the six ple, by adding up the number of prejudiced state- actions represent roughly the same degree of politi- ments each respondent agreed with. We construct cal activism. a scale, however, by assigning scores to patterns of responses, recognizing that some items reflect a Using these six items, we could construct an relatively weak degree of the variable while others index of political activism by giving each person reflect something stronger. For example, agreeing 1 point for each of the actions he or she has taken. that “Women are different from men” is, at best, weak evidence of sexism compared with agree- index  A type of composite measure that summa- ing that “Women should not be allowed to vote.” rizes and rank-orders several specific observations A scale takes advantage of differences in intensity and represents some more-general dimension. among the attributes of the same variable to iden- tify distinct patterns of response. scale  A type of composite measure composed of several items that have a logical or empirical Let’s consider this simple example of sexism s­tructure among them. Examples of scales include a bit further. Imagine asking people to agree or Bogardus social distance, Guttman, Likert, and disagree with the two statements just presented. Thurstone scales.

200 ■ Chapter 7: Typologies, Indexes, and Scales FIGURE 7-1 Indexes versus Scales. Both indexes and scales seek to measure variables such as political activism. Whereas indexes count the number of indica- tors of the variable, scales take account of the differing intensities of those indicators. If you wrote to a public official and signed a peti- worked on a campaign probably also gave some tion, you’d get a total of 2 points. If I gave money money and voted. This suggests that most people to a candidate and persuaded someone to change will fall into only one of five idealized action pat- her or his vote, I’d get the same score as you. Using terns, represented by the illustrations at the bottom this approach, we’d conclude that you and I had of the figure. The discussion of scales, later in this the same degree of political activism, even though chapter, describes ways of identifying people with we had taken different actions. the type they most closely represent. The second part of Figure 7-1 describes the As you might surmise, scales are generally logic of scale construction. In this case, the actions superior to indexes, because scales take into con- clearly represent different degrees of political ac- sideration the intensity with which different items tivism, ranging from simply voting to running for reflect the variable being measured. Also, as the office. Moreover, it seems safe to assume a pattern of actions in this case. For example, all those who example in Figure 7-1 shows, scale scoresCcoenvnegya g e L e a r n i n g contributed money probably also voted. Those who more information than index scores do. ABgaaibnb, biee: The Practice of aware that the term scale is commonly misused to

Index Construction ■ 201 refer to measures that are only indexes. Merely Item Selection calling a measure a scale instead of an index doesn’t make it better. The first step in creating an index is selecting items for a composite index, which is created to measure There are two other misconceptions about scal- some variable. ing that you should know about. First, whether the combination of several data items results in a scale Face Validity almost always depends on the particular sample of observations under study. Certain items may form The first criterion for selecting items to be included a scale within one sample but not within another. in an index is face validity (or logical validity). If For this reason, do not assume that a given set of you want to measure political conservatism, for ex- items is a scale simply because it has turned out ample, each of your items should appear on its face that way in an earlier study. to indicate conservatism (or its opposite, liberalism). Political party affiliation would be one such item. Second, the use of specific scaling techniques— Another would be an item asking people to ap- such as Guttman scaling, to be discussed—does prove or disapprove of the views of a well-known not ensure the creation of a scale. Rather, such conservative public figure. In constructing an index techniques let us determine whether or not a set of of religiosity, you might consider items such as at- items constitutes a scale. tendance at religious services, acceptance of certain religious beliefs, and frequency of prayer; each of An examination of actual social science re- these appears to offer some indication of religiosity. search reports will show that researchers use in- dexes much more frequently than they do scales. Unidimensionality Ironically, however, the methodological literature contains little if any discussion of index construc- The methodological literature on conceptualization tion, whereas discussions of scale construction and measurement stresses the need for unidimen- abound. There appear to be two reasons for this sionality in scale and index construction. That is, disparity. First, indexes are more frequently used a composite measure should represent only one because scales are often difficult or impossible to dimension of a concept. Thus, items reflecting reli- construct from the data at hand. Second, methods gious fundamentalism should not be included in a of index construction seem so obvious and straight- measure of political conservatism, even though the forward that they aren’t discussed much. two variables might be empirically related to each other. Constructing indexes is not a simple undertak- ing, however. The general failure to develop index- General or Specific construction techniques has resulted in many bad indexes in social research. With this in mind, I’ve Although measures should tap the same dimen- devoted over half of this chapter to the methods of sion, the general dimension you’re attempting to index construction. With a solid understanding of measure may have many nuances. In the example the logic of this activity, you’ll be better equipped to of religiosity, the indicators mentioned previously— try constructing both indexes and scales. ritual participation, belief, and so on—represent different types of religiosity. If you want to focus on Index Construction ritual participation in religion, you should choose items specifically indicating this type of religiosity: Let’s look now at four main steps in the construc- attendance at religious services and other rituals tion of an index: selecting possible items, examin- such as confession, bar mitzvah, bowing toward ing their empirical relationships, scoring the index, Mecca, and the like. If you want to measure reli- and validating it. We’ll conclude this discussion giosity in a more general way, you should include by examining the construction of an index that a balanced set of items, representing each of the provided interesting findings about the status of women in different countries.

202 ■ Chapter 7: Typologies, Indexes, and Scales different types of religiosity. Ultimately, the na- Bivariate Relationships ture of the items you include will determine how specifically or generally the variable is measured. A bivariate relationship is, simply put, a relationship between two variables. Suppose we want to mea- Variance sure respondents’ support for U.S. participation in the United Nations. One indicator of different levels In selecting items for an index, you must also of support might be the question “Do you feel the be concerned with the amount of variance they U.S. financial support of the UN is    Too high    provide. If an item is intended to indicate political conservatism, for example, you should note what  About right    Too low?” proportion of respondents would be identified as A second indicator of support for the conservatives by that item. If a given item identi­ fied no one as a conservative or everyone as a United Nations might be the question “Should ­conservative—for example, if nobody indicated the United States contribute military personnel to approval of a radical-right political figure—that UN peacekeeping actions?    Strongly approve item would not be very useful in the construction of an index.  Mostly approve    Mostly disapprove Strongly disapprove.” To guarantee variance, you have two options. First, you may select several items the responses to Both of these questions, on their face, seem which divide people about equally in terms of the to reflect different degrees of support for the variable, for example, about half conservative and United Nations. Nonetheless, some people might half liberal. Although no single response would jus- feel the United States should give more money but tify the characterization of a person as very conser- not provide troops. Others might favor sending vative, a person who responded as a conservative troops but cutting back on financial support. on all items might be so characterized. If the two items both reflect degrees of the The second option is to select items differing in same thing, however, we should expect re- variance. One item might identify about half of the sponses to the two items to correspond with subjects as conservative, while another might iden- each other. Specifically, those who approve of tify few of the respondents as conservative. Note military support should be more likely to favor that this second option is necessary for scaling, and financial support than those who disapprove of it is reasonable for index construction as well. military support would. Conversely, those who favor financial support should be more likely to Examination of Empirical favor military support than those disapproving Relationships of financial support would. If these expectations are met, we say there is a bivariate relationship The second step in index construction is to exam- ­between the two items. ine the empirical relationships among the items being considered for inclusion. (See Chapter 14 Here’s another example. Suppose we want to for more.) An empirical relationship is established determine the degree to which respondents feel when respondents’ answers to one question—in a women have the right to an abortion. We might questionnaire, for example—help us predict how ask (1) “Do you feel a woman should have the they’ll answer other questions. If two items are right to an abortion when her pregnancy was the empirically related to each other, we can reason- result of rape?” and (2) “Do you feel a woman ably argue that each reflects the same variable, should have the right to an abortion if continu- and we may include them both in the same index. ing her pregnancy would seriously threaten her There are two types of possible relationships among life?” items: bivariate and multivariate. Granted, some respondents might agree with item (1) and disagree with item (2); others will do just the reverse. However, if both items tap into some general opinion people have about the issue of abortion, then the responses to these two items

Index Construction ■ 203 Tips and Tools “Cause” and “Effect” Indicators nonwhite or female increases the likelihood of experiencing discrimina- tion, so both are good indicators of the variable. But we would not Kenneth Bollen expect the race and sex of individuals to be strongly associated. Department of Sociology, University of North Carolina, Chapel Hill Or, we may measure social interaction with three indicators: time While it often makes sense to expect indicators of the same variable to spent with friends, time spent with family, and time spent with coworkers. be positively related to one another, as discussed in the text, this is not Though each indicator is valid, they need not be positively correlated.Time always the case. spent with friends, for instance, may be inversely related to time spent with family. Here, the three indicators“cause”the degree of social interaction. Indicators should be related to one another if they are essentially “effects”of a variable. For example, to measure self-esteem, we might As a final example, exposure to stress may be measured by ask a person to indicate whether he or she agrees or disagrees with the whether a person recently experienced divorce, death of a spouse, or loss statements (1)“I am a good person”and (2)“I am happy with myself.”A of a job. Though any of these events may indicate stress, they need not person with high self-esteem should agree with both statements while correlate with one another. one with low self-esteem would probably disagree with both. Since each indicator depends on or“reflects”self-esteem, we expect them to be In short, we expect an association between indicators that depend positively correlated. More generally, indicators that depend on the same on or“reflect”a variable, that is, if they are the“effects”of the variable. variable should be associated with one another if they are valid measures. But if the variable depends on the indicators—if the indicators are the “causes”—those indicators may be either positively or negatively corre- But, this is not the case when the indicators are the“cause”rather lated, or even unrelated. Therefore, we should decide whether indicators than the“effect”of a variable. In this situation the indicators may cor- are causes or effects of a variable before using their intercorrelations to relate positively, negatively, or not at all. For example, we could use sex assess their validity. and race as indicators of the variable exposure to discrimination. Being should be related to each other. Those who support At the same time, a very strong relation- the right to an abortion in the case of rape should ship between two items presents a different be more likely to support it if the woman’s life is problem. If two items are perfectly related to threatened than those who disapproved of abortion each other, then only one needs to be included in the case of rape would. This would be another in the index; because it completely conveys the example of a bivariate relationship. i­ndications provided by the other, nothing more would be added by including the other item. (This You should examine all the possible bivari- problem will become even clearer in the next ate relationships among the several items being section.) considered for inclusion in an index, in order to determine the relative strengths of relationships Here’s an example to illustrate the testing of among the several pairs of items. Percentage tables, bivariate relationships in index construction. I correlation coefficients (see Chapter 16), or both once conducted a survey of medical school faculty may be used for this purpose. How we evaluate members to find out about the consequences of the strength of the relationships, however, can be a “scientific perspective” on the quality of patient rather subtle. The Tips and Tools feature “‘Cause’ care provided by physicians. The primary intent and ‘Effect’ Indicators” examines some of these was to determine whether scientifically inclined subtleties. doctors treated patients more impersonally than other doctors did. Be wary of items that are not related to one another empirically: It’s unlikely that they measure The survey questionnaire offered several pos- the same variable. You should probably drop any sible indicators of respondents’ scientific perspec- item that is not related to several other items. tives. Of those, three items appeared to provide

204 ■ Chapter 7: Typologies, Indexes, and Scales especially clear indicati0ons of whether the doctors total patient management. In response to the third were scientifically oriented: item—reading preferences—about 80 percent chose the scientific answer. 1. As a medical school faculty member, in what capacity do you feel you can make These three questionnaire items can’t tell us your greatest teaching contribution: as a prac- how many “scientists” there are in the sample, for ticing physician or as a medical researcher? none of them is related to a set of criteria for what constitutes being a scientist in any absolute sense. 2. As you continue to advance your own Using the items for this purpose would present us medical knowledge, would you say your with the problem of three quite different estimates ultimate medical interests lie primarily in of how many scientists there were in the sample. the direction of total patient management or the understanding of basic mechanisms? However, these items do provide us with [The purpose of this item was to distinguish three independent indicators of respondents’ those who were mostly interested in over- relative inclinations toward science in medicine. all patient care from those mostly inter- Each item separates respondents into the more ested in biological processes.] scientific and the less scientific. But each grouping of more or less scientific respondents will have a 3. In the field of therapeutic research, are somewhat different membership from the oth- you generally more interested in articles ers. Respondents who seem scientific in terms of reporting evaluations of the effectiveness of one item will not seem scientific in terms of an- various treatments or articles exploring the other. ­Nevertheless, to the extent that each item basic rationale underlying the treatments? measures the same general dimension, we should [Similarly, I wanted to distinguish those find some correspondence among the several more interested in articles dealing with groupings. Respondents who appear scientific in patient care from those more interested in terms of one item should be more likely to appear biological processes.] scientific in their response to another item than would those who appeared nonscientific in their (Babbie 1970: 27–31) response to the first. In other words, we should find an association or correlation between the re- For each of these items, we might conclude that sponses given to two items. those respondents who chose the second answer are more scientifically oriented than respondents who Figure 7-2 shows the associations among chose the first answer. Though this comparative con- the responses to the three items. Three bivariate clusion is reasonable, we should not be misled into tables are presented, showing the distribution of thinking that respondents who chose the second responses for each possible pairing of items. An answer to a given item are scientists in any absolute examination of the three bivariate relationships sense. They are simply more scientifically oriented presented in the figure supports the suggestion than those who chose the first answer to the item. that the three items all measure the same variable: scientific orientation. To see why this is so, let’s begin To see this point more clearly, let’s examine by looking at the first bivariate relationship in the the distribution of responses to each item. From table. The table shows that faculty who responded the first item—greatest teaching contribution— that “researcher” was the role in which they could only about one-third of the respondents appeared make their greatest teaching contribution were scientifically oriented. That is, approximately one- more likely to identify their ultimate medical inter- third said they could make their greatest teaching ests as “basic mechanisms” (87 percent) than were contribution as medical researchers. In response those who answered “physician” (51 percent). The to the second item—ultimate medical interests—­ fact that the “physicians” are about evenly split approximately two-thirds chose the scientific in their ultimate medical interests is irrelevant for answer, saying they were more interested in learn- our purposes. It is only relevant that they are less ing about basic mechanisms than learning about

Index Construction ■ 205 FIGURE 7-2 the relationship between reading preferences and greatest teaching contribution as a 21 percentage Bivariate Relationships among Scientific Orientation Items. If several point difference. In summary, then, each single indicators are measures of the same variable, then they should be item produces a different grouping of “scientific” empirically correlated with one another, as you can observe in this and “nonscientific” respondents. However, the re- case. Those who choose the scientific orientation on one item are sponses given to each of the items correspond, to a more likely to choose the scientific orientation on other items. greater or lesser degree, to the responses given to each of the other items. scientific in their medical interests than the “re- searchers.” The strength of this relationship may be Initially, the three items were selected on the summarized as a 36 percentage point difference. basis of face validity—each appeared to give some indication of faculty members’ orientations to The same general conclusion applies to the science. By examining the bivariate relationship other bivariate relationships. The strength of the between the pairs of items, we have found support relationship between reading preferences and ul- for the expectation that they all measure basically timate medical interests may be summarized as a the same thing. However, that support does not 38 percentage point difference, and the strength of sufficiently justify including the items in a compos- ite index. Before combining them in a single index, we need to examine the multivariate relationships among the several variables. Multivariate Relationships among Items Figure 7-3 categorizes the sample respondents into four groups according to (1) their greatest teaching contribution and (2) their reading preferences. The numbers in parentheses indicate the number of respondents in each group. Thus, 66 of the faculty members who said they could best teach as physi- cians also said they preferred articles dealing with the effectiveness of treatments. For each of the four groups, the figure presents the percentage of those who say they are ultimately more interested in basic mechanisms. So, for example, of the 66 fac- ulty mentioned, 27 percent are primarily interested in basic mechanisms. The arrangement of the four groups is based on a previously drawn conclusion regarding scientific orientations. The group in the upper left corner of the table is presumably the least scientifically ori- ented, based on greatest teaching contribution and reading preferences. The group in the lower right corner is presumably the most scientifically ori- ented in terms of those items. Recall that expressing a primary interest in basic mechanisms was also taken as an indication of scientific orientation. As we should expect, then, those in the lower right corner are the most likely to give this response (89 percent), and those in the

206 ■ Chapter 7: Typologies, Indexes, and Scales FIGURE 7-3 FIGURE 7-4 Ceng Babbie Trivariate Relationships among Scientific Orientation Items. Indicators Hypothetical Trivariate Relationship among Scientific Orientation of the same variable should be correlated in a multivariate analysis Items. This hypothetical relationship suggests that not all three indica- Social as well as in bivariate analyses. Those who choose the scientific re- tors would contribute effectively to a composite index. sponses on greatest teaching contribution and reading preferences are 1-133-04 the most likely to choose the scientific response on the third item. point difference. Looking only at the “physicians” in Figure 7-3, we see that the relationship between the upper left corner are the least likely (27 percent). other two items is now 31 percentage points. The The respondents who gave mixed responses in same relationship is found among the “researchers” terms of teaching contributions and reading prefer- in the second column. ences have an intermediate rank in their concern for basic mechanisms (58 percent in both cases). The importance of these observations becomes clearer when we consider what might have hap- This table tells us many things. First, we may pened. In Figure 7-4, hypothetical data tell a much note that the original relationships between pairs different story than the actual data in Figure 7-3 of items are not significantly affected by the pres- do. As you can see, Figure 7-4 shows that the orig­ ence of a third item. Recall, for example, that inal relationship between teaching role and ulti- the relationship between teaching contribution mate medical interest persists, even when reading and ultimate medical interest was summarized preferences are introduced into the picture. In each as a 36 percentage point difference. Looking at row of the table, the “researchers” are more likely ­Figure 7-3, we see that among only those respon- to express an interest in basic mechanisms than the dents who are most interested in articles dealing “physicians” are. Looking down the columns, how- with the effectiveness of treatments, the relation- ever, we note that there is no relationship between ship between teaching contribution and ultimate reading preferences and ultimate medical interest. medical interest is 31 percentage points (58 per- If we know whether a respondent feels he or she cent minus 27 percent: first row). The same is true can best teach as a physician or as a researcher, among those most interested in articles dealing with the rationale for treatments (89 percent onkrnoiotehnwitniangtgiototnh.oeIuBfrCresaoesepvbmoanbelnugitdehaea:itnnioTggtnh’seleiorkefLePahFdreiaisignacougtrrircnpehreie7enr-foe4sgfrcerieensncuteilfitaecddds minus 58 percent: second row). The original rela- tionship between teaching contribution and ulti- from the actuaSl doactiaa,lwReewseoaurldchco, 1n3cl/uede that read- mate medical interest is essentially the same as in ing preferenc1e-s1h3o3u-0ld49n7o9t-6be inclFuidge. d6-i3n the same Figure 7-2, even among those respondents judged index as teaching role, because it contributed noth- as scientific or nonscientific in terms of reading ing to the composite index. preferences. This example used only three questionnaire We can draw the same conclusion from the items. If more were being considered, then more- columns in Figure 7-3. Recall that the original rela- complex multivariate tables would be in order, tionship between reading preferences and ultimate constructed of four, five, or more variables. The medical interest was summarized as a 38 percentage purpose of this step in index construction, again,

Index Construction ■ 207 is to discover the simultaneous interaction of the Of course, this decision must be related to the items in order to determine which should be in- earlier issue regarding the balance of items cho- cluded in the same index. These kinds of data anal- sen. If the index is to represent the composite of yses are easily accomplished using programs such slightly different aspects of a given variable, then as SPSS and MicroCase. They are usually referred you should give each aspect the same weight. In to as cross-tabulations. some instances, however, you may feel that two items reflect essentially the same aspect, and the Index Scoring third reflects a different aspect. If you want to have both aspects equally represented by the index, you When you’ve chosen the best items for your index, might give the different item a weight equal to the you next assign scores for particular responses, combination of the two similar ones. For instance, thereby creating a single composite measure out of you could assign a maximum score of 2 to the dif- the several items. There are two basic decisions to ferent item and a maximum score of 1 to each of be made in this step. the similar ones. First, you must decide the desirable range of Although the rationale for scoring responses the index scores. A primary advantage of an index should take such concerns as these into account, over a single item is the range of gradations it offers typically researchers experiment with different in the measurement of a variable. As noted earlier, scoring methods, examining the relative weights political conservatism might be measured from given to different aspects but at the same time “very conservative” to “not at all conservative” worrying about the range and distribution of cases or “very liberal.” How far to the extremes, then, provided. Ultimately, the scoring method chosen should the index extend? will represent a compromise among these several demands. Of course, as in most research activities, In this decision, the question of variance enters such a decision is open to revision on the basis of once more. Almost always, as the possible ex- later examinations. Validation of the index, to be tremes of an index are extended, fewer cases are to discussed shortly, may lead the researcher to re- be found at each end. The researcher who wishes cycle his or her efforts by constructing a completely to measure political conservatism to its greatest different index. extreme (somewhere to the right of Attila the Hun, as the saying goes) may find there is almost no one In the example taken from the medical school in that category. At some point, additional grada- faculty survey, I decided to weight the items tions do not add meaning to the results. equally, since I’d chosen them, in part, because they represent slightly different aspects of the over- The first decision, then, concerns the con- all variable scientific orientation. On each of the items, flicting desire for (1) a range of measurement in the respondents were given a score of 1 for choos- the index and (2) an adequate number of cases ing the “scientific” response to the item and a score at each point in the index. You’ll be forced to of 0 for choosing the “nonscientific” response. Each reach some kind of compromise between these respondent, then, could receive a score of 0, 1, 2, or conflicting desires. 3. This scoring method provided what I considered a useful range of variation—four index categories— The second decision concerns the actual as- and also provided enough cases for analysis in each signment of scores for each particular response. category. Basically you must decide whether to give items in the index equal weight or different weights. Here’s a similar example of index scoring, from Although there are no firm rules, I suggest—and a study of work satisfaction. One of the key vari- practice tends to support this method—that items ables was job-related depression, measured by an index be weighted equally unless there are compelling composed of the following four items, which asked reasons for differential weighting. That is, the bur- workers how they felt when thinking about them- den of proof should be on differential weighting; selves and their jobs: equal weighting should be the norm.

208 ■ Chapter 7: Typologies, Indexes, and Scales • “I feel downhearted and blue.” Handling Missing Data • “I get tired for no reason.” • “I find myself restless and can’t keep still.” Regardless of your data-collection method, you’ll • “I am more irritable than usual.” frequently face the problem of missing data. In a content analysis of the political orientations of The researchers, Amy Wharton and blogs, for example, you may discover that a par- James Baron, report, “Each of these items was ticular blog has never taken an editorial position on coded: 4 = often, 3 = sometimes, 2 = rarely, 1 = one of the issues being studied. In an experimental never.” They go on to explain how they measured design involving several retests of subjects over another variable, job-related self-esteem: time, some subjects may be unable to participate in some of the sessions. In virtually every survey, Job-related self-esteem was based on four items some respondents fail to answer some questions asking respondents how they saw themselves (or choose a “don’t know” response). Although in their work: happy/sad; successful/not suc- missing data present problems at all stages of analy- cessful; important/not important; doing their sis, they’re especially troublesome in index con- best/not doing their best. Each item ranged struction. There are, however, several methods of from 1 to 7, where 1 indicates a self-perception dealing with these problems. of not being happy, successful, important, or doing one’s best. First, if there are relatively few cases with missing data, you may decide to exclude them (1987: 578) from the construction of the index and the analy- sis. (I did this in the medical school faculty ex- As you look through the social research lit- ample.) The primary concerns in this instance are erature, you’ll find numerous similar examples whether the numbers available for analysis will of cumulative indexes being used to measure remain sufficient and whether the exclusion will variables. result in an unrepresentative sample whenever the index, excluding some of the respondents, is Although it is often appropriate to examine the used in the analysis. The latter possibility can be relationships among indicators of a variable being examined through a comparison—on other rel- measured by an index or scale, you should realize evant variables—of those who would be included that the indicators are sometimes independent of in and excluded from the index. one another. For example, Stacy De Coster notes that the indicators of family stress may be indepen- Second, you may sometimes have grounds dent of one another, though they contribute to the for treating missing data as one of the available same variable. responses. For example, if a questionnaire has asked respondents to indicate their participation Family Stress is a scale of stressful events within in various activities by checking “yes” or “no” for the family. The experience of any one of these each, many respondents may have checked some events—parent job loss, parent separation, par- of the activities “yes” and left the remainder blank. ent illness—is independent of the other events. In such a case, you might decide that a failure to Indeed, prior research on events utilized in answer meant “no,” and score missing data in this stress scales has demonstrated that the events case as though the respondents had checked the in these scales typically are independent of one “no” space. another and reliabilities on the scales low. Third, a careful analysis of missing data may (2005: 176) yield an interpretation of their meaning. In con- structing a measure of political conservatism, for If the indicators of a variable are logically related to example, you may discover that respondents who one another, on the other hand, it is important to failed to answer a given question were generally use that relationship as a criterion for determining as conservative on other items as those who gave which are the better indicators.

Index Construction ■ 209 the conservative answer were. In another example, The Research in Real Life feature, “How Healthy a recent study measuring religious beliefs found Is Your State,” illustrates one use of indexing that that people who answered “don’t know” about a you might find interesting. In addition to the rank given belief were almost identical to the “disbeliev- listing, be sure to examine the health measures in- ers” in their answers about other beliefs. (Note: You cluded in the index. should take these examples not as empirical guides in your own studies but only as suggestions of gen- Index Validation eral ways to analyze your own data.) Whenever the analysis of missing data yields such interpreta- Up to this point, we’ve discussed all the steps in tions, then, you may decide to score such cases the selection and scoring of items that result in accordingly. an index purporting to measure some variable. If each of the preceding steps is carried out carefully, There are many other ways of handling the the likelihood of the index actually measuring problem of missing data. If an item has several pos- the variable is enhanced. To demonstrate success, sible values, you might assign the middle value to however, we must show that the index is valid. cases with missing data; for example, you could as- Following the basic logic of validation, we assume sign a 2 if the values are 0, 1, 2, 3, and 4. For a con- that the index provides a measure of some variable; tinuous variable such as age, you could similarly that is, the scores on the index arrange cases in a assign the mean to cases with missing data (more rank order in terms of that variable. An index of on this in Chapter 14). Or, missing data can be sup- political conservatism rank-orders people in terms plied by assigning values at random. All of these of their relative conservatism. If the index does are conservative solutions because they weaken that successfully, then people scored as relatively the “purity” of your index and reduce the likeli- conservative on the index should appear relatively hood that it will relate to other variables in ways conservative in all other indications of political ori- you may have hypothesized. entation, such as their responses to other question- naire items. There are several methods of validating If you’re creating an index out of a large num­ an index. ber of items, you can sometimes handle missing data by using proportions based on what is ob- Item Analysis served. Suppose your index is composed of six indicators, and you only have four observations The first step in index validation is an internal for a particular subject. If the subject has earned validation called item analysis. In item analysis, 4 points out of a possible 4, you might assign an you examine the extent to which the index is index score of 6; if the subject has 2 points (half related to (or predicts responses to) the individual the possible score on four items), you could as- items it comprises. Here’s an illustration of this sign a score of 3 (half the possible score on six step. observations). In the index of scientific orientations among The choice of a particular method to be used medical school faculty, index scores ranged from 0 depends so much on the research situation that I (most interested in patient care) to 3 (most inter- can’t reasonably suggest a single “best” method or ested in research). Now let’s consider one of the rank the several I’ve described. Excluding all cases items in the index: whether respondents wanted with missing data can bias the representativeness to advance their own knowledge more with regard of the findings, but including such cases by assign- ing scores to missing data can influence the nature item analysis  An assessment of whether each of of the findings. The safest and best method is to the items included in a composite measure makes construct the index using more than one of these an independent contribution or merely duplicates methods and see whether you reach the same con- the contribution of other items in the measure. clusions using each of the indexes. Understanding your data is the final goal of analysis anyway.

210 ■ Chapter 7: Typologies, Indexes, and Scales Research in Real Life How Healthy Is Your State? nation as a whole. The scores are shown as standard deviations from the national average. While you may not have studied this statistical Since 1990, United Health Foundation, the American Public Health technique, you can still tell whether your state is above or below the A­ ssociation, and Partnership for Prevention have collaborated on an national average. The healthiest state in 2010 was Vermont; Mississippi annual evaluation of the health status of each of the 50 states. The fol- was the least healthy. lowing table displays the findings for overall rankings from the 2010 report. The scores indicate where each state stands in comparison to the You may be interested in seeing how your state ranks. 2010 Overall Rankings Rank Order Rank State Score* Rank State Score* 1 Vermont 1.131 26 California 0.230 2 Massachusetts 0.906 27 Pennsylvania 0.046 3 New Hampshire 0.892 28 Alaska 0.033 4 Connecticut 0.873 29 Illinois 0.031 5 Hawaii 0.852 30 Michigan 0.024 6 Minnesota 0.844 31 Arizona 0.009 7 Utah 0.825 32 Delaware Ϫ0.032 8 Maine 0.627 33 New Mexico Ϫ0.056 9 Idaho 0.569 34 Ohio Ϫ0.070 10 Rhode Island 0.553 35 North Carolina Ϫ0.181 11 Nebraska 0.550 36 Georgia Ϫ0.207 11 Washington 0.550 37 Florida Ϫ0.210 13 Colorado 0.545 38 Indiana Ϫ0.322 14 Iowa 0.524 39 Missouri Ϫ0.325 15 Oregon 0.516 40 Texas Ϫ0.364 16 North Dakota 0.511 41 South Carolina Ϫ0.397 17 New Jersey 0.487 42 Tennessee Ϫ0.423 18 Wisconsin 0.468 43 West Virginia Ϫ0.449 19 Wyoming 0.419 44 Kentucky Ϫ0.456 20 South Dakota 0.324 45 Alabama Ϫ0.519 21 Maryland 0.274 46 Oklahoma Ϫ0.521 22 Virginia 0.266 47 Nevada Ϫ0.533 23 Kansas 0.258 48 Arkansas Ϫ0.605 24 New York 0.250 49 Louisiana Ϫ0.664 25 Montana 0.243 50 Mississippi Ϫ0.768 *Scores presented in this table indicate the weighted number of standard deviations a state is above or below the national norm.

Index Construction ■ 211 Since you are, by now, a critical consumer of social research, I indicators encompass a number of categories. Some represent positive can hear you demanding,“Wait a minute, how did they measure indications (e.g., high school graduation rates) and some are negative healthy?”Good question. The table,“Weight of Individual Measures,” indicators (e.g., smoking and binge drinking). Moreover, the table provides a summary of the components included in the report’s shows the weight assigned to each indicator in the construction of a definition of what constitutes good or bad health. You’ll see that the state’s overall score. Weight of Individual Measures % of Total Effect on Score Name of Measure 7.5 Negative 5.0 Negative DETERMINANTS 7.5 Negative BEHAVIORS 5.0 Positive Prevalence of Smoking 5.0 Negative Prevalence of Binge Drinking 2.5 Negative Prevalence of Obesity 5.0 Negative High School Graduation 5.0 Negative COMMUNITY AND ENVIRONMENT 5.0 Negative Violent Crime Occupational Fatalities 5.0 Negative Infectious Disease 2.5 Positive Children in Poverty 5.0 Positive Air Pollution PUBLIC AND HEALTH POLICIES 5.0 Positive Lack of Health Insurance 5.0 Positive Public Health Funding 5.0 Negative Immunization Coverage CLINICAL CARE 2.5 Negative Early Prenatal Care 2.5 Negative Primary Care Physicians 5.0 Negative Preventable Hospitalizations 5.0 Negative OUTCOMES 2.5 Negative Poor Mental Health Days 2.5 Negative Poor Physical Health Days 5.0 Negative Geographic Disparity 100.0 — Infant Mortality Cardiovascular Deaths Cancer Deaths Premature Death OVERALL HEALTH RANKING

212 ■ Chapter 7: Typologies, Indexes, and Scales Research in Real Life (Continued) It would be a good idea for you to review each indicator and see Source: United Health Foundation, Public Health Association, and Partnership for if you agree that it reflects on how healthy states are. Perhaps you can Prevention,“America’s Health Rankings: A Call to Action for Individuals and Their think of other indicators that might have been used. Communities.”©2010 United Health Foundation. Table 1 taken from page 8, Table 36 from page 41. You may download a copy of the report at: (http://www The full report provides a wealth of thoughtful discussion on why .americashealthrankings.org/2010/AHR2010Edition-compact.pdf ). each of these indicators was chosen, and I’d encourage you to check it out at the URL shown below. to total patient management or more in the area of by the way the index was constructed, however; basic mechanisms. The latter were treated as being it is an empirical question—one we answer in more scientifically oriented than the former. The an item analysis. Here’s how this particular item following empty table shows how we would ex- analysis turned out. amine the relationship between the index and the individual item. Index of Scientific Orientations 0123 Index of Scientific Orientations 0123 Percent who said they were more 16 91 100 interested in basic mechanisms 0 Percent who said they were more interested in basic mechanisms ?? ?? ?? ?? As you can see, in accord with our assumption that the 2’s are more scientifically oriented than If you take a minute to reflect on the table, you the 1’s, we find that a higher percentage of the 2’s may see that we already know the numbers that (91 percent) say “basic mechanisms” than the 1’s go in two of the cells. To get a score of 3 on the (16 percent). index, respondents had to say “basic mechanisms” in response to this question and give the “scientific” An item analysis of the other two components answers to the other two items as well. Thus, of the index yields similar results, as shown here. 100 percent of the 3’s on the index said “basic mechanisms.” By the same token, all the 0’s had to Index of Scientific Orientations answer this item with “total patient management.” 0123 Thus, 0 percent of those respondents said “basic mechanisms.” Here’s how the table looks with the Percent who said they could teach 0 4 14 100 information we already know. best as medical researchers 0 80 97 100 Percent who said they preferred ­reading about rationales Index of Scientific Orientations Each of the items, then, seems an appro- 0123 priate component in the index. Each seems to reflect the same quality that the index as a whole Percent who said they were more ?? ?? 100 measures. interested in basic mechanisms 0 In a complex index containing many items, this If the individual item is a good reflection of step provides a convenient test of the independent the overall index, we should expect the 1’s and contribution of each item to the index. If a given 2’s to fill in a progression between 0 percent and item is found to be poorly related to the index, it 100 percent. More of the 2’s should choose “basic may be assumed that other items in the index can- mechanisms” than 1’s. This result is not guaranteed cel out the contribution of that item, and it should

Index Construction ■ 213 be excluded from the index. If the item in question Table 7-1 contributes nothing to the index’s power, it should Validation of Scientific Orientation Index be excluded. Index of Scientific Orientation Although item analysis is an important first test of an index’s validity, it is not a sufficient test. If Low High the index adequately measures a given variable, it 01 23 should successfully predict other indications of that variable. To test this, we must turn to items not Percent interested in attend- 34 42 46 65 ­included in the index. ing scientific lectures at the 43 60 65 89 medical school 0 8 32 66 External Validation Percent who say faculty mem- 61 76 94 99 bers should have experience In our example of the scientific orientation in­ as medical researchers dex, several questions in the questionnaire of- Percent who would prefer fac- fered the possibility of external validation. ulty duties involving research Table 7-1 presents some of these items, which activities only provide several lessons regarding index valida- Percent who engaged in tion. First, we note that the index strongly pre- research during the preceding dicts the responses to the validating items in the academic year sense that the rank order of scientific responses among the four groups is the same as the rank is wrong with the index. But if the index fails to order provided by the index itself. That is, the predict strongly the external validation items, the percentages reflect greater scientific orientation conclusion to be drawn is more ambiguous. In this as you read across the rows of the table. At the situation we must choose between two possibili- same time, each item gives a different description ties: (1) the index does not adequately measure the of scientific orientation overall. For example, variable in question, or (2) the validation items do the last validating item indicates that the great not adequately measure the variable and thereby majority of all faculty were engaged in research do not provide a sufficient test of the index. during the preceding year. If this were the only indicator of scientific orientation, we would Having worked long and conscientiously on conclude that nearly all faculty were scientific. the construction of an index, you’ll likely find the Nevertheless, those scored as more scientific on second conclusion compelling. Typically, you’ll the index are more likely to have engaged in feel you have included the best indicators of the research than were those scored as relatively less variable in the index; the validating items are, scientific. The third validating item provides a therefore, second-rate indicators. Nevertheless, different descriptive picture: Only a minority of you should recognize that the index is purportedly the faculty overall say they would prefer duties a very powerful measure of the variable; thus, it limited exclusively to research. Nevertheless, the should be somewhat related to any item that taps relative percentages giving this answer corre- the variable, even if poorly. spond to the scores assigned on the index. When external validation fails, you should Bad Index versus Bad Validators reexamine the index before deciding that the Nearly every index constructor at some time must external validation  The process of testing the face the apparent failure of external items to vali- validity of a measure, such as an index or scale, by date the index. If the internal item analysis shows examining its relationship to other, presumed indi- inconsistent relationships between the items in- cators of the same variable. If the index really mea- cluded in the index and the index itself, something sures prejudice, for example, it should correlate with other indicators of prejudice.

214 ■ Chapter 7: Typologies, Indexes, and Scales validating items are insufficient. One way to do Once again, the Scandinavian countries this is to examine the relationships between the ranked high but were joined by Canada, New validating items and the individual items included Zealand, the Netherlands, the United States, and in the index. If you discover that some of the index Austria. Having two different measures of gender items relate to the validators and others do not, equality rather than one allowed the researchers you’ll have improved your understanding of the to make more-sophisticated distinctions. For ex- index as it was initially constituted. ample, in several countries, most notably Greece, France, and Japan, women fared relatively well There’s no cookbook solution to this problem; on the GDI but quite poorly on the GEM. Thus, it is an agony serious researchers must learn to while women were doing fairly well in terms survive. Ultimately, the wisdom of your decision to of income, education, and life expectancy, they accept an index will be determined by the useful- were still denied access to power. And whereas ness of that index in your later analyses. Perhaps the GDI scores were higher in the wealthier na- you’ll initially decide that the index is a good one tions than in the poorer ones, GEM scores showed and that the validators are defective, but you’ll later that women’s empowerment depended less on find that the variable in question (as measured by national wealth, with many poor, developing the index) is not related to other variables in the countries outpacing some rich, industrial ones in ways you expected. You may then have to com- regard to such empowerment. pose a new index. By examining several different dimensions The Status of Women: An of the variables involved in their study, the UN Illustration of Index Construction researchers also uncovered an aspect of women’s earnings that generally goes unnoticed. Population For the most part, our discussion of index construc- Communications International (1996: 1) summa- tion has focused on the specific context of survey rizes the finding nicely: research, but other types of research also lend themselves to this kind of composite measure. For Every year, women make an invisible con- example, when the United Nations (1995) set out tribution of eleven trillion U.S. dollars to the to examine the status of women in the world, they global economy, the UNDP [United Nations chose to create two indexes, reflecting two d­ ifferent Development Programme] report says, count- dimensions. ing both unpaid work and the underpayment of women’s work at prevailing market prices. The Gender-related Development Index (GDI) This “underevaluation” of women’s work not compared women to men in terms of three indica- only undermines their purchasing power, says tors: life expectancy, education, and income. These the 1995 HDR [Human Development Report], indicators are commonly used in monitoring the but also reduces their already low social status status of women in the world. The Scandinavian and affects their ability to own property and countries of Norway, Sweden, Finland, and Den- use credit. Mahbub ul Haq, the principal author mark ranked highest on this measure. of the report, says that “if women’s work were accurately reflected in national statistics, it The second index, the Gender Empowerment would shatter the myth that men are the main Measure (GEM), aimed more at power issues and breadwinners of the world.” The UNDP report comprised three different indicators: finds that women work longer hours than men in almost every country, including both paid • The proportion of parliamentary seats held by and unpaid duties. women “Research in Real Life: Indexing the World” provides some other examples of indexes that have • The proportion of administrative, managerial, been created to monitor the state of the world. professional, and technical positions held by women • A measure of access to jobs and wages

Scale Construction ■ 215 Research in Real Life Indexing the World • Columbia University’s Environmental Sustainability Index is one of If you browse the web in search of indexes, you’ll be handsomely re- several measures that seek to monitor the environmental impact of warded. Here are just a few examples of the ways in which people have the nations of the planet. used the logic of social indexes to monitor the state of the world. Go to your Sociology CourseMate at www.cengagebrain.com for links to each • The well-being of America’s young people is the focus of the Child of the following examples: and Youth Well-Being Index, housed at Duke University. • The well-being of nations is commonly measured in economic • Money Magazine has indexed the 100 best places to live in America, terms, such as the Gross Domestic Product per capita, average in- come, or stock market averages. In 1972, however, the mountainous using factors such as economics, housing, schools, health, crime, kingdom of Bhutan drew global attention by proposing an index weather, and public facilities. of “Gross National Happiness,”augmenting economic factors with measures of physical and mental health, freedom, environment, • The Heritage Foundation offers the Index of Economic Freedom for marital stability, and other indicators of noneconomic well-being. The World Data Base of Happiness expands this general idea those planning business ventures around the world. to 24 countries. • For Christians who believe in prophecies of the end of times, the Rapture Index uses 45 indicators—including inflation, famine, floods, liberalism, and Satanism—and offers a gauge of how close or far away the end is. Can you find other, similar indexes online? As you can see, indexes can be constructed measure may have different intensities in terms of from many different kinds of data for a variety of the variable. Many methods of scaling are avail- purposes. Now we’ll turn our attention from the able. We’ll look at four scaling procedures to illus- construction of indexes to an examination of scal- trate the variety of techniques available, along with ing techniques. a technique called the semantic differential. Although these examples focus on questionnaires, the logic Scale Construction of scaling, like that of indexing, applies to other r­esearch methods as well. Good indexes provide an ordinal ranking of cases on a given variable. All indexes are based on this kind Bogardus Social Distance Scale of assumption: A senator who voted for seven con- servative bills is considered to be more conservative Let’s suppose you’re interested in the extent to than one who voted for only four of them. What an which U.S. citizens are willing to associate with, index may fail to take into account, however, is that say, sex offenders. You might ask the following not all indicators of a variable are equally important questions: or equally strong. The first senator might have voted in favor of seven mildly conservative bills, whereas 1. Are you willing to permit sex offenders to live the second senator might have voted in favor of four in your country? extremely conservative bills. (The second senator might have considered the other seven bills too lib- 2. Are you willing to permit sex offenders to live eral and voted against them.) in your community? Scales offer more assurance of ordinality by 3. Are you willing to permit sex offenders to live tapping the intensity structures among the indi- in your neighborhood? cators. The several items going into a composite 4. Would you be willing to let a sex offender live next door to you? 5. Would you let your child marry a sex offender?

216 ■ Chapter 7: Typologies, Indexes, and Scales These questions increase in terms of the close- sex offenders a given respondent will accept, we ness of contact with sex offenders. Beginning with know which relationships were accepted. Thus, a the original concern to measure willingness to single number can accurately summarize five or six associate with sex offenders, you have thus devel- data items without a loss of information. oped several questions indicating differing degrees of intensity on this variable. The kinds of items Motoko Lee, Stephen Sapp, and Melvin presented constitute a Bogardus social distance Ray (1996) noticed an implicit element in the scale (created by Emory Bogardus). This scale is a Bogardus social distance scale: It looks at social measurement technique for determining the will- distance from the point of view of the majority ingness of people to participate in social relations— group in a society. These researchers decided to of varying degrees of closeness—with other kinds turn the tables and create a “reverse social dis- of people. tance” scale: looking at social distance from the perspective of the minority group. Here’s how The clear differences of intensity suggest a they framed their questions (1996: 19): structure among the items. Presumably if a person is willing to accept a given kind of association, he Considering typical Caucasian Americans you or she would be willing to accept all those preced- have known, not any specific person nor the ing it in the list—those with lesser intensities. For worst or the best, circle Y or N to express your example, the person who is willing to permit sex opinion. offenders to live in the neighborhood will surely Y N  5. Do they mind your being a citizen in accept them in the community and the nation but may or may not be willing to accept them as next- this country? door neighbors or relatives. This, then, is the logical Y N  4. Do they mind your living in the same structure of intensity inherent among the items. neighborhood? Empirically, one would expect to find the larg- Y N  3. W ould they mind your living next to est number of people accepting co-citizenship and the fewest accepting intermarriage. In this sense, them? we speak of “easy items” (for example, residence in Y N  2. Would they mind your becoming a the United States) and “hard items” (for example, intermarriage). More people agree to the easy items close friend to them? than to the hard ones. With some inevitable excep- Y N  1. Would they mind your becoming their tions, logic demands that once a person has refused a relationship presented in the scale, he or she will kin by marriage? also refuse all the harder ones that follow it. As with the original scale, the researchers found The Bogardus social distance scale illustrates the that knowing the number of items minority important economy of scaling as a data-reduction r­espondents agreed with also told the researchers device. By knowing how many relationships with which ones were agreed with, 98.9 percent of the time in this case. Bogardus social distance scale  A measurement technique for determining the willingness of people Thurstone Scales to participate in social relations—of varying degrees of closeness—with other kinds of people. It is an Often, the inherent structure of the Bogardus especially efficient technique in that one can sum- social distance scale is not appropriate to the marize several discrete answers without losing any variable being measured. Indeed, such a logical of the original details of the data. structure among several indicators is seldom ap- parent. A Thurstone scale (created by Louis Thurstone scale  A type of composite measure, Thurstone) is an attempt to develop a format for constructed in accord with the weights assigned by generating groups of indicators of a variable that “judges” to various indicators of some variables. have at least an empirical structure among them. A group of judges is given perhaps a hundred items that are thought to be indicators of a given variable. Each judge is then asked to estimate

Scale Construction ■ 217 is—by assigning scores of perhaps 1 to 13. If the Likert Scaling variable were prejudice, for example, the judges would be asked to assign the score of 1 to the I’m sure you are familiar with questionnaire items very weakest indicators of prejudice, the score of containing response categories such as “strongly 13 to the strongest indicators, and intermediate agree,” “agree,” “disagree,” and “strongly disagree scores to those felt to be somewhere in between. Rensis Likert (pronounced “LICK-ert”) created this commonly used question format. Likert also cre- Once the judges have completed this task, the ated a technique for combining the items into a researcher examines the scores assigned to each scale, but while Likert’s scaling technique is rarely item by all the judges, then determines which used, his answer format is one of the most fre- items produced the greatest agreement among the quently used in survey research. judges. Those items on which the judges disagreed broadly would be rejected as ambiguous. Among The particular value of this format is the unam- those items producing general agreement in scor- biguous ordinality of response categories. If respon- ing, one or more would be selected to represent dents were permitted to volunteer or select such each scale score from 1 to 13. answers as “sort of agree,” “pretty much agree,” “really agree,” and so forth, you would find it im- The items selected in this manner might then possible to judge the relative strength of agreement be included in a survey questionnaire. Respondents intended by the various respondents. The Likert who appeared prejudiced on those items repre- format solves this problem. senting a strength of 5 would then be expected to appear prejudiced on those having lesser strengths, Though seldom used, Likert’s scaling method and if some of those respondents did not appear is fairly easy to understand, based on the relative prejudiced on the items with a strength of 6, it intensity of different items. As a simple example, would be expected that they would also not appear suppose we wish to measure prejudice against prejudiced on those with greater strengths. women. To do this, we create a set of 20 state- ments, each of which reflects that prejudice. One If the Thurstone scale items were adequately of the items might be “Women can’t drive as well developed and scored, the economy and effective- as men.” Another might be “Women shouldn’t be ness of data reduction inherent in the Bogardus allowed to vote.” Likert’s scaling technique would social distance scale would appear. A single score demonstrate the difference in intensity between might be assigned to each respondent (the strength these items as well as pegging the intensity of the of the hardest item accepted), and that score would other 18 statements. adequately represent the responses to several ques- tionnaire items. And as is true of the Bogardus Let’s suppose we ask a sample of people to scale, a respondent who scored 6 might be r­egarded agree or disagree with each of the 20 statements. as more prejudiced than one who scored 5 or less. Simply giving one point for each of the indica- tors of prejudice against women would yield the Thurstone scaling is not often used in research possibility of index scores ranging from 0 to 20. today, primarily because of the tremendous expen- A true Likert scale goes one step beyond that diture of energy and time required to have 10 to 15 judges score the items. Because the quality of Likert scale  A type of composite measure devel- their judgments would depend on their experience oped by Rensis Likert, in an attempt to improve the with the variable under consideration, they might levels of measurement in social research through need to be professional researchers. Moreover, the the use of standardized response categories in sur- meanings conveyed by the several items indicating vey questionnaires, to determine the relative inten- a given variable tend to change over time. Thus, an sity of different items. Likert items are those using item having a given weight at one time might have such response categories as strongly agree, agree, quite a different weight later on. For a Thurstone disagree, and strongly disagree. Such items may be scale to be effective, it would have to be updated used in the construction of true Likert scales as well periodically. as other types of composite measures.

218 ■ Chapter 7: Typologies, Indexes, and Scales and calculates the average index score for those two opposite positions by using qualifiers to bridge agreeing with each of the individual statements. the distance between the two opposites. Here’s Let’s say that all those who agreed that women how it works. are poorer drivers than men had an average index score of 1.5 (out of a possible 20). Those Suppose you’re evaluating the effectiveness of who agreed that women should be denied the a new music-appreciation lecture on subjects’ ap- right to vote might have an average index score preciation of music. As a part of your study, you of, say, 19.5—indicating the greater degree of want to play some musical selections and have the prejudice reflected in that response. subjects report their feelings about them. A good way to tap those feelings would be to use a seman- As a result of this item analysis, respondents tic differential format. could be rescored to form a scale: 1.5 points for agreeing that women are poorer drivers, 19.5 To begin, you must determine the dimen- points for saying women shouldn’t vote, and points sions along which subjects should judge each for other responses reflecting how those items selection. Then you need to find two opposite related to the initial, simple index. If those who terms, representing the polar extremes along disagreed with the statement “I might vote for a each dimension. Let’s suppose one dimension woman for president” had an average index score that interests you is simply whether subjects en- of 15, then the scale would give 15 points to people joyed the piece or not. Two opposite terms in this disagreeing with that statement. case could be “enjoyable” and “unenjoyable.” Similarly, you might want to know whether they As I’ve said earlier, Likert scaling is seldom regarded the individual selections as “complex” used today. The item format devised by Likert, or “simple,” “harmonic” or “discordant,” and so however, is one of the most commonly used forth. formats in contemporary questionnaire design. Typically, it is now used in the creation of simple Once you have determined the relevant indexes. With, say, five response categories (in- dimensions and have found terms to represent cluding “no opinion” or something similar), scores the extremes of each, you might prepare a rat- of 0 to 4 or 1 to 5 might be assigned, taking the ing sheet each subject would complete for each direction of the items into account (for example, piece of music. Figure 7-5 shows what it might assign a score of 5 to “strongly agree” for posi- look like. tive items and to “strongly disagree” for negative items). Each respondent would then be assigned On each line of the rating sheet, the subject an overall score representing the summation of would indicate how he or she felt about the piece the scores he or she received for responses to the of music: whether it was enjoyable or unenjoyable, individual items. for example, and whether it was “somewhat” that way or “very much” so. To avoid creating a biased Semantic Differential pattern of responses to such items, it’s a good idea to vary the placement of terms that are likely to Like the Likert format, the semantic differential be related to each other. Notice, for example, that asks questionnaire respondents to choose between “discordant” and “traditional” are on the left side of the sheet, with “harmonic” and “modern” on the semantic differential  A questionnaire format in right. Most likely, those selections scored as “dis- which the respondent is asked to rate something cordant” would also be scored as “modern” rather in terms of two, opposite adjectives (e.g., rate text- than “traditional.” books as “boring” or “exciting”), using qualifiers such as “very,” “somewhat,” “neither,” “somewhat,” Both the Likert and semantic differential for- and “very” to bridge the distance between the two mats have a greater rigor and structure than other opposites. question formats do. As I indicated earlier, these formats produce data suitable to both indexing and scaling.

Scale Construction ■ 219 Figure 7-5 Semantic Differential: Feelings about Musical Selections. The semantic differential asks respondents to describe something or someone in terms of opposing adjectives. Guttman Scaling The different percentages supporting ­ Researchers today often use the scale developed abortion under the three conditions suggest by Louis Guttman. Like Bogardus, Thurstone, and Likert scaling, Guttman scaling is based on the fact something about the different levels of support that some items under consideration may prove to be more-extreme indicators of the variable than that each item indicates. For example, if some- others. Here’s an example to illustrate this pattern. one supported abortion when the mother’s life In the earlier example of measuring scientific orientation among medical school faculty members, is seriously endangered, that’s not a very strong you’ll recall that a simple index was constructed. As it happens, however, the three items included in indicator of general support for abortion, because the index essentially form a Guttman scale. almost e­ veryone agreed with that. Supporting The construction of a Guttman scale begins with some of the same steps that initiate index con- abortion for unmarried women seems a much struction. You begin by examining the face validity of items available for analysis. Then, you examine stronger indicator of support for abortion in the bivariate and perhaps multivariate relations among those items. In scale construction, however, general—fewer than half the sample took that you also look for relatively “hard” and “easy” indi- cators of the variable being examined. position. Earlier, when we talked about attitudes regard­ Guttman scaling is based on the idea that any- ing a woman’s right to have an abortion, we dis- cussed several conditions that can affect people’s one who gives a strong indicator of some variable opinions: whether the woman is married, whether her health is endangered, and so forth. These dif- will also give the weaker indicators. In this case, we fering conditions provide an excellent illustration of Guttman scaling. would assume that anyone who supported abor- Here are the percentages of the people in the tion for unmarried women would also support it 2006 GSS sample who supported a woman’s right to an abortion, under three different conditions: in the case of rape or of the woman’s health being threatened. Table 7-2 tests this assumption by pre- senting the number of respondents who gave each of the possible response patterns. The first four response patterns in the table compose what we would call the scale types: those patterns that form a scalar structure. F­ ollowing those respondents who suppoCrteedn g a g e L e a r n i n g sareebseopr(otliinonsnees2u)hnatdhveaert cathlhlootshseernewethictehontowdniotlyioetnawssioe(lripnoreonB-e1Scas)ho;,bocwbiicaieeel :RTehseeaPrrcahc,ti1c3e/eof Fig. 6-5 those with only one such response (line13-1)3c3h-o0s4e979-6 the easiest of the three (the woman’s health Woman’s health is seriously endangered 87% Pregnant as a result of rape 77% Guttman scale  A type of composite measure used to summarize several discrete observations and to Woman is not married 38% represent some more-general variable.

220 ■ Chapter 7: Typologies, Indexes, and Scales Table 7-2 In the present example of attitudes regarding Scaling Support for Choice of Abortion abortion, respondents fitting into the scale types would receive the same scores as would be as- Scale types Women’s Result Woman Number signed in the construction of an index. Persons se- Mixed types Health of Rape Unmarried of Cases lecting all three pro-choice responses (+ + +) would still be scored 3, those who selected pro-choice re- + + + 763 sponses to the two easier items and were opposed + + − 633 on the hardest item (+ + −) would be scored 2, and + − − 201 so on. For each of the four scale types we could − − − 191 predict accurately all the actual responses given by all the respondents based on their scores. − + Total = 1,788 + − − 43 The mixed types in the table present a problem, − − +7 however. The first mixed type (− + −) was scored − + +4 1 on the index to indicate only one pro-choice +4 response. But, if 1 were assigned as a scale score, we would predict that the 43 respondents in this Total = 58 group had chosen only the easiest item (approving abortion when the woman’s life was endangered), being endangered). And finally, there are some and we would be making two errors for each such ­respondents who opposed abortion in all three respondent: thinking their response pattern was circumstances (line 4). (+ − −) instead of (− + −). Scale scores are assigned, therefore, with the aim of minimizing the errors The second part of the table presents those that would be made in reconstructing the original response patterns that violate the scalar structure responses. of the items. The most radical departures from the scalar structure are the last two response patterns: Table 7-3 illustrates the index and scale scores those who accepted only the hardest item and that would be assigned to each of the response those who rejected only the easiest one. patterns in our example. Note that one error is made for each respondent in the mixed types. This The final column in the table indicates the is the minimum we can hope for in a mixed-type number of survey respondents who gave each of pattern. In the first mixed type, for example, we the response patterns. The great majority (1,788, would erroneously predict a pro-choice response to or 97 percent) fit into one of the scale types. The the easiest item for each of the 43 respondents in presence of mixed types, however, indicates that this group, making a total of 43 errors. the items do not form a perfect Guttman scale. (It would be extremely rare for such data to form a The extent to which a set of empirical re- Guttman scale perfectly.) sponses form a Guttman scale is determined by the accuracy with which the original responses can Recall at this point that one of the chief func- be reconstructed from the scale scores. For each tions of scaling is efficient data reduction. Scales of the 1,846 respondents in this example, we’ll provide a technique for presenting data in a sum- predict three questionnaire responses, for a total mary form while maintaining as much of the origi- of 5,538 predictions. Table 7-3 indicates that we’ll nal information as possible. When the scientific make 58 errors using the scale scores assigned. orientation items were formed into an index in The percentage of correct predictions is called the our earlier discussion, respondents were given one coefficient of reproducibility: the percentage of original point for each scientific response they gave. If these responses that could be reproduced by knowing same three items were scored as a Guttman scale, the scale scores used to summarize them. In the some respondents would be assigned scale scores present example, the coefficient of reproducibility that would permit the most accurate reproduction is 99 percent. of their original responses to all three items.

Typologies ■ 221 Table 7-3 questionnaire items (perhaps developed and used Index and Scale Scores by a previous researcher) constitutes a Guttman scale. Rather, we can say only that they form a Response Number Index Scale Total scale within a given body of data being analyzed. Pattern of Cases Scores Scores Scale Errors Scalability, then, is a sample-dependent, empirical matter. Although a set of items may form a Gutt- Scale types + + + 763 3 3 0 man scale among one sample of survey respon- dents, for example, there is no guarantee that this + + − 633 2 2 0 set will form such a scale among another sample. In this sense, then, a set of questionnaire items in + − − 201 1 1 0 and of itself never forms a scale, but a set of empiri- cal observations may. − − − 191 0 0 0 This concludes our discussion of indexing and Mixed types − + − 43 1 2 43 scaling. Like indexes, scales are composite mea- sures of a variable, typically broadening the mean- +−+ 7 2 3 7 ing of the variable beyond what might be captured by a single indicator. Both scales and indexes seek −−+ 4 1 0 4 to measure variables at the ordinal level of mea- surement. Unlike indexes, however, scales take −++ 4 2 3 4 advantage of any intensity structure that may be present among the individual indicators. To the 58 Total scale errors = 58 extent that such an intensity structure is found and 1,846 × 58 the data from the people or other units of analysis = 1 − 3 = 1 − 5,538 comply with the logic of that intensity structure, we can have confidence that we have created an = .9895 = 99% ordinal measure. Coefficient of reproducibility = 1 − number of errors Typologies number of guesses Indexes and scales, then, are constructed to provide This table presents one common method for scoring mixed types, but you should ordinal measures of given variables. We attempt to be advised that other methods are also used. assign index or scale scores to cases in such a way as to indicate a rising degree of prejudice, religios- Except for the case of perfect (100 percent) re- ity, conservatism, and so forth. In such cases, we’re producibility, there is no way of saying that a set of dealing with single dimensions. items does or does not form a Guttman scale in any absolute sense. Virtually all sets of such items ap- Often, however, the researcher wishes to sum- proximate a scale. As a general guideline, however, marize the intersection of two or more variables, coefficients of 90 or 95 percent are the commonly thereby creating a set of categories or types—a used standards. If the observed reproducibility ex- nominal variable—called a typology. You may, for ceeds the level you’ve set, you’ll probably decide to score and use the items as a scale. typology  The classification (typically nominal) of observations in terms of their attributes on two or The decision concerning criteria in this regard more variables. The classification of newspapers as is, of course, arbitrary. Moreover, a high degree of liberal-urban, liberal-rural, conservative-urban, or reproducibility does not ensure that the scale con- conservative-rural would be an example. structed in fact measures the concept under con- sideration. What it does is increase confidence that all the component items measure the same thing. Also, you should realize that a high coefficient of reproducibility is most likely when few items are involved. One concluding remark with regard to Gutt- man scaling: It’s based on the structure observed among the actual data under examination. This is an important point that is often misunder-

222 ■ Chapter 7: Typologies, Indexes, and Scales example, wish to examine the political orientations Table 7-4 Foreign Policy of newspapers separately in terms of domestic is- A Typology of Newpapers Conservative Liberal sues and foreign policy. The fourfold presentation in Table 7-4 describes such a typology. Domestic policy Conservative AB Liberal CD Newspapers in cell A of the table are conserva- tive on both foreign policy and domestic policy; easily examine the effects of both foreign and do- those in cell D are liberal on both. Those in cells B mestic policies on political endorsements. and C are conservative on one and liberal on the other. It’s extremely difficult, however, to analyze a typology as a dependent variable. If you want to As another example, Rodney Coates (2006) discover why newspapers fall into the different cells created a typology of “racial hegemony” from two of typology, you’re in trouble. That becomes ap- dimensions: parent when we consider the ways you might con- struct and read your tables. Assume, for example, 1. Political Ideology that you want to examine the effects of community size on political policies. With a single dimension, a. Democratic you could easily determine the percentages of rural and urban newspapers that were scored conserva- b. Non-Democratic tive and liberal on your index or scale. 2. Military and Industrial Sophistication With a typology, however, you would have to present the distribution of the urban newspapers a. Low in your sample among types A, B, C, and D. Then you would repeat the procedure for the rural ones b. High in the sample and compare the two distributions. Let’s suppose that 80 percent of the rural news- He then used the typology to examine modern papers are scored as type A (conservative on both examples of colonial rule, with specific reference dimensions), compared with 30 percent of the to race relations. The specific cases he examined urban ones. Moreover, suppose that only 5 percent allowed him to illustrate and refine the typology. of the rural newspapers are scored as type B (con- He points out that such a device represents Max servative only on domestic issues), compared with Weber’s “ideal type”: “As stipulated by Weber, 40 percent of the urban ones. It would be incorrect ideal types represent a type of abstraction from to conclude from an examination of type B that reality. These abstractions, constructed from the urban newspapers are more conservative on logical extraction of elements derived from spe- domestic issues than rural ones are, because cific examples, provide a theoretical model by 85 percent of the rural newspapers, compared with which and from which we may examine reality” 70 percent of the urban ones, have this character- (2006: 87). istic. The relative sparsity of rural newspapers in type B is due to their concentration in type A. It Frequently, you arrive at a typology in the should be apparent that an interpretation of such course of an attempt to construct an index or scale. data would be very difficult for anything other The items that you felt represented a single vari- than description. able appear to represent two. We might have been attempting to construct a single index of political In reality, you’d probably examine two such orientations for newspapers but discovered—­ dimensions separately, especially if the dependent empirically—that foreign and domestic politics had to be kept separate. In any event, you should be warned against a difficulty inherent in typological analysis. When- ever the typology is used as the independent vari- able, there will probably be no problem. In the preceding example, you might compute the per- centages of newspapers in each cell that normally endorse Democratic candidates; you could then

Key Terms ■ 223 variable has more categories of responses than the • Index scoring involves deciding the desirable given example does. range of scores and determining whether items Don’t think that typologies should always be will have equal or different weights. avoided in social research; often they provide the most appropriate device for understanding the • There are various techniques that allow items to data. To examine the pro-life orientation in depth, for example, you might create a typology involving be used in an index in spite of missing data. both abortion and capital punishment. Libertarian- ism could be seen in terms of both economic and • Item analysis is a type of internal validation, based social permissiveness. You’ve now been warned, however, against the special difficulties involved in on the relationship between individual items in the using typologies as dependent variables. composite measure and the measure itself. Exter- nal validation refers to the relationships between Main Points the composite measure and other indicators of the variable—indicators not included in the measure. Introduction Scale Construction • Single indicators of variables seldom (1) cap- • Four types of scaling techniques are represented ture all the dimensions of a concept, (2) have sufficiently clear validity to warrant their use, or by the Bogardus social distance scale, a device for (3) permit the desired range of variation to allow measuring the varying degrees to which a person ordinal rankings. Composite measures, such would be willing to associate with a given class of as scales and indexes, solve these problems by people; Thurstone scaling, a technique that uses including several indicators of a variable in one judges to determine the intensities of different in- summary measure. dicators; Likert scaling, a measurement technique based on the use of standardized response catego- Indexes versus Scales ries; and Guttman scaling, a method of discover- ing and using the empirical intensity structure • Although both indexes and scales are intended as among several indicators of a given variable. Gutt- man scaling is probably the most popular scaling ordinal measures of variables, scales typically sat- technique in social research today. isfy this intention better than indexes do. • The semantic differential is a question format that • Whereas indexes are based on the simple cumula- asks respondents to make ratings that lie between tion of indicators of a variable, scales take advan- two extremes, such as “very positive” and “very tage of any logical or empirical intensity structures negative.” that exist among a variable’s indicators. Typologies Index Construction • A typology is a nominal composite measure often • The principal steps in constructing an index in- used in social research. Typologies may be used clude selecting possible items, examining their effectively as independent variables, but interpre- empirical relationships, scoring the index, and tation is difficult when they are used as dependent validating it. variables. • Criteria of item selection include face validity, K e y T erm s unidimensionality, the degree of specificity with The following terms are defined in context in the which a dimension is to be measured, and the chapter and at the bottom of the page where the term amount of variance provided by the items. is introduced, as well as in the comprehensive glossary at the back of the book. • If different items are indeed indicators of the Bogardus social distance scale Likert scale same variable, then they should be related external validation scale empirically to one another. In constructing an Guttman scale semantic differential index, the researcher needs to examine bivari- index Thurstone scale ate and multivariate relationships among the item analysis typology items.

224 ■ Chapter 7: Typologies, Indexes, and Scales Proposing Social Research: find the Consumer Price Index survey. What are some of the dimensions of living costs included in Co mp o s ite M e a s ure s this measure? This chapter has extended the issue of measurement S P SS E x er c i s e s to include those in which variables are measured by more than one indicator. What you have learned here See the booklet that accompanies your text for ex- may extend the discussion of measurement in your ercises using SPSS (Statistical Package for the Social proposal. As in the case of operationalization, you may Sciences). There are exercises offered for each chapter, find this easier to formulate in the case of quantitative and you’ll also find a detailed primer on using SPSS. studies, but the logic of multiple indicators may be ap- plied to all research methods. Online Study Resources If your study will involve the use of composite Access the resources your instructor has assigned. For measures, you should identify the type(s), the indica- this book, you can access: tors to be used in their construction, and the methods you’ll use to create and validate them. If the study you CourseMate for The are planning in this series of exercises will not include Practice of Social Research composite measures, you can test your understand- ing of the chapter by exploring ways in which they Login to CengageBrain.com to access chapter-specific could be used, even if you need to temporarily vary learning tools including Learning Objectives, Practice the data-collection method and/or variables you have Quizzes, Videos, Internet Exercises, Flash Cards, Glossaries, in mind. Web Links, and more from your Sociology CourseMate. R e v ie w Q ue s ti o n s a n d E x er c i s e s If your professor has assigned Aplia homework: 1. Sign into your account. 1. In your own words, describe the difference be- 2. After you complete each page of questions, click tween an index and a scale. “Grade It Now” to see detailed explanations of 2. Suppose you wanted to create an index for rat- every answer. ing the quality of colleges and universities. Name 3. Click “Try Another Version” for an opportunity to three data items that might be included in such improve your score. an index. Visit www.cengagebrain.com to access your account and purchase materials. 3. Make up three questionnaire items that measure attitudes toward nuclear power and that would probably form a Guttman scale. 4. Construct a typology of pro-life attitudes as dis- cussed in the chapter. 5. Economists often use indexes to measure eco- nomic variables, such as the cost of living. Go to the Bureau of Labor Statistics link on your Sociol- ogy CourseMate at www.cengagebrain.com and


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook