Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore -Earl_R._Babbie-_The_Practice_of_Social_Research_((BookFi)

-Earl_R._Babbie-_The_Practice_of_Social_Research_((BookFi)

Published by dinakan, 2021-08-12 20:16:58

Description: e-Book ini adalah untuk tujuan pembacaan sahaja dan tidak berasaskan sebarang keuntungan.

Search

Read the Text Version

CHAPTER FIVE Conceptualization, Operationalization, and Measurement CHAPTER OVERVIEW Introduction Variations between the Extremes The interrelated steps of conceptual- Measuring Anything That Exists A Note on Dimensions ization, operationalization, and Conceptions, Concepts, Defining Variables measurement allow researchers to and Reality and Attributes turn a general idea for a research Concepts as Constructs Levels of Measurement topic into useful and valid measure- Single or Multiple Indicators ments in the real world.An essential Conceptualization Some Illustrations of part of this process involves trans- Indicators and Dimensions Operationalization Choices forming the relatively vague terms The Interchangeability Operationalization Goes of ordinary language into precise of Indicators On and On objects of study with well-defined Real, Nominal, and measurable meanings. and Operational Definitions Criteria of Measurement Quality Creating Conceptual Order Precision and Accuracy 124 An Example Reliability of Conceptualization: Validity The Concept of Anomie Who Decides What’s Valid? Tension between Reliability Definitions in Descriptive and Validity and Explanatory Studies The Ethics of Measurement Operationalization Choices Range of Variation CengageNOW for Sociology Use this online tool to help you make the grade on your next exam. After reading this chapter, go to “Online Study Resources” at the end of the chapter for instructions on how to benefit from CengageNOW.

Measuring Anything That Exists ■ 125 Introduction Measuring Anything That Exists This chapter and the next deal with how research- ers move from a general idea about what they Earlier in this book, I said that one of the two pillars want to study to effective and well-defined of science is observation. Because this word can measurements in the real world. This chapter suggest a casual, passive activity, scientists often discusses the interrelated processes of conceptual- use the term measurement instead, meaning careful, ization, operationalization, and measurement. deliberate observations of the real world for the Chapter 6 builds on this foundation to dis- purpose of describing objects and events in terms of cuss types of measurements that are more the attributes composing a variable. complex. You may have some reservations about the Consider a notion such as “satisfaction with ability of science to measure the really important college.” I’m sure you know some people who are aspects of human social existence. If you’ve read very satisfied, some who are very dissatisfied, and research reports dealing with something like liber- many who are between those extremes. More- alism or religion or prejudice, you may have been over, you can probably place yourself somewhere dissatisfied with the way the researchers measured along that satisfaction spectrum. While this prob- whatever they were studying. You may have felt ably makes sense to you as a general matter, how that they were too superficial, that they missed the would you go about measuring how different aspects that really matter most. Maybe they mea- students were, so you could place them along that sured religiosity as the number of times a person spectrum? went to religious services, or maybe they measured liberalism by how people voted in a single election. There are some comments students make Your dissatisfaction would surely have increased if in conversations (such as “This place sucks”) that you had found yourself being misclassified by the would tip you off as to where they stood. Or, in a measurement system. more active effort, you can probably think of ques- tions you might ask students to learn about Your feeling of dissatisfaction reflects an their satisfaction (such as “How satisfied are you important fact about social research: Most of the with . . . ?”). Perhaps there are certain behaviors variables we want to study don’t actually exist in (class attendance, use of campus facilities, setting the way that rocks exist. Indeed, they are made up. the dean’s office on fire) that would suggest differ- Moreover, they seldom have a single, unambiguous ent levels of satisfaction. As you think about meaning. ways of measuring satisfaction with college, you are engaging in the subject matter of this To see what I mean, suppose we want to study chapter. political party affiliation. To measure this variable, we might consult the list of registered voters to We begin by confronting the hidden concern note whether the people we were studying were people sometimes have about whether it’s truly registered as Democrats or Republicans and take possible to measure the stuff of life: love, hate, prej- that as a measure of their party affiliation. But we udice, religiosity, radicalism, alienation. The answer could also simply ask someone what party they is yes, but it will take a few pages to see how. Once identify with and take their response as our mea- we establish that researchers can measure anything sure. Notice that these two different measurement that exists, we’ll turn to the steps involved in doing possibilities reflect somewhat different definitions just that.

126 ■ Chapter 5: Conceptualization, Operationalization, and Measurement of political party affiliation. They might even produce • We personally heard people say nasty things different results: Someone may have registered as a Democrat years ago but gravitated more and about minority groups. more toward a Republican philosophy over time. Or someone who is registered with neither political • We heard people say that women were inferior party may, when asked, say she is affiliated with the one she feels the most kinship with. to men. Similar points apply to religious affiliation. • We read about African Americans being Sometimes this variable refers to official member- ship in a particular church, temple, mosque, and lynched. so forth; other times it simply means whatever religion, if any, you identify yourself with. Perhaps • We read that women and minorities earned less to you it means something else, such as attendance at religious services. for the same work. The truth is that neither party affiliation nor • We learned about “ethnic cleansing” and wars religious affiliation has any real meaning, if by “real” we mean corresponding to some objective aspect of in which one ethnic group tried to eradicate reality. These variables do not exist in nature. They another. are merely terms we’ve made up and assigned specific meanings to for some purpose, such as do- With additional experience, we noticed some- ing social research. thing more. People who participated in lynching were also quite likely to call African Americans But, you might object, political affiliation and ugly names. A lot of them, moreover, seemed to religious affiliation—and a host of other things social want women to “stay in their place.” Eventually it researchers are interested in, such as prejudice or dawned on us that these several tendencies often compassion—have some reality. After all, research- appeared together in the same people and also had ers make statements about them, such as “In something in common. At some point, someone Happytown, 55 percent of the adults affiliate with had a bright idea: “Let’s use the word prejudiced as a the Republican Party, and 45 percent of them are shorthand notation for people like that. We can use Episcopalians. Overall, people in Happytown are the term even if they don’t do all those things—as low in prejudice and high in compassion.” Even long as they’re pretty much like that.” ordinary people, not just social researchers, have been known to make statements like that. If these Being basically agreeable and interested in things don’t exist in reality, what is it that we’re efficiency, we went along with the system. That’s measuring and talking about? where “prejudice” came from. We never observed it. We just agreed to use it as a shortcut, a name What indeed? Let’s take a closer look by consid- that represents a collection of apparently related ering a variable of interest to many social research- phenomena that we’ve each observed in the course ers (and many other people as well)—prejudice. of life. In short, we made it up. Conceptions, Concepts, Here’s another clue that prejudice isn’t some- and Reality thing that exists apart from our rough agreement to use the term in a certain way. Each of us devel- As you and I wandered down the road of life, we ops our own mental image of what the set of real observed a lot of things and knew they were real phenomena we’ve observed represents in general through our observations, and we heard reports and what these phenomena have in common. from other people that seemed real. For example: When I say the word prejudice, it evokes a mental image in your mind, just as it evokes one in mine. It’s as though file drawers in our minds contained thousands of sheets of paper, with each sheet of pa- per labeled in the upper right-hand corner. A sheet of paper in each of our minds has the term prejudice on it. On your sheet are all the things you’ve been told about prejudice and everything you’ve

Measuring Anything That Exists ■ 127 Keeping Humanity in Focus In the early 1970s,Elijah Anderson spent three years observing life You can read excerpts of the book online and can hear Anderson in a black,working-class neighborhood in South Chicago,focusing discuss the book in an interview with BBC’s Laurie Taylor at the on Jelly’s,a combination bar and liquor store.While some people still links offered on this book’s website:http://www.cengage.com/ believe that impoverished neighborhoods in the inner city are socially sociology/babbie. chaotic and disorganized,Anderson’s study and others like it have clearly demonstrated a definite social structure there that guides the behavior A Place on the Corner: A Study of Black Street Corner Men by of its participants.Much of his interest centered on systems of social Elijah Anderson. © 2004 University of Chicago Press status and how the 55 or so regulars at Jelly’s worked those systems to establish themselves among their peers. In the second edition of this classic study of urban life,Elijah Anderson returned to Jelly’s and the surrounding neighborhood.There he found several changes,largely due to the outsourcing of manufactur- ing jobs overseas that has brought economic and mental depression to many of the residents.These changes,in turn,had also altered the nature of social organization. For a research methods student,the book offers many insights into the process of establishing rapport with people being observed in their natural surroundings.Further,he offers excellent examples of how concepts are established in qualitative research. observed that seems to be an example of it. My I’m sure you’ve heard some reference to the sheet has what I’ve been told about it plus all the many words Eskimos have for snow, as an example things I’ve observed that seem examples of it—and of how environment can shape language. Here’s mine isn’t the same as yours. an exercise you might enjoy when you’re ready to take a break from reading. Search the web for The technical term for those mental images, “Eskimo words for snow.” You may be surprised by those sheets of paper in our mental file drawers, is what you find. You’re likely to discover wide dis- conception. That is, I have a conception of prejudice, agreement on the number of, say, Inuit, words— and so do you. We can’t communicate these mental ranging from 1 to 400. Several sources, moreover, images directly, so we use the terms written in the will suggest that if the Inuit have several words for upper right-hand corner of our own mental sheets snow, so does English. Cecil Adams, for example, of paper as a way of communicating about our con- lists “snow, slush, sleet, hail, powder, hard pack, ceptions and the things we observe that are related blizzard, flurries, flake, dusting, crust, avalanche, to those conceptions. These terms make it possible for us to communicate and eventually agree on conceptualization The mental process whereby what we specifically mean by those terms. In social fuzzy and imprecise notions (concepts) are made research, the process of coming to an agreement more specific and precise. So you want to study prej- about what terms mean is conceptualization, and udice. What do you mean by “prejudice”? Are there the result is called a concept. See “Keeping Human- different kinds of prejudice? What are they? ity in Focus” for a glimpse at a project that reveals a lot about conceptualization.

128 ■ Chapter 5: Conceptualization, Operationalization, and Measurement drift, frost, and iceberg” (Straight Dope 2001). This and “cross-species dating”—and you say so. Even- illustrates the ambiguities in the field with regard tually, we set about comparing the entries we have to the concepts and words that we use in everyday on our respective sheets labeled “compassionate.” communications and that also serve as the ground- We then discover that many of our mental images ing for social research. corresponding to that term differ. Let’s take another example of a conception. In the big picture, language and communica- Suppose that I’m going to meet someone named tion work only to the extent that you and I have Pat, whom you already know. I ask you what Pat considerable overlap in the kinds of entries we is like. Now suppose that you’ve seen Pat help lost have on our corresponding mental file sheets. The children find their parents and put a tiny bird back similarities we have on those sheets represent the in its nest. Pat got you to take turkeys to poor fami- agreements existing in our society. As we grow up, lies on Thanksgiving and to visit a children’s hos- we’re told approximately the same thing when pital on Christmas. You’ve seen Pat weep through we’re first introduced to a particular term, though a movie about a mother overcoming adversities to our nationality, gender, race, ethnicity, region, save and protect her child. As you search through language, or other cultural factors may shade our your mental files, you may find all or most of those understanding of concepts. phenomena recorded on a single sheet labeled “compassionate.” You look over the other entries Dictionaries formalize the agreements our on the page, and you find they seem to provide society has about such terms. Each of us, then, an accurate description of Pat. So you say, “Pat is shapes his or her mental images to correspond with compassionate.” such agreements. But because all of us have differ- ent experiences and observations, no two people Now I leaf through my own mental file drawer end up with exactly the same set of entries on any until I find a sheet marked “compassionate.” I then sheet in their file systems. If we want to measure look over the things written on my sheet, and I say, “prejudice” or “compassion,” we must first stipulate “Oh, that’s nice.” I now feel I know what Pat is like, what, exactly, counts as prejudice or compassion but my expectations reflect the entries on my file for our purposes. sheet, not yours. Later, when I meet Pat, I happen to find that my own experiences correspond to the Returning to the assertion made at the outset of entries I have on my “compassionate” file sheet, this chapter, we can measure anything that’s real. and I say that you sure were right. We can measure, for example, whether Pat actually puts the little bird back in its nest, visits the hospital But suppose my observations of Pat contradict on Christmas, weeps at the movie, or refuses to the things I have on my file sheet. I tell you that contribute to saving the whales. All of those be- I don’t think Pat is very compassionate, and we haviors exist, so we can measure them. But is Pat begin to compare notes. really compassionate? We can’t answer that ques- tion; we can’t measure compassion in any objective You say, “I once saw Pat weep through a movie sense, because compassion doesn’t exist in the way about a mother overcoming adversity to save and that those things I just described exist. Compas- protect her child.” I look at my “compassionate sion exists only in the form of the agreements we sheet” and can’t find anything like that. Looking have about how to use the term in communicating elsewhere in my file, I locate that sort of phenom- about things that are real. enon on a sheet labeled “sentimental.” I retort, “That’s not compassion. That’s just sentimentality.” Concepts as Constructs To further strengthen my case, I tell you that If you recall the discussions of postmodernism in I saw Pat refuse to give money to an organiza- Chapter 1, you’ll recognize that some people would tion dedicated to saving whales from extinction. object to the degree of “reality” I’ve allowed in “That represents a lack of compassion,” I argue. the preceding comments. Did Pat “really” visit the You search through your files and find saving the whales on two sheets—”environmental activism”

Measuring Anything That Exists ■ 129 TABLE 5-1 What Social Scientists Measure Examples Direct observables Physical characteristics (sex,height,skin color) of a person Indirect observables being observed and/or interviewed Constructs Characteristics of a person as indicated by answers given in a self-administered questionnaire Level of alienation,as measured by a scale that is created by combining several direct and/or indirect observables hospital on Christmas? Does the hospital “really” them, and the conceptions of all those who have exist? Does Christmas? Though we aren’t going ever used these terms. They cannot be observed to be radically postmodern in this chapter, I think directly or indirectly, because they don’t exist. We you’ll recognize the importance of an intellectually made them up. tough view of what’s real and what’s not. (When the intellectual going gets tough, the tough become To summarize, concepts are constructs derived social scientists.) by mutual agreement from mental images (con- ceptions). Our conceptions summarize collections of In this context, Abraham Kaplan (1964) distin- seemingly related observations and experiences. guishes three classes of things that scientists mea- Although the observations and experiences are sure. The first class is direct observables: those things real, at least subjectively, conceptions, and the con- we can observe rather simply and directly, like the cepts derived from them, are only mental creations. color of an apple or the check mark on a question- The terms associated with concepts are merely naire. The second class, indirect observables, require devices created for the purposes of filing and com- “relatively more subtle, complex, or indirect obser- munication. A term such as prejudice is, objectively vations” (1964: 55). We note a person’s check mark speaking, only a collection of letters. It has no in- beside “female” in a questionnaire and have indi- trinsic reality beyond that. Is has only the meaning rectly observed that person’s gender. History books we agree to give it. See “A Concept in Search of a or minutes of corporate board meetings provide Label” for one example of such an agreement. indirect observations of past social actions. Finally, the third class of observables consists of constructs— Usually, however, we fall into the trap of be- theoretical creations that are based on observations lieving that terms for constructs do have intrinsic but that cannot be observed directly or indirectly. meaning, that they name real entities in the world. A good example is intelligence quotient, or IQ. It is That danger seems to grow stronger when we be- constructed mathematically from observations of gin to take terms seriously and attempt to use them the answers given to a large number of questions precisely. Further, the danger is all the greater in on an IQ test. No one can directly or indirectly the presence of experts who appear to know more observe IQ. It is no more a “real” characteristic of than we do about what the terms really mean: It’s people than is compassion or prejudice. See easy to yield to authority in such a situation. Table 5-1 for more examples of what social scien- tists measure. Once we assume that terms like prejudice and compassion have real meanings, we begin the Kaplan (1964: 49) defines concept as a “family tortured task of discovering what those real mean- of conceptions.” A concept is, as Kaplan notes, a ings are and what constitutes a genuine measure- construct, something we create. Concepts such as ment of them. Regarding constructs as real is called compassion and prejudice are constructs created reification. The reification of concepts in day-to- from your conception of them, my conception of day life is quite common. In science, we want to be quite clear about what it is we are actually

130 ■ Chapter 5: Conceptualization, Operationalization, and Measurement A Concept in Search of a Label use.If they need to get river water up a hill to irrigate fields,he uses his knowledge of hydrology to show them how they can do it.In retrospect, In the late 1950s,in the heat of the Cold War between the United he was an early model for participatory action research,which we’ll States and the Communist bloc,American foreign policy came under examine in Chapter 10 of this book. criticism for being sometimes arrogant and thoughtless regarding the cultures and concerns of other countries—especially the recipients So,why was Atkins called“the ugly American”? The authors of U.S.aid.The image of arrogant and thoughtless Americans abroad used the term as a somewhat ironic reference to the main character’s became an entrenched and vivid image that required a label.From the homeliness—that’s it.The public’s misappropriation of the term points late 1950s to this day,a rude and insensitive American abroad has been to the strong societal need to name a concept that was very real for known as an“ugly American.” many people. This term was taken from a 1958 political novel of the same name, Each of the concepts studied by social researchers began as a by William J.Lederer and Eugene Burdick.Ironically,however,the mean- mental image,ultimately requiring a label to allow communication ing of the term was completely perverted.In the book,Homer Atkins is about it,just as the concept of the boorish,bossy American found“ugly the hero—a retired engineer,volunteering to help villagers in a country American”as its label. strikingly similar to Vietnam.Rather than pushing the villagers around and superimposing his will,he is a thoughtful and considerate listener. Source: William Lederer and Eugene Burdick,The Ugly American (New York: When the local people discuss problems and possible solutions,he looks Norton, 1958). for ways his engineering training and lifetime of experience can be of measuring, but this aim brings a pitfall with it. Set- Conceptualization tling on the “best” way of measuring a variable in a particular study may imply that we’ve discovered As we’ve seen, day-to-day communication usu- the “real” meaning of the concept involved. In fact, ally occurs through a system of vague and general concepts have no real, true, or objective mean- agreements about the use of terms. Although you ings— only those we agree are best for a particular and I do not agree completely about the use of the purpose. term compassionate, I’m probably safe in assum- ing that Pat won’t pull the wings off flies. A wide Does this discussion imply that compassion, range of misunderstandings and conflict—from the prejudice, and similar constructs can’t be mea- interpersonal to the international—is the price we sured? Interestingly, the answer is no. (And a pay for our imprecision, but somehow we muddle good thing, too, or a lot of us social researcher through. Science, however, aims at more than types would be out of work.) I’ve said that we can muddling; it cannot operate in a context of such measure anything that’s real. Constructs aren’t real imprecision. in the way that trees are real, but they do have another important virtue: They are useful. That is, The process through which we specify what they help us organize, communicate about, and we mean when we use particular terms in research understand things that are real. They help us make is called conceptualization. Suppose we want to predictions about real things. Some of those predic- find out, for example, whether women are more tions even turn out to be true. Constructs can work compassionate than men. I suspect many people this way because, although not real or observable assume this is the case, but it might be interest- in themselves, they have a definite relationship to ing to find out if it’s really so. We can’t meaning- things that are real and observable. The bridge from fully study the question, let alone agree on the direct and indirect observables to useful constructs answer, without some working agreements about is the process called conceptualization. the meaning of compassion. They are “working”

Indicators and Dimensions ■ 131 agreements in the sense that they allow us to work much. And yet, the group’s literature often speaks on the question. We don’t need to agree or even of their compassion for others. You want to explore pretend to agree that a particular specification is this seeming paradox. ultimately the best one. To pursue this research interest, you might Conceptualization, then, produces a specific, arrange to interact with cult members, getting to agreed-on meaning for a concept for the purposes know them and learning more about their views. of research. This process of specifying exact mean- You could tell them you were a social researcher ing involves describing the indicators we’ll be using interested in learning about their group, or perhaps to measure our concept and the different aspects of you would just express an interest in learning the concept, called dimensions. more, without saying why. Indicators and Dimensions In the course of your conversations with group members and perhaps attendance of religious ser- Conceptualization gives definite meaning to a con- vices, you would put yourself in situations where cept by specifying one or more indicators of what you could come to understand what the cult mem- we have in mind. An indicator is a sign of the bers mean by compassion. You might learn, for ex- presence or absence of the concept we’re studying. ample, that members of the group were so deeply Here’s an example. concerned about sinners burning in hell that they were willing to be aggressive, even violent, to make We might agree that visiting children’s hospitals people change their sinful ways. Within their own during Christmas and Hanukkah is an indicator of paradigm, then, cult members would see beating compassion. Putting little birds back in their nests up gays, prostitutes, and abortion doctors as acts of might be agreed on as another indicator, and so compassion. forth. If the unit of analysis for our study is the in- dividual person, we can then observe the presence Social researchers focus their attention on or absence of each indicator for each person under the meanings that the people under study give to study. Going beyond that, we can add up the num- words and actions. Doing so can often clarify the ber of indicators of compassion observed for each behaviors observed: At least now you understand individual. We might agree on ten specific indica- how the cult can see violent acts as compassionate. tors, for example, and find six present in our study On the other hand, paying attention to what words of Pat, three for John, nine for Mary, and so forth. and actions mean to the people under study almost always complicates the concepts researchers are Returning to our question about whether men interested in. (We’ll return to this issue when we or women are more compassionate, we might discuss the validity of measures, toward the end of calculate that the women we studied displayed an this chapter.) average of 6.5 indicators of compassion, the men an average of 3.2. On the basis of our quantitative Whenever we take our concepts seriously and analysis of group difference, we might therefore set about specifying what we mean by them, we conclude that women are, on the whole, more discover disagreements and inconsistencies. Not compassionate than men. only do you and I disagree, but each of us is likely to find a good deal of muddiness within our own Usually, though, it’s not that simple. Imagine mental images. If you take a moment to look at you’re interested in understanding a small fun- what you mean by compassion, you’ll probably damentalist religious cult, particularly their harsh find that your image contains several kinds of views on various groups: gays, nonbelievers, femi- nists, and others. In fact, they suggest that anyone indicator An observation that we choose to con- who refuses to join their group and abide by its sider as a reflection of a variable we wish to study. teachings will “burn in hell.” In the context of your Thus, for example, attending religious services might interest in compassion, they don’t seem to have be considered an indicator of religiosity.

132 ■ Chapter 5: Conceptualization, Operationalization, and Measurement compassion. That is, the entries on your mental file Sometimes conceptualization aimed at identi- sheet can be combined into groups and subgroups, fying different dimensions of a variable leads to a say, compassion toward friends, co-religionists, different kind of distinction. We may conclude that humans, and birds. You may also find several we’ve been using the same word for meaningfully different strategies for making combinations. For distinguishable concepts. In the following example, example, you might group the entries into feelings the researchers find (1) that “violence” is not a and actions. sufficient description of “genocide” and (2) that the concept “genocide” itself comprises several distinct The technical term for such groupings is phenomena. Let’s look at the process they went dimension: a specifiable aspect of a concept. For through to come to this conclusion. instance, we might speak of the “feeling dimen- sion” of compassion and the “action dimension” When Daniel Chirot and Jennifer Edwards of compassion. In a different grouping scheme, attempted to define the concept of “genocide,” we might distinguish “compassion for humans” they found existing assumptions were not precise from “compassion for animals.” Or we might see enough for their purposes: compassion as helping people have what we want for them versus what they want for themselves. The United Nations originally defined it as Still differently, we might distinguish compassion as an attempt to destroy “in whole or in part, a forgiveness from compassion as pity. national, ethnic, racial, or religious group.” If genocide is distinct from other types of vio- Thus, we could subdivide compassion into sev- lence, it requires its own unique explanation. eral clearly defined dimensions. A complete con- ceptualization involves both specifying dimensions (2003: 14) and identifying the various indicators for each. Notice the final comment in this excerpt, as it When Jonathan Jackson (2005: 301) set out to provides an important insight into why research- measure “fear of crime,” he considered seven dif- ers are so careful in specifying the concepts they ferent dimensions: study. If genocide, such as the Holocaust, were simply another example of violence, like assaults • The frequency of worry about becoming a vic- and homicides, then what we know about violence in general might explain genocide. If it differs from tim of three personal crimes and two property other forms of violence, then we may need a differ- crimes in the immediate neighbourhood . . . ent explanation for it. So, the researchers began by suggesting that “genocide” was a concept distinct • Estimates of likelihood of falling victim to each from “violence” for their purposes. crime locally Then, as Chirot and Edwards examined histori- cal instances of genocide, they began concluding • Perceptions of control over the possibility of that the motivations for launching genocidal may- hem differed sufficiently to represent four distinct becoming a victim of each crime locally phenomena that were all called “genocide” (2003: 15 –18). • Perceptions of the seriousness of the conse- 1. Convenience: Sometimes the attempt to eradi- quences of each crime cate a group of people serves a function for the eradicators, such as Julius Caesar’s attempt to • Beliefs about the incidence of each crime eradicate tribes defeated in battle, fearing they would be difficult to rule. Or when gold was locally discovered on Cherokee land in the South- eastern United States in the early nineteenth • Perceptions of the extent of social physical century, the Cherokee were forcibly relocated incivilities in the neighbourhood • Perceptions of community cohesion, including informal social control and trust/social capital dimension A specifiable aspect of a concept. “Reli- giosity,” for example, might be specified in terms of a belief dimension, a ritual dimension, a devotional dimension, a knowledge dimension, and so forth.

Indicators and Dimensions ■ 133 to Oklahoma in an event known as the “Trail codebook and explore some of the ways the re- of Tears,” which ultimately killed as many as searchers have measured various concepts (see the half of those forced to leave. link on this book’s website: http://www.cengage .com/sociology/babbie). 2. Revenge: When the Chinese of Nanking bravely resisted the Japanese invaders in the early years The Interchangeability of World War II, the conquerors felt they had of Indicators been insulted by those they regarded as inferior beings. Tens of thousands were slaughtered in There is another way that the notion of indicators the “Rape of Nanking” in 1937–1938. can help us in our attempts to understand reality by means of “unreal” constructs. Suppose, for the 3. Fear: The ethnic cleansing that recently moment, that you and I have compiled a list of 100 occurred in the former Yugoslavia was at least indicators of compassion and its various dimen- partly motivated by economic competition sions. Suppose further that we disagree widely on and worries that the growing Albanian popula- which indicators give the clearest evidence of com- tion of Kosovo was gaining political strength passion or its absence. If we pretty much agree on through numbers. Similarly, the Hutu attempt some indicators, we could focus our attention on to eradicate the Tutsis of Rwanda grew out those, and we would probably agree on the answer of a fear that returning Tutsi refugees would they provided. We would then be able to say that seize control of the country. Often intergroup some people are more compassionate than others fears such as these grow out of long histories of in some dimension. But suppose we don’t really atrocities, often inflicted in both directions. agree on any of the possible indicators. Surpris- ingly, we can still reach an agreement on whether 4. Purification: The Nazi Holocaust, probably the men or women are the more compassionate. How most publicized case of genocide, was intended we do that has to do with the interchangeability of as a purification of the “Aryan race.” While indicators. Jews were the main target, gypsies, homosexu- als, and many other groups were also included. The logic works like this. If we disagree totally Other examples include the Indonesian on the value of the indicators, one solution would witch-hunt against communists in 1965–1966 be to study all of them. Suppose that women turn and the attempt to eradicate all non-Khmer out to be more compassionate than men on all 100 Cambodians under Pol Pot in the 1970s. indicators— on all the indicators you favor and on all of mine. Then we would be able to agree that No single theory of genocide could explain these women are more compassionate than men, even various forms of mayhem. Indeed, this act of con- though we still disagree on exactly what compas- ceptualization suggests four distinct phenomena, sion means in general. each needing a different set of explanations. The interchangeability of indicators means that Specifying the different dimensions of a con- if several different indicators all represent, to some cept often paves the way for a more sophisticated degree, the same concept, then all of them will be- understanding of what we’re studying. We might have the same way that the concept would behave observe, for example, that women are more com- if it were real and could be observed. Thus, given passionate in terms of feelings, and men more so a basic agreement about what “compassion” is, if in terms of actions— or vice versa. Whichever women are generally more compassionate than turned out to be the case, we would not be able men, we should be able to observe that difference to say whether men or women are really more by using any reasonable measure of compassion. If, compassionate. Our research would have shown on the other hand, women are more compassion- that there is no single answer to the question. That ate than men on some indicators but not on others, alone represents an advance in our understanding of reality. To get a better feel for concepts, variables, and indicators, go to the General Social Survey

134 ■ Chapter 5: Conceptualization, Operationalization, and Measurement we should see if the two sets of indicators represent sion as “plucking feathers off helpless birds” if I different dimensions of compassion. wanted to—but they can be more or less useful. For most purposes, especially communication, You have now seen the fundamental logic of that last definition of compassion would be pretty conceptualization and measurement. The discus- useless. Most nominal definitions represent some sions that follow are mainly refinements and ex- consensus, or convention, about how a particular tensions of what you’ve just read. Before turning to term is to be used. a technical elaboration of measurement, however, we need to fill out the picture of conceptualization An operational definition, as you may re- by looking at some of the ways social researchers member from Chapter 4, specifies precisely how a provide standards, consistency, and commonality concept will be measured—that is, the operations for the meanings of terms. we’ll perform. An operational definition is nomi- nal rather than real, but it has the advantage of Real, Nominal, achieving maximum clarity about what a concept and Operational Definitions means in the context of a given study. In the midst of disagreement and confusion over what a term As we have seen, the design and execution of social “really” means, we can specify a working definition research requires us to clear away the confusion for the purposes of an inquiry. Wishing to examine over concepts and reality. To this end, logicians and socioeconomic status (SES) in a study, for example, scientists have found it useful to distinguish three we may simply specify that we are going to treat kinds of definitions: real, nominal, and operational. SES as a combination of income and educational attainment. In this decision, we rule out other pos- The first of these reflects the reification of sible aspects of SES: occupational status, money in terms. As Carl Hempel cautions, the bank, property, lineage, lifestyle, and so forth. Our findings will then be interesting to the extent A “real” definition, according to traditional that our definition of SES is useful for our purpose. logic, is not a stipulation determining the meaning of some expression but a statement of Creating Conceptual Order the “essential nature” or the “essential attri- butes” of some entity. The notion of essential The clarification of concepts is a continuing pro- nature, however, is so vague as to render this cess in social research. Catherine Marshall and characterization useless for the purposes of Gretchen Rossman (1995: 18) speak of a “concep- rigorous inquiry. tual funnel” through which a researcher’s inter- est becomes increasingly focused. Thus, a general (1952: 6) interest in social activism could narrow to “indi- viduals who are committed to empowerment and In other words, trying to specify the “real” meaning social change” and further focus on discovering of concepts only leads to a quagmire: It mistakes a “what experiences shaped the development of fully construct for a real entity. committed social activists.” This focusing process is inescapably linked to the language we use. The specification of concepts in scientific inquiry depends instead on nominal and opera- In some forms of qualitative research, the tional definitions. A nominal definition is one that clarification of concepts is a key element in the is simply assigned to a term without any claim that collection of data. Suppose you were conducting the definition represents a “real” entity. Nominal interviews and observations in a radical political definitions are arbitrary—I could define compas- group devoted to combating oppression in U.S. society. Imagine how the meaning of oppression specification The process through which concepts would shift as you delved more and more deeply are made more specific. into the members’ experiences and worldviews.

Indicators and Dimensions ■ 135 For example, you might start out thinking of op- of anticipated meanings that can be refined during pression in physical and perhaps economic terms. data collection and interpretation. No one seriously The more you learned about the group, however, believes we can observe life with no preconcep- the more you might appreciate the possibility of tions; for this reason, scientific observers must be psychological oppression. conscious of and explicit about these conceptual starting points. The same point applies even to contexts where meanings might seem more fixed. In the analysis Let’s explore initial conceptualization the way of textual materials, for example, social research- it applies to structured inquiries such as surveys ers sometimes speak of the “hermeneutic circle,” a and experiments. Though specifying nominal cyclical process of ever-deeper understanding. definitions focuses our observational strategy, it does not allow us to observe. As a next step we The understanding of a text takes place must specify exactly what we are going to observe, through a process in which the meaning of how we will do it, and what interpretations we are the separate parts is determined by the global going to place on various possible observations. All meaning of the text as it is anticipated. The these further specifications make up the opera- closer determination of the meaning of the tional definition of the concept. separate parts may eventually change the origi- nally anticipated meaning of the totality, which In the example of socioeconomic status, we again influences the meaning of the separate might decide to ask survey respondents two ques- parts, and so on. tions, corresponding to the decision to measure SES in terms of income and educational attainment: (Kvale 1996: 47) 1. What was your total family income during the Consider the concept “prejudice.” Suppose you past 12 months? needed to write a definition of the term. You might start out thinking about racial/ethnic prejudice. At 2. What is the highest level of school you some point you would realize you should prob- completed? ably allow for gender prejudice, religious prejudice, antigay prejudice, and the like in your definition. To organize our data, we’d probably want Examining each of these specific types of prejudice to specify a system for categorizing the answers would affect your overall understanding of the people give us. For income, we might use catego- general concept. As your general understanding ries such as “under $5,000,” “$5,000 to $10,000,” changed, however, you would likely see each of the and so on. Educational attainment might be simi- individual forms somewhat differently. larly grouped in categories: less than high school, high school, college, graduate degree. Finally, we The continual refinement of concepts occurs would specify the way a person’s responses to these in all social research methods. Often you will find two questions would be combined in creating a yourself refining the meaning of important con- measure of SES. cepts even as you write up your final report. In this way we would create a working and Although conceptualization is a continuing workable definition of SES. Although others might process, it is vital to address it specifically at the disagree with our conceptualization and operation- beginning of any study design, especially rigorously alization, the definition would have one essential structured research designs such as surveys and scientific virtue: It would be absolutely specific and experiments. In a survey, for example, operational- unambiguous. Even if someone disagreed with ization results in a commitment to a specific set of our definition, that person would have a good idea questionnaire items that will represent the concepts how to interpret our research results, because what under study. Without that commitment, the study we meant by SES—reflected in our analyses and could not proceed. conclusions—would be precise and clear. Even in less-structured research methods, Table 5-2 shows the progression of measure- however, it’s important to begin with an initial set ment steps from our vague sense of what a term

136 ■ Chapter 5: Conceptualization, Operationalization, and Measurement TABLE 5-2 Example: Social Class Progression of Measurement Measurement Step What are the different meanings and dimensions of the concept“social class”? Conceptualization For our study, we will define “social class” as representing economic differences: Nominal definition specifically, income. We will measure economic differences via responses to the survey question Operational definition “What was your annual income,before taxes,last year?” The interviewer will ask,“What was your annual income,before taxes,last year?” Measurements in the real world means to specific measurements in a fully struc- ciety’s agreements are clear and stable. Noting that tured scientific study. times of social upheaval and change often present individuals with grave uncertainties about what is An Example of expected of them, Durkheim suggested that such Conceptualization: uncertainties cause confusion, anxiety, and even The Concept of Anomie self-destruction. To describe this societal condition of normlessness, Durkheim chose the term anomie. To bring this discussion of conceptualization in Durkheim did not make this word up. Used in both research together, let’s look briefly at the history German and French, it literally meant “without of a specific social science concept. Researchers law.” The English term anomy had been used for at studying urban riots are often interested in the part least three centuries before Durkheim to mean dis- played by feelings of powerlessness. Social scientists regard for divine law. However, Durkheim created sometimes use the word anomie in this context. the social science concept of anomie. This term was first introduced into social science by Emile Durkheim, the great French sociologist, in In the years that have followed the publica- his classic 1897 study, Suicide. tion of Suicide, social scientists have found anomie a useful concept, and many have expanded on Using only government publications on suicide Durkheim’s use. Robert Merton, in a classic article rates in different regions and countries, Durkheim entitled “Social Structure and Anomie” (1938), produced a work of analytic genius. To determine concluded that anomie results from a disparity be- the effects of religion on suicide, he compared the tween the goals and means prescribed by a society. suicide rates of predominantly Protestant countries Monetary success, for example, is a widely shared with those of predominantly Catholic ones, Prot- goal in our society, yet not all individuals have the estant regions of Catholic countries with Catholic resources to achieve it through acceptable means. regions of Protestant countries, and so forth. An emphasis on the goal itself, Merton suggested, To determine the possible effects of the weather, he produces normlessness, because those denied the compared suicide rates in northern and southern traditional avenues to wealth go about getting it countries and regions, and he examined the differ- through illegitimate means. Merton’s discussion, ent suicide rates across the months and seasons then, could be considered a further conceptualiza- of the year. Thus, he could draw conclusions about tion of the concept of anomie. a supremely individualistic and personal act without having any data about the individuals engaging in it. Although Durkheim originally used the concept of anomie as a characteristic of societies, At a more general level, Durkheim suggested as did Merton after him, other social scientists that suicide also reflects the extent to which a so- have used it to describe individuals. To clarify this distinction, some scholars have chosen to use

Indicators and Dimensions ■ 137 anomie in reference to its original, societal mean- In the half-century following its publication, ing and to use the term anomia in reference to the the Srole scale has become a research staple for individual characteristic. In a given society, then, social scientists. You’ll likely find this particular some individuals experience anomia, and others do operationalization of anomia used in many of the not. Elwin Powell, writing 20 years after Merton, research projects reported in academic journals. provided the following conceptualization of anomia Srole touches on this in the accompanying box, (though using the term anomie) as a characteristic “The Origins of Anomia,” which he prepared for of individuals: this book before his death. When the ends of action become contradic- This abbreviated history of anomie and anomia tory, inaccessible or insignificant, a condition of as social science concepts illustrates several points. anomie arises. Characterized by a general loss First, it’s a good example of the process through of orientation and accompanied by feelings of which general concepts become operationalized “emptiness” and apathy, anomie can be simply measurements. This is not to say that the issue of conceived as meaninglessness. how to operationalize anomie/anomia has been resolved once and for all. Scholars will surely con- (1958: 132) tinue to reconceptualize and reoperationalize these concepts for years to come, continually seeking Powell went on to suggest there were two more-useful measures. distinct kinds of anomia and to examine how the two rose out of different occupational experi- The Srole scale illustrates another important ences to result at times in suicide. In his study, point. Letting conceptualization and operationaliza- however, Powell did not measure anomia per se; tion be open-ended does not necessarily produce he studied the relationship between suicide and anarchy and chaos, as you might expect. Order occupation, making inferences about the two kinds often emerges. For one thing, although we could of anomia. Thus, the study did not provide an define anomia any way we chose—in terms of, operational definition of anomia, only a further say, shoe size—we’re likely to define it in ways not conceptualization. too different from other people’s mental images. If you were to use a really offbeat definition, people Although many researchers have offered opera- would probably ignore you. tional definitions of anomia, one name stands out over all. Two years before Powell’s article appeared, A second source of order is that, as researchers Leo Srole (1956) published a set of questionnaire discover the utility of a particular conceptualization items that he said provided a good measure of and operationalization of a concept, they’re likely to anomia as experienced by individuals. It consists of adopt it, which leads to standardized definitions of five statements that subjects were asked to agree or concepts. Besides the Srole scale, examples include disagree with: IQ tests and a host of demographic and economic measures developed by the U.S. Census Bureau. 1. In spite of what some people say, the lot of the Using such established measures has two advan- average man is getting worse. tages: They have been extensively pretested and debugged, and studies using the same scales can be 2. It’s hardly fair to bring children into the world compared. If you and I do separate studies of two with the way things look for the future. different groups and use the Srole scale, we can compare our two groups on the basis of anomia. 3. Nowadays a person has to live pretty much for today and let tomorrow take care of itself. Social scientists, then, can measure anything that’s real; through conceptualization and opera- 4. These days a person doesn’t really know who tionalization, they can even do a pretty good job he can count on. of measuring things that aren’t. Granting that such concepts as socioeconomic status, prejudice, com- 5. There’s little use writing to public officials passion, and anomia aren’t ultimately real, social because they aren’t really interested in the problems of the average man. (1956: 713)

138 ■ Chapter 5: Conceptualization, Operationalization, and Measurement Text not available due to copyright restrictions scientists can create order in handling them. It is problematic for descriptive research than for ex- an order based on utility, however, not on ultimate planatory research. Before we turn to other aspects truth. of measurement, you’ll need a basic understanding of why this is so (we’ll discuss this point more fully Definitions in Descriptive in Part 4). and Explanatory Studies It’s easy to see the importance of clear and As you’ll recall from Chapter 4, two general pur- precise definitions for descriptive research. If we poses of research are description and explanation. want to describe and report the unemployment The distinction between them has important impli- rate in a city, our definition of being unemployed cations for definition and measurement. If it seems is obviously critical. That definition will depend on that description is simpler than explanation, you our definition of another term: the labor force. If may be surprised to learn that definitions are more it seems patently absurd to regard a three-year-old child as being unemployed, it is because such a child is not considered a member of the labor force. Thus, we might follow the U.S. Census Bureau’s

Operationalization Choices ■ 139 convention and exclude all people under 14 years definitions you would need in order to say, “Forty- of age from the labor force. five percent of the students at this institution are politically conservative.” Like the unemployment This convention alone, however, would not rate, this percentage would depend directly on give us a satisfactory definition, because it would the definition of what is being measured—in this count as unemployed such people as high school case, political conservatism. A different definition students, the retired, the disabled, and homemak- might result in the conclusion “Five percent of the ers. We might follow the census convention further student body are politically conservative.” by defining the labor force as “all persons 14 years of age and over who are employed, looking for Ironically, definitions are less problematic in work, or waiting to be called back to a job from the case of explanatory research. Let’s suppose which they have been laid off or furloughed.” If a we’re interested in explaining political conser- student, homemaker, or retired person is not look- vatism. Why are some people conservative and ing for work, such a person would not be included others not? More specifically, let’s suppose we’re in the labor force. Unemployed people, then, would interested in whether conservatism increases with be those members of the labor force, as defined, age. What if you and I have 25 different opera- who are not employed. tional definitions of conservative, and we can’t agree on which definition is best? As we saw in the But what does “looking for work” mean? Must discussion of indicators, this is not necessarily an a person register with the state employment service insurmountable obstacle to our research. Suppose or go from door to door asking for employment? we found old people to be more conservative than Or would it be sufficient to want a job or be open young people in terms of all 25 definitions. Clearly, to an offer of employment? Conventionally, “look- the exact definition wouldn’t matter much. We ing for work” is defined operationally as saying yes would conclude that old people are generally more in response to an interviewer’s asking “Have you conservative than young people—even though been looking for a job during the past seven days?” we couldn’t agree about exactly what conservative (Seven days is the period most often specified, but means. for some research purposes it might make more sense to shorten or lengthen it.) In practice, explanatory research seldom results in findings quite as unambiguous as this example As you can see, the conclusion of a descrip- suggests; nonetheless, the general pattern is quite tive study about the unemployment rate depends common in actual research. There are consistent directly on how each issue of definition is resolved. patterns of relationships in human social life that Increasing the period during which people are result in consistent research findings. However, counted as looking for work would add more such consistency does not appear in a descriptive unemployed people to the labor force as defined, situation. Changing definitions almost inevitably thereby increasing the reported unemployment results in different descriptive conclusions. “The rate. If we follow another convention and speak of Importance of Variable Names” explores this issue the civilian labor force and the civilian unemploy- in connection with the variable citizen participation. ment rate, we’re excluding military personnel; that, too, increases the reported unemployment rate, be- Operationalization Choices cause military personnel would be employed—by definition. Thus, the descriptive statement that In discussing conceptualization, I frequently have the unemployment rate in a city is 3 percent, or referred to operationalization, for the two are 9 percent, or whatever it might be, depends directly intimately linked. To recap: Conceptualization is on the operational definitions used. the refinement and specification of abstract con- cepts, and operationalization is the development of This example is relatively clear because there are several accepted conventions relating to the labor force and unemployment. Now, consider how difficult it would be to get agreement about the

140 ■ Chapter 5: Conceptualization, Operationalization, and Measurement The Importance of Variable Names Patricia Fisher of the different topics addressed by private citizens at similar meetings; while a third might record the number of local government meeting Graduate School of Planning,University of Tennessee attendees,letters and phone calls received by the mayor and other public officials,and meetings held by special interest groups during a particular Operationalization is one of those things that’s easier said than time period.As skilled researchers,we can readily see that each planner done.It is quite simple to explain to someone the purpose and would be measuring (in a very simplistic fashion) a different dimension importance of operational definitions for variables,and even to describe of citizen participation:extent of citizen participation,issues prompting how operationalization typically takes place.However,until you’ve tried citizen participation,and form of citizen participation.Therefore,the to operationalize a rather complex variable,you may not appreciate some original naming of our variable,citizen participation, which was quite of the subtle difficulties involved.Of considerable importance to the satisfactory from a conceptual point of view,proved inadequate for operationalization effort is the particular name that you have chosen for purposes of operationalization. a variable.Let’s consider an example from the field of Urban Planning. The precise and exact naming of variables is important in A variable of interest to planners is citizen participation. Planners research.It is both essential to and a result of good operationalization. are convinced that participation in the planning process by citizens is Variable names quite often evolve from an iterative process of forming important to the success of plan implementation.Citizen participation a conceptual definition,then an operational definition,then renaming is an aid to planners’understanding of the real and perceived needs of the concept to better match what can or will be measured.This looping a community,and such involvement by citizens tends to enhance their process continues (our example illustrates only one iteration),resulting cooperation with and support for planning efforts.Although many in a gradual refinement of the variable name and its measurement until different conceptual definitions might be offered by different planners, a reasonable fit is obtained.Sometimes the concept of the variable that there would be little misunderstanding over what is meant by citizen you end up with is a bit different from the original one that you started participation.The name of the variable seems adequate. with,but at least you are measuring what you are talking about,if only because you are talking about what you are measuring! However,if we ask different planners to provide very simple operational measures for citizen participation,we are likely to find a Source: From Patricia Fisher,The Importance of Variable Names,Copyright © Patricia variety among their responses that does generate confusion.One planner Fisher.Reprinted by permission. might keep a tally of attendance by private citizens at city commission and other local government meetings;another might maintain a record specific research procedures (operations) that will Let’s suppose you want to measure people’s in- result in empirical observations representing those comes in a study by collecting the information from concepts in the real world. either records or interviews. The highest annual incomes people receive run into the millions of As with the methods of data collection, social dollars, but not many people get that much. Unless researchers have a variety of choices when opera- you’re studying the very rich, it probably won’t add tionalizing a concept. Although the several choices much to your study to keep track of extremely high are intimately interconnected, I’ve separated them categories. Depending on whom you study, you’ll for the sake of discussion. Realize, though, that probably want to establish a highest income cat- operationalization does not proceed through a egory with a much lower floor—maybe $100,000 systematic checklist. or more. Although this decision will lead you to throw together people who earn a trillion dollars Range of Variation a year with paupers earning a mere $100,000, they’ll survive it, and that mixing probably won’t In operationalizing any concept, researchers must hurt your research any, either. The same decision be clear about the range of variation that inter- faces you at the other end of the income spectrum. ests them. The question is, to what extent are In studies of the general U.S. population, a bottom they willing to combine attributes in fairly gross category of $5,000 or less usually works fine. categories?

Operationalization Choices ■ 141 In studies of attitudes and orientations, the value of higher education, you could probably stop question of range of variation has another dimen- at no value and not worry about those who might sion. Unless you’re careful, you may end up mea- consider higher education dangerous to students’ suring only half an attitude without really meaning health. (If you were studying students, however . . .) to. Here’s an example of what I mean. Variations between the Extremes Suppose you’re interested in people’s attitudes toward expanding the use of nuclear power gen- Degree of precision is a second consideration in erators. You’d anticipate that some people consider operationalizing variables. What it boils down to nuclear power the greatest thing since the wheel, is how fine you will make distinctions among the whereas other people have absolutely no inter- various possible attributes composing a given vari- est in it. Given that anticipation, it would seem to able. Does it matter for your purposes whether a make sense to ask people how much they favor person is 17 or 18 years old, or could you con- expanding the use of nuclear energy and to give duct your inquiry by throwing them together in them answer categories ranging from “Favor it very a group labeled 10 to 19 years old? Don’t answer much” to “Don’t favor it at all.” too quickly. If you wanted to study rates of voter registration and participation, you’d definitely want This operationalization, however, conceals half to know whether the people you studied were the attitudinal spectrum regarding nuclear energy. old enough to vote. In general, if you’re going to Many people have feelings that go beyond simply measure age, you must look at the purpose and not favoring it: They are, with greater or lesser procedures of your study and decide whether fine degrees of intensity, actively opposed to it. In this or gross differences in age are important to you. instance, there is considerable variation on the left In a survey, you’ll need to make these decisions in side of zero. Some oppose it a little, some quite a order to design an appropriate questionnaire. In bit, and others a great deal. To measure the full the case of in-depth interviews, these decisions will range of variation, then, you’d want to operational- condition the extent to which you probe for details. ize attitudes toward nuclear energy with a range from favoring it very much, through no feelings The same thing applies to other variables. If one way or the other, to opposing it very much. you measure political affiliation, will it matter to your inquiry whether a person is a conservative This consideration applies to many of the Democrat rather than a liberal Democrat, or will it variables social scientists study. Virtually any public be sufficient to know the party? In measuring reli- issue involves both support and opposition, each in gious affiliation, is it enough to know that a person is varying degrees. In measuring religiosity, people are Protestant, or do you need to know the denomina- not just more or less religious; some are positively tion? Do you simply need to know whether or not antireligious. Political orientations range from very a person is married, or will it make a difference to liberal to very conservative, and depending on the know if he or she has never married or is sepa- people you’re studying, you may want to allow for rated, widowed, or divorced? radicals on one or both ends. There is, of course, no general answer to such The point is not that you must measure the questions. The answers come out of the purpose of full range of variation in every case. You should, a given study, or why we are making a particular however, consider whether you need to, given measurement. I can give you a useful guideline, your particular research purpose. If the differ- though. Whenever you’re not sure how much ence between not religious and antireligious isn’t detail to pursue in a measurement, get too much relevant to your research, forget it. Someone has rather than too little. When a subject in an in- defined pragmatism as “any difference that makes depth interview volunteers that she is 37 years old, no difference is no difference.” Be pragmatic. record “37” in your notes, not “in her thirties.” When you’re analyzing the data, you can always Finally, decisions on the range of variation combine precise attributes into more general should be governed by the expected distribution of attributes among the subjects of the study. In a study of college professors’ attitudes toward the

142 ■ Chapter 5: Conceptualization, Operationalization, and Measurement categories, but you can never separate any varia- dimensions of the variables that interest you, you tions you lumped together during observation and may have another choice: a mathematical-logical measurement. one. That is, you may need to decide what level of measurement to use. To discuss this point, we need A Note on Dimensions to take another look at attributes and their relation- ship to variables. We’ve already discussed dimensions as a charac- teristic of concepts. When researchers get down Defining Variables and Attributes to the business of creating operational measures of variables, they often discover— or worse, never An attribute, you’ll recall, is a characteristic or qual- notice—that they’re not exactly clear about which ity of something. Female is an example. So is old or dimensions of a variable they’re really interested in. student. Variables, on the other hand, are logical sets Here’s an example. of attributes. Thus, gender is a variable composed of the attributes female and male. What could be Let’s suppose you’re studying people’s attitudes simpler? toward government, and you want to include an examination of how people feel about corruption. Actually, some would insist that sex is the Here are just a few of the dimensions you might proper name of the variable composed of the examine: physical attributes female and male, while gender is a social-identity and behavioral variable composed • Do people think there is corruption in of the attributes feminine and masculine. In most social science research, biological differences are government? less important than how people treat those dif- ferences in terms of their own behavior as well as • How much corruption do they think there is? their expectations and treatment of others. Despite • How certain are they in their judgment of how this distinction, the two terms are commonly used interchangeably, both in everyday language and by much corruption there is? social scientists. As long as the terms are defined for the purposes of research, there should be little • How do they feel about corruption in govern- confusion. ment as a problem in society? In any case, the conceptualization and op- erationalization processes can be seen as the • What do they think causes it? specification of variables and the attributes compos- • Do they think it’s inevitable? ing them. Thus, in the context of a study of unem- • What do they feel should be done about it? ployment, employment status is a variable having the • What are they willing to do personally to elimi- attributes employed and unemployed; the list of attri- butes could also be expanded to include the other nate corruption in government? possibilities discussed earlier, such as homemaker. • How certain are they that they would be will- Every variable must have two important quali- ties. First, the attributes composing it should be ing to do what they say they would do? exhaustive. For the variable to have any utility in research, we must be able to classify every observa- The list could go on and on—how people feel tion in terms of one of the attributes composing the about corruption in government has many dimen- variable. We’ll run into trouble if we conceptualize sions. It’s essential to be clear about which ones the variable political party affiliation in terms of the are important in our inquiry; otherwise, you may attributes Republican and Democrat, because some measure how people feel about corruption when of the people we set out to study will identify with you really wanted to know how much they think there is, or vice versa. Once you’ve determined how you’re going to collect your data (for example, survey, field research) and have decided on the relevant range of variation, the degree of precision needed be- tween the extremes of variation, and the specific

Operationalization Choices ■ 143 the Green Party, the Reform Party, or some other we’ve asked a large gathering of people to stand organization, and some (often a large percentage) together in groups according to the states in which will tell us they have no party affiliation. We could they were born: all those born in Vermont in one make the list of attributes exhaustive by adding group, those born in California in another, and so other and no affiliation. Whatever we do, we must be forth. The variable is place of birth; the attributes able to classify every observation. are born in California, born in Vermont, and so on. All the people standing in a given group have at least At the same time, attributes composing a vari- one thing in common and differ from the people able must be mutually exclusive. Every observa- in all other groups in that same regard. Where the tion must be able to be classified in terms of one individual groups form, how close they are to one and only one attribute. For example, we need to another, or how the groups are arranged in the define employed and unemployed in such a way that room is irrelevant. All that matters is that all the nobody can be both at the same time. That means members of a given group share the same state of being able to classify the person who is working birth and that each group has a different shared at a job but is also looking for work. (We might state of birth. All we can say about two people in run across a fully employed mud wrestler who is terms of a nominal variable is that they are either looking for the glamour and excitement of being a the same or different. social researcher.) In this case, we might define the attributes so that employed takes precedence over Ordinal Measures unemployed, and anyone working at a job is em- ployed regardless of whether he or she is looking Variables with attributes we can logically rank- for something better. order are ordinal measures. The different attributes of ordinal variables represent relatively more or less Levels of Measurement of the variable. Variables of this type are social class, conservatism, alienation, prejudice, intellectual sophisti- Attributes operationalized as mutually exclusive cation, and the like. In addition to saying whether and exhaustive may be related in other ways two people are the same or different in terms of as well. For example, the attributes composing an ordinal variable, you can also say one is “more” variables may represent different levels of measure- than the other—that is, more conservative, more ment. In this section, we’ll examine four levels of religious, older, and so forth. measurement: nominal, ordinal, interval, and ratio. In the physical sciences, hardness is the most Nominal Measures frequently cited example of an ordinal measure. We may say that one material (for example, Variables whose attributes have only the charac- diamond) is harder than another (say, glass) if the teristics of exhaustiveness and mutual exclusive- former can scratch the latter and not vice versa. ness are nominal measures. Examples include gender, By attempting to scratch various materials with religious affiliation, political party affiliation, birthplace, other materials, we might eventually be able to college major, and hair color. Although the attributes arrange several materials in a row, ranging from composing each of these variables—as male and fe- male compose the variable gender—are distinct from nominal measure A variable whose attributes one another (and exhaust the possibilities of gender have only the characteristics of exhaustiveness among people), they have no additional structures. and mutual exclusiveness. In other words, a level Nominal measures merely offer names or labels of measurement describing a variable that has at- for characteristics. tributes that are merely different, as distinguished from ordinal, interval, or ratio measures. Gender is an Imagine a group of people characterized in example of a nominal measure. terms of one such variable and physically grouped by the applicable attributes. For example, say

144 ■ Chapter 5: Conceptualization, Operationalization, and Measurement the softest to the hardest. We could never say how does have meaning. Such variables are interval hard a given material was in absolute terms; we measures. For these, the logical distance between could only say how hard in relative terms—which attributes can be expressed in meaningful standard materials it is harder than and which softer than. intervals. Let’s pursue the earlier example of grouping For example, in the Fahrenheit temperature the people at a social gathering. This time imagine scale, the difference, or distance, between that we ask all the people who have graduated 80 degrees and 90 degrees is the same as that be- from college to stand in one group, all those with tween 40 degrees and 50 degrees. However, only a high school diploma to stand in another 80 degrees Fahrenheit is not twice as hot as group, and all those who have not graduated from 40 degrees, because the zero point in the Fahren- high school to stand in a third group. This manner heit scale is arbitrary; zero degrees does not really of grouping people satisfies the requirements for mean lack of heat. Similarly, minus 30 degrees on exhaustiveness and mutual exclusiveness discussed this scale doesn’t represent 30 degrees less than no earlier. In addition, however, we might logically heat. (This is true for the Celsius scale as well. In arrange the three groups in terms of the relative contrast, the Kelvin scale is based on an absolute amount of formal education (the shared attribute) zero, which does mean a complete lack of heat.) each had. We might arrange the three groups in a row, ranging from most to least formal educa- About the only interval measures commonly tion. This arrangement would provide a physical used in social science research are constructed representation of an ordinal measure. If we knew measures such as standardized intelligence tests which groups two individuals were in, we could that have been more or less accepted. The inter- determine that one had more, less, or the same val separating IQ scores of 100 and 110 may be formal education as the other. regarded as the same as the interval separating scores of 110 and 120 by virtue of the distribution Notice in this example that it is irrelevant how of observed scores obtained by many thousands of close or far apart the educational groups are from people who have taken the tests over the years. But one another. The college and high school groups it would be incorrect to infer that someone with an might be 5 feet apart, and the less-than-high- IQ of 150 is 50 percent more intelligent than some- school group 500 feet farther down the line. These one with an IQ of 100. (A person who received a actual distances don’t have any meaning. The high score of 0 on a standard IQ test could not be re- school group, however, should be between the less- garded, strictly speaking, as having no intelligence, than-high-school group and the college group, or although we might feel he or she was unsuited to else the rank order will be incorrect. be a college professor or even a college student. But perhaps a dean . . . ?) Interval Measures When comparing two people in terms of an For the attributes composing some variables, interval variable, we can say they are different from the actual distance separating those attributes each other (nominal), and that one is more than the other (ordinal). In addition, we can say “how ordinal measure A level of measurement describ- much” more. ing a variable with attributes we can rank-order along some dimension. An example is socioeconomic Ratio Measures status as composed of the attributes high, medium, low. Most of the social science variables meeting the interval measure A level of measurement describ- minimum requirements for interval measures also ing a variable whose attributes are rank-ordered and meet the requirements for ratio measures. In ratio have equal distances between adjacent attributes. measures, the attributes composing a variable, be- The Fahrenheit temperature scale is an example of sides having all the structural characteristics men- this, because the distance between 17 and 18 is the tioned previously, are based on a true zero point. same as that between 89 and 90.

Operationalization Choices ■ 145 The Kelvin temperature scale is one such measure. I should draw your attention to some of the practi- Examples from social science research include age, cal implications of the differences that have been length of residence in a given place, number of organiza- distinguished. These implications appear primarily tions belonged to, number of times attending religious in the analysis of data (discussed in Part 4), but you services during a particular period of time, number of need to anticipate such implications when you’re times married, and number of Arab friends. structuring any research project. Returning to the illustration of methodological Certain quantitative analysis techniques require party games, we might ask a gathering of people variables that meet certain minimum levels of to group themselves by age. All the one-year-olds measurement. To the extent that the variables to would stand (or sit or lie) together, the two-year- be examined in a research project are limited to a olds together, the three-year-olds, and so forth. The particular level of measurement—say, ordinal— fact that members of a single group share the same you should plan your analytic techniques ac- age and that each different group has a different cordingly. More precisely, you should anticipate shared age satisfies the minimum requirements for drawing research conclusions appropriate to the a nominal measure. Arranging the several groups levels of measurement used in your variables. For in a line from youngest to oldest meets the addi- example, you might reasonably plan to determine tional requirements of an ordinal measure and lets and report the mean age of a population under us determine if one person is older than, younger study (add up all the individual ages and divide by than, or the same age as another. If we space the the number of people), but you should not plan to groups equally far apart, we satisfy the additional report the mean religious affiliation, because that requirements of an interval measure and can is a nominal variable, and the mean requires ratio- say how much older one person is than another. level data. (You could report the modal—the most Finally, because one of the attributes included common—religious affiliation.) in age represents a true zero (babies carried by women about to give birth), the phalanx of hapless At the same time, you can treat some variables party goers also meets the requirements of a ratio as representing different levels of measurement. measure, permitting us to say that one person is Ratio measures are the highest level, descending twice as old as another. (Remember this in case through interval and ordinal to nominal, the lowest you’re asked about it in a workbook assignment.) level of measurement. A variable representing a Another example of a ratio measure is income, higher level of measurement—say, ratio—can also which extends from an absolute zero to approxi- be treated as representing a lower level of measure- mately infinity, if you happen to be the founder of ment—say, ordinal. Recall, for example, that age Microsoft. is a ratio measure. If you wished to examine only the relationship between age and some ordinal- Comparing two people in terms of a ratio level variable—say, self-perceived religiosity: high, variable, then, allows us to conclude (1) whether medium, and low—you might choose to treat they are different (or the same), (2) whether one age as an ordinal-level variable as well. You might is more than the other, (3) how much they dif- characterize the subjects of your study as being fer, and (4) what the ratio of one to another is. young, middle-aged, and old, specifying what age Figure 5-1 summarizes this discussion by pre- range composed each of these groupings. Finally, senting a graphic illustration of the four levels of measurement. Implications of Levels of Measurement ratio measure A level of measurement describing a variable with attributes that have all the qualities Because it’s unlikely that you’ll undertake the of nominal, ordinal, and interval measures and in physical grouping of people just described (try it addition are based on a “true zero” point. Age is an once, and you won’t be invited to many parties), example of a ratio measure.

146 ■ Chapter 5: Conceptualization, Operationalization, and Measurement FIGURE 5-1 Levels of Measurement. Often you can choose among different levels of measurement—nominal, ordinal, interval, or ratio—carrying progressively more amounts of information. age might be used as a nominal-level variable for different levels of measurement, the study should certain research purposes. People might be grouped be designed to achieve the highest level required. as being born during the Depression or not. An- For example, if the subjects in a study are asked other nominal measurement, based on birth date their exact ages, they can later be organized into rather than just age, would be the grouping of ordinal or nominal groupings. people by astrological signs. Again, you need not necessarily measure The level of measurement you’ll seek, then, is variables at their highest level of measurement. determined by the analytic uses you’ve planned for If you’re sure to have no need for ages of people a given variable, keeping in mind that some vari- at higher than the ordinal level of measurement, ables are inherently limited to a certain level. If a you may simply ask people to indicate their age variable is to be used in a variety of ways, requiring range, such as 20 to 29, 30 to 39, and so forth. In

Operationalization Choices ■ 147 a study of the wealth of corporations, rather than cut it, gender usually turns out to be a matter of seek more precise information, you may use Dun male or female: a nominal-level variable that can & Bradstreet ratings to rank corporations. When- be measured by a single observation—either ever your research purposes are not altogether by looking (well, not always) or by asking a ques- clear, however, seek the highest level of measure- tion (usually). In a study involving the size of ment possible. As we’ve discussed, although ratio families, you’ll want to think about adopted and measures can later be reduced to ordinal ones, you foster children, as well as blended families, but it’s cannot convert an ordinal measure to a ratio one. usually pretty easy to find out how many chil- More generally, you cannot convert a lower-level dren a family has. For most research purposes, the measure to a higher-level one. That is a one-way resident population of a country is the resident street worth remembering. population of that country—you can look it up in an almanac and know the answer. A great many Typically a research project will tap variables variables, then, have obvious single indicators. If at different levels of measurement. For example, you can get one piece of information, you have William Bielby and Denise Bielby (1999) set out what you need. to examine the world of film and television, using a nomothetic, longitudinal approach (take a mo- Sometimes, however, there is no single indica- ment to remind yourself what that means). In tor that will give you the measure of a variable you what they referred to as the “culture industry,” really want. As discussed earlier in this chapter, the authors found that reputation (an ordinal vari- many concepts are subject to varying interpreta- able) is the best predictor of screenwriters’ future tions—each with several possible indicators. In productivity. More interestingly, they found that these cases, you’ll want to make several observa- screenwriters who were represented by “core” tions for a given variable. You can then combine (or elite) agencies were not only far more likely to the several pieces of information you’ve collected, find jobs (a nominal variable), but also jobs that creating a composite measurement of the vari- paid more (a ratio variable). In other words, the able in question. Chapter 6 is devoted to ways of researchers found that agencies’ reputations (ordi- doing that, so here let’s just discuss one simple nal) was a key independent variable for predicting illustration. a screenwriter’s career success. The researchers also found that being older (ratio), female (nominal), an Consider the concept “college performance.” ethnic minority (nominal), and having more years All of us have noticed that some students perform of experience (ratio) were disadvantageous for a well in college courses and others don’t. In studying writer’s career. On the other hand, higher earnings these differences, we might ask what characteristics from previous years (measured in ordinal catego- and experiences are related to high levels of per- ries) led to more success in the future. In Bielby formance (many researchers have done just that). and Bielby’s terms, “success breeds success” How should we measure overall performance? (1999: 80). Each grade in any single course is a potential indicator of college performance, but it also may Single or Multiple Indicators not typify the student’s general performance. The solution to this problem is so firmly established With so many alternatives for operationalizing so- that it is, of course, obvious: the grade point aver- cial science variables, you may find yourself worry- age (GPA). We assign numerical scores to each ing about making the right choices. To counter this letter grade, total the points earned by a given feeling, let me add a momentary dash of certainty student, and divide by the number of courses and stability. taken, thus obtaining a composite measure. (If the courses vary in number of credits, we adjust Many social research variables have fairly obvi- the point values accordingly.) Creating such ous, straightforward measures. No matter how you composite measures in social research is often appropriate.

148 ■ Chapter 5: Conceptualization, Operationalization, and Measurement Some Illustrations they belong to. Calculate whether women of Operationalization Choices or men are more likely to belong to those that seem to reflect compassionate feel- To bring together all the operationalization choices ings. To account for the case in which one available to the social researcher and to show the group belongs to more organizations than potential in those possibilities, let’s look at some the other does, do this: For each person of the distinct ways you might address vari- you study, calculate the percentage of his or ous research problems. The alternative ways of her organizational memberships that reflect operationalizing the variables in each case should compassion. See if men or women have a demonstrate the opportunities that social research higher average percentage. can present to our ingenuity and imaginations. To simplify matters, I have not attempted to describe 2. Are sociology students or accounting students all the research conditions that would make one better informed about world affairs? alternative superior to the others, though in a given situation they would not all be equally a. Prepare a short quiz on world affairs and appropriate. arrange to administer it to the students in a sociology class and in an accounting class at Here are specific research questions, then, and a comparable level. If you want to compare some of the ways you could address them. We’ll sociology and accounting majors, be sure to begin with an example discussed earlier in the ask students what they are majoring in. chapter. It has the added advantage that one of the variables is straightforward to operationalize. b. Get the instructor of a course in world affairs to give you the average grades of 1. Are women more compassionate than men? sociology and accounting students in the course. a. Select a group of subjects for study, with equal numbers of men and women. Pres- c. Take a petition to sociology and accounting ent them with hypothetical situations that classes that urges that “the United Nations involve someone’s being in trouble. Ask headquarters be moved to New York City.” them what they would do if they were Keep a count of how many in each class confronted with that situation. What would sign the petition and how many inform you they do, for example, if they came across a that the UN headquarters is already located small child who was lost and crying for his in New York City. or her parents? Consider any answer that involves helping or comforting the child as 3. Do people consider New York or California the an indicator of compassion. See whether better place to live? men or women are more likely to indicate a. Consulting the Statistical Abstract of the United they would be compassionate. States or a similar publication, check the migration rates into and out of each state. b. Set up an experiment in which you pay a See if you can find the numbers moving small child to pretend that he or she is lost. directly from New York to California and Put the child to work on a busy sidewalk vice versa. and observe whether men or women are b. The national polling companies—Gallup, more likely to offer assistance. Also be Harris, Roper, and so forth— often ask sure to count the total number of men and people what they consider the best state women who walk by, because there may to live in. Look up some recent results be more of one than the other. If that’s the in the library or through your local case, simply calculate the percentage of men newspaper. and the percentage of women who help. c. Compare suicide rates in the two states. c. Select a sample of people and do a survey 4. Who are the most popular instructors on your in which you ask them what organizations campus, those in the social sciences, the natural sciences, or the humanities?

Operationalization Choices ■ 149 Measuring College Satisfaction Early in this chapter,we considered“college satisfaction”as an How would you measure each of these dimensions? One method example of a concept people often talk about casually.To study such would be to ask a sample of students,“How would you rate your level a concept,however,we need to engage in the processes of conceptual- of satisfaction with each of the following?” giving them a list of ization and operationalization.I’ll sketch out the process briefly,then you items similar to those listed here and providing a set of categories might try your hand at expanding on my comments. for them to use (such as very satisfied,satisfied,dissatisfied,very dissatisfied). What are some of the dimensions of college satisfaction? Here are a few to get you started,but feel free to add your own: But suppose you didn’t have the time and/or money to conduct a survey and were interested in comparing overall levels of satisfac- Academic quality:faculty,courses,majors tion at several schools.What data about schools (the unit of analysis) Physical facilities:classrooms,dorms,cafeteria,grounds might give you the answer you were interested in? Retention rates Athletics and extracurricular activities might be one general indicator.Can you think of others? Costs and availability of financial aid Sociability of students,faculty,staff Security,crime on campus a. If your school has a provision for student a questionnaire—these two processes continue evaluation of instructors, review some re- throughout any research project, even if the data cent results and compute the average rating have been collected in a structured mass survey. of each of the three groups. As we’ve seen, in less-structured methods such as field research, the identification and specification of b. Begin visiting the introductory courses relevant concepts is inseparable from the ongoing given in each group of disciplines and mea- process of observation. sure the attendance rate of each class. Imagine, for example, that you’re doing a c. In December, select a group of faculty in qualitative, observational study of members of each of the three divisions and ask them a new religious cult, and, in part, you want to to keep a record of the numbers of holiday identify those members who are more religious and greeting cards and presents they receive those who are less religious. You may begin with from admiring students. See who wins. a focus on certain kinds of ritual behavior, only to eventually discover that the members of the group The point of these examples is not necessarily to place a higher premium on religious experience or suggest respectable research projects but to steadfast beliefs. illustrate the many ways variables can be opera- tionalized. The open-endedness of conceptualization and operationalization is perhaps more obvious in qual- “Measuring College Satisfaction” briefly over- itative than in quantitative research, since changes views the preceding steps in terms of a concept can be made at any point during data collection mentioned at the outset of this chapter. and analysis. In quantitative methods such as sur- vey research or experiments, you will be required Operationalization to commit yourself to particular measurement Goes On and On structures. Once a questionnaire has been printed and administered, for example, altering it would be Although I’ve discussed conceptualization and impractical if not impossible, even when the un- operationalization as activities that precede data folding of the research might suggest changes. Even collection and analysis—for example, you must in the case of a survey questionnaire, however, design questionnaire items before you send out

150 ■ Chapter 5: Conceptualization, Operationalization, and Measurement you may have some flexibility in how you measure Precision and Accuracy variables during the analysis phase, as we’ll see in the following chapter. To begin, measurements can be made with varying degrees of precision. As we saw in the discussion of As I mentioned, however, the qualitative operationalization, precision concerns the fineness researcher has a greater flexibility in this regard. of distinctions made between the attributes that Things you notice during in-depth interviews, for compose a variable. The description of a woman as example, may suggest a different set of questions “43 years old” is more precise than “in her forties.” than you initially planned, allowing you to pursue Saying a street-corner gang was formed “in the unanticipated avenues. Then later, as you review summer of 1996” is more precise than saying “dur- and organize your notes for analysis, you may ing the 1990s.” again see unanticipated patterns and redirect your analysis. As a general rule, precise measurements are superior to imprecise ones, as common sense Regardless of whether you are using qualita- dictates. There are no conditions under which im- tive or quantitative methods, you should always be precise measurements are intrinsically superior to open to reexamining your concepts and definitions. precise ones. Even so, exact precision is not always The ultimate purpose of social research is to clarify necessary or desirable. If knowing that a woman is the nature of social life. The validity and utility of in her forties satisfies your research requirements, what you learn in this regard doesn’t depend on then any additional effort invested in learning when you first figured out how to look at things her precise age is wasted. The operationalization any more than it matters whether you got the idea of concepts, then, must be guided partly by an from a learned textbook, a dream, or your brother- understanding of the degree of precision required. in-law. If your needs are not clear, be more precise rather than less. Criteria of Measurement Quality Don’t confuse precision with accuracy, however. Describing someone as “born in New This chapter has come some distance. It began with England” is less precise than “born in Stowe, the bald assertion that social scientists can measure Vermont”—but suppose the person in question was anything that exists. Then we discovered that most actually born in Boston. The less-precise descrip- of the things we might want to measure and study tion, in this instance, is more accurate, a better don’t really exist. Next we learned that it’s possible reflection of the real world. to measure them anyway. Now we’ll discuss of some of the yardsticks against which we judge our Precision and accuracy are obviously impor- relative success or failure in measuring things— tant qualities in research measurement, and they even things that don’t exist. probably need no further explanation. When social scientists construct and evaluate measurements, however, they pay special attention to two techni- cal considerations: reliability and validity. reliability That quality of measurement method Reliability that suggests that the same data would have been collected each time in repeated observations of the In the abstract, reliability is a matter of whether same phenomenon. In the context of a survey, we a particular technique, applied repeatedly to the would expect that the question “Did you attend same object, yields the same result each time. Let’s religious services last week?” would have higher reli- say you want to know how much I weigh. (No, ability than the question “About how many times I don’t know why.) As one technique, say you ask have you attended religious services in your life?” two different people to estimate my weight. If the This is not to be confused with validity. first person estimates 150 pounds and the other

Criteria of Measurement Quality ■ 151 estimates 300, we have to conclude that the tech- of workers several days in a row, we might arrive nique of having people estimate my weight isn’t at different evaluations on each day. Further, even very reliable. if several observers evaluated the same behavior, they might arrive at different conclusions about the Suppose, as an alternative, that you use a bath- workers’ morale. room scale as your measurement technique. I step on the scale twice, and you note the same result Here’s another strategy for assessing morale, each time. The scale has presumably reported the a quantitative approach. Suppose we check the same weight for me both times, indicating that company records to see how many grievances the scale provides a more reliable technique for have been filed with the union during some fixed measuring a person’s weight than asking people to period. Presumably this would be an indicator of estimate it does. morale: the more grievances, the lower the morale. This measurement strategy would appear to be Reliability, however, does not ensure accuracy more reliable: Counting up the grievances over and any more than precision does. Suppose I’ve set my over, we should keep arriving at the same number. bathroom scale to shave five pounds off my weight just to make me feel better. Although you would If you find yourself thinking that the number (reliably) report the same weight for me each time, of grievances doesn’t necessarily measure morale, you would always be wrong. This new element, you’re worrying about validity, not reliability. We’ll called bias, is discussed in Chapter 8. For now, just discuss validity in a moment. The point for now be warned that reliability does not ensure accuracy. is that the last method is more like my bathroom scale—it gives consistent results. Let’s suppose we’re interested in studying mo- rale among factory workers in two different kinds In social research, reliability problems crop up of factories. In one set of factories, workers have in many forms. Reliability is a concern every time specialized jobs, reflecting an extreme division of a single observer is the source of data, because we labor. Each worker contributes a tiny part to the have no certain guard against the impact of that overall process performed on a long assembly line. observer’s subjectivity. We can’t tell for sure how In the other set of factories, each worker performs much of what’s reported originated in the situation many tasks, and small teams of workers complete observed and how much in the observer. the whole process. Subjectivity is not only a problem with single How should we measure morale? Following observers, however. Survey researchers have one strategy, we could observe the workers in each known for a long time that different interviewers, factory, noticing such things as whether they joke because of their own attitudes and demeanors, get with one another, whether they smile and laugh a different answers from respondents. Or, if we were lot, and so forth. We could ask them how they like to conduct a study of newspapers’ editorial posi- their work and even ask them whether they think tions on some public issue, we might create a team they would prefer their current arrangement or of coders to take on the job of reading hundreds the other one being studied. By comparing what of editorials and classifying them in terms of their we observed in the different factories, we might position on the issue. Unfortunately, different reach a conclusion about which assembly process coders will code the same editorial differently. Or produces the higher morale. Notice that I’ve just we might want to classify a few hundred specific described a qualitative measurement procedure. occupations in terms of some standard coding scheme, say a set of categories created by the De- Now let’s look at some reliability problems partment of Labor or by the Census Bureau. You inherent in this method. First, how you and I are and I would not place all those occupations in the feeling when we do the observing will likely color same categories. what we see. We may misinterpret what we see. We may see workers kidding each other but think Each of these examples illustrates problems of they’re having an argument. We may catch them reliability. Similar problems arise whenever we ask on an off day. If we were to observe the same group people to give us information about themselves.

152 ■ Chapter 5: Conceptualization, Operationalization, and Measurement Sometimes we ask questions that people don’t characteristics and behavior. Three months later, know the answers to: How many times have you a follow-up questionnaire asked the same subjects been to religious services? Sometimes we ask for the same information, and the results of the two people about things they consider totally irrelevant: surveys were compared. Overall, only 15 percent of Are you satisfied with China’s current relationship the subjects reported the same information in both with Albania? In such cases, people will answer studies. differently at different times because they’re mak- ing up answers as they go. Sometimes we explore Sacks and his colleagues report the following: issues so complicated that a person who had a clear opinion in the matter might arrive at a dif- Almost 10 percent of subjects reported a differ- ferent interpretation of the question when asked a ent height at follow-up examination. Parental second time. age was changed by over one in three subjects. One parent reportedly aged 20 chronologic So how do you create reliable measures? If years in three months. One in five ex-smokers your research design calls for asking people for and ex-drinkers have apparent difficulty in information, you can be careful to ask only about reliably recalling their previous consumption things the respondents are likely to know the pattern. answer to. Ask about things relevant to them, and be clear in what you’re asking. Of course, these (1980: 730) techniques don’t solve every possible reliability problem. Fortunately, social researchers have de- Some subjects erased all trace of previously veloped several techniques for cross-checking the reported heart murmur, diabetes, emphysema, reliability of the measures they devise. arrest record, and thoughts of suicide. One subject’s mother, deceased in the first questionnaire, was ap- Test-Retest Method parently alive and well in time for the second. One subject had one ovary missing in the first study but Sometimes it’s appropriate to make the same mea- present in the second. In another case, an ovary surement more than once, a technique called the present in the first study was missing in the second test-retest method. If you don’t expect the sought-af- study—and had been for ten years! One subject ter information to change, then you should expect was reportedly 55 years old in the first study and the same response both times. If answers vary, the 50 years old three months later. (You have to won- measurement method may, to the extent of that der whether the physician-counselors could ever variation, be unreliable. Here’s an illustration. have nearly the impact on their patients that their patients’ memories did.) Thus, test-retest revealed In their research on Health Hazard Appraisal that this data-collection method was not especially (HHA), a part of preventive medicine, Jeffrey reliable. Sacks, W. Mark Krushat, and Jeffrey Newman (1980) wanted to determine the risks associated Split-Half Method with various background and lifestyle factors, making it possible for physicians to counsel their As a general rule, it’s always good to make more patients appropriately. By knowing patients’ life than one measurement of any subtle or complex situations, physicians could advise them on their social concept, such as prejudice, alienation, or potential for survival and on how to improve it. social class. This procedure lays the groundwork for This purpose, of course, depended heavily on the another check on reliability. Let’s say you’ve created accuracy of the information gathered about each a questionnaire that contains ten items you believe subject in the study. measure prejudice against women. Using the split- half technique, you would randomly assign those To test the reliability of their information, Sacks ten items to two sets of five. Each set should pro- and his colleagues had all 207 subjects complete vide a good measure of prejudice against women, a baseline questionnaire that asked about their and the two sets should classify respondents the

Criteria of Measurement Quality ■ 153 same way. If the two sets of items classify people now, however, let’s recall that even total reliability differently, you most likely have a problem of reli- doesn’t ensure that our measures actually measure ability in your measure of the variable. what we think they measure. Now let’s plunge into the question of validity. Using Established Measures Validity Another way to help ensure reliability in getting in- formation from people is to use measures that have In conventional usage, validity refers to the extent proved their reliability in previous research. If you to which an empirical measure adequately reflects want to measure anomia, for example, you might the real meaning of the concept under consider- want to follow Srole’s lead. ation. A measure of social class should measure social class, not political orientations. A measure The heavy use of measures, though, does not of political orientations should measure political guarantee their reliability. For example, the Scho- orientations, not sexual permissiveness. Validity lastic Assessment Tests (SATs) and the Minnesota means that we are actually measuring what we say Multiphasic Personality Inventory (MMPI) have we are measuring. been accepted as established standards in their respective domains for decades. In recent years, Whoops! I’ve already committed us to the though, they’ve needed fundamental overhauling view that concepts don’t have real meanings. How to reflect changes in society, eliminating outdated can we ever say whether a particular measure topics and gender bias in wording. adequately reflects the concept’s meaning, then? Ultimately, of course, we can’t. At the same time, as Reliability of Research Workers we’ve already seen, all of social life, including social research, operates on agreements about the terms As we’ve seen, it’s also possible for measurement we use and the concepts they represent. There are unreliability to be generated by research workers: several criteria of success in making measurements interviewers and coders, for example. There are that are appropriate to these agreed-on meanings several ways to check on reliability in such cases. To of concepts. guard against interviewer unreliability in surveys, for example, a supervisor will call a subsample First, there’s something called face validity. of the respondents on the telephone and verify Particular empirical measures may or may not jibe selected pieces of information. with our common agreements and our individual Replication works in other situations also. If validity A term describing a measure that accu- you’re worried that newspaper editorials or occupa- rately reflects the concept it is intended to measure. tions may not be classified reliably, you could have For example, your IQ would seem a more valid each independently coded by several coders. Those measure of your intelligence than the number of cases that are classified inconsistently can then be hours you spend in the library would. Though the evaluated more carefully and resolved. ultimate validity of a measure can never be proved, we may agree to its relative validity on the basis Finally, clarity, specificity, training, and practice of face validity, criterion-related validity, construct can prevent a great deal of unreliability and grief. If validity, content validity, internal validation, and you and I spent some time reaching a clear agree- external validation (see Chapter 6). This must not be ment on how to evaluate editorial positions on an confused with reliability. issue—discussing various positions and reading through several together—we could probably do face validity That quality of an indicator that a good job of classifying them in the same way makes it seem a reasonable measure of some vari- independently. able. That the frequency of attendance at religious services is some indication of a person’s religiosity The reliability of measurements is a funda- seems to make sense without a lot of explanation. It mental issue in social research, and we’ll return has face validity. to it more than once in the chapters ahead. For

154 ■ Chapter 5: Conceptualization, Operationalization, and Measurement mental images concerning a particular concept. For Supports far-right militia groups example, you and I might quarrel about whether counting the number of grievances filed with the Is concerned about the environment union will adequately measure morale. Still, we’d surely agree that the number of grievances has Some possible validators would be, respectively, something to do with morale. That is, the measure is attends religious services, votes for women can- valid “on its face,” whether or not it’s adequate. If I didates, belongs to the NRA, and belongs to the were to suggest that we measure morale by finding Sierra Club. out how many books the workers took out of the library during their off-duty hours, you’d undoubt- Sometimes it’s difficult to find behavioral edly raise a more serious objection: That measure criteria that can be taken to validate measures as wouldn’t have much face validity. directly as in such examples. In those instances, however, we can often approximate such criteria Second, I’ve already pointed to many of the by applying a different test. We can consider how more formally established agreements that define the variable in question ought, theoretically, to re- some concepts. The Census Bureau, for example, late to other variables. Construct validity is based has created operational definitions of such concepts on the logical relationships among variables. as family, household, and employment status that seem to have a workable validity in most studies Suppose, for example, that you want to study using these concepts. the sources and consequences of marital satisfac- tion. As part of your research, you develop a mea- Three additional types of validity also specify sure of marital satisfaction, and you want to assess particular ways of testing the validity of measures. its validity. The first, criterion-related validity, sometimes called predictive validity, is based on some exter- In addition to developing your measure, you’ll nal criterion. For example, the validity of College have developed certain theoretical expectations Board exams is shown in their ability to predict about the way the variable marital satisfaction relates students’ success in college. The validity of a writ- to other variables. For example, you might reason- ten driver’s test is determined, in this sense, by the ably conclude that satisfied husbands and wives relationship between the scores people get on the will be less likely than dissatisfied ones to cheat test and their subsequent driving records. In these on their spouses. If your measure relates to marital examples, college success and driving ability are the fidelity in the expected fashion, that constitutes criteria. evidence of your measure’s construct validity. If satisfied marriage partners are as likely to cheat To test your understanding of criterion-related on their spouses as the dissatisfied ones are, validity, see whether you can think of behaviors however, that would challenge the validity of your that might be used to validate each of the following measure. attitudes: Tests of construct validity, then, can offer a Is very religious weight of evidence that your measure either does or doesn’t tap the quality you want it to measure, Supports equality of men and women without providing definitive proof. Although I have suggested that tests of construct validity are less criterion-related validity The degree to which a compelling than those of criterion validity, there is measure relates to some external criterion. For ex- room for disagreement about which kind of test a ample, the validity of College Board tests is shown in particular comparison variable (driving record, mari- their ability to predict the college success of students. tal fidelity) represents in a given situation. It’s less Also called predictive validity. important to distinguish the two types of validity tests than to understand the logic of validation that construct validity The degree to which a measure they have in common: If we’ve succeeded in mea- relates to other variables as expected within a system suring some variable, then our measures should of theoretical relationships. relate in some logical way to other measures.

Criteria of Measurement Quality ■ 155 FIGURE 5-2 An Analogy to Validity and Reliability. A good measurement technique should be both valid (measuring what it is intended to measure) and reliable (yielding a given measurement dependably). Finally, content validity refers to how much Social researchers sometimes criticize them- a measure covers the range of meanings included selves and one another for implicitly assuming they within a concept. For example, a test of mathemati- are somewhat superior to those they study. For cal ability cannot be limited to addition but also example, researchers often seek to uncover motiva- needs to cover subtraction, multiplication, division, tions that the social actors themselves are unaware and so forth. Or, if we’re measuring prejudice, do of. You think you bought that new Burpo-Blasto our measurements reflect all types of prejudice, because of its high performance and good looks, including prejudice against racial and ethnic groups, but we know you’re really trying to achieve a religious minorities, women, the elderly, and so on? higher social status. Figure 5-2 presents a graphic portrayal of the This implicit sense of superiority would fit difference between validity and reliability. If you comfortably with a totally positivistic approach think of measurement as analogous to repeatedly (the biologist feels superior to the frog on the lab shooting at the bull’s-eye on a target, you’ll see table), but it clashes with the more humanistic and that reliability looks like a “tight pattern,” regard- typically qualitative approach taken by many social less of where the shots hit, because reliability is a scientists. We’ll explore this issue more deeply function of consistency. Validity, on the other hand, in Chapter 10. In seeking to understand the way is a function of shots being arranged around the ordinary people make sense of their worlds, eth- bull’s-eye. The failure of reliability in the figure is nomethodologists have urged all social scientists to randomly distributed around the target; the failure pay more respect to the natural social processes of of validity is systematically off the mark. Notice conceptualization and shared meaning. At the very that neither an unreliable nor an invalid measure is least, behavior that may seem irrational from the likely to be very useful. scientist’s paradigm may make logical sense when viewed through the actor’s paradigm. Who Decides What’s Valid? Clifford Geertz (1973) applies the term thick Our discussion of validity began with a reminder description in reference to the goal of understanding, that we depend on agreements to determine as deeply as possible, the meanings that elements what’s real, and we’ve just seen some of the ways social scientists can agree among themselves that content validity The degree to which a measure they have made valid measurements. There is yet covers the range of meanings included within a another way of looking at validity. concept.

156 ■ Chapter 5: Conceptualization, Operationalization, and Measurement of a culture have for those who live within that As I pointed out earlier, however, the count- culture. He recognizes that the outside observer ing strategy would be more reliable. This situation will never grasp those meanings fully, however, reflects a more general strain in research mea- and warns, “Cultural analysis is intrinsically incom- surement. Most of the really interesting concepts plete.” He then elaborates: we want to study have many subtle nuances, so specifying precisely what we mean by them is hard. There are a number of ways to escape this— Researchers sometimes speak of such concepts as turning culture into folklore and collecting it, having a “richness of meaning.” Although scores of turning it into traits and counting it, turning books and articles have been written on the topic it into institutions and classifying it, turning it of anomie/anomia, for example, they still haven’t into structures and toying with it. But they are exhausted its meaning. escapes. The fact is that to commit oneself to a semiotic concept of culture and an interpre- Very often, then, specifying reliable opera- tive approach to the study of it is to commit tional definitions and measurements seems to rob oneself to a view of ethnographic assertion as, concepts of their richness of meaning. Positive to borrow W. B. Gallie’s by now famous phrase, morale is much more than a lack of grievances “essentially contestable.” Anthropology, or at filed with the union; anomia is much more than least interpretive anthropology, is a science what is measured by the five items created by Leo whose progress is marked less by a perfection Srole. Yet, the more variation and richness we al- of consensus than by a refinement of debate. low for a concept, the more opportunity there is What gets better is the precision with which we for disagreement on how it applies to a particular vex each other. situation, thus reducing reliability. (1973: 29) To some extent, this dilemma explains the per- sistence of two quite different approaches to social Ultimately, social researchers should look both research: quantitative, nomothetic, structured tech- to their colleagues and to their subjects as sources niques such as surveys and experiments on the one of agreement on the most useful meanings and hand, and qualitative, idiographic methods such as measurements of the concepts they study. Some- field research and historical studies on the other. times one source will be more useful, sometimes In the simplest generalization, the former methods the other. But neither one should be dismissed. tend to be more reliable, the latter more valid. Tension between Reliability By being forewarned, you’ll be effectively and Validity forearmed against this persistent and inevitable dilemma. If there is no clear agreement on how Clearly, we want our measures to be both reliable to measure a concept, measure it several differ- and valid. However, a tension often arises between ent ways. If the concept has several dimensions, the criteria of reliability and validity, forcing a measure them all. Above all, know that the concept trade-off between the two. does not have any meaning other than what you and I give it. The only justification for giving any Recall the example of measuring morale in dif- concept a particular meaning is utility. Measure ferent factories. The strategy of immersing yourself concepts in ways that help us understand the world in the day-to-day routine of the assembly line, around us. observing what goes on, and talking to the work- ers would seem to provide a more valid measure The Ethics of Measurement of morale than counting grievances would. It just seems obvious that we’d get a clearer sense of Measurement decisions can sometimes be judged whether the morale was high or low using this first by ethical standards. We have seen that most of method. the concepts of interest to social researchers are

Main Points ■ 157 open to varied meanings. Suppose, for example, concepts don’t exist in the real world, so they can’t that you are interested in sampling public opinion be measured directly, but we can measure the on the abortion issue in the United States. Notice things that our concepts summarize. the difference it would make if you conceptual- ized one side of the debate as “pro-choice” or as Conceptualization “pro-abortion.” If your personal bias made you want to minimize support for having an abortion, • Conceptualization is the process of specifying you might be tempted to frame the concept and the measurements based on it in terms of people being observations and measurements that give concepts “pro-abortion,” thereby eliminating all those who definite meaning for the purposes of a research were not especially fond of abortion per se but felt study. a woman should have the right to make that choice for herself. To pursue this strategy, however, would • Conceptualization includes specifying the indica- violate accepted research ethics. tors of a concept and describing its dimensions. Consider the choices available to you in con- Operational definitions specify how variables ceptualizing attitudes toward the U.S. invasion of relevant to a concept will be measured. Iraq in 2003. Imagine the different levels of support you would “discover” if you framed the position as Definitions in Descriptive an unprovoked invasion of a sovereign nation, as and Explanatory Studies a retaliation for the September 11, 2001, attack on the World Trade Towers (many Americans still be- • Precise definitions are even more important in lieve Saddam Hussein masterminded that attack), as a defensive act against a perceived threat, as part descriptive than in explanatory studies. The de- of a global war on terrorism, or in any of the other gree of precision needed varies with the type and way this event has been portrayed. There is no one, purpose of a study. correct way to conceptualize this issue, but it would be unethical to seek to slant the results through a Operationalization Choices biased definition of the issue. • Operationalization is an extension of conceptual- MAIN POINTS ization that specifies the exact procedures that will Introduction be used to measure the attributes of variables. • The interrelated processes of conceptualization, • Operationalization involves a series of interrelated operationalization, and measurement allow re- choices: specifying the range of variation that is searchers to move from a general idea about what appropriate for the purposes of a study, determin- they want to study to effective and well-defined ing how precisely to measure variables, account- measurements in the real world. ing for relevant dimensions of variables, clearly defining the attributes of variables and their Measuring Anything That Exists relationships, and deciding on an appropriate level of measurement. • Concepts are mental images we use as summary • Researchers must choose from four levels of devices for bringing together observations and experiences that seem to have something in com- measurement, which capture increasing amounts mon. We use terms or labels to reference these of information: nominal, ordinal, interval, and concepts. ratio. The most appropriate level depends on the purpose of the measurement. • Concepts are constructs; they represent the • A given variable can sometimes be measured agreed-on meanings we assign to terms. Our at different levels. When in doubt, researchers should use the highest level of measurement ap- propriate to that variable so they can capture the greatest amount of information. • Operationalization begins in the design phase of a study and continues through all phases of the research project, including the analysis of data. Criteria of Measurement Quality • Criteria of the quality of measures include preci- sion, accuracy, reliability, and validity.

158 ■ Chapter 5: Conceptualization, Operationalization, and Measurement • Whereas reliability means getting consistent The ease or difficulty of this exercise may vary with the type of data collection you’re planning. It results from the same measure, validity refers to will probably be easier to accomplish in the case of getting results that accurately reflect the concept quantitative studies, such as surveys, where you can being measured. report the questionnaire items you’ll use for measure- ments. In qualitative research, however, you’ll have • Researchers can test or improve the reliability more opportunities to modify the ways variables are measured as the study unfolds, taking advantage of of measures through the test-retest method, the insights gained “in the trenches.” Even so, you’ll still split-half method, the use of established mea- need to begin with some clear ideas about how you’ll sures, and the examination of work performed by begin your measurements. research workers. Criteria such as precision, accuracy, validity, and • The yardsticks for assessing a measure’s validity reliability matter greatly in all kinds of social research projects. include face validity, criterion-related validity, construct validity, and content validity. REVIEW QUESTIONS AND EXERCISES • Creating specific, reliable measures often seems 1. Pick a social science concept such as liberalism or alienation, then specify that concept so that to diminish the richness of meaning our general it could be studied in a research project. Be sure concepts have. This problem is inevitable. The best to specify the indicators you’ll use as well as the solution is to use several different measures, tap- dimensions you wish to include in and exclude ping the different aspects of a concept. from your conceptualization. The Ethics of Measurement 2. What level of measurement—nominal, ordinal, interval, or ratio—describes each of the following • Conceptualization and measurement must never variables? be guided by bias or preferences for particular a. Race (white, African American, Asian, and research outcomes. so on) KEY TERMS b. Order of finish in a race (first, second, third, and so on) The following terms are defined in context in the chapter and at the bottom of the page where the term c. Number of children in families is introduced, as well as in the comprehensive glossary at the back of the book. d. Populations of nations conceptualization interval measure e. Attitudes toward nuclear energy (strongly construct validity nominal measure approve, approve, disapprove, strongly content validity ordinal measure disapprove) criterion-related validity ratio measure dimension reliability f. Region of birth (Northeast, Midwest, and face validity specification so on) indicator validity g. Political orientation (very liberal, some- PROPOSING SOCIAL RESEARCH: MEASUREMENT what liberal, somewhat conservative, very conservative) This chapter has taken us deeper into the matter of measurement. In previous exercises, you’ve identified 3. To conceptualize the variable prejudice, use your the concepts and variables you want to address in your favorite web browser to search for this term. After research project. Now you’ll need to get more specific reviewing several of the websites resulting from in terms of conceptualization and operationalization. your search, make a list of some different forms You should conclude this portion of the proposal with of prejudice that might be studied in an omnibus a description of how, precisely, you will make distinc- project dealing with that topic. tions regarding your variables. If you want to compare liberals and conservatives, for example, how exactly 4. In a good dictionary, look up truth and true, then will you identify subjects’ political orientations? copy out the definitions. Note the key terms used in those definitions (such as reality), look up the definitions of those terms, and copy out these

Online Study Resources ■ 159 definitions as well. Continue this process until find information on this online tool, as well no new terms appear. Comment on what you’ve as instructions on how to access all of its great learned from this exercise. Did you discover resources, in the front of the book. “truth”? 2. As you review, take advantage of the CengageNOW SPSS EXERCISES personalized study plan, based on your quiz results. Use this study plan with its interactive ex- See the booklet that accompanies your text for ex- ercises and other resources to master the material. ercises using SPSS (Statistical Package for the Social Sciences). There are exercises offered for each chapter, 3. When you’re finished with your review, take the and you’ll also find a detailed primer on using SPSS. posttest to confirm that you’re ready to move on to the next chapter. Online Study Resources WEBSITE FOR THE PRACTICE OF SOCIAL RESEARCH 12TH EDITION If your book came with an access code card, visit www.cengage.com/login to register. To purchase Go to your book’s website at www.cengage.com/ access, please visit www.ichapters.com. sociology/babbie for tools to aid you in studying for your exams. You’ll find Tutorial Quizzes with feedback, 1. Before you do your final review of the chapter, Internet Exercises, Flash Cards, Glossaries, and Essay Quiz- take the CengageNOW pretest to help identify the zes, as well as InfoTrac College Edition search terms, sug- areas on which you should concentrate. You’ll gestions for additional reading, Web Links, and primers for using data-analysis software such as SPSS.

CHAPTER SIX Indexes, Scales, and Typologies CHAPTER OVERVIEW Researchers often need to employ multiple indicators to measure a variable adequately and validly. Indexes,scales,and typologies are useful composite measures made up of several indicators of variables. Introduction Scale Construction Bogardus Social Distance Indexes versus Scales Scale Thurstone Scales Index Construction Likert Scaling Item Selection Semantic Differential Examination of Empirical Guttman Scaling Relationships Index Scoring Typologies Handling Missing Data Index Validation The Status of Women: An Illustration of Index Construction CengageNOW for Sociology Use this online tool to help you make the grade on your next exam. After 160 reading this chapter, go to “Online Study Resources” at the end of the chapter for instructions on how to benefit from CengageNOW.

Indexes versus Scales ■ 161 Introduction case of complex concepts, however, researchers can seldom develop single indicators before they As we saw in Chapter 5, many social science con- actually do the research. This is especially true cepts have complex and varied meanings. Making with regard to attitudes and orientations. Rarely measurements that capture such concepts can be a can a survey researcher, for example, devise single challenge. Recall our discussion of content validity, questionnaire items that adequately tap respon- which concerns whether we have captured all the dents’ degrees of prejudice, religiosity, political different dimensions of a concept. orientation, alienation, and the like. More likely, the researcher will devise several items, each of To achieve broad coverage of the various which provides some indication of the variables. dimensions of a concept, we usually need to make Taken individually, each of these items is likely to multiple observations pertaining to that concept. prove invalid or unreliable for many respondents. Thus, for example, Bruce Berg (1989: 21) advises A composite measure, however, can overcome this in-depth interviewers to prepare essential ques- problem. tions, which are “geared toward eliciting specific, desired information.” In addition, the researcher Second, researchers may wish to employ a should prepare extra questions: “questions roughly rather refined ordinal measure of a particular equivalent to certain essential ones, but worded variable (alienation, say), arranging cases in several slightly differently.” ordinal categories from very low to very high, for example. A single data item might not have Multiple indicators are used with quantitative enough categories to provide the desired range of data as well. Suppose you’re designing a survey. variation. However, an index or scale formed from Although you can sometimes construct a single several items can provide the needed range. questionnaire item that captures the variable of interest—”Gender: Male Female” is a simple Finally, indexes and scales are efficient devices example— other variables are less straightforward for data analysis. If considering a single data item and may require you to use several questionnaire gives us only a rough indication of a given variable, items to measure them adequately. considering several data items can give us a more comprehensive and more accurate indication. For Quantitative data analysts have developed example, a single newspaper editorial may give specific techniques for combining indicators into a us some indication of the political orientations single measure. This chapter discusses the con- of that newspaper. Examining several editorials struction of two types of composite measures of would probably give us a better assessment, but the variables—indexes and scales. Although these manipulation of several data items simultaneously measures can be used in any form of social research, could be very complicated. Indexes and scales (es- they are most common in survey research and other pecially scales) are efficient data-reduction devices: quantitative methods. A short section at the end of They allow us to summarize several indicators in this chapter considers typologies, which are relevant a single numerical score, while sometimes nearly to both qualitative and quantitative research. maintaining the specific details of all the individual indicators. Composite measures are frequently used in quantitative research, for several reasons. First, Indexes versus Scales social scientists often wish to study variables that have no clear and unambiguous single indicators. The terms index and scale are typically used Single indicators do suffice for some variables, such imprecisely and interchangeably in social research as age. We can determine a survey respondent’s age literature. The two types of measures do have some by simply asking, “How old are you?” Similarly, we can determine a newspaper’s circulation by merely looking at the figure the newspaper reports. In the

162 ■ Chapter 6: Indexes, Scales, and Typologies characteristics in common, but in this book we’ll reflect something stronger. For example, agreeing distinguish between the two. However, you should that “Women are different from men” is, at best, be warned of a growing tendency in the literature weak evidence of sexism compared with agree- to use the term scale to refer to both indexes and ing that “Women should not be allowed to vote.” scales, as they are distinguished here. A scale takes advantage of differences in intensity among the attributes of the same variable to iden- First, let’s consider what they have in common. tify distinct patterns of response. Both scales and indexes are ordinal measures of variables. Both rank-order the units of analysis in Let’s consider this simple example of sexism a terms of specific variables such as religiosity, alien- bit further. Imagine asking people to agree or dis- ation, socioeconomic status, prejudice, or intel- agree with the two statements just presented. Some lectual sophistication. A person’s score on either a might agree with both, some might disagree with scale or an index of religiosity, for example, gives both. But suppose I told you someone agreed with an indication of his or her relative religiosity vis-à- one and disagreed with the other: Could you guess vis other people. which statement they agreed with and which they did not? I’d guess the person in question agreed Further, both scales and indexes are compos- that women were different but disagreed that they ite measures of variables—that is, measurements should be prohibited from voting. On the other based on more than one data item. Thus, a survey hand, I doubt that anyone would want to prohibit respondent’s score on an index or scale of religios- women from voting, while asserting that there ity is determined by the responses given to several is no difference between men and women. That questionnaire items, each of which provides some would make no sense. indication of religiosity. Similarly, a person’s IQ score is based on answers to a large number of test Now consider this. The two responses we questions. The political orientation of a newspaper wanted from each person would technically yield might be represented by an index or scale score four possible response patterns: agree/agree, reflecting the newspaper’s editorial policy on vari- agree/disagree, disagree/agree, and disagree/dis- ous political issues. agree. We’ve just seen, however, that only three of the four patterns make any sense or are likely Despite these shared characteristics, it’s useful to occur. Where indexes score people based on to distinguish between indexes and scales. In this their responses, scales score people on the basis of book, we’ll distinguish them by the way scores are response patterns: We determine what the logical assigned in each. We construct an index simply response patterns are and score people in terms of by accumulating scores assigned to individual at- the pattern their responses most closely resemble. tributes. We might measure prejudice, for example, by adding up the number of prejudiced statements Figure 6-1 provides a graphic illustration of the each respondent agreed with. We construct a difference between indexes and scales. Let’s assume scale, however, by assigning scores to patterns of we want to develop a measure of political activism, responses, recognizing that some items reflect a distinguishing those people who are very active in relatively weak degree of the variable while others political affairs, those who don’t participate much at all, and those who are somewhere in between. index A type of composite measure that summa- rizes and rank-orders several specific observations The first part of Figure 6-1 illustrates the logic and represents some more general dimension. of indexes. The figure shows six different politi- cal actions. Although you and I might disagree on scale A type of composite measure composed of some specifics, I think we could agree that the six several items that have a logical or empirical struc- actions represent roughly the same degree of politi- ture among them. Examples of scales include Bogar- cal activism. dus social distance, Guttman, Likert, and Thurstone scales. Using these six items, we could construct an index of political activism by giving each person 1 point for each of the actions he or she has taken.

Indexes versus Scales ■ 163 FIGURE 6-1 Indexes versus Scales. Both indexes and scales seek to measure variables such as political activism. Whereas indexes count the number of indicators of the variable, scales take account of the differing intensities of those indicators. If you wrote to a public official and signed a peti- contributed money probably also voted. Those who tion, you’d get a total of 2 points. If I gave money worked on a campaign probably also gave some to a candidate and persuaded someone to change money and voted. This suggests that most people her or his vote, I’d get the same score as you. Using will fall into only one of five idealized action pat- this approach, we’d conclude that you and I had terns, represented by the illustrations at the bottom the same degree of political activism, even though of the figure. The discussion of scales, later in this we had taken different actions. chapter, describes ways of identifying people with the type they most closely represent. The second part of Figure 6-1 describes the logic of scale construction. In this case, the actions As you might surmise, scales are generally clearly represent different degrees of political activ- superior to indexes, because scales take into con- ism, ranging from simply voting to running for sideration the intensity with which different items office. Moreover, it seems safe to assume a pattern reflect the variable being measured. Also, as the of actions in this case. For example, all those who example in Figure 6-1 shows, scale scores convey

164 ■ Chapter 6: Indexes, Scales, and Typologies more information than index scores do. Again, be and validating it. We’ll conclude this discussion aware that the term scale is commonly misused to by examining the construction of an index that refer to measures that are only indexes. Merely provided interesting findings about the status of calling a measure a scale instead of an index doesn’t women in different countries. make it better. Item Selection There are two other misconceptions about scal- ing that you should know about. First, whether the The first step in creating an index is selecting items combination of several data items results in a scale for a composite index, which is created to measure almost always depends on the particular sample of some variable. observations under study. Certain items may form a scale within one sample but not within another. Face Validity For this reason, do not assume that a given set of items is a scale simply because it has turned out The first criterion for selecting items to be included that way in an earlier study. in an index is face validity (or logical validity). If you want to measure political conservatism, for Second, the use of specific scaling techniques— example, each of your items should appear on its such as Guttman scaling, to be discussed—does face to indicate conservatism (or its opposite, liber- not ensure the creation of a scale. Rather, such alism). Political party affiliation would be one such techniques let us determine whether or not a set of item. Another would be an item asking people items constitutes a scale. to approve or disapprove of the views of a well- known conservative public figure. In constructing An examination of actual social science an index of religiosity, you might consider items research reports will show that researchers use such as attendance at religious services, acceptance indexes much more frequently than they do scales. of certain religious beliefs, and frequency of prayer; Ironically, however, the methodological literature each of these appears to offer some indication of contains little if any discussion of index construc- religiosity. tion, whereas discussions of scale construction abound. There appear to be two reasons for this Unidimensionality disparity. First, indexes are more frequently used because scales are often difficult or impossible to The methodological literature on conceptualization construct from the data at hand. Second, methods and measurement stresses the need for unidimen- of index construction seem so obvious and straight- sionality in scale and index construction. That is, forward that they aren’t discussed much. a composite measure should represent only one dimension of a concept. Thus, items reflecting Constructing indexes is not a simple undertak- religious fundamentalism should not be included ing, however. The general failure to develop index- in a measure of political conservatism, even though construction techniques has resulted in many bad the two variables might be empirically related to indexes in social research. With this in mind, I’ve each other. devoted over half of this chapter to the methods of index construction. With a solid understanding of General or Specific the logic of this activity, you’ll be better equipped to try constructing both indexes and scales. Although measures should tap the same dimen- sion, the general dimension you’re attempting Index Construction to measure may have many nuances. In the example of religiosity, the indicators mentioned Let’s look now at four main steps in the construc- previously—ritual participation, belief, and so tion of an index: selecting possible items, examin- on—represent different types of religiosity. If you ing their empirical relationships, scoring the index, want to focus on ritual participation in religion,

Index Construction ■ 165 you should choose items specifically indicating this they’ll answer other questions. If two items are type of religiosity: attendance at religious services empirically related to each other, we can reason- and other rituals such as confession, bar mitzvah, ably argue that each reflects the same variable, bowing toward Mecca, and the like. If you want and we may include them both in the same index. to measure religiosity in a more general way, you There are two types of possible relationships among should include a balanced set of items, representing items: bivariate and multivariate. each of the different types of religiosity. Ultimately, the nature of the items you include will deter- Bivariate Relationships mine how specifically or generally the variable is measured. A bivariate relationship is, simply put, a relationship between two variables. Suppose we want to mea- Variance sure respondents’ support for U.S. participation in the United Nations. One indicator of different levels In selecting items for an index, you must also of support might be the question “Do you feel the be concerned with the amount of variance they U.S. financial support of the UN is Too high provide. If an item is intended to indicate political conservatism, for example, you should note what About right Too low?” proportion of respondents would be identified as A second indicator of support for the United conservatives by that item. If a given item identified Nations might be the question “Should the United no one as a conservative or everyone as a conserva- States contribute military personnel to UN peace- tive—for example, if nobody indicated approval of keeping actions? Strongly approve Mostly a radical-right political figure—that item would not approve Mostly disapprove Strongly be very useful in the construction of an index. disapprove.” Both of these questions, on their face, seem to To guarantee variance, you have two options. reflect different degrees of support for the United First, you may select several items the responses to Nations. Nonetheless, some people might feel the which divide people about equally in terms of the United States should give more money but not variable, for example, about half conservative and provide troops. Others might favor sending troops half liberal. Although no single response would but cutting back on financial support. justify the characterization of a person as very con- If the two items both reflect degrees of the servative, a person who responded as a conserva- same thing, however, we should expect responses tive on all items might be so characterized. to the two items to correspond with each other. Specifically, those who approve of military support The second option is to select items differing in should be more likely to favor financial support variance. One item might identify about half of the than those who disapprove of military support subjects as conservative, while another might iden- would. Conversely, those who favor financial sup- tify few of the respondents as conservatives. Note port should be more likely to favor military support that this second option is necessary for scaling, and than those disapproving of financial support would. it is reasonable for index construction as well. If these expectations are met, we say there is a bivariate relationship between the two items. Examination of Empirical Here’s another example. Suppose we want to Relationships determine the degree to which respondents feel women have the right to an abortion. We might The second step in index construction is to exam- ask (1) “Do you feel a woman should have the ine the empirical relationships among the items right to an abortion when her pregnancy was the being considered for inclusion. (See Chapter 14 result of rape?” and (2) “Do you feel a woman for more.) An empirical relationship is established should have the right to an abortion if continuing when respondents’ answers to one question—in a her pregnancy would seriously threaten her life?” questionnaire, for example—help us predict how

166 ■ Chapter 6: Indexes, Scales, and Typologies “Cause” and “Effect” Indicators tion,so both are good indicators of the variable.But we would not expect the race and gender of individuals to be strongly associated. Kenneth Bollen Or,we may measure social interaction with three indicators:time Department of Sociology,University of North Carolina, spent with friends,time spent with family,and time spent with cowork- Chapel Hill ers.Though each indicator is valid,they need not be positively correlated. Time spent with friends,for instance,may be inversely related to time While it often makes sense to expect indicators of the same vari- spent with family.Here,the three indicators“cause”the degree of social able to be positively related to one another,as discussed in the interaction. text,this is not always the case. As a final example,exposure to stress may be measured by Indicators should be related to one another if they are essentially whether a person recently experienced divorce,death of a spouse,or loss “effects”of a variable.For example,to measure self-esteem,we might of a job.Though any of these events may indicate stress,they need not ask a person to indicate whether he or she agrees or disagrees with the correlate with one another. statements (1)“I am a good person”and (2)“I am happy with myself.”A person with high self-esteem should agree with both statements while In short,we expect an association between indicators that depend one with low self-esteem would probably disagree with both.Since on or“reflect”a variable,that is,if they are the“effects”of the variable. each indicator depends on or“reflects”self-esteem,we expect them to But if the variable depends on the indicators—if the indicators are the be positively correlated.More generally,indicators that depend on the “causes”—those indicators may be either positively or negatively corre- same variable should be associated with one another if they are valid lated,or even unrelated.Therefore,we should decide whether indicators measures. are causes or effects of a variable before using their intercorrelations to assess their validity. But,this is not the case when the indicators are the“cause”rather than the“effect”of a variable.In this situation the indicators may corre- Source: From Kenneth Bollen,Cause and Effect Indicators,Copyright © Kenneth Bollen. late positively,negatively,or not at all.For example,we could use gender Reprinted by permission. and race as indicators of the variable exposure to discrimination. Being nonwhite or female increases the likelihood of experiencing discrimina- Granted, some respondents might agree with be rather subtle. “‘Cause’ and ‘Effect’ Indicators” item (1) and disagree with item (2); others will do examines some of these subtleties. just the reverse. However, if both items tap into some general opinion people have about the issue Be wary of items that are not related to one of abortion, then the responses to these two items another empirically: It’s unlikely that they measure should be related to each other. Those who support the same variable. You should probably drop any the right to an abortion in the case of rape should item that is not related to several other items. be more likely to support it if the woman’s life is threatened than those who disapproved of abortion At the same time, a very strong relationship in the case of rape would. This would be another between two items presents a different problem. example of a bivariate relationship. If two items are perfectly related to each other, then only one needs to be included in the index; You should examine all the possible bivari- because it completely conveys the indications pro- ate relationships among the several items being vided by the other, nothing more would be added considered for inclusion in an index, in order to by including the other item. (This problem will determine the relative strengths of relationships become even clearer in the next section.) among the several pairs of items. Percentage tables, correlation coefficients (see Chapter 16), or both Here’s an example to illustrate the testing of may be used for this purpose. How we evaluate bivariate relationships in index construction. I the strength of the relationships, however, can once conducted a survey of medical school faculty members to find out about the consequences of a “scientific perspective” on the quality of patient

Index Construction ■ 167 care provided by physicians. The primary intent contribution as medical researchers. In response was to determine whether scientifically inclined to the second item—ultimate medical interests— doctors treated patients more impersonally than approximately two-thirds chose the scientific other doctors did. answer, saying they were more interested in learn- ing about basic mechanisms than learning about The survey questionnaire offered several pos- total patient management. In response to the third sible indicators of respondents’ scientific perspec- item—reading preferences—about 80 percent tives. Of those, three items appeared to provide chose the scientific answer. especially clear indications of whether the doctors were scientifically oriented: These three questionnaire items can’t tell us how many “scientists” there are in the sample, for 1. As a medical school faculty member, in what none of them is related to a set of criteria for what capacity do you feel you can make your great- constitutes being a scientist in any absolute sense. est teaching contribution: as a practicing physi- Using the items for this purpose would present us cian or as a medical researcher? with the problem of three quite different estimates of how many scientists there were in the sample. 2. As you continue to advance your own medical knowledge, would you say your ultimate medi- However, these items do provide us with three cal interests lie primarily in the direction of independent indicators of respondents’ relative total patient management or the understand- inclinations toward science in medicine. Each item ing of basic mechanisms? [The purpose of this separates respondents into the more scientific and item was to distinguish those who were mostly the less scientific. But each grouping of more or less interested in overall patient care from those scientific respondents will have a somewhat differ- mostly interested in biological processes.] ent membership from the others. Respondents who seem scientific in terms of one item will not seem 3. In the field of therapeutic research, are you scientific in terms of another. Nevertheless, to the generally more interested in articles reporting extent that each item measures the same general evaluations of the effectiveness of various treat- dimension, we should find some correspondence ments or articles exploring the basic rationale among the several groupings. Respondents who underlying the treatments? [Similarly, I wanted appear scientific in terms of one item should be to distinguish those more interested in articles more likely to appear scientific in their response dealing with patient care from those more to another item than would those who appeared interested in biological processes.] nonscientific in their response to the first. In other (Babbie 1970: 27–31) words, we should find an association or correlation between the responses given to two items. For each of these items, we might conclude that those respondents who chose the second Figure 6-2 shows the associations among answer are more scientifically oriented than re- the responses to the three items. Three bivariate spondents who chose the first answer. Though this tables are presented, showing the distribution of comparative conclusion is reasonable, we should responses for each possible pairing of items. An not be misled into thinking that respondents who examination of the three bivariate relationships chose the second answer to a given item are scien- presented in the figure supports the suggestion tists in any absolute sense. They are simply more that the three items all measure the same variable: scientifically oriented than those who chose the scientific orientation. To see why this is so, let’s begin first answer to the item. by looking at the first bivariate relationship in the table. The table shows that faculty who responded To see this point more clearly, let’s examine that “researcher” was the role in which they could the distribution of responses to each item. From make their greatest teaching contribution were the first item—greatest teaching contribution— more likely to identify their ultimate medical inter- only about one-third of the respondents appeared ests as “basic mechanisms” (87 percent) than were scientifically oriented. That is, approximately one- third said they could make their greatest teaching

168 ■ Chapter 6: Indexes, Scales, and Typologies “researchers.” The strength of this relationship may be summarized as a 36 percentage point difference. The same general conclusion applies to the other bivariate relationships. The strength of the relationship between reading preferences and ultimate medical interests may be summarized as a 38 percentage point difference, and the strength of the relationship between reading preferences and greatest teaching contribution as a 21 percentage point difference. In summary, then, each single item produces a different grouping of “scientific” and “nonscientific” respondents. However, the responses given to each of the items correspond, to a greater or lesser degree, to the responses given to each of the other items. Initially, the three items were selected on the basis of face validity—each appeared to give some indication of faculty members’ orientations to science. By examining the bivariate relationship between the pairs of items, we have found support for the expectation that they all measure basically the same thing. However, that support does not sufficiently justify including the items in a compos- ite index. Before combining them in a single index, we need to examine the multivariate relationships among the several variables. FIGURE 6-2 Multivariate Relationships among Items Bivariate Relationships among Scientific Orientation Items. If several indicators are measures of the same variable, then Figure 6-3 categorizes the sample respondents into they should be empirically correlated with one another, as four groups according to (1) their greatest teaching you can observe in this case. Those who choose the scientific contribution and (2) their reading preferences. The orientation on one item are more likely to choose the scientific numbers in parentheses indicate the number of orientation on other items. respondents in each group. Thus, 66 of the faculty members who said they could best teach as physi- those who answered “physician” (51 percent). cians also said they preferred articles dealing with The fact that the “physicians” are about evenly the effectiveness of treatments. For each of the four split in their ultimate medical interests is irrelevant groups, the figure presents the percentage of those for our purposes. It is only relevant that they are who say they are ultimately more interested in ba- less scientific in their medical interests than the sic mechanisms. So, for example, of the 66 faculty mentioned, 27 percent are primarily interested in basic mechanisms. The arrangement of the four groups is based on a previously drawn conclusion regarding scientific orientations. The group in the upper left corner of the table is presumably the least scientifically oriented, based on greatest teaching contribution

Index Construction ■ 169 FIGURE 6-3 FIGURE 6-4 Trivariate Relationships among Scientific Orientation Items. Hypothetical Trivariate Relationship among Scientific Orienta- Indicators of the same variable should be correlated in a mul- tion Items. This hypothetical relationship suggests that not tivariate analysis as well as in bivariate analyses. Those who all three indicators would contribute effectively to a composite choose the scientific responses on greatest teaching contribu- index. tion and reading preferences are the most likely to choose the scientific response on the third item. original relationship between teaching contribution and ultimate medical interest is essentially the same and reading preferences. The group in the lower as in Figure 6-2, even among those respondents right corner is presumably the most scientifically judged as scientific or nonscientific in terms of read- oriented in terms of those items. ing preferences. Recall that expressing a primary interest in We can draw the same conclusion from the basic mechanisms was also taken as an indication columns in Figure 6-3. Recall that the original rela- of scientific orientation. As we should expect, then, tionship between reading preferences and ultimate those in the lower right corner are the most likely medical interest was summarized as a 38 percent- to give this response (89 percent), and those in the age point difference. Looking only at the “physi- upper left corner are the least likely (27 percent). cians” in Figure 6-3, we see that the relationship The respondents who gave mixed responses in between the other two items is now 31 percentage terms of teaching contributions and reading prefer- points. The same relationship is found among the ences have an intermediate rank in their concern “researchers” in the second column. for basic mechanisms (58 percent in both cases). The importance of these observations be- This table tells us many things. First, we may comes clearer when we consider what might have note that the original relationships between pairs of happened. In Figure 6-4, hypothetical data tell items are not significantly affected by the presence a much different story than the actual data in Fig- of a third item. Recall, for example, that the relation- ure 6-3 do. As you can see, Figure 6-4 shows that ship between teaching contribution and ultimate the original relationship between teaching role and medical interest was summarized as a 36 percent- ultimate medical interest persists, even when read- age point difference. Looking at Figure 6-3, we see ing preferences are introduced into the picture. In that among only those respondents who are most each row of the table, the “researchers” are more interested in articles dealing with the effectiveness likely to express an interest in basic mechanisms of treatments, the relationship between teaching than the “physicians” are. Looking down the col- contribution and ultimate medical interest is 31 per- umns, however, we note that there is no relation- centage points (58 percent minus 27 percent: first ship between reading preferences and ultimate row). The same is true among those most interested medical interest. If we know whether a respondent in articles dealing with the rationale for treatments feels he or she can best teach as a physician or as a (89 percent minus 58 percent: second row). The researcher, knowing the respondent’s reading pref- erence adds nothing to our evaluation of his or her

170 ■ Chapter 6: Indexes, Scales, and Typologies scientific orientation. If something like Figure 6-4 kind of compromise between these conflicting resulted from the actual data, we would conclude desires. that reading preference should not be included in the same index as teaching role, because it contrib- The second decision concerns the actual as- uted nothing to the composite index. signment of scores for each particular response. Basically you must decide whether to give items This example used only three questionnaire in the index equal weight or different weights. items. If more were being considered, then more- Although there are no firm rules, I suggest—and complex multivariate tables would be in order, con- practice tends to support this method—that items structed of four, five, or more variables. The pur- be weighted equally unless there are compelling pose of this step in index construction, again, is to reasons for differential weighting. That is, the bur- discover the simultaneous interaction of the items den of proof should be on differential weighting; in order to determine which should be included in equal weighting should be the norm. the same index. These kinds of data analyses are easily accomplished using programs such as SPSS Of course, this decision must be related to and MicroCase. They are usually referred to as the earlier issue regarding the balance of items cross-tabulations. chosen. If the index is to represent the composite of slightly different aspects of a given variable, then Index Scoring you should give each aspect the same weight. In some instances, however, you may feel that two When you’ve chosen the best items for your index, items reflect essentially the same aspect, and the you next assign scores for particular responses, third reflects a different aspect. If you want to have thereby creating a single composite measure out of both aspects equally represented by the index, you the several items. There are two basic decisions to might give the different item a weight equal to the be made in this step. combination of the two similar ones. For instance, you could assign a maximum score of 2 to the dif- First, you must decide the desirable range of ferent item and a maximum score of 1 to each of the index scores. A primary advantage of an index the similar ones. over a single item is the range of gradations it offers in the measurement of a variable. As noted earlier, Although the rationale for scoring responses political conservatism might be measured from should take such concerns as these into account, “very conservative” to “not at all conservative” typically researchers experiment with different or “very liberal.” How far to the extremes, then, scoring methods, examining the relative weights should the index extend? given to different aspects but at the same time worrying about the range and distribution of cases In this decision, the question of variance enters provided. Ultimately, the scoring method chosen once more. Almost always, as the possible extremes will represent a compromise among these several of an index are extended, fewer cases are to be demands. Of course, as in most research activities, found at each end. The researcher who wishes to such a decision is open to revision on the basis of measure political conservatism to its greatest ex- later examinations. Validation of the index, to be treme (somewhere to the right of Attila the Hun, as discussed shortly, may lead the researcher to re- the saying goes) may find there is almost no one in cycle his or her efforts by constructing a completely that category. At some point, additional gradations different index. do not add meaning to the results. In the example taken from the medical school The first decision, then, concerns the conflict- faculty survey, I decided to weight the items ing desire for (1) a range of measurement in the equally, since I’d chosen them, in part, because index and (2) an adequate number of cases at each they represent slightly different aspects of the point in the index. You’ll be forced to reach some overall variable scientific orientation. On each of the items, the respondents were given a score of 1 for

Index Construction ■ 171 choosing the “scientific” response to the item and a independent of one another, though they contrib- score of 0 for choosing the “nonscientific” response. ute to the same variable. Each respondent, then, could receive a score of 0, 1, 2, or 3. This scoring method provided what I Family Stress is a scale of stressful events within considered a useful range of variation—four index the family. The experience of any one of these categories—and also provided enough cases for events—parent job loss, parent separation, par- analysis in each category. ent illness—is independent of the other events. Indeed, prior research on events utilized in Here’s a similar example of index scoring, stress scales has demonstrated that the events from a study of work satisfaction. One of the key in these scales typically are independent of one variables was job-related depression, measured by an another and reliabilities on the scales low. index composed of the following four items, which asked workers how they felt when thinking about (2005: 176) themselves and their jobs: If the indicators of a variable are logically related to • “I feel downhearted and blue.” one another, on the other hand, it is important to • “I get tired for no reason.” use that relationship as a criterion for determining • “I find myself restless and can’t keep still.” which are the better indicators. • “I am more irritable than usual.” Handling Missing Data The researchers, Amy Wharton and James Baron, report, “Each of these items was coded: Regardless of your data-collection method, you’ll 4 = often, 3 = sometimes, 2 = rarely, 1 = never.” frequently face the problem of missing data. In They go on to explain how they measured another a content analysis of the political orientations of variable, job-related self-esteem: newspapers, for example, you may discover that a particular newspaper has never taken an editorial Job-related self-esteem was based on four items position on one of the issues being studied. In an asking respondents how they saw themselves experimental design involving several retests of in their work: happy/sad; successful/not suc- subjects over time, some subjects may be unable to cessful; important/not important; doing their participate in some of the sessions. In virtually ev- best/not doing their best. Each item ranged ery survey, some respondents fail to answer some from 1 to 7, where 1 indicates a self-perception questions (or choose a “don’t know” response). Al- of not being happy, successful, important, or though missing data present problems at all stages doing one’s best. of analysis, they’re especially troublesome in index construction. There are, however, several methods (1987: 578) of dealing with these problems. As you look through the social research litera- First, if there are relatively few cases with miss- ture, you’ll find numerous similar examples of cu- ing data, you may decide to exclude them from mulative indexes being used to measure variables. the construction of the index and the analysis. (I Sometimes the indexing procedures are controver- did this in the medical school faculty example.) sial, as evidenced in “What Is the Best College in The primary concerns in this instance are whether the United States?” the numbers available for analysis will remain sufficient and whether the exclusion will result in Although it is often appropriate to examine an unrepresentative sample whenever the index, the relationships among indicators of a variable excluding some of the respondents, is used in being measured by an index or scale, you should the analysis. The latter possibility can be exam- realize that the indicators are sometimes indepen- ined through a comparison— on other relevant dent of one another. For example, Stacy De Coster notes that the indicators of family stress may be

172 ■ Chapter 6: Indexes, Scales, and Typologies What Is the Best College in the United States? Each year the newsmagazine U.S.News and World Report issues a In its“best colleges”issue two years ago,U.S.News made special report ranking the nation’s colleges and universities.Their precisely this point,saying it considered only the rank ordering rankings reflect an index,created from several items:educational expen- of per-student expenditures,rather than the actual amounts,on ditures per student,graduation rates,selectivity (percentage accepted the grounds that expenditures at institutions with large research of those applying),average SAT scores of first-year students,and similar programs and medical schools are substantially higher than those indicators of quality. at the rest of the schools in the category.In other words,just two years ago,the magazine felt it unfair to give Caltech,MIT,and Johns Typically,Harvard is ranked the number one school in the nation, Hopkins credit for having lots of fancy laboratories that don’t actu- followed by Yale and Princeton.However,the 1999“America’s Best ally improve undergraduate education. Colleges”issue shocked educators,prospective college students,and their parents.The California Institute of Technology had leaped from ninth Gottlieb reviewed each of the changes in the index and then asked place in 1998 to first place a year later.While Harvard,Yale,and Princeton how 1998’s ninth-ranked Caltech would have done had the revised still did well,they had been supplanted.What had happened at Caltech indexing formula been in place a year earlier.His conclusion:Caltech to produce such a remarkable surge in quality? would have been first in 1998 as well.In other words,the apparent improvement was solely a function of how the index was scored. The answer was to be found at U.S.News and World Report, not at Caltech.The newsmagazine changed the structure of the ranking index Clearly,composite measures such as scales and indexes are valuable in 1999,which made a big difference in how schools fared. tools for understanding society.However,it’s important that we know how those measures are constructed and what that construction implies. Bruce Gottlieb (1999) gives this example of how the altered scor- ing made a difference. For a very different ranking of colleges and universities,you might be interested in the“Webometrics Ranking,”which can be found at the So,how did Caltech come out on top? Well,one variable in a link on this book’s website:http://www.cengage.com/sociology/babbie. school’s ranking has long been educational expenditures per This link details the items included in the index,as well as how they are student,and Caltech has traditionally been tops in this category. combined to produce an overall ranking of the world’s institutions of But until this year,U.S.News considered only a school’s ranking higher education.As of January 2008,MIT was the top-ranked American in this category—first,second,etc.—rather than how much it university,but you’ll have to examine the methodological description to spent relative to other schools.It didn’t matter whether Caltech know what that means. beat Harvard by $1 or by $100,000.Two other schools that rose in their rankings this year were MIT (from fourth to third) and Johns So,what’s really the best college in the United States? It depends Hopkins (from 14th to seventh).All three have high per-student on how you define“best.”There is no“really best,”only the various social expenditures and all three are especially strong in the hard constructions we can create. sciences.Universities are allowed to count their research budgets in their per-student expenditures,though students get no direct Sources:“America’s Best Colleges,”U.S.News and World Report, August 30,1999; benefit from costly research their professors are doing outside of Bruce Gottlieb,“Cooking the School Books:How U.S.News Cheats in Picking Its class. ‘Best American Colleges,’”Slate, August 31,1999 (http://slate.msn.com/default .aspx?id534027). variables— of those who would be included in and of the activities “yes” and left the remainder blank. excluded from the index. In such a case, you might decide that a failure to answer meant “no,” and score missing data in this Second, you may sometimes have grounds case as though the respondents had checked the for treating missing data as one of the available “no” space. responses. For example, if a questionnaire has asked respondents to indicate their participation Third, a careful analysis of missing data may in various activities by checking “yes” or “no” for yield an interpretation of their meaning. In con- each, many respondents may have checked some structing a measure of political conservatism, for

Index Construction ■ 173 example, you may discover that respondents who conclusions using each of the indexes. Understand- failed to answer a given question were generally as ing your data is the final goal of analysis anyway. conservative on other items as those who gave the conservative answer were. In another example, a Index Validation recent study measuring religious beliefs found that people who answered “don’t know” about a given Up to this point, we’ve discussed all the steps in belief were almost identical to the “disbelievers” the selection and scoring of items that result in an in their answers about other beliefs. (Note: You index purporting to measure some variable. If each should take these examples not as empirical guides of the preceding steps is carried out carefully, the in your own studies but only as suggestions of gen- likelihood of the index actually measuring the vari- eral ways to analyze your own data.) Whenever able is enhanced. To demonstrate success, however, the analysis of missing data yields such interpreta- we must show that the index is valid. Following the tions, then, you may decide to score such cases basic logic of validation, we assume that the index accordingly. provides a measure of some variable; that is, the scores on the index arrange cases in a rank order in There are many other ways of handling the terms of that variable. An index of political conser- problem of missing data. If an item has several vatism rank-orders people in terms of their relative possible values, you might assign the middle conservatism. If the index does that successfully, value to cases with missing data; for example, you then people scored as relatively conservative on could assign a 2 if the values are 0, 1, 2, 3, and 4. the index should appear relatively conservative in For a continuous variable such as age, you could all other indications of political orientation, such as similarly assign the mean to cases with missing their responses to other questionnaire items. There data (more on this in Chapter 14). Or, missing data are several methods of validating an index. can be supplied by assigning values at random. All of these are conservative solutions because they Item Analysis weaken the “purity” of your index and reduce the likelihood that it will relate to other variables in The first step in index validation is an internal vali- ways you may have hypothesized. dation called item analysis. In item analysis, you examine the extent to which the index is related If you’re creating an index out of a large num- to (or predicts responses to) the individual items it ber of items, you can sometimes handle missing comprises. Here’s an illustration of this step. data by using proportions based on what is ob- served. Suppose your index is composed of six in- In the index of scientific orientations among dicators, and you only have four observations for a medical school faculty, index scores ranged from particular subject. If the subject has earned 4 points 0 (most interested in patient care) to 3 (most in- out of a possible 4, you might assign an index score terested in research). Now let’s consider one of the of 6; if the subject has 2 points (half the possible items in the index: whether respondents wanted score on four items), you could assign a score of 3 to advance their own knowledge more with regard (half the possible score on six observations). to total patient management or more in the area of basic mechanisms. The latter were treated as The choice of a particular method to be used being more scientifically oriented than the former. depends so much on the research situation that I The following empty table shows how we would can’t reasonably suggest a single “best” method or rank the several I’ve described. Excluding all cases item analysis An assessment of whether each of with missing data can bias the representativeness of the items included in a composite measure makes an the findings, but including such cases by assign- independent contribution or merely duplicates the ing scores to missing data can influence the nature contribution of other items in the measure. of the findings. The safest and best method is to construct the index using more than one of these methods and see whether you reach the same


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook