Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore -Earl_Babbie-_The_Practice_of_Social_Research(BookFi)

-Earl_Babbie-_The_Practice_of_Social_Research(BookFi)

Published by dinakan, 2021-08-12 20:20:06

Description: e-Book ini adalah untuk tujuan pembacaan sahaja dan tidak berasaskan sebarang keuntungan.

Search

Read the Text Version

Conceptualization 125 find out for example, whether women are more mentalist religious cult particularly their harsh compassionate than men. I suspect many people views on various groups: gays, nonbelievers, femi- assume iliis is the case, but it might be interesting nists, and others. In fact they suggest that anyone to find out if it's really so. We can't meaningfully who refuses to join their group and abide by its study the question, let alone agree on the answer, teachings will \"burn in hell.\" In the context of your without some working agreements about the interest in compassion, they don't seem to have meaning of compassion. They are \"working\" agree- much. And yet, the group's literature often speaks ments in the sense that they allow us to work on of their compassion for others. You want to explore the question. We don't need to agree or even pre- this seeming paradox. tend to agree that a particular specification is ulti- mately the best one. To pursue this research interest you might ar- range to interact with cult members, getting to Conceptualization, then, produces a specific, know them and learning more about their views. agreed-on meaning for a concept for the purposes You could tell them you were a social researcher of research. This process of specifying exact mean- interested in learning about their group, or perhaps ing involves describing the indicators we'll be using you would just express an interest in learning to measure our concept and the different aspects of more, without saying why. the concept called dimensions. In the course of your conversations with group Indicators and Dimensions members and perhaps attendance of religious ser- vices, you would put yourself in situations where Conceptualization gives definite meaning to a con- you could come to understand what the cult mem- cept by specifying one or more indicators of what bers mean by compassion. You might learn, for ex- we have in mind. An indicator is a sign of the ample, that members of the group were so deeply presence or absence of the concept we're studying. concerned about sinners burning in hell that they Here's an example. were \"villing to be aggressive, even violent to make people change their sinful ways. Within their own We might agree that visiting children'S hospitals paradigm, then, cult members would see beating during Christmas and Hanukkah is an indicator of up gays, prostitutes, and abortion doctors as acts of compassion. Putting little birds back in their nests compassion. might be agreed on as another indicator, and so forth. If the unit of analysis for our study is the in- Social researchers focus their attention on the dividual person, we can then observe the presence meanings that the people under study give to or absence of each indicator for each person under words and actions. Doing so can often clarify the study. Going beyond that, we can add up the num- behaviors observed: At least now you understand ber of indicators of compassion observed for each how the cult can see violent acts as compassionate. individual. We might agree on ten specific On the other hand, paying attention to what words indicators, for example, and find six present in our and actions mean to the people under study almost study of Pat three for John, nine for Mary, and always complicates the concepts researchers are so forth. interested in. (We'll return to this issue when we discuss the validity of measures, toward the end of Returning to our question about whether men this chapter.) or women are more compassionate, we might cal- culate that the women we studied displayed an Whenever we take our concepts seriously and average of 6.5 indicators of compassion, the men set about specifying what we mean by them, we an average of 3.2. On the basis of our quantitative analysis of group difference, we might therefore indicator An observation that we choose to con- conclude that women are, on the whole, more sider as a reflection of a variable we wish to study. compassionate than men. Thus, for example, attending religious services might be considered an indicator of religiosity. Usually, though, it's not that simple. Imagine you're interested in understanding a small funda-

126 Chapter 5: Conceptualization, Operationalization, and Measurement discover disagreements and inconsistencies. Not found existing assumptions were not precise only do you and I disagree, but each of us is likely enough for their purposes: to find a good deal of muddiness within our own mental images. If you take a moment to look at The United Nations originally defined it as an 'what you mean by compassion, you'll probably attempt to destroy \"in whole or in part, a na- find that your image contains several kinds of com- tional, ethnic, racial, or religious group.\" If passion. That is, the entries on your mental file genocide is distinct from other types of vio- sheet can be combined into groups and subgroups, lence, it requires its own unique explanation. say, compassion toward friends, co-religionists, hu- mans, and birds. You may also find several different (2003 14) strategies for making combinations. For example, you might group the entries into feelings and Notice the final comment in this excerpt, as it pro- actions. vides an important insight into why researchers are so careful in specifying the concepts they study. If The technical term for such groupings is genocide, such as the Holocaust, were simply an- dimension: a specifiable aspect of a concept. For other example of violence, like assaults and homi- instance, we might speak of the \"feeling dimen- cides, then what we know about violence in gen- sion\" of compassion and the \"action dimension\" of eral might e::\\:plain genocide. If it differs from other compassion. In a different grouping scheme, we forms of violence, then we may need a different might distinguish \"compassion for humans\" from explanation for it. So, the researchers began by \"compassion for animals.\" Or we might see com- suggesting that \"genocide\" was a concept distinct paSSion as helping people have what we want for from \"violence\" for their purposes. them versus what they want for themselves. Still differently, we might distinguish compassion as Then, as Chirot and Edwards examined histori- forgiveness from compassion as pity. cal instances of genocide, they began concluding that the motivations for launching genocidal may- Thus, we could subdivide compassion into hem differed sufficiently to represent four distinct several clearly defined dimensions. A complete phenomena that were all called \"genocide\" (2003: conceptualization involves both specifying dimen- 15-18). sions and identifying the various indicators for eacl1. I. COlZvenience: Sometimes the attempt to eradicate a group of people serves a function for the Sometimes conceptualization aimed at identify- eradicators, such as Julius Caesar's attempt to ing different dimensions of a variable leads to a dif- eradicate tribes defeated in battle, fearing they ferent kind of distinction. We may conclude that would be difficult to rule. Or when gold was we've been using the same word for meaningfully discovered on Cherokee land in the Southeast- distinguishable concepts. In the follOwing example, ern United States in the early nineteenth cen- the researchers find (1) that \"violence\" is not a tury, the Cherokee were forcibly relocated to sufficient description of \"genocide\" and (2) that the Oklahoma in an event known as the \"Trail of concept \"genocide\" itself comprises several distinct Tears,\" which ultimately killed as many as half phenomena. Let's look at the process they went of those forced to leave. through to come to this conclusion. 2. Revenge: When the Chinese of Nanking bravely When Daniel Chirot and Jennifer Edwards at- resisted the Japanese invaders in the early years tempted to define the concept of \"genocide,\" they of World War II, the conquerors felt they had been insulted by those they regarded as inferior dimension A specifiable aspect of a concept. \"Reli- beings. Tens of thousands were slaughtered in giosity,\" for example, might be specified in terms of the \"Rape of Nanking\" in 1937-1938. a belief dimension, a ritual dimension, a devotional dimension, a knowledge dimension, and so forth. 3.. Fear: The ethnic cleansing that recently oc- curred in the former Yugoslavia was at least

Conceptualization 127 partly motivated by economic competition moment, that you and I have compiled a list of 100 and worries that the growing Albanian indicators of compassion and its various dimen- population of Kosovo was gaining political sions. Suppose further that we disagree widely on strength through numbers. Similarly, the Hutu which indicators give the clearest evidence of com- attempt to eradicate the Tutsis of Rwanda grew passion or its absence, If we pretty much agree on out of a fear that returning Tutsi refugees some indicators, we could focus our attention on would seize control of the country. Often those, and we would probably agree on the answer intergroup fears such as these grow out of long they provided. We would then be able to say that histories of atrocities, often inflicted in both some people are more compassionate than others directions. in some dimension, But suppose we don't really 4. Purification: The Nazi Holocaust, probably the agree on any of the possible indicators. Surpris- most publicized case of genocide, was intended ingly, we can still reach an agreement on whether as a purification of the \"Aryan race.\" While men or women are the more compassionate. How Jews were the main target, gypsies, homo- we do that has to do with the interchangeability of sexuals, and many other groups were also included. Other examples include the Indo- indicators. nesian witch-hunt against communists in The logic works like this. If we disagree totally 1965-1966 and the attempt to eradicate all non-Khmer Cambodians under Pol Pot in the on the value of the indicators, one solution would I970s. be to study all of them. Suppose that women turn out to be more compassionate than men on all 100 No single theory of genocide could explain these indicators-on all the indicators you favor and on various forms of mayhem. Indeed, this act of con- all of mine. Then we would be able to agree that ceptualization suggests four distinct phenomena, women are more compassionate than men, even each needing a different set of explanations. though we still disagree on exactly what compas- sion means in generaL Specifying the different dimensions of a con- cept often paves the way for a more sophisticated The interchangeability of indicators means understanding of what we're studying. We might that if several different indicators all represent, to observe, for example, that women are more com- some degree, the same concept, then all of them passionate in terms of feelings, and men more so in will behave the same way that the concept would terms of actions-or vice versa. Whichever turned behave if it were real and could be observed. Thus, out to be the case, we would not be able to say given a basic agreement about what \"compassion\" whether men or women are really more compas- is, if women are generally more compassionate sionate. Our research would have shown that there than men, we should be able to observe that is no single answer to the question. That alone rep- difference by using any reasonable measure of resents an advance in our understanding of reality. compassion. If, on the other hand, women are To get a better feel for concepts, variables, and indi- more compassionate than men on some indicators cators, go to the General Social Survey codebook but not on others, we should see if the two sets and explore some of the ways the researchers have of indicators represent different dimensions of measured various concepts: http:// www. icpsr .umich.edu/GSS99/subject/s-indexJltm. compassion. You have now seen the fundamental logic of The Interchangeability of Indicators conceptualization and measurement. The discus- There is another way that the notion of indicators sions that follow are mainly refinements and ex- can help us in our attempts to understand reality tensions of what you've just read. Before turning to by means of \"unreal\" constructs. Suppose, for the a technical elaboration of measurement, however, we need to fill out the picture of conceptualization by looking at some of the ways social researchers provide standards, consistency, and commonality for the meanings of terms.

128 Ii Chapter 5: Conceptualization, Operationalization, and Measurement ReaL NominaL disagreement and confusion over what a term \"re- and Operational Definitions ally\" means, we can specify a working definition for the purposes of an inquiry. Wishing to examine As we have seen, the design and execution of social socioeconomic status (SES) in a study. for example, research requires us to clear away the confusion we may simply specify that we are going to treat over concepts and reality To this end, logicians and SES as a combination of income and educational scientists have found it useful to distinguish three attainmenL In this decision, we rule out other pos- kinds of definitions: reaL nominaL and operationaL sible aspects of SES: occupational status, money in the bank, property, lineage, lifestyle, and so forth. The first of these reflects the reification of Our findings will then be interesting to the extent terms. As Carl Hempel cautions, that our definition of SES is useful for our purpose. A \"real\" definition, according to traditional Creating Conceptual Order logic, is not a stipulation determining the meaning of some expression but a statement The clarification of concepts is a continuing process of the \"essential nature\" or the \"essential attri- in social research. Catherine Marshall and butes\" of some entity. The notion of essential Gretchen Rossman (1995: 18) speak of a \"concep- nature, however, is so vague as to render tlus tual funnel\" through which a researcher's interest characterization useless for the purposes of rig- becomes increasingly focused. Thus, a general in- orous inquiry. terest in social activism could narrow to \"individu- als who are committed to empowerment and social (1952: 6) change\" and further focus on discovering \"what experiences shaped the development of fully com- In' other words, trying to specify the \"real\" meaning mitted social activists,\" This focusing process is in- of concepts only leads to a quagmire: It mistakes a escapably linked to the language we use. construct for a real entity. In some forms of qualitative research, the The specification of concepts in scientific in- clarification of concepts is a key element in the col- quiry depends instead on nominal and operational lection of data. Suppose you were conducting inter- definitions. A nominal definition is one that is simply views and observations in a radical political group assigned to a term without any claim that the defini- devoted to combating oppression in U.S. society. tion represents a \"real\" entity Nominal definitions Imagine how the meaning of oppression would are arbitrary-I could define compassion as \"plucking shift as you delved more and more deeply into the feathers off helpless birds\" if I wanted to-but they members' experiences and worldviews, For ex- can be more or less usefuL For most purposes, espe- ample, you might start out thinking of oppression cially communication, that last definition of com- in physical and perhaps economic terms, The more passion would be pretty useless\" Most nominal you learned about the group, however, the more definitions represent some consensus, or conven- you might appreciate the possibility of psychologi- tion, about how a particular term is to be used. cal oppression. An operarional definition, as you may remember The same point applies even to contexts where from an earlier chapter, specifies precisely how a meanings might seem more fixed. In the analysis of concept will be measured-that is, the operations textual materials, for example, social researchers we'll perform. An operational definition is nominal sometimes speak of the \"hermeneutic circle,\" a rather than reaL but it has the advantage of achiev- cyclical process of ever-deeper understanding. ing maximum clarity about what a concept means in the context of a given study. In the midst of The understanding of a text takes place through a process in which the meaning of the separate specification The process through which concepts parts is determined by the global meaning of are made more specific.

Conceptualization 129 the text as it is anticipated. The closer determi- place on various possible observations. All these nation of the meaning of the separate parts may further specifications make up the operational eventually change the originally anticipated definition of the concept. meaning of the totality, which again influences the meaning of the separate parts, and so on\" In the example of socioeconomic status, we might decide to ask survey respondents two (K\\'ale 1996 47) questions, corresponding to the decision to mea- sure SES in terms of income and educational Consider the concept \"prejudice.\" Suppose attainment: you needed to write a definition of the term. You might start out thinking about racial/ethnic preju- 1. What was your total family income during dice. At some point you would realize you should the past 12 months? probably allow for gender prejudice, religiOUS prej- udice, antigay prejudice, and the like in your 2, What is the highest level of school you definition. Examining each of these specific types completed? of prejudice would affect your overall understand- ing of the general concept. As your general under- To organize our data, we'd probably want to standing changed, however, you would likely specify a system for categorizing the answers see each of the individual forms somewhat people give us. For income, we might use cate- differently. gories such as \"under $5,000,\" \"$5,000 to $10,000,\" and so on, Educational attainment might The continual refinement of concepts occurs in be similarly grouped in categories: less than high all social research methods, Often you will find schooL high schooL college, graduate degree. Fi- yourself refining the meaning of important con- nally, we would specify the way a person's re- cepts even as you write up your final report, sponses to these two questions would be combined in creating a measure of SES. Although conceptualization is a continuing process, it is vital to address it specifically at the be- In this way we would create a working and ginning of any study design, especially rigorously workable definition of SES, Although others structured research deSigns such as surveys and ex- might disagree with our conceptualization and periments. In a survey, for example, operationaliza- operationalization, the definition would have one tion results in a commitment to a specific set of essential scientific virtue: It would be absolutely questionnaire items that will represent the concepts specific and unambiguous. Even if someone dis- under study, Without that commitment, the study agreed with our definition, that person would could not proceed. have a good idea how to interpret our research re- sults, because what we meant by SES-reflected in Even in less-structured research methods, how- our analyses and conclusions-would be precise ever, it's important to begin with an initial set of and clear. anticipated meanings that can be refined during data collection and interpretation. No one seriously Here is a diagram showing the progression of believes we can observe life with no preconcep- measurement steps from our vague sense of what a tions; for this reason, scientific observers must be term means to specific measurements in a fully conscious of and explicit about these conceptual structured scientific study: starting points. Conceptualization Let's explore initial conceptualization the way it applies to structured inquiries such as surveys and J. experiments, Though specifying nominal defini- tions focuses our observational strategy, it does not Nominal Definition allow us to observe, As a next step we must specify exactly what we are going to observe, how we will J. do it, and what interpretations we are going to Operational Definition 1 Measurements in the Real World

130 Chapter 5: Conceptualization, Operationalization, and Measurement An Example of Conceptualization: Durkheim's use. Robert Merton, in a classic Tl7e Concept ofAnomie article entitled \"Social Structure and Anomie\" (1938), concluded that anomie results from a dis- To bring this discussion of conceptualization in re- parity between the goals and means prescribed search together, let's look briefly at the history of a by a society. Monetary success, for example, is a specific social scientific concept. Researchers study- widely shared goal in our society, yet not all indi- ing urban riots are often interested in the part \\riduals have the resources to achieve it through played by feelings of powerlessness. Social scientists acceptable means. An emphasis on the goal itself, sometimes use the word anomie in this context. Merton suggested, produces normlessness, because This term was first introduced into social science by those denied the traditional avenues to wealth Emile Durkheim, the great French sociologist, in go about getting it through illegitimate means. his classic 1897 study, Suicide. Merton's discussion, then, could be considered a further conceptualization of the concept of Using only government publications on suicide anomie. rates in different regions and countries, Durkheim produced a work of analytic genius. To determine Although Durkheim originally used the con- the effects of religion on suicide, he compared the cept of anomie as a characteristic of societies, as suicide rates of predominantly Protestant countries did Merton after him, other social scientists have with those of predominantly Catholic ones, Protes- used it to describe individuals. To clarify this dis- tant regions of Catholic countries with Catholic tinction, some scholars have chosen to use anomie regions of Protestant countries, and so forth. To in reference to its originaL societal meaning and to determine the possible effects of the weather, he use the term anomia in reference to the individual compared suicide rates in northern and southern characteristic. In a given SOciety, then, some indi- countries and regions, and he examined the differ- viduals experience anomia, and others do not. ent suicide rates across the months and seasons Elwin PowelL writing 20 years after Merton, pro- of the year. Thus, he could draw conclusions vided the following conceptualization of anomia about a supremely individualistic and personal act (though using the term anomie) as a characteristic vvithout having any data about the individuals of individuals: engaging in it.. When the ends of action become contradictory, At a more general level, Durkheim suggested inaccessible or insignificant, a condition of that suicide also reflects the extent to which a anomie arises. Characterized by a general loss society'S agreements are clear and stable. Noting of orientation and accompanied by feelings of that times of social upheaval and change often \"emptiness\" and apathy, anomie can be simply present individuals with grave uncertainties about conceived as meaninglessness. what is expected of them, Durkheim suggested that such uncertainties cause confusion, anxiety, (1958132) and even self-destruction. To describe this societal condition of normlessness, Durkheim chose the Powell went on to suggest there were two term anomie. Durkheim did not make this word up. distinct kinds of anomia and to examine how the Used in both German and French, it literally two rose out of different occupational experiences meant \"without law.\" The English term anomy to result at times in suicide. In his study. however, had been used for at least three centuries before Powell did not measure anomia per se; he studied Durkheim to mean disregard for divine law. the relationship between suicide and occupation, However. Durkheim created the social scientific making inferences about the two kinds of concept of anomie. anomia. Thus, the study did not provide an operational definition of anomia, only a further In the years that have followed the publication conceptualization. of SuiCIde, social scientists have found anomie a useful concept, and many have expanded on Although many researchers have offered oper- ational definitions of anomia, one name stands out

Conceptualization 131 by Leo 5role underlying disintegrated social bonds We needed to work expeditiously, ycareer-long fixation on anomie began with reading Durkheim's deemphasizing proliferation of macro-level theory in favor ofadirect ex- Le Suicide as aHarvard undergraduate Later, as agraduate stu- dent at Chicago, Istudied under two Durkheimian anthropologists ploratory encounter with individuals,using state-of- William Lloyd Warner and Alfred Radcliffe-Brovm Radcliffe-Brown had carried on alively correspondence with Durkheim,making me acollat- the-an survey research methodology Such research,1 also felt,should eral \"descendant\" of the great French sociologist For me, the early impact of Durkheim's work on suicide was mixed focus on abroader spectrum of behavioral pathologies than suicide but permanent On the one hand, Ihad serious reservations about his strenuous, ingenious, and orren awkward efforts to force the crude, bu- My initial investigations were adiverse effort.ln 1950, for example, reaucratic records on suicide rates to fit with his unidirectional sociologi- cal determinism On the other hand, Iwas moved by Durkheim's Iwas able to interview asample of 401 bus riders in Springfield, Massa- unswerving preoccupation with the moral force of the interpersonal ties that bind us to our time, place, and past, and also his insights about chusetts Four years later, the Midtown i'v1anhattan Mental Health Study the lethal consequences that can folloVi from shrinkage and decay in those ties provided amuch larger population reach.These and other field projects My interest in anomie received an eyewitness jolt at the finale of World War II, when Iserved with the United Nations Relief and Rehabili- gave me scope to expand and refine my measurements of that quality in tation Administration, helping to rebuild awar-torn Europe At the Nazi concentration camp of Dachau, Isaw firsthand the depths of dehuman- individuals which reflected the macro-social quality Durkheim had called ization that macrosocial forces, such as those that engaged Durkheim, anomie could produce in individuals like Hitler, Eichmann, and the others serving their dictates at all levels in the Nazi death factories. While Ibegan by using Durkheim's term in my ovm vlork, Isoon Returning from my UNRRA post,! felt most urgently thatthe time Vias long overdue to come to an understanding ofthe dynamics decided that it was necessary to limit the use of that concept to its macro-social meaning and to sharply segregate it from its individual manifestations For the latter purpose, the cognate but hitherto obsolete Greek term,onomia, readily suggested itself Ifirst published the anomia construct in a1956 article in the Amer- ican Sociological RfVlev'I, describing ways of operationalizing it, and pre- senting the results of its initial field application research By 1982, the Sci- ence Citation Index and Social Science Citation Index had listed sorne 400 publications in political science, psychology, social work, and sociology journals here and abroad that had cited use of that article's instruments or findings, warranting the Arnerican Institute for Scientific Inforrnation to deSignate it a\"citation classic\" over alL Two years before Powell's article appeared, 5. There's little use writing to public officials be- Leo Srole (1956) published a set of questionnaire cause they aren't really interested in the prob- items that he said provided a good measure of lems of the average man. anomia as experienced by individuals. It consists of five statements that subjects were asked to agree or (1956 713) disagree with: In the half-century following its publication, I. In spite of what some people say, the lot of the the Srole scale has become a research staple for so- average man is getting worse. cial scientists. You'll likely find this particular oper- ationalization of anomia used in many of the re- 2. It's hardly fair to bring children into the world search projects reported in academic journals. Srole with the way things look for the future. touches on this in the accompanying box, \"The Origins of Anomia,\" which he prepared for this 3. Nowadays a person has to live pretty much for book before his death. today and let tomorrow take care of itself. This abbreviated history of anomie and anomia 4. These days a person doesn't really know who as social scientific concepts illustrates several points. he can count on.

132 EI Chapter 5: Conceptualization, Operationalizatior1, and Measurement First, it's a good example of the process through implications for definition and measurement. If it which general concepts become operationalized seems that description is simpler than explanation, measurements. This is not to say that the issue of you may be surprised to learn that definitions are how to operationalize anomie/anomia has been re- more problematic for descriptive research than for solved once and for all. Scholars will surely con- explanatory research. Before we turn to other as- tinue to reconceptualize and reoperationalize these pects of measurement. you'll need a basic under- concepts for years to come, continually seeking standing of why this is so (we'll discuss this point more-useful measures. more fully in Part 4). The Srole scale illustrates another important It's easy to see the importance of clear and pre- point. Letting conceptualization and operationaliza- cise definitions for descriptive research. If we want tion be open-ended does not necessarily produce to describe and report the unemployment rate in a anarchy and chaos, as you might expect. Order of- city, our definition of being unemployed is obvi- ten emerges. For one thing, although \\'le could ously critical. That definition will depend on our define anomia any way we chose-in terms of. say, definition of another term: the labor force. If it shoe size-we're likely to define it in ways not too seems patently absurd to regard a three-year-old different from other people's mental images. If you child as being unemployed, it is because such a were to use a really offbeat definition, people child is not considered a member of the labor force. would probably ignore you. Thus, we might follow the U.S. Census Bureau's convention and exclude all people under 14 years A second source of order is that. as researchers of age from the labor force. discover the utility of a particular conceptualization and operationalization of a concept. they're likely to This convention alone, however. would not adopt it. which leads to standardized definitions of give us a satisfactory definition, because it would concepts. Besides the Srole scale, examples include count as unemployed such people as high school IQ tests and a host of demographic and economic students, the retired, the disabled, and homemak- measures developed by the US. Census Bureau. ers. We might follow the census convention further Using such established measures has two advan- by defining the labor force as \"all persons 14 years tages: They have been extensively pretested and de- of age and over who are employed, looking for bugged, and studies using the same scales can be work, or waiting to be called back to a job from compared. If you and I do separate studies of two which they have been laid off or furloughed.\" If a different groups and use the Srole scale, we can student, homemaker. or retired person is not look- compare our two groups on the basis of anomia. ing for work, such a person would not be included in the labor force. Unemployed people, then, Social scientists, then, can measure anything would be those members of the labor force, as that's real; through conceptualization and opera- defined, who are not employed. tionalization, they can even do a pretty good job of measuring things that aren't. Granting that such But what does \"looking for work\" mean? Must concepts as socioeconomic status, prejudice, com- a person register with the state employment service passion, and anomia aren't ultimately real. social sci- or go from door to door asking for employment? entists can create order in handling them. It is an or- Or would it be sufficient to want a job or be open der based on utility, however. not on ultimate truth. to an offer of employment? Conventionally, \"look- ing for work\" is defined operationally as saying yes Definitions in Descriptive in response to an interviewer's asking \"Have you and Explanatory Studies been looking for a job during the past seven days?\" (Seven days is the period most often specified, but As you'll recall from Chapter 4, two general pur- for some research purposes it might make more poses of research are description and explanation. sense to shorten or lengthen iLl The distinction between them has important As you can see, the conclusion of a descriptive study about the unemployment rate depends di- rectly on how each issue of definition is resolved.

Definitions in Descriptive and Explanatory Studies 133 Increasing the period during which people are result in consistent research findings. However, counted as looking for work would add more un- such consistency does not appear in a descriptive employed people to the labor force as defined, situation. Changing definitions almost inevitably thereby increasing the reported unemployment results in different descriptive conclusions. \"The rate. If we follow another convention and speak of Importance of Variable Names\" explores this issue the civilian labor force and the civilian unemploy- in connection vvith the variable cirizen participation. ment rate, we're excluding military personnel; that, too, increases the reported unemployment rate, be- Operationalization Choices cause military personnel would be employed-by definition. Thus, the descriptive statement that the In discussing conceptualization, I frequently have unemployment rate in a city is 3 percent, or 9 per- referred to operationalization, for the two are inti- cent. or whatever it might be, depends directly on mately linked. To recap: Conceptualization is the the operational definitions used. refinement and specification of abstract concepts, and operationalization is the development of This example is relatively clear because there specific research procedures (operations) that will are several accepted conventions relating to the la- result in empirical observations representing those bor force and unemployment. Now, consider how concepts in the real world. difficult it would be to get agreement about the definitions you would need in order to say, \"Fort1'- As with the methods of data collection, social five percent of the students at this institution are researchers have a variety of choices when opera- politically conservative.\" Like the unemployment tionalizing a concept Although the several choices rate, this percentage would depend directly on the are intimately interconnected, I've separated them definition of what is being measured-in this case, for the sake of discussion. Realize, though, that op- political conservatism. A different definition might erationalization does not proceed through a sys- result in the conclusion \"Five percent of the stu- tematic checklisL dent body are politically conservative.\" Range of Variation Ironically, definitions are less problematic in the case of explanatory research. Let's suppose we're In operationalizing any concept. researchers must interested in ex-plaining political conservatism. be clear about the range of variation that interests Why are some people conservative and others not? them. The question is, to what extent are they will- More specifically, let's suppose we're interested in ing to combine attributes in fairly gross categories? whether conservatism increases with age. What if you and I have 25 different operational definitions Let's suppose you want to measure people's in- of cOllsell'ative, and we can't agree on which comes in a study by collecting the information from definition is best? As we saw in the discussion of either records or interviews. The highest annual in- indicators, this is not necessarily an insurmount- comes people receive run into the millions of dol- able obstacle to our research. Suppose we found lars, but not many people get that much. Unless old people to be more conservative than young you're studying the very rich, it probably won't add people in terms of all 25 definitions. Clearly, the much to your study to keep track of extremely high exact definition wouldn't matter much. We would categories. Depending on whom you study, you'll conclude that old people are generally more con- probably want to establish a highest income cate- servative than young people-even though we gory with a much lower floor-maybe $100,000 or couldn't agree about exactly what conservative more. Although this decision will lead you to throw means. together people who earn a trillion dollars a year with paupers earning a mere $100,000, they'll sur- In practice, explanatory research seldom results vive it. and that mixing probably won't hurt your in findings quite as unambiguous as this example research any, either. The same decision faces you at suggests; nonetheless, the general pattern is quite common in actual research. There are consistent patterns of relationships in human social life that

134 Chapter 5: Conceptualization, Operationalization, and Measurement by Patricia Fisher keep atally ofattendance by private citizens at city commission and other local government meetings; another might maintain arecord of Graduate School ofPlanning, University ofTennessee the different topics addressed by private citizens at similar meetings; while athird might record the number of local government meeting at- perationalization is one of those things that's easier said than tendees, letters, and phone calls received by the mayor and other public is quite simple to explain to someone the purpose and im- officials, and meetings held by special interest groups during aparticular time period As skilled researchers, we can readily see that each planner portance of operational definitions for variables, and even to describe would be measuring (in avery simplistic fashion) adifferent dimension how operationalization typically takes place. However, until you've tried of citizen participation:extent of citizen participation, issues prompting to operationalize arather complex variable, you may not appreciate some citizen participation, and form of citizen participationTherefore, the origi- of the subtle difficulties involved. Of considerable importance to the op- nal naming of our variable, citizen participation, which was quite satisfac- erationalization effort is the particular name that you have chosen for a tory from aconceptual point of view, proved inadequate for purposes of variable. Let's consider an example from the field of Urban Planning. operationalization Avariable of interestto planners is citizen participation. Planners are The precise and exact naming ofvariables is important in research. convinced that participation in the planning process by citizens is impor- It is both essential to and aresult of good operationalization.Variable tant to the success of plan implementation Citizen participation is an aid names quite often evolve from an iterative process of forming aconcep- to planners' understanding of the real and perceived needs of acommu- tual definition, then an operational definition, then renaming the concept nity, and such involvement by citizens tends to enhance their cooperation to better match what can or will be measuredThis looping process con- with and support for planning efforts Although many different concep- tinues (our example illustrates only one iteration), resulting in agradual tual definitions might be offered by different planners, there would be refinement of the variable name and its measurement until areasonable little misunderstanding over what is meant by citizen participationThe fit is obtained. Sometimes the concept of the variable that you end up name of the variable seems adequate with is abit different from the original one that you started with, but at least you are measuring what you are talking about, if only because you However, if vie ask different planners to provide very simple opera- are talking about what you are measuring! tional measures for citizen participation, we are likely to find avariety among their responses that does generate confusion One planner might the other end of the income spectrum. In studies of swer categories ranging from \"Favor it very much\" the general US popUlation, a bottom category of to \"Don't favor it at alL\" S5,000 or less usually works fine. This operationalization, however, conceals half In studies of attitudes and orientations, the the attitudinal spectrum regarding nuclear energy. question of range of variation has another dimen- Many people have feelings that go beyond simply sion. Unless you're carefuL you may end up mea- not favoring it: They are, with greater or lesser de- suring only half an attitude without really meaning grees of intensity, actively opposed to it. In this in- to. Here's an example of what I mean. stance, there is considerable variation on the left side of zero. Some oppose it a little, some quite a Suppose you're interested in people's attitudes bit, and others a great deaL To measure the full toward expanding the use of nuclear power gener- range of variation, then, you'd want to operational- ators. You'd anticipate that some people consider ize attitudes toward nuclear energy with a range nuclear power the greatest thing since the wheeL from favoring it very much, through no feelings whereas other people have absolutely no interest in one way or the other, to opposing it very much. iL Given that anticipation, it would seem to make sense to ask people how much they favor expand- This consideration applies to many of the ing the use of nuclear energy and to give them an- variables social scientists study. Virtually any public

Operationalization Choices 135 issue involves both support and opposition, each in inquiry whether a person is a conservative Demo- varying degrees. Political orientations range from crat rather than a liberal Democrat, or will it be very liberal to very conservative, and depending on sufficient to know the party? In measuring reli- the people you're studying, you may want to allow giolls affiliation, is it enough to know that a person for radicals on one or both ends. Similarly, people is Protestant, or do you need to know the denomi- are not just more or less religious; some are posi- nation? Do you simply need to know whether or tivelyantireligious. not a person is married, or will it make a difference to know if he or she has never married or is sepa- The point is not that you must measure the full rated, widowed, or divorced? range of variation in every case. You should, how- ever, consider whether you need to, given your There is, of course, no general answer to such particular research purpose. If the difference be- questions. The answers come out of the purpose of tween not religious and antireligious isn't relevant a given study, or why we are making a particular to your research, forget it. Someone has defined measurement. I can give you a useful guideline, pragmatism as \"any difference that makes no differ- though. Whenever you're not sure how much de- ence is no difference.\" Be pragmatic. tail to pursue in a measurement, get too much rather than too little. When a subject in an in-depth Finally, decisions on the range of variation interview volunteers that she is 37 years old, record should be governed by the expected distribution of \"37\" in your notes, not \"in her thirties.\" When attributes among the subjects of the study. In a you're analyzing the data, you can always combine study of college professors' attitudes toward the precise attributes into more general categories, but value of higher education, you could probably stop you can never separate any variations you lumped at no value and not worry about those who might together during observation and measurement. consider higher education dangerous to students' health. (If you were studying students, however ... j Variations between the Extremes ANote on Dimensions Degree of precision is a second consideration in op- We've already discussed dimensions as a character- erationalizing variables. What it boils down to is istic of concepts. When researchers get down to how fine you \"'ill make distinctions among the the business of creating operational measures of various possible attributes composing a given vari- variables, they often discover-or worse, never able. Does it matter for your purposes whether a notice-that they're not exactly dear about which person is 17 or 18 years old, or could you conduct dimensions of a variable they're really interested in. your inquiry by throwing them together in a group Here's an example. labeled 10 to 19 years old? Don't answer too quickly. If you wanted to study rates of voter regis- Let's suppose you're studying people's attitudes tration and participation, you'd definitely want to toward government, and you want to include an know whether the people you studied were old examination of how people feel about corruption. enough to vote. In general, if you're going to mea- Here are just a few of the dimensions you might sure age, you must look at the purpose and proce- examine: dures of your study and decide whether fine or gross differences in age are important to you. In a e Do people think there is corruption in survey, you'll need to make these decisions in order government? to design an appropriate questionnaire. In the case of in-depth interviews, these decisions will condi- e How much corruption do they think there is? tion the extent to which you probe for detailso e How certain are they in their judgment of how The same thing applies to other variables. If you much corruption there is? measure political affiliation, will it matter to your e How do they feel about corruption in govern- ment as a problem in society? e What do they think causes it?

136 \" Chapter 5: Conceptualization, Operationalization, and Measurement Do they think it's inevitable? also be expanded to include the other possibilities What do they feel should be done about it? discussed earlier, such as homemaker. What are they willing to do personally to elimi- nate corruption in government? Every variable must have two in1portant How certain are they that they would be will- qualities. First, the attributes composing it should ing to do what they say they would do? be exhaustive. For the variable to have any utility in research, we must be able to classify every The list could go on and on-how people feel observation in terms of one of the attributes com- about corruption in government has many dimen- posing the variable> We'll run into trouble if we sions> It's essential to be clear about which ones are conceptualize the variable political party affiliation in important in our inquiry; otherwise, you may mea- terms of the attributes Repllblican and Democrat, be- sure how people feel about corruption when you cause some of the people we set out to study will really wanted to know how much they think there identify with the Green Party, the Reform Party, or is, or vice versa. some other organization, and some (often a large percentage) will tell us they have no party affilia- Once you have determined how you're going tion. We could make the list of attributes exhaus- to collect your data (for example, survey, field re- tive by adding other and no affiliation. Whatever we search) and have decided on the relevant range of do, we must be able to classify every observation. variation, the degree of precision needed between the extremes of variation, and the specific dimen- At the same time, attributes composing a vari- sions of the variables that interest you, you may have able must be mutually exclusive. Every observation another choice: a mathematical-logical one. That is, must be able to be classified in terms of one and you may need to decide what level of measurement only one attribute. For example, we need to define to use. To discuss this point, we need to take another employed and unemployed in such a way that nobody look at attributes and their relationship to variables. can be both at the same time. That means being able to classify the person who is working at a job Defining Variables and Attributes but is also looking for work. (We might run across a fully employed mud wrestler who is looking for the An attribute, you'll recalL is a characteristic or oalamour and excitement of being a social re- quality of something> Female is an example> So is searcheL) In this case, we might define the attri- old or stlldent. Variables, on the other hand, are logi- butes so that employed takes precedence over 1ll1em- cal sets of attributes. Thus, gender is a variable com- played, and anyone working at a job is employed posed of the attributes female and male. reaoardless of whether he or she is looking for something betteL The conceptualization and operationalization processes can be seen as the specification of vari- Levels olMeasurement ables and the attributes composing them. Thus, in the context of a study of unemployment, employ- Attributes operationalized as mutually exclusive ment statlls is a variable having the attributes em- and exhaustive may be related in other ways as ployed and llnemployed; the list of attributes could welL For example, the attributes composing vari- ables may represent different levels of measure- nominal measure A variable whose attributes ment. In this section, we'll examine four levels of have only the characteristics of exhaustiveness and measurement: nominaL ordinaL intervaL and ratio. mutual exclusiveness, In other words, a level of measurement describing a variable that has attri- Nominal Measures butes that are merely different, as distinguished from ordinal, interval, or ratio measures. Gender is an Variables whose attributes have only the character- example of a nominal measure, istics of exhaustiveness and mutual exclusiveness are nominal rneasures. Examples include gender,

Operationalization Choices 137 religiolls affiliation, political party affiliation, birthplace, several materials in a row, ranging from the softest college major, and hair color. Although the attributes to the hardest. We could never say how hard a composing eacl1 of these variables-as male and fe- given material was in absolute terms; we could male compose the variable gender-are distinct only say how hard in relative terms-which mate- from one another (and exhaust the possibilities of rials it is harder than and which softer than> gender among people), they have no additional structures. Nominal measures merely offer names Let's pursue the earlier example of grouping the or labels for characteristics. people at a social gathering. This time imagine that we ask all the people who have graduated from col- Imagine a group of people characterized in lege to stand in one group, all those with only a high terms of one such variable and physically grouped school diploma to stand in another group, and all by the applicable attributes. For example, say we've those who have not graduated from high school to asked a large gathering of people to stand together stand in a third group. This manner of grouping in groups according to the states in which they people satisfies the requirements for exhaustiveness were born: all those born in Vermont in one group, and mutual exclusiveness discussed earlier. In addi- those born in California in another, and so forth. tion, however, we might logically arrange the three The variable is place ofbirth; the attributes are bam in groups in terms of the relative amount of formal ed- Califomia, bam in VerllloJZ[, and so on> All the people ucation (the shared attribute) each had. We might standing in a given group have at least one thing in arrange the three groups in a row, ranging from common and differ from the people in all other most to least formal education. This arrangement groups in that same regard. Where the individual would provide a physical representation of an ordi- groups form, how close they are to one another, or nal measure. If we knew which groups two individu- how the groups are arranged in the room is irrele- als were in, we could determine that one had more, vant All that matters is that all the members of a less, or the same formal education as the other. given group share the same state of birth and that each group has a different shared state of birth. All Notice in this example that it is irrelevant how we can say about two people in terms of a nominal close or far apart the educational groups are from variable is that they are either the same or different. one another. The college and high school groups might be 5 feet apart, and the less-than-high- Ordinal Measures school group 500 feet farther down the line. These actual distances don't have any meaning. The high Variables with attributes we can logically rank-order school group, however, should be between the less- are ordinal measures. The different attributes of than-high-school group and the college group, or ordinal variables represent relatively more or less else the rank order ,viII be incorrect. of the variable. Variables of this type are social class, cOllsen'atislll, alienatioll, prejlldice, intellectllal sophistica- Interval Measures tion, and the like. In addition to saying whether two people are the same or different in terms of an or- For the attributes composing some variables, the ac- dinal variable, you can also say one is \"more\" than tual distance separating those attributes does have the other-that is, more conservative, more reli- meaning. Such variables are interval measures. gious, older, and so forth. For these, the logical distance between attributes can be expressed in meaningful standard intervals. In the physical sciences, hardness is the most frequently cited example of an ordinal measure. ordinal measure A level of measurement describ- We may say that one material (for example, dia- ing a variable with attributes we can rank-order mond) is harder than another (say, glass) if the for- along some dimension. An example is sodoecollomic mer can scratch the latter and not vice versa, By status as composed of the attributes high, medium. [ow. attempting to scratch various materials with other materials, we might eventually be able to arrange

138 Chapter 5: Conceptualization, Operationalization, and Measurement For example, in the Fahrenheit temperature another (ordinal). In addition, we can say \"how scale, the difference, or distance, between 80 de- much\" more. grees and 90 degrees is the same as that between 40 degrees and 50 degrees. However, 80 degrees Ratio Measures Fahrenheit is not twice as hot as 40 degrees, be- cause the zero point in the Fahrenheit scale is Most of the social scientific variables meeting the arbitrary; zero degrees does not really mean mininmm requirements for interval measures also lack of heal. Sinlilarly, minus 30 degrees on this meet the requirements for ratio measures. In ratio scale doesn't represent 30 degrees less than no measures, the attributes composing a variable, be- heal. (This is true for the Celsius scale as well. sides having all the structural characteristics men- In contrast, the Kelvin scale is based on an ab- tioned previously, are based on a true zero point. solute zero, which does mean a complete lack The Kelvin temperature scale is one such measure. of heaL) Examples from social scientific research include age, length of residence in a given place, number of About the only interval measures commonly organizations belonged to, number of times attend- used in social scientific research are constructed ing religious services during a particular period of measures such as standardized intelligence tests time, number of times married, and number of that have been more or less accepted. The interval Arab friends. separating IQ scores of 100 and 110 may be re- garded as the same as the interval separating scores Returning to the illustration of methodological of 110 and 120 by virtue of the distribution of ob- party games, we might ask a gathering of people to served scores obtained by many thousands of group themselves by age. All the one-year-olds people who have taken the tests over the years. But would stand (or sit or lie) together, the two-year- it would be incorrect to infer that someone with an aIds together, the three-year-olds, and so forth. The IQ of 150 is 50 percent more intelligent than some- fact that members of a single group share the same one with an IQ of 100. (A person who received a age and that each different group has a different score of 0 on a standard IQ test could not be re- shared age satisfies the minimum requirements for garded, strictly speaking, as having no intelligence, a nominal measure. Arranging the several groups in although we might feel he or she was unsuited to a line from youngest to oldest meets the additional be a college professor or even a college studenL But requirements of an ordinal measure and lets us de- perhaps a dean ... ?) termine if one person is older than, younger than, or the same age as another. If we space the groups When comparing two people in terms of an in- equally far apart, we satisfy the additional require- terval variable, we can say they are different from ments of an interval measure and can say how one another (nominal), and that one is more than much older one person is than another. Finally, be- cause one of the attributes included in age repre- interval measure A level of measurement describ- sents a true zero (babies carried by women about to ing a variable whose attributes are rank-ordered and give birth), the phalanx of hapless party goers also have equal distances between adjacent attributes. meets the requirements of a ratio measure, permit- The Fahrenheit temperature scale is an example of ting us to say that one person is nvice as old as an- this, because the distance between 17 and 18 is the otheL (Remember this in case you're asked about it same as that between 89 and 90. in a workbook assignment.) Another example of a ratio measure is income, which extends from an ab- ratio measure A level of measurement describing a solute zero to approximately infinity, if you happen variable with attributes that have all the qualities of to be the founder of Microsoft. nominal, ordinal. and interval measures and in addi- tion are based on a \"true zero\" point. Age is an Comparing two people in terms of a ratio vari- example of a ratio measure, able, then, allows us to conclude (1) whether they

Operationalization Choices 139 Female Male Not very Fairly Very Most important important important important thing in my life t SO S10,000 S20,000 $30,000 S40,000 $50,000 FIGURE 5-1 levels of Measurement Often you can choose among different levels of measurement-nominal, ordinal, interval, or ratio-carrying progressively more amounts of information. are different (or the same), (2) whether one is once, and you won't be invited to many parties), I more than the other. (3) how much they differ, and should draw your attention to some of the practical (4) what the ratio of one to another is. Figure 5-1 implications of the differences that have been dis- sunm1arizes this discussion by presenting a graphic tinguished. These implications appear primarily in illustration of the four levels of measurement. the analysis of data (discussed in Part 4), but you need to anticipate such implications when you're Implications 1ofLevels ofMeasurement structuring any research project. Because it's unlikely that you'll undertake the Certain quantitative analysis techniques re- physical grouping of people just described (try it quire variables that meet certain minimum levels of

140 \" Chapter 5: Conceptualization, Operationalization, and Measurement measurement. To the extent that the variables to be Again, you need not necessarily measure examined in a research project are limited to a par- variables at their highest level of measurement. If ticular level of measurement-say, ordinal-you you're sure to have no need for ages of people at should plan your analytical techniques accordingly. higher than the ordinal level of measurement, you More precisely, you should anticipate drawing re- may simply ask people to indicate their age range, search conclusions appropriate to the levels of mea- such as 20 to 29, 30 to 39, and so forth. In a study of surement used in your variables. For example, you the wealth of corporations, rather than seek more might reasonably plan to determine and report the precise information, you may use Dun & Bradstreet mean age of a population under study (add up all ratings to rank corporations. Whenever your re- the individual ages and divide by the number of search purposes are not altogether clear, however, people), but you should not plan to report the mean seek the highest level of measurement possible. religious affiliation, because that is a nominal vari- As we've discussed, although ratio measures can able, and the mean requires ratio-level data. (You later be reduced to ordinal ones, you cannot convert could report the modal-the most common- an ordinal measure to a ratio one. More generally, religious affiliation.) you cannot convert a lower-level measure to a higher-level one. That is a one-way street worth At the same time, you can treat some variables remembering. as representing different levels of measurement. Ratio measures are the highest leveL descending Typically a research project wi.ll tap variables through interval and ordinal to nominaL the lowest at different levels of measurement. For example, level of measurement. A variable representing a William Bielby and Denise Bielby (1999) set out to higher level of measurement-say, ratio-can also examine the world of film and television, using a be treated as representing a lower level of measure- nomothetic, longitudinal approach (take a moment ment-say, ordinal. Recall, for example, that age is to remind yourself what that means). In what they a ratio measure. If you ,vished to examine only the referred to as the \"culture industry,\" the authors relationship between age and some ordinal-level found that reputatioll (an ordinal variable) is the variable-say, self-perceived religiosity: high, me- best predictor of screenwriters' future productivity. dium, and low-you might choose to treat age as More interestingly, they found that screenwriters an ordinal-level variable as well. You might charac- who were represented by \"core\" (or elite) agencies terize the subjects of your study as being young, were not only far more likely to find jobs (a nomi- middle-aged, and old, specifying what age range nal variable), but also jobs that paid more (a ratio composed each of these groupings. Finally, age variable). In other words, the researchers found might be used as a nominal-level variable for cer- that agencies' reputations (ordinal) was a key inde- tain research purposes. People might be grouped as pendent variable for predicting a screenwriter'S ca- being born during the Depression or not. Another reer success. The researchers also found that being nominal measurement based on birth date rather older (ratio), female (nominal), an ethnic minority than just age, would be the grouping of people by (nominal), and having more years of experience astrological signs. (ratio) were disadvantageous for a writer'S career. On the other hand, higher earnings from previous The level of measurement you'll seek, then, is years (measured in ordinal categories) led to more determined by the analytical uses you've planned success in the future. In Bielby and Bielby's terms, for a given variable, keeping in mind that some \"success breeds success\" (1999: 80). variables are inherently limited to a certain level. If a variable is to be used in a variety of ways, requir- Single or Multiple Indicators ing different levels of measurement, the study should be designed to achieve the highest level re- With so many alternatives for operationalizing quired. For example, if the subjects in a study are social scientific variables, you may find yourself asked their exact ages, they can later be organized worrying about making the right choices. To into ordinal or nominal groupings.

Operationalization Choices 141 counter this feeling, let me add a momentary dash in number of credits, we adjust the point values ac- of certainty and stability. cordingly.) Creating such composite measures in social research is often appropriate. Many social research variables have fairly obvious, straightforward measures. No matter how Some lfIustrations you cut it, gender usually turns out to be a matter of Operationalization Choices of male or female: a nominal-level variable that can be measured by a single observation-either To bring together all the operationalization choices by looking (well, not always) or by asking a ques- available to the social researcher and to show the tion (usually). In a study involving the size of potential in those possibilities, let's look at some of families, you'll want to think about adopted and the distinct ways you might address various re- foster children, as well as blended families, but it's search problems. The alternative ways of opera- usually pretty easy to find out how many dlildren tionalizing the variables in each case should dem- a family has. For most research purposes, the resi- onstrate the opportunities that social research can dent population of a country is the resident popula- present to our ingenuity and imaginations. To tion of that country-you can look it up in an simplify matters, I have not attempted to describe almanac and know the answer. A great many vari- all the research conditions that would make one ables, then, have obvious single indicators. If you alternative superior to the others, though in a can get one piece of information, you have what given situation they would not all be equally you need. appropriate. Sometimes, however, there is no single indica- Here are specific research questions, then, and tor that will give you the measure of a variable some of the ways you could address them. We'll be- you really want. As discussed earlier in thls gin ,vith an example discussed earlier in the chap- chapter, many concepts are subject to varying ter. It has the added advantage that one of the vari- interpretations-each ,vith several possible indica- ables is straightforward to operationalize. tors. In these cases, you'll want to make several observations for a given variable. You can then 1. Are women more compassionate than men? combine the several pieces of information you've collected, creating a composite measurement of the a. Select a group of subjects for study, with variable in question. Chapter 6 is devoted to ways equal numbers of men and women. Present of doing that so here let's just discuss one simple them vvith hypothetical situations that in- illustration. volve someone's being in trouble. Ask them what they would do if they were con- Consider the concept \"college performance.\" fronted ,vith that situation. What would All of us have noticed that some students perform they do, for example, if they came across a well in college courses and others don't. In studying small child who was lost and crying for his these differences, we might ask what characteristics or her parents? Consider any answer that and experiences are related to high levels of perfor- involves helping or comforting the child as mance (many researchers have done just that). an indicator of compassion. See whether How should we measure overall performance? men or women are more likely to indicate Each grade in any single course is a potential indi- they would be compassionate. cator of college performance, but it also may not typify the student'S general performance. The solu- b. Set up an experiment in which you pay a tion to this problem is so firmly established that it small child to pretend that he or she is lost. is, of course, obvious: the grade point average Put the child to work on a busy sidewalk (GPA). We assign numerical scores to each letter and observe whether men or women are grade, total the points earned by a given student, more likely to offer assistance. Also be sure and divide by the number of courses taken, thus to count the total number of men and obtaining a composite measure. (If the courses vary

142 Chapter 5: Conceptualization, Operationalization, and Measurement women who walk by, because there may be people what they consider the best state to more of one than the otheL If that's the live in. Look up some recent results in the case, simply calculate the percentage of men library or through your local newspaper. and the percentage of women who help. c Compare suicide rates in the two states. c Select a sample of people and do a survey in which you ask them what organizations 4. Who are the most popular instructors on your they belong to. Calculate whether women campus, those in the social sciences, the natural or men are more likely to belong to those sciences, or the humanities? that seem to reflect compassionate feelings. a. If your school has a provision for student To take account of men who belong to evaluation of instructors, review some re- more organizations than do women in cent results and compute the average rating general-or vice versa-do this: For each of each of the three groups. person you study, calculate the percentage b. Begin visiting the introductory courses of his or her organizational memberships given in each group of disciplines and mea- that reflect compassion. See if men or sure the attendance rate of each class. women have a higher average percentage. c In December, select a group of faculty in each of the three divisions and ask them to 2. Are sociology students or accounting students keep a record of the numbers of holiday better informed about world affairs? greeting cards and presents they receive from admiring students. See who vvins. a. Prepare a short quiz on world affairs and arrange to administer it to the students in a The point of these examples is not necessarily to sociology class and in an accounting class at suggest respectable research projects but to illustrate a comparable leveL If you want to compare the many ways variables can be operationalized. sociology and accounting majors, be sure to ask students what they are majoring in. Operationalization Goes On and On b. Get the instructor of a course in world af- Although I've discussed conceptualization and op- fairs to give you the average grades of sociol- erationalization as activities that precede data col- ogy and accounting students in the course. lection and analysis-for example, you must de- sign questionnaire items before you send out a c Take a petition to sociology and accounting questionnaire-these two processes continue classes that urges that \"the United Nations throughout any research project, even if the data headquarters be moved to New York City.\" have been collected in a structured mass survey. As Keep a count of how many in each class we've seen, in less-structured methods such as field sign the petition and how many inform research, the identification and specification of rele- you that the UN headquarters is already lo- vant concepts is inseparable from the ongoing pro- cated in New York City. cess of observation. 3. Do people consider New York or California the As a researcher, always be open to reexamining better place to live? your concepts and definitions. The ultimate pur- pose of social research is to clarify the nature of so- a. Consulting the Statistical Abstracr ofthe Ul1ited ciallife. The validity and utility of what you learn States or a similar publication, check the in this regard doesn't depend on when you first migration rates into and out of each state. figured out how to look at things any more than it See if you can find the numbers moving matters whether you got the idea from a learned directly from New York to California and textbook, a dream, or your brother-in-law. vice versa. b. The national polling companies-Gallup, Harris, Roper, and so forth-often ask

Criteria of Measurement Quality 143 Criteria probably need no further eA\"planation. When social of Measurement Quality scientists construct and evaluate measurements, however, they pay special attention to two techni- This chapter has come some distance. It began with cal considerations: reliability and validity. the bald assertion that social scientists can measure anything that exists. Then we discovered that most Reliability of the things we might want to measure and study don't really exist Next we learned that it's possible to In the abstract, reliability is a matter of whether a measure them anyway. Now we conclude the chap- particular technique, applied repeatedly to the ter with a discussion of some of the yardsticks against same object, yields the same result each time, Let's which we judge our relative success or failure in say you want to know how much I weigh. (No, I measuring things-even things that don't exist. don't know why.) As one technique, say you ask two different people to estimate my weight. If the Precision and Accuracy first person estimates 150 pounds and the other estimates 300, we have to conclude the technique To begin, measurements can be made with varying of having people estimate my weight isn't very degrees of precision. As we saw in the discussion of reliable. operationalization, precision concerns the fineness of distinctions made between the attributes that Suppose, as an alternative, that you use a bath- compose a variable. The description of a woman as room scale as your measurement tedmique. I step \"43 years old\" is more precise than \"in her forties.\" on the scale t'llI,rjce, and you note the same result Saying a street-corner gang was formed \"in the each time. The scale has presumably reported the summer of 1996\" is more precise than saying \"dur- same weight for me both times, indicating that the ing the 1990s.\" scale provides a more reliable technique for mea- suring a person's weight than asking people to esti- As a general rule, precise measurements are su- mate it does. perior to imprecise ones, as common sense dictates. There are no conditions under which imprecise Reliability, however, does not ensure accuracy measurements are intrinsically superior to precise any more than precision does. Suppose I've set my ones. Even so, exact precision is not always neces- bathroom scale to shave five pounds off my weight sary or desirable. If knowing that a woman is in her just to make me feel betteL Although you would forties satisfies your research requirements, then (reliably) report the same weight for me each time, any additional effort invested in learning her precise you would always be wrong. This new element, age is wasted. The operationalization of concepts, called bias, is discussed in Chapter 8. For now, just then, must be guided partly by an understanding of be warned that reliability does not ensure accuracy. the degree of precision required. If your needs are not clear, be more precise rather than less. Let's suppose we're interested in studying mo- rale among factory workers in two different kinds Don't confuse precision with accuracy, how- eveL Describing someone as \"born in New reliability That quality of measurement method England\" is less precise than \"born in Stowe, that suggests that the same data would have been Vermont\"-but suppose the person in question collected each time in repeated observations of the was actually born in Boston., The less-precise de- same phenomenon, In the context of a survey, we scription, in this instance, is more accurate, a better would expect that the question \"Did you attend reli- reflection of the real world. gious services last week?\" would have higher relia- bility than the question \"About how many times Precision and accuracy are obviously important have you attended religious services in your life?\" qualities in research measurement, and they This is not to be confused 'with validity.

144 Chapter 5: Conceptualization, Operationalization, and Measurement of factories. In one set of factories, workers have In social research, reliability problems crop up specialized jobs, reflecting an extreme division of in many forms, Reliability is a concern every time a labor. Each worker contributes a tiny part to the single observer is the source of data, because we overall process performed on a long assembly line. have no certain guard against the impact of that In the other set of factories, each worker performs observer's subjectivity. We can't tell for sure how many tasks, and small teams of workers complete much of what's reported originated in the situation the whole process. observed and how much in the observer. How should we measure morale? Following Subjectivity is not only a problem with single one strategy, we could observe the workers in each observers, however. Survey researchers have factory, noticing such things as whether they joke known for a long time that different interviewers, with one another, whether they snille and laugh a because of their own attitudes and demeanors, get lot, and so forth. We could ask them how they like different answers from respondents. Or, if we were their work and even ask them whether they think to conduct a study of newspapers' editorial posi- they would prefer their current arrangement or the tions on some public issue, we might CTeate a team other one being studied. By comparing what we of coders to take on the job of reading hundreds of observed in the different factories, we might reach editorials and classifying them in terms of their po- a conclusion about which assembly process pro- sition on the issue. Unfortunately, different coders duces the higher morale. Notice that I've just de- will code the same editorial differently. Or we scribed a qualitative measurement procedure. might want to classify a few hundred specific occu- pations in terms of some standard coding scheme, Now let's look at some reliability problems in- say a set of categories created by the Department of herent in this method, First how you and I are Labor or by the Census Bureau. You and I would feeling when we do the observing will likely color not place all those occupations in the same what we see. We may misinterpret what we see. categories. We may see workers kidding each other but think they're having an argument. We may catch them Each of these examples illustrates problems of on an off day. If we were to observe the same group reliability. Similar problems arise whenever we ask of workers several days in a row, we might arrive at people to give us information about themselves. different evaluations on each day. Further, even if Sometimes we ask questions that people don't several observers evaluated the same behavior, they know the answers to: How many times have you might arrive at different conclusions about the been to religious services? Sometimes we ask workers' morale. people about things they consider totally irrelevant: Are you satisfied with China's current relationship Here's another strategy for assessing morale, a with Albania? In such cases, people will answer dif- quantitative approach. Suppose we check the com- ferently at different times because they're making pany records to see how many grievances have up answers as they go. Sometimes we explore is- been filed Vlrith the union during some fixed pe- sues so complicated that a person who had a dear riod. Presumably this would be an indicator of mo- opinion in the matter might arrive at a different rale: the more grievances, the lower the morale. interpretation of the question when asked a This measurement strategy would appear to be second time. more reliable: Counting up the grievances over and over, we should keep arriving at the same So how do you create reliable measures? If number. your research design calls for asking people for in- formation, you can be careful to ask only about If you find yourself thinking that the number of things the respondents are likely to know the an- grievances doesn't necessarily measure morale, swer to. Ask about things relevant to them, and be you're worrying about validity, not reliability. We'll dear in what you're asking. Of course, these tech- discuss validity in a moment. The point for now is niques don't solve every possible reliability prob- that the last method is more like my bathroom lem. Fortunately, social researchers have developed scale-it gives consistent results.

Criteria of Measurement Quality\" 145 several techniques for cross-checking the reliability mother, deceased in the first questionnaire, was ap- of the measures they devise. parently alive and well in time for the second. One subject had one ovary missing in the first study but Test-Retest Method present in the second. In another case, an ovary present in the first study was missing in the second Sometimes it's appropriate to make the same mea- study-and had been for ten years! One subject surement more than once, a technique called the was reportedly 55 years old in the first study and test-mest mer/zod. If you don't expect the sought- 50 years old three months later. (You have to won- after information to change, then you should ex- der whether the physician-counselors could ever pect the same response both times. If answers vary, have nearly the impact on their patients that their the measurement method may, to the extent of patients' memories did.) Thus, test-retest revealed that variation, be unreliable. Here's an illustration. that this data-collection method was not especially reliable. In their research on Health Hazard Appraisal (HHA), a part of preventive medicine, Jeffrey Split-HalfMethod Sacks, W. Mark Krushat, and Jeffrey Newman (1980) wanted to determine the risks associated As a general rule, it's always good to make more with various background and lifestyle factors, mak- than one measurement of any subtle or complex ing it possible for physicians to counsel their pa- social concept, such as prejudice, alienation, or so- tients appropriately. By knovving patients' life cial class. This procedure lays the groundwork for situations, physicians could advise them on their another check on reliability. Let's say you've CTe- potential for survival and on how to improve it. ated a questionnaire that contains ten items you This purpose, of course, depended heavily on the believe measure prejudice against women. Using accuracy of the information gathered about each the split-half technique, you would randomly as- subject in the study. sign those ten items to two sets of five . Each set should provide a good measure of prejudice against To test the reliability of their information, Sacks women, and the two sets should classify respon- and his colleagues had all 207 subjects complete a dents the same way. If the two sets of items classify baseline questionnaire that asked about their char- people differently, you most likely have a problem acteristics and behavior. Three months later, a of reliability in your measure of the variable. follow-up questionnaire asked the same subjects for the same information, and the results of the two Using Established Measures surveys were compared. Overall. only 15 percent of the subjects reported the same information in both Another way to help ensure reliability in getting in- studies. formation from people is to use measures that have proved their reliability in previous research. If you Sacks and his colleagues report the following: want to measure anomia, for example, you might want to follow Srole's lead. Almost 10 percent of subjects reported a differ- ent height at follow-up examination. Parental The heavy use of measures, though, does not age was changed by over one in three subjects. guarantee their reliability. For example, the One parent reportedly aged 20 chronologic Scholastic Assessment Tests (SATs) and the Min- years in three months. One in five ex-smokers nesota Multiphasic Personality Inventory (MMPI) and ex-drinkers have apparent difficulty in have been accepted as established standards in their reliably recalling their previous consumption respective domains for decades. In recent years, pattern. though, they've needed fundamental overhauling to reflect changes in society, eliminating outdated (1980 730) topics and gender bias in wording. Some subjects erased all trace of previously re- ported heart murmur, diabetes, emphysema, arrest record, and thoughts of suicide. One subject's

146 Chapter 5: Conceptualization, Operationalization, and Measurement Reliability ofResearch Workers sure what we think they measure\" Now let's plunge into the question of validity. As we've seen, it's also possible for measurement unreliability to be generated by research workers: Validity interviewers and coders, for example\" There are several ways to check on reliability in such cases\"\" To In conventional usage, validity refers to the extent guard against interviewer unreliability in surveys, to which an empirical measure adequately reflects for example, a supervisor will call a subsample of the real meaning of the concept under considera- the respondents on the telephone and verify se- tion. Whoops! I've already committed us to the view lected pieces of information\" that concepts don't have real meanings. How can we ever say whether a particular measure adequately Replication works in other situations also. If reflects the concept's meaning, then? Ultimately, of you're worried that newspaper editorials or occupa- course, we can't At the same time, as we've already tions may not be classified reliably, you could have seen, all of social life, including social research, op- each independently coded by several coders. Those erates on agreements about the terms we use and cases that are classified inconsistently can then be the concepts they represent There are several crite- evaluated more carefully and resolved. ria of success in making measurements that are ap- propriate to these agreed-on meanings of concepts. Finally, clarity, specificity, training, and practice can prevent a great deal of unreliability and grief. If First, there's something called face validity. you and I spent some time reaching a clear agree- Particular empirical measures mayor may not jibe ment on how to evaluate editorial positions on an ,>vith our conunon agreements and our individual issue-discussing various positions and reading mental images concerning a particular concept For through several together-HIe could probably example, you and I might quarrel about whether do a good job of classifying them in the same way counting the number of grievances filed vvith the independently union vvill adequately measure morale. Still, we'd surely agree that the number of grievances has The reliability of measurements is a fundamen- sometlzing to do 'with morale. That is, the measure is tal issue in social research, and we'll return to it valid \"on its face,\" whether or not it's adequate. If I more than once in the chapters ahead. For now, were to suggest that we measure morale by finding however, let's recall that even total reliability out how many books the workers took out of the doesn't ensure that our measures actually mea- library during their off-duty hours, you'd undoubt- edly raise a more serious objection: That measure validity A term describing a measure that accu- wouldn't have much face validity. rately reflects the concept it is intended to measure. For example, your IQ would seem a more valid mea- Second, I've already pointed to many of the sure of your intelligence than the number of hours more formally established agreements that define you spend in the library would\" Though the ultimate some concepts. The Census Bureau, for example, validity of a measure can never be proved, we may has created operational definitions of such concepts agree to its relative validity on the basis of face valid- as family, household, and employment status that ity, criterion validity, content validity, construct va- seem to have a workable validity in most studies lidity, internal validation, and external validation. using these concepts. This must not be confused with reliability\"\" face validity That quality of an indicator that Three additional types of validity also specify makes it seem a reasonable measure of some vari- particular ways of testing the validity of measures. able. That the frequency of attendance at religious The first, criterion-related validity, sometimes services is some indication of a person's religiosity called predictive validity, is based on some external seems to make sense without a lot of ex-planalion. It criterion. For example, the validity of College Board has face validity. exams is shown in their ability to predict students'

Criteria of Measurement Quality 147 success in college. The validity of a written driver'S or doesn't tap the quality you want it to measure, test is determined, in this sense, by the relationship vvithout providing definitive prooL Although I have between the scores people get on the test and their suggested that tests of construct validity are less subsequent driving records. In these examples, col- compelling than those of criterion validity, there is lege success and driving ability are the criteria. room for disagreement about which kind of test a particular comparison variable (driving record, mari- To test your understanding of criterion-related tal fidelity) represents in a given situation. It's less validity, see whether you can think of behaviors important to distinguish the two types of validity that might be used to validate each of the following tests than to understand the logic of validation that attitudes: they have in common: If we've succeeded in mea- suring some variable, then our measures should Is very religiOUS relate in some logical way to other measures. Supports equality of men and women Finally, content validity refers to how much a measure covers the range of meanings included Supports far-right militia groups within a concept For example, a test of mathemat- ical ability cannot be limited to addition but also Is concerned about the environment needs to cover subtraction, multiplication, division, and so forth. Or, if we're measuring prejudice, do Some possible validators would be, respectively, at- our measurements reflect all types of prejudice, in- tends religious services, votes for women candi- cluding prejudice against racial and ethnic groups, dates, belongs to the NRA, and belongs to the religious minorities, women, the elderly, and so on? Sierra Club. Figure 5-2 presents a graphic portrayal of the dif- Sometimes it's difficult to find behavioral crite- ference between validity and reliability. If you think ria that can be taken to validate measures as di- of measurement as analogous to repeatedly shooting rectly as in such examples. In those instances, how- at the bull's-eye on a target, you'll see that reliability ever, we can often approximate such criteria by looks like a \"tight pattern,\" regardless of where applying a different test We can consider how the the shots hit, because reliability is a function of con- variable in question ought, theoretically, to relate to sistency. Validity, on the other hand, is a function of other variables. Construct validity is based on the shots being arranged around the bull's-eye, The fail- logical relationships among variables. ure of reliability in the figure is randomly distributed around the target; the failure of validity is systemati- Suppose, for example, that you want to study cally off the mark. Notice that neither an unreliable the sources and consequences of marital satisfac- nor an invalid measure is likely to be very useful. tion. As part of your research, you develop a mea- sure of marital satisfaction, and you want to assess criterion-related validity The degree to which a its validity, measure relates to some external criterion. For ex- ample, the validity of College Board tests is shown in In addition to developing your measure, you'll their ability to predict the college success of students, have developed certain theoretical eX1lectations Also called predictive validity. about the way the variable marital satisfaction relates construct validity The degree to which a measure to other variables. For example, you might reason- relates to other variables as expected within a sys- ably conclude that satisfied husbands and wives tem of theoretical relationships. will be less likely than dissatisfied ones to cheat on content validity The degree to which a measure their spouses. If your measure relates to marital covers the range of meanings included within a fidelity in the expected fashion, that constitutes concept. evidence of your measure's construct validity. If satisfied marriage partners are as likely to cheat on their spouses as are the dissatisfied ones, however, that would challenge the validity of your measure. Tests of construct validity, then, can offer a weight of evidence that your measure either does

148 Chapter 5: Conceptualization, Operationalization, and Measurement Reliable but not valid Valid but not reliable Valid and reliable FIGURE 5-2 An Analogy to Validity and Reliability. Agood measurement technique should be both valid (measuring what it is intended to measure) and reliable (yielding a given measurement dependably). Who Decides What's Valid? Ultimately, social researchers should look both to their colleagues and to their subjects as sources Our discussion of validity began with a reminder of agreement on the most useful meanings and that we depend on agreements to determine what's measurements of the concepts they study. Some- real, and we've just seen some of the ways social times one source will be more useful, sometimes scientists can agree among themselves that they the other. But neither one should be dismissed. have made valid measurements. There is yet an- other way of looking at validity. Tension between Reliability and Validity Social researchers sometimes criticize them- Clearly, we want our measures to be both reliable selves and one another for implicitly assuming they and valid. However, a tension often arises between are somewhat superior to those they study. For ex- the criteria of reliability and validity, forcing a ample, researchers often seek to uncover motiva- trade-off between the two. tions that the social actors themselves are unaware of. You think you bought that new Burpo-Blasto Recall the example of measuring morale in dif- because of its high performance and good looks, ferent factories. The strategy of immersing yourself but lVe know you're really trying to achieve a in the day-to-day routine of the assembly line, ob- higher social status. serving what goes on, and talking to the workers would seem to provide a more valid measure of mo- This implicit sense of superiority would fit com- rale than counting grievances would. It just seems fortably with a totally positivistic approach (the biol- obvious that we'd get a clearer sense of whether the ogist feels superior to the frog on the lab table), but morale was high or low using this first method. it clashes with the more humanistic and typically qualitative approach taken by many social scientists. As I pointed out earlier, however, the counting We'll explore this issue more deeply in Chapter 10. strategy would be more reliable. This situation reflects a more general strain in research measure- In seeking to understand the way ordinary ment. Most of the really interesting concepts we people make sense of their worlds, etlmomethodol- want to study have many subtle nuances, so ogists have urged all social scientists to pay more specifying precisely what we mean by them is hard. respect to these natural social processes of concep- Researchers sometimes speak of such concepts as tualization and shared meaning. At the very least having a \"richness of meaning.\" Although scores of behavior that may seem irrational from the scien- books and articles have been written on the topic tist's paradigm may make logical sense when viewed through the actor's paradigm.

Main Points El 149 of anomie/anomia, for example, they still haven't e Concepts are constructs; they represent the exhausted its meaning. agreed-on meanings we assign to terms. Our concepts don't exist in the real world, so they Very often, then, specifying reliable operational can't be measured directly, but we can measure definitions and measurements seems to rob con- the things that our concepts summarize. cepts of their richness of meaning. Positive morale is much more than a lack of grievances filed with Conceptualization the union; anomia is much more than what is e Conceptualization is the process of specifying measured by the five items created by Leo Srole. Yet the more variation and richness we allow for a observations and measurements that give con- concept the more opportunity there is for dis- cepts definite meaning for the purposes of a re- agreement on how it applies to a particular situa- search study. tion, thus reducing reliability. III Conceptualization includes specifying the indi- To some extent, this dilemma explains the per- cators of a concept and describing its dimen- sistence of two quite different approaches to social sions. Operational definitions specify how vari- research: quantitative, nomothetic, structured tech- ables relevant to a concept will be measured. niques such as surveys and experinlents on the one hand, and qualitative, idiographic methods such as Definitions in Descriptive field research and historical studies on the other. In and Explanatory Studies the simplest generalization, the former methods e Precise definitions are even more important in tend to be more reliable, the latter more valid. descriptive than in explanatory studies. The de- By being forewarned, you'll be effectively fore- gree of precision needed varies with the type armed against this persistent and inevitable and purpose of a study. dilel1U11a. If there is no clear agreement on how to measure a concept measure it several different Operationalization Choices ways. If the concept has several dinlensions, mea- e Operationalization is an extension of conceptu- sure them alL Above alL know that the concept does not have any meaning other than what you and I alization that specifies the exact procedures that give it. The only justification for giving any concept will be used to measure the attributes of a particular meaning is utility. Measure concepts in variables. ways that help us understand the world around us. e Operationalization involves a series of interre- MAIN POINTS lated choices: specifying the range of variation that is appropriate for the purposes of a study, Introduction determining how precisely to measure vari- ables, accounting for relevant dimensions of e The interrelated processes of conceptualization, variables, clearly defining the attributes of vari- operationalization, and measurement allow re- ables and their relationships, and deciding on searchers to move from a general idea about an appropriate level of measurement. what they want to study to effective and well- defined measurements in the real world. e Researchers must choose from four levels of measurement, which capture increasing Measuring Anything That Exists amounts of information: nominal, ordinal, in- e Conceptions are mental images we use as sum- tervaL and ratio. The most appropriate level de- pends on the purpose of the measurement. mary devices for bringing together observations and experiences that seem to have something e A given variable can sometimes be measured at in C0l1U110n. We use terms or labels to reference different levels. When in doubt researchers these conceptions. should use the highest level of measurement appropriate to that variable so they can capture the greatest amount of information.,

150 Chapter 5: Conceptualization, Operationalization, and Measurement e Operationalization begins in the design phase of dimensions you wish to include in and exclude a study and continues through all phases of the from your conceptualization. research project, including the analysis of data. 2, What level of measurement-nominal. ordinal. Criteria of Measurement Quality interval. or ratio-describes each of the follow- e Criteria of the quality of measures include pre- ing variables? cision, accuracy, reliability, and validity. a. Race (white, African American, Asian, and so on) e Whereas reliability means getting consistent re- sults from the same measure, validity refers to b. Order of finish in a race (first. second, getting results that accurately reflect the con- third, and so on) cept being measured. c. Number of children in families e Researchers can test or improve the reliability of measures through the test-retest method, d. Populations of nations the split-half method, the use of established measures, and the examination of work per- e. Attitudes toward nuclear energy (strongly formed by research workers. approve, approve, disapprove, strongly disapprove) e The yardsticks for assessing a measure's validity include face validity, criterion-related validity, L Region of birth (Northeast. Midwest, construct validity, and content validity. and so on) e Creating specific, reliable measures often seems g. Political orientation (very liberal. some- to diminish the richness of meaning our gen- what liberaL somewhat conservative, very eral concepts have. This problem is inevitable. conservative) The best solution is to use several different mea- sures, tapping the different aspects of a concept 3. Let's conceptualize the variable: prejudice. Using your favorite web browser, search for the term KEY TERMS prejudice. After reviewing several of the websites resulting from your search, make a list of some The following terms are defined in context in the different forms of prejudice that might be stud- chapter and at the bottom of the page where the ied in an omnibus project dealing with that term is introduced, as well as in the comprehensive topic. glossary at the back of the book. 4. Let's discover truth. In a good dictionary, look conceptualization interval measure up trllth and true, then copy out the definitions. construct validity nominal measure Note the key terms used in those definitions content validity ordinal measure (e. g\" realify), look up the definitions of those criterion-related validity ratio measure terms, and copy out these definitions as welL dimension reliability Continue this process until no new terms ap- face validity specification pear, Comment on what you've learned from indicator validity this exercise. REVIEW QUESTIONS AND EXERCISES ADDITIONAL READINGS L Pick a social science concept such as liberalism Bohrnstedt. George W. 1983. \"MeasuremenL\" Pp. or alienation, then specify that concept so that it 70-121 in Halldbook of Survey Research edited by could be studied in a research project. Be sure Peter H. Rossi, James D. Wright, and Andy B., to specify the indicators you'll use as well as the Anderson. New York: Academic Press., This essay offers the logical and statistical grounding of re- liability and validity in measurement. Grimes, Michael D. 1991. Class in TlVentieth-CentlllY American Sociology.: An Analysis of Theories and Measurement Strategies. New York: PraegeL This book provides an excellent, long-term view of conceptualization as the author examines a va- riety of theoretical views of social class and the measurement techniques appropriate to those theories.

Online Study Resources 151 Lazarsfeld, Paul E, and Morris Rosenberg, eds. based on your quiz results, Use this study 1955. The Lallguage ofSocial Research, Section L plan with its interactive exercises and other New York: Free Press of Glencoe. An excellent resources to master the materiaL and diverse classic collection of descriptions of specific measurements in past social research. 3. When you're finished with your review, take These 14 articles present useful and readable ac- the posttest to confirm that you're ready to counts of actual measurement operations per- move on to the next chapter.. formed by social researchers, as well as more conceptual discussions of measurement in WEBSITE FOR THE PRACTICE generaL OF SOCIAL RESEARCH 11TH EDITION Miller. Delbert. 1991 . Handbook ofResearclz Design Go to your book's website at http://sociology and Social Measurement. Newbury Park, CA: Sage. wadsworth com/babbie_practice II e for tools to A powerful reference work., This book, espe- aid you in studying for your exams. You'll find TlIlo- cially Part 6, cites and describes a wide variety rial Quizzes with feedback, I11lcnzet Exercises, Flashcards, of operational measures used in earlier social re- and Chapter Tutorials, as well as E\\1ended Projeas, lnfo- search. In several cases, the questionnaire for- Trac College Editioll search terms, Social Research in mats used are presented. Though the quality of Cyberspace, GSS Dara, Web Links, and primers for us- these illustrations is uneven, they provide excel- ing various data-analysis software such as SPSS and lent examples of possible variations. NVivo. Silverman, David. 1993. IllIerprerilzg Qualirarive Data:' WEB LINKS FOR THIS CHAPTER Methods for Analyzing Talk, Text, and Interaction, Chapter 7., Newbury Park, CA: Sage. This chap- Please realize that the Internet is an evolv- ter deals with the issues of validity and reliabil- ing entity, subject to change., Nevertheless, ity specifically in regard to qualitative research., these few websites should be fairly stable, Also, check your book's website for even more H't?b U.S. Department of Health and Human Services. Links. These websites, current at the time of this 1992. Survey l'vIeasurelllent ofDmg Use . Washing- book's publication, provide opportunities to learn ton, DC: U.S. Government Printing Office. An about conceptualization, operationalization, and extensive review of techniques devised and measurement. used for measuring various kinds of drug use. US Census, Statistical Abstract SPSS EXERCISES of the United States http://www.census.gov/prod/w,,\\vw/statistical-abstract See the booklet that accompanies your text for exer- -us.html cises using SPSS (Statistical Package for the Social Sci- Here is just about everything you want to know about ences), There are exercises offered for each chapter, people in the United States: what they are like and and you'll also find a detailed primer on using SPSS. what they do., It provides numerous examples of how characteristics and behaviors can be defined and Online Study Resources measured. Sociology~Now'M: Research Methods University of Michigan, General Social Survey Codebook 1. Before you do your final review of the chapter. http://www,icpsLumich. edu/GSS/ take the SociologyNow: Research Methods diagnos- This is a major social science resource. The GSS code- tic quiz to help identify the areas on which you book identifies the numerous variables examined by should concentrate. You'll find information on the studies over time and gives the specific opera- this online tooL as well as instructions on how tionalization of those variables. to access all of its great resources, in the front of the book, University of Colorado, Social Science Data Archives 2. As you review, take advantage of the Sociology http://socscLcolorado.edu/LAB/dataarchives.htm Now Research l'vIetlzods customized study plan, These hotlinks to major social science data sets will give you many examples of variables defined and studied by researchers.

Indexes, Scales, and Typologies Introduction Scale Construction Bogardus Social Distance Indexes versus Scales Scale Thurstone Scales Index Construction Likert Scaling Item Selection Semantic Differential Examination of Empirical Guttman Scaling Relationships Index Scoring Typologies Handling Missing Data Index Validation The Status of Women: An Illustration of Index Construction Sociology@Noww: Research Methods Use this online tool to help you make the grade on your next exam. After reading this chapter, go to the \"Online Study Resources\" at the end of the chapter for instructions on how to benefit from SoaologyNow: Research Aletlzods.

Indexes versus Scales 153 Introduction complex concepts, however, researchers can seldom develop single indicators before they actually do the As we saw in Chapter 5, many social scientific con- research. This is especially true with regard to atti- cepts have complex and varied meanings. Making tudes and orientations. Rarely can a survey re- measurements that capture such concepts can be a searcher. for example, devise single questionnaire challenge. Recall our discussion of content validity, items that adequately tap respondents' degrees of which concerns whether we have captured all the prejudice, religiosity, political orientations, alien- different dimensions of a concept. ation, and the like. More likely, the researcher will devise several items, each of which provides some To achieve broad coverage of the various di- indication of the variables. Taken individually, each mensions of a concept. we usually need to make of these items is likely to prove invalid or unreliable multiple observations pertaining to that concept. for many respondents. A composite measure, how- Thus, for example, Bruce Berg (1989: 21) advises ever, can overcome this problem. in-depth interviewers to prepare essential ques- tions, which are \"geared toward eliciting specific, Second, researchers may wish to employ a desired information.\" In addition, the researcher rather refined ordinal measure of a particular vari- should prepare extra questions: \"questions roughly able (alienarion. say), arranging cases in several or- equivalent to certain essential ones, but worded dinal categories from very low to very high, for ex- slightly differently.\" ample. A single data item might not have enough categories to provide the desired range of variation. Multiple indicators are used with quantitative However, an index or scale formed from several data as well. Suppose you're designing a survey. items can provide the needed range. Although you can sometimes construct a single questionnaire item that captures the variable of Finally, indexes and scales are efficient devices for data analysis. If considering a single data item interest-\"Gender: 0 Male 0 Female\" is a simple gives us only a rough indication of a given variable, considering several data items can give us a more example-other variables are less straightforward comprehensive and more accurate indication. For and may require you to use several questionnaire example, a single newspaper editorial may give us items to measure them adequately. some indication of the political orientations of that newspaper. Examining several editorials would Quantitative data analysts have developed probably give us a better assessment, but the manip- specific techniques for combining indicators into a ulation of several data items simultaneously could single measure. This chapter discusses the construc- be very complicated. Indexes and scales (especially tion of two types of composite measures of vari- scales) are efficient data-reduction devices: They al- ables-indexes and scales. Although these mea- low us to summarize several indicators in a single sures can be used in any form of social research, numerical score, while sometimes nearly maintain- they are most common in survey research and other ing the specific details of all the individual indicators. quantitative methods. A short section at the end of this chapter considers typologies, which are relevant Indexes versus Scales to both qualitative and quantitative research. The terms il1dex and scale are typically used impre- Composite measures are frequently used in cisely and interchangeably in social research litera- quantitative research, for several reasons. First. so- ture. The two types of measures do have some cial scientists often wish to study variables that have characteristics in common, but in this book we'll no dear and unambiguous single indicators. Single distinguish between the two. However. you should indicators do suffice for some variables, such as age. We can determine a survey respondent's age by sim- ply asking, \"How old are you?\" Similarly, we can de- termine a newspaper's circulation by merely looking at the figure the newspaper reports. In the case of

154 Chapter 6: Indexes, Scales, and Typologies be warned of a growing tendency in the literature weak evidence of sexism compared with agreeing to use the term scale to refer to both indexes and that \"Women should not be allowed to vote.\" A scales, as they are distinguished here, scale takes advantage of differences in intensity among the attributes of the same variable to iden- First let's consider what they have in common, tify distinct patterns of response. Both scales and indexes are ordinal measures of variables. Both rank-order the units of analysis in Let's consider this simple example of sexism a terms of specific variables such as religiosiry, alien- bit further. Imagine asking people to agree or dis- arioll, socioecollomic status, prejudice, or imellectual so- agree with the two statements just presented, Some phisticariolL A person's score on either a scale or an might agree ,vith both, some might disagree ,vith index of religiosity, for example, gives an indication both, But suppose I told you someone agreed with of his or her relative religiosity vis-a-vis other one and disagree with the other: Could you guess people, which statement they agreed with and which they did not? I'd guess the person in question agreed Further, both scales and indexes are composite that women were different but disagreed that they measures of variables-that is, measurements should be prohibited from voting. On the other based on more than one data item. Thus, a survey hand. I doubt that anyone would want to prohibit respondent's score on an index or scale of religios- women from voting, while asserting that there is ity is determined by the responses given to several no difference between men and women, That questionnaire items, each of which provides some would make no sense. indication of religiosity Similarly, a person's IQ score is based on answers to a large number of test Nmv consider this. The two responses we questions. The political orientation of a newspaper wanted from each person would technically yield might be represented by an index or scale score four response patterns: agree/agree, agree/disagree, reflecting the newspaper's editorial policy on vari- disagree/agree, and disagree/disagree. We've just ous political issues, seen, however, that only three of the four patterns make any sense or are likely to occur. Where in- Despite these shared characteristics, it's useful dexes score people based on their responses, scales to distinguish between indexes and scales. In this score people on the basis of response pattems: We de- book, we'll distinguish them by the way scores are termine what the logical response patterns are and assigned in each. We construct an index simply by score people in terms of the pattern their responses accumulating scores assigned to individual attri- most closely resemble. butes. We might measure prejudice, for example, by adding up the number of prejudiced statements Figure 6-1 provides a graphic illustration of each respondent agreed v'lith. We construct a scale, the difference between indexes and scales. Let's however, by assigning scores to patterns of res- assume we want to develop a measure of political ponses, recognizing that some items reflect a activism. distinguishing those people who are very relatively weak degree of the variable while others active in political affairs, those who don't partici- reflect something stronger. For example, agreeing pate much at all, and those who are somewhere that \"Women are different from men\" is, at best, in between, index A type of composite measure that summa- The first part of Figure 6-1 illustrates the logic rizes and rank-orders several specific observations of indexes, The figure shows six different political and represents some more general dimension, actions. Although you and I might disagree on some specifics, I think we could agree that the six scale A type of composite measure composed of actions represent roughly the same degree of politi- several items that have a logical or empirical struc- cal activism. ture among them, Examples of scales include Bogar- dus social distance. Guttman. Likert. and Thurstone Using these six items, we could construct an scales. index of political activism by giving each person 1 point for each of the actions he or she has taken. If you wrote to a public official and signed a petition,

Indexes versus Scales 155 Index-Construction Logic Wrote a letter Signed a Gave money to a political toa Here are several types of political actions people petition may have taken. By and large, the different actions public official political cause represent similar degrees of political activism.. Wrote a Gave money political letter IIPersuaded To create an index of overall political activism, we to a political to the editor might give people 1 point for each of the actions someone to they've taken. candidate change her or his voting plans! Scale-Construction Logic Here are some political actions that represent very different degrees of activism: e.g., running for office represents a higher degree of activism than simply voting does. It seems likely, moreover, that anyone who has taken one of the more demanding actions would have taken all the easier ones as well. To construct a scale of political activism, we might score people according to which of the following \"ideal\" patterns comes closest to describing them, Ran for office Worked on a political campaign Contributed money to a political campaign I I I I I I I IVoted Yes No Yes Yes Yes o 234 FIGURE 6-1 Indexes versus Scales. Both indexes and scales seek to measure variables such as political activism. Whereas indexes count the number of indicators of the variable, scales take account of the differing intensities of those indicators. you'd get a total of 2 points. If I gave money to a worked on a campaign probably also gave some candidate and persuaded someone to change her or money and voted, This suggests that most people his vote, I'd get the same score as you. Using this \\vill fall into only one of five idealized action pat- approach, we'd conclude that you and I had the terns, represented by the illustrations at the bottom same degree of political activism, even though we of the figure. The discussion of scales. later in this had taken different actions. chapter, describes ways of identifying people with the type they most closely represent. The second part of Figure 6-1 describes the logic of scale construction. In this case, the actions As you might surmise, scales are generally su- clearly represent different degrees of political ac- perior to indexes, because scales take into consider- tivism, ranging from simply voting to running for ation the intensity with which different items office. Moreover, it seems safe to assume a pattern reflect the variable being measured, Also, as the of actions in this case. For example, all those who example in Figure 6-1 shows, scale scores convey contributed money probably also voted. Those who more information than index scores do. Again,

156 \" Chapter 6: Indexes, Scales, and Typologies be aware that the term scale is commonly misused and validating it- We'll conclude this discussion by to refer to measures that are only indexes\" Merely examining the construction of an index that pro- calling a measure a scale instead of an index vided interesting findings about the status of doesn't make it bettec women in different countries. There are two other misconceptions about scal- Item Selection ing that you should know abouL First, whether the combination of several data items results in a scale The first step in creating an index is selecting items almost always depends on the particular sample of for a composite index, which is created to measure observations under study. Certain items may form some variable. a scale within one sample but not within anotheL For this reason, do not assume that a given set of Face Validity items is a scale sin1ply because it has turned out that way in an earlier study. The first criterion for selecting items to be included in an index is face validity (or logical validity)\" If Second, the use of specific scaling techniques- you want to measure political conservatism, for ex- such as Guttman scaling, to be discussed-does not ample, each of your items should appear on its face ensure the creation of a scale\" Rather. such tech- to indicate conservatism (or its opposite, liberal- niques let us determine whether or not a set of ism)\" Political party affiliation would be one such items constitutes a scale. item. Another would be an item asking people to approve or disapprove of the views of a well- An examination of actual social science re- known conservative public figure. In constructing search reports vvilI show that researchers use in- an index of religiosity, you might consider items dexes much more frequently than they do scales\" such as attendance at religious services, acceptance Ironically, however, the methodological literature of certain religious beliefs, and frequency of prayer; contains little if any discussion of index construc- each of these appears to offer some indication of tion, whereas discussions of scale construction religiosity. abound\" There appear to be two reasons for this disparity\" First. indexes are more frequently used Unidimensionality because scales are often difficult or impossible to construct from the data at hand\" Second, methods The methodological literature on conceptualization of index construction seem so obvious and straight- and measurement stresses the need for unidimen- forvvard that they aren't discussed much\" sionality in scale and index construction\" That is, a composite measure should represent only one di- Constructing indexes is not a simple under- mension of a concept Thus, items reflecting reli- taking, howeveL The general failure to develop gious fundamentalism should not be included in a index construction techniques has resulted in measure of political conservatism, even though the many bad indexes in social research\" With this two variables might be empirically related to each in mind, I've devoted over half of this chapter to other. the methods of index construction. With a solid understanding of the logic of this activity, you'll be General or Specific better equipped to try constructing scales. Indeed, a carefully constructed index may turn out to be Although measures should tap the same dimen- a scale. sion, the general dimension you're attempting to measure may have many nuances. In the example Index Construction of religiosity, the indicators mentioned previously- ritual participation, belief. and so on-represent Let's look now at four main steps in the construc- different types of religiosity If you wished to focus tion of an index: selecting possible items, examin- ing their empirical relationships, scoring the index,

Index Construction\" 157 on ritual participation in religion, you should empirically related to each other, we can reason- choose items specifically indicating this type of reli- ably argue that each reflects the same variable, and giosity: attendance at religious services and other we may include them both in the same index\" rituals such as confession, bar mitzvah, bowing to- There are two types of possible relationships among ward Mecca, and the like. If you wished to measure items: bivariate and multivariate. religiosity in a more general way, you would in- clude a balanced set of items, representing each of Bivariate Relationships the different types of religiosity. Ultimately, the na- ture of the items you include will determine how A bivariate relationship is, sin1ply put, a relationship specifically or generally the variable is measured. between two variables. Suppose we want to mea- sure respondents' support for US\" participation in Variance the United Nations. One indicator of different levels of support might be the question \"Do you feel the In selecting items for an index, you must also be concerned with the amount of variance they pro- U.S. financial support of the UN is D Too high vide. If an item is intended to indicate political D About right D Too low?\" conservatism, for example, you should note what proportion of respondents would be identified as A second indicator of support for the United conservatives by that item. If a given item identified Nations might be the question \"Should the United no one as a conservative or everyone as a conserva- States contribute military personnel to UN peace- tive-for example, if nobody indicated approval of a radical-right political figure-that item would not keeping actions? D Strongly approve D Mostly be very useful in the construction of an index\" approve D Mostly disapprove D Strongly To guarantee variance, you have two options\" disapprove\"\" First. you may select several items the responses to Both of these questions, on their face, seem to which divide people about equally in terms of the variable, for example, about half conservative and reflect different degrees of support for the United half liberal. Although no single response would jus- Nations. Nonetheless, some people might feel the tify the characterization of a person as very conser- United States should give more money but not pro- vative, a person who responded as a conservative vide troops\" Others might favor sending troops but on all items might be so characterized\" cutting back on financial support\" The second option is to select items differing in If the two items both reflect degrees of the variance. One item might identify about half the same thing, however, we should expect responses subjects as conservative, while another might iden- to the two items to generally correspond with each tify few of the respondents as conservatives\" Note othec Specifically, those who approve of military that this second option is necessary for scaling, and support should be more likely to favor financial it is reasonable for index construction as well. support than those who disapprove of military sup- port would\" Conversely, those who favor financial Examination ofEmpirical Relationships support should be more likely to favor military support than those disapproving of financial sup- The second step in index construction is to exam- port would\" If these expectations are met, we say ine the empirical relationships among the items be- there is a bivariate relationship between the two ing considered for inclusion. (See Chapter 14 for items\" more.) An empirical relationship is established when respondents' answers to one question-in a Here's another example. Suppose we want to questionnaire, for example-help us predict how determine the degree to which respondents feel they'll answer other questions. If two items are women have the right to an abortion. We might ask (1) \"Do you feel a woman should have the right to an abortion when her pregnancy was the result of rape?\" and (2) \"Do you feel a woman should have the right to an abortion if continuing her pregnancy would seriously threaten her life?\"

158 Chapter 6: Indexes, Scales, and Typologies by Kenneth Bollen nonwhite or female increases the likelihood of experiencing discrimina- tion, so both are good indicators of the variable. But we would not expect Department ofSociology, University ofNorth Carolina, the race and gender ofindividuals to be strongly associated. Chapel Hilf Or, we may measure social interaction with three indicators:time hile it often makes sense to expect indicators ofthe same vari- spent with friends, time spent with family, and time spent with cowork- able to be positively related to one another, as discussed in the ersThough each indicator is valid, they need not be positively correlated. text, this is not always the case. Time spent with friends, for instance, may be inversely related to time Indicators should be related to one another ifthey are essentially spent with family. Here, the three indicators\"cause\"the degree ofsocial \"effects\"of avariable. For example, to measure self-esteem, we might ask interaction. aperson to indicate whether he or she agrees or disagrees with the state- ments (1) \"I am agood person\" and (2) \"I am happy with myself\" Aper- As afinal example, exposure to stress may be measured by whether son with high self-esteem should agree with both statements while one aperson recently experienced divorce, death ofaspouse, or loss of ajob. vlith low self-esteem would probably disagree with both. Since each in- Though any of these events may indicate stress, they need not correlate dicator depends on or\"refiects\" self-esteem, we expect them to be posi- with one another. tively correlated. More generally, indicators that depend on the same vari- able should be associated with one another ifthey are valid measures. In short, we expect an association between indicators that depend But, this is not the case when the indicators are the \"cause\" rather on or\"reflect\"avariable, that is, ifthey are the\"effects\" ofthe variable. than the\"effect\"of avariable. In this situation the indicators may correlate But if the variable depends on the indicators-ifthe indicators are the positively, negatively, or not at all For example, we could use gender and \"causes\"-those indicators may be either positively or negatively corre- race as indicawrs of the variable exposure to discrimination. Being lated, or even unrelatedTherefore, we should decide whether indicators are causes or effects of avariable before using their intercorrelations to assess their validity Granted, some respondents might agree Vlrith rather subtle. \"'Cause' and 'Effect' Indicators\" ex- item (1) and disagree ,\\rith item (2); others ,\\rill do amines some of these subtleties. just the reverse. However, if both items tap into some general opinion people have about the issue Be wary of items that are not related to one an- of abortion, then the responses to these two items other empirically: It's unlikely that they measure should be related to each other. Those who support the same variable. You should probably drop any the right to an abortion in the case of rape should item that is not related to several other items. be more likely to support it if the woman's life is threatened than those who disapproved of abortion At the same time, a very strong relationship be- in the case of rape would. This would be another tween two items presents a different problem. If example of a bivariate relationship. two items are perfectly related to each other, then only one needs to be included in the index; be- You should examine all the possible bivariate cause it completely conveys the indications pro- relationships among the several items being consid- vided by the other, nothing more would be added ered for inclusion in an index, in order to deter- by including the other item. (This problem mil be- mine the relative strengths of relationships among come even clearer in the next section.) the several pairs of items. Percentage tables, corre- lation coefficients (see Chapter 16), or both may Here's an example to illustrate the testing of bi- be used for this purpose. How we evaluate the variate relationships in index construction. I once strength of the relationships, however, can be conducted a survey of medical school faculty mem- bers to find out about the consequences of a \"sci- entific perspective\" on the quality of patient care

Index Construction 159 provided by physicians. The primary intent was to contribution as medical researchers. In response to determine whether scientifically inclined doctors the second item-ultimate medical interests-ap- treated patients more in1personally than other doc- proximately two-thirds chose the scientific answer, tors did. saying they were more interested in learning about basic mechanisms than learning about total patient The survey questionnaire offered several pos- management. In response to the third item-read- sible indicators of respondents' scientific perspec- ing preferences-about 80 percent chose the sci- tives. Of those, three items appeared to provide es- entific answer. pecially clear indications of whether the doctors were scientifically oriented: These three questionnaire items can't tell us how many \"scientists\" there are in the sample, for 1. As a medical school faculty member, in what none of them is related to a set of criteria for what capacity do you feel you can make your great- constitutes being a scientist in any absolute sense. est teachil1g contribution: as a practicing physi- Using the items for this purpose would present us cian or as a medical researcher? mth the problem of three quite different estimates of how many scientists there were in the sample. 2. As you continue to advance your own medical knowledge, would you say your ultimate medi- However, these items do provide us mth three cal interests lie primarily in the direction of to- independent indicators of respondents' relative in- tal patient management or the understanding clinations toward science in medicine. Each item of basic mechanisms? [The purpose of tlus item separates respondents into the more scientific and was to distinguish those who were mostly in- the less scientific. But each grouping of more or less terested in overall patient care from those scientific respondents will have a somewhat differ- mostly interested in biological processes.] ent memberslup from the others. Respondents who seem scientific in terms of one item \"rill not seem 3. In the field of therapeutic research, are YOll scientific in terms of another. Nevertheless, to the generally more interested in articles reporting extent that each item measures the same general evaluations of the effectiveness of various treat- dimension, we should find some correspondence ments or articles exploring the basic rationale among the several groupings. Respondents who underlying the treatments? [Similarly, I wanted appear scientific in terms of one item should be to distinguish those more interested in articles more likely to appear scientific in their response to dealing mth patient care from those more in- another item than would those who appeared non- terested in biological processes.] scientific in their response to the first. In other words, we should find an association or correlation (Babbie 1970.· 27-31) between the responses given to two items. For each of these items, we might conclude Figure 6-2 shows the associations among the that those respondents who chose the second an- responses to the three items. Three bivariate swer are more scientifically oriented than respon- tables are presented, shomng the distribution of dents who chose the first answer. Though this responses for each possible pairing of items. An comparative conclusion is reasonable, we should examination of the three bivariate relationships not be misled into tlunking that respondents who presented in the figure supports the suggestion that chose the second answer to a given item are scien- the three items all measure the same variable: sci- tists in any absolute sense. They are simply more emific oriel1tatiol1. To see why tl1is is so, let's begin by scientifically oriented than those who chose the looking at the first bivariate relationship in the first answer to the item. table. The table shows that faculty who responded that \"researcher\" was the role in which they could To see this point more clearly, let's examine the make their greatest teaching contribution were distribution of responses to each item. From the more likely to identify their ultimate medical first item-greatest teaching contribution-only interests as \"basic mechanisms\" (87 percent) than about one-third of the respondents appeared sci- entifically oriented. That is, approximately one- third said they could make their greatest teaching

160 Chapter 6: Indexes, Scales, and Typologies a. Greatest Teaching Contribution The same general conclusion applies to the other bivariate relationships. The strength of the re- Physician Researcher lationship between reading preferences and ulti- mate medical interests may be summarized as a 38 .... Total patient percentage point difference, and the strength of the management relationship between reading preferences and the I/) two teaching roles as a 21 percentage point differ- Basic ence. In summary, then, each single item produces a QI mechanisms different grouping of \"scientific\" and \"nonscientific\" respondents. However, the responses given to each caeQI \"Q-I of the items correspond, to a greater or lesser degree, e=:;:: C'a to the responses given to each of the other items. - .-:::) (.) Initially, the three items were selected on the 'C basis of face validity-each appeared to give some QI indication of faculty members' orientations to sci- ence. By examining the bivariate relationship be- ::iE tween the pairs of items, we have found support for the expectation that they all measure basically 100% 100% the same thing. However, that support does not (159) sufficiently justify including the items in a compos- (268) ite index. Before combining them in a single index, we need to examine the multivariate relationships b. Reading Preferences among the several variables. Effectiveness Rationale Multivariate Relationships among Items Ui Total patient Figure 6- 3 categorizes the sample respondents into management four groups according to (1) their greatest teaching QI contribution and (2) their reading preferences. The Basic numbers in parentheses indicate the number of re- .Q..I...\"Q..-I mechanisms spondents in each group. Thus, 66 of the faculty members who said they could best teach as physi- C\\l c: cians also said they preferred articles dealing with the effectiveness of treatments. For each of the four e=;:: co groups, the figure presents the percentage of those who say they are ultimately more interested in ba- -:::)'6(.) sic mechanisms. So, for example, of the 66 faculty mentioned, 27 percent are primarily interested in QI basic mechanisms. ::iE The arrangement of the four groups is based on a previously drawn conclusion regarding scientific 100% 100% orientations. The group in the upper left corner of (349) the table is presumably the least scientifically ori- (78) ented, based on greatest teaching contribution and reading preference. The group in the lower right c. Reading Preferences corner is presumably the most scientifically ori- ented in terms of those items. Effectiveness Rationale .Cc-:l c: Physician '(5'00; Researcher QI ::::l 1-.0 ..,\"i: mu1/) .0c..:. ..Q..I \"- Cl 100% 100% (349) (78) FIGURE 6-2 Bivariate Relationships among Scientific Orientation Items. If several indicators are measures of the same variable, then they should be empirically correlated with one another. were those who answered \"physician\" (51 per- cent). The fact that the \"physicians\" are about evenly split in their ultimate medical interests is ir- relevant for our purposes. It is only relevant that they are less scientific in their medical interests than the \"researchers.\" The strength of this rela- tionship may be summarized as a 36 percentage point difference.,

Index Construction 161 Percent Interested in Basic Mechanisms Percent Interested in Basic Mechanisms Greatest Teaching Contribution Greatest Teaching Contribution Physician Researcher Physician Researcher I/) Effectiveness I/) Effectiveness of treatments .Cc_: l Qc(.:)l .g_' c~: of treatments Rationale 'CQI behind 'CQI C\\l \"- treatments QI.l!! m.! Rationale a:~ a: a~. behind a. treatments FIGURE 6-3 FIGURE 6-4 Trivariate Relationships among Scientific Orientation Items. In- Hypothetical Trivariate Relationship among Scientific Orienta- dicators of the same variable should be correlated in amUlti- tion Items. This hypothetical relationship would suggest that variate analysis as well as in bivariate analyses. not all three indicators would contribute effectively to a com- pOSite index. Recall that expressing a primary interest in ba- relationship between reading preferences and sic mechanisms was also taken as an indication of ultimate medical interests was summarized as a scientific orientation. As we should expect then, 38 percentage point difference. Looking only at those in the lower right corner are the most likely the \"physicians\" in Figure 6-3, we see that the to give this response (89 percent), and those in the relationship between the other two items is now upper left corner are the least likely (27 percent). 31 percentage points. The same relationship is The respondents who gave mixed responses in found among the \"researchers\" in the second terms of teaching contributions and reading prefer- column. ences have an intermediate rank in their concern for basic mechanisms (58 percent in both cases). The importance of these observations becomes clearer when we consider what might have hap- This table tells us many things. First we may pened. In Figure 6-4, hypothetical data tell a much note that the original relationships between pairs of different story than the actual data in Figure 6- 3 items are not significantly affected by the presence do. As you can see, Figure 6-4 shows that the origi- of a third item. Recall. for example, that the rela- nal relationship between teaching role and ultimate tionship between teaching contribution and ulti- medical interest persists, even when reading prefer- mate medical interest was summarized as a 36 per- ences are introduced into the picture. In each row centage point difference. Looking at Figure 6-3, we of the table, the \"researchers\" are more likely to see that among only those respondents who are eXlJress an interest in basic mechanisms than the most interested in articles dealing with the effec- \"physicians\" are. Looking down the columns, how- tiveness of treatments, the relationship between ever, we note that there is no relationship between teaching contribution and ultimate medical interest reading preferences and ultimate medical interest. is 31 percentage points (58 percent minus 27 per- If we know whether a respondent feels he or she cent: first row). The same is true among those most can best teach as a physician or as a researcher, interested in articles dealing with the rationale for knOwing the respondent's reading preference adds treatments (89 percent minus 58 percent: second nothing to our evaluation of his or her scientific row), The original relationship between teaching orientation. If something like Figure 6-4 resulted contribution and ultimate medical interest is essen- from the actual data, we would conclude that read- tially the same as in Figure 6-2, even among those ing preference should not be included in the same respondents judged as scientific or nonscientific in index as teaching role, because it contributed noth- terms of reading preferences. ing to the composite index. We can draw the same conclusion from the columns in Figure 6-3.. Recall that the original

162 Chapter 6: Indexes, Scales, and Typologies This example used only three questionnaire tends to support this method-that items be items. If more were being considered, then more- weighted equally unless there are compelling rea- complex multivariate tables would be in order, con- sons for differential weighting. That is, the burden structed of four, five, or more variables. The pur- of proof should be on differential weighting; equal pose of this step in index construction, again, is to weighting should be the norm. discover the simultaneous interaction of the items in order to determine which should be included in Of course, this decision must be related to the the same index. These kinds of data analyses are earlier issue regarding the balance of items chosen. easily accomplished using programs such as SPSS If the index is to represent the composite of slightly and MicroCase. They are usually referred to as different aspects of a given variable, then you should cross- tabulations. give each aspect the same weight. In some instances, however, you may feel that two items reflect essen- Index Scoring tially the same aspect, and the third reflects a differ- ent aspect. If you want to have both aspects equally When you've chosen the best items for your index, represented by the index, you might give the differ- you next assign scores for particular responses, ent item a weight equal to the combination of the thereby creating a single composite measure out of two similar ones. For instance, you could assign a the several items. There are two basic decisions to maximum score of 2 to the different item and a be made in this step. maximum score of I to each of the similar ones. First you must decide the desirable range of Although the rationale for scoring responses the index scores. A primary advantage of an index should take such concerns as these into account over a single item is the range of gradations it offers typically researchers experiment with different in the measurement of a variable. As noted earlier, scoring methods, examining the relative weights political conservatism might be measured from given to different aspects but at the same time wor- \"very conservative\" to \"not at all conservative\" or rying about the range and distribution of cases pro- \"very liberaL\" How far to the extremes, then, vided. Ultimately, the scoring method chosen will should the index extend? represent a compromise among these several de- mands. Of course, as in most research activities, In this decision, the question of variance enters such a decision is open to revision on the basis of once more. Almost always, as the possible extremes later examinations. Validation of the index, to be of an index are extended, fewer cases are to be discussed shortly, may lead the researcher to recy- found at each end. The researcher who wishes to cle his or her efforts toward constructing a com- measure political conservatism to its greatest ex- pletely different index. treme (somewhere to the right of Attila the Hun, as the saying goes) may find there is almost no one in In the example taken from the medical school that category. At some point additional gradations faculty survey, I decided to weight the items do not add meaning to the results. equally, because I'd chosen them, in part, because they represent slightly different aspects of the over- The first decision, then, concerns the conflict- all variable scientific orientation. On eacl1 of the items, ing desire for (I) a range of measurement in the in- the respondents were given a score of I for choos- dex and (2) an adequate number of cases at each point in the index. You'll be forced to reach some °ing the \"scientific\" response to the item and a score kind of compromise between these conflicting desires. of for choosing the \"nonscientific\" response. Each respondent then, could receive a score of 0, 1, 2, The second decision concerns the actual assign- or 3. This scoring method provided what I consid- ment of scores for each particular response. Basi- ered a useful range of variation-four index cate- cally you must decide whether to give items in the gories-and also provided enough cases for analy- index equal weight or different weights. Although there are no firm rules, I suggest-and practice sis in each category. Here's a similar example of index scoring, from a study of work satisfaction. One of the key

Index Construction 163 variables was job-related depressiol1, measured by an in index construction. There are, however, several index composed of the following four items, which methods of dealing with these problems. asked workers how they felt when thinking about themselves and their jobs: First if there are relatively few cases with miss- ing data, you may decide to exclude them from the e \"I feel dovvnhearted and blue.\" construction of the index and the analysiS. (I did e \"I get tired for no reason.\" this in the medical school faculty example.) The e \"I find myself restless and can't keep stilL\" primary concerns in this instance are whether e \"I am more irritable than usual.\" the numbers available for analysis vvill remain sufficient and whether the exclusion ,viII result in The researchers, Amy Wharton and James an unrepresentative sample whenever the index, Baron, report, \"Each of these items was coded: excluding some of the respondents, is used in the analysis. The latter possibility can be examined 4 often, 3 = sometimes, 2 rarely, I = never.\" through a comparison-on other relevant vari- ables-of those who would be included and ex- They go on to explain how they measured another cluded from the index. variable, job-related self-esteem: Second, you may sometimes have grounds for Job-related self-esteem was based on four items treating missing data as one of the available re- asking respondents how they saw themselves sponses. For example, if a questionnaire has asked in their work: happy/sad; successful/not suc- respondents to indicate their participation in vari- cessful; important/not important; doing their ous activities by checking \"yes\" or \"no\" for each, bestlnot doing their best. Each item ranged many respondents may have checked some of the from 1 to 7, 'where 1 indicates a self-perception activities \"yes\" and left the remainder blank. In of not being happy, successfuL important, or such a case, you might decide that a failure to an- doing one's best. swer meant \"no,\" and score missing data in this case as though the respondents had checked the (1987578) \"no\" space. As you look through the social research litera- Third, a careful analysis of missing data may ture, you'll find numerous similar examples of cu- yield an interpretation of their meaning. In con- mulative indexes being used to measure variables. structing a measure of political conservatism, for Sometimes the indexing procedures are controver- example, you may discover that respondents who siaL as evidenced in \"What Is the Best College in failed to answer a given question were generally as the United States?\" conservative on other items as those who gave the conservative answer. In another example, a recent Handling Missing Data study measuring religious beliefs found that people vvho answered \"don't know\" about a given belief Regardless of your data-collection method, you'll were almost identical to the \"disbelievers\" in their frequently face the problem of missing data. In a answers about other beliefs. (Note: You should take content analysis of the political orientations of these examples not as empirical guides in your own newspapers, for example, you may discover that a studies but only as suggestions of general ways to particular newspaper has never taken an editorial analyze your own data.) Whenever the analysis of position on one of the issues being studied. In an missing data yields such interpretations, then, you experimental design involving several retests of may decide to score such cases accordingly. subjects over time, some subjects may be unable to participate in some of the sessions. In virtually There are many other ways of handling the every survey, some respondents fail to answer problem of missing data. If an item has several pos- some questions (or choose a \"don't know\" re- sible values, you might assign the middle value to sponse). Although missing data present problems at cases ,vith missing data; for example, you could as- all stages of analysis, they're especially troublesome sign a 2 if the values are 0, L 2, 3, and 4. For a

164 \" Chapter 6: Indexes, Scales, and Typologies Each year the newsmagazine US News and Worfd Report issues a allowed to count their research budgets in their per-student ex- special report ranking the nation's colleges and universities Their penditures, though students get no direct benefit from costly re- rankings reflect an index,created from several items:educational expen- search their professors are doing outside ofclass. ditures per student, graduation rates,selectivity (percentage accepted of those applying), average SAT scores of first-year students, and similar in- In its\"best colleges\"issue two years ago, US NeVIS made precisely this point,saying it considered only the rank ordering of per- dicators of quality. student expenditures, ratherthan the actual amounts,on the Typically, Harvard is ranked the number one school in the nation, grounds that expenditures at institutions with large research pro- grams and medical schools are substantially higher than those at followed by Yale and Princeton However, the 1999\"America's Best Col- the rest ofthe schools in the category. In other words,justtwo leges\"issue shocked educators, prospective college students, and their years ago,the magazine felt it unfair to give Caltech,MIT,and Johns parentsThe California Institute ofTechnology had leaped from ninth Hopkins credit for having lots offancy laboratories that don't place in 1998 to first place ayear laterWhile Harvard,Yale, and Princeton actually improve undergraduate education. still did well, they had been supplantedWhat had happened at Caltech to produce such aremarkable surge in quality? Gottlieb reviewed each ofthe changes in the index and then asked how 1998's ninth-ranked Caltech would have done had the revised in- The answer was to be found at US News and Worfd Report, not at dexing formula been in place ayear earlier. His conclusion: Caltech would CaltechThe newsmagazine changed the structure ofthe ranking index in have been first in 1998 as well.ln other words, the apparent improve- 1999, which made abig difference in how schools fared. ment was solely afunction of how the index was scored. Bruce Gottlieb (1999) gives this example of how the altered scoring Composite measures such as scales and indexes are valuable tools for understanding society. However, it's important that we know how made adifference. those measures are constructed and what that construction implies. So, how did Caltech come out on top? Well, one variable in a So, what's really the best college in the United States? It depends on how you define\"best\"There is no\"really best,\" only the various social school's ranking has long been educational expenditures per stu- constructions we can create. dent, and Caltech has traditionally been tops in this category. But Sources: us. News and World RepoTi,\"America's Best CollegeS,\" August 30, 1999; Bruce until this year, US News considered only aschool's ranking in this Books:How U.s. NeVIS Cheats in Picking Its 'Best ;\\,merican category-first, second, etc-rather than how much it spent rel- (hnp:llslate.msn.comldefault.aspxlid =34027). ative to other schools It didn't matter whether Caltech beat Harvard by $1 or by $10Q,000Two other schools that rose in their rankings this year were MIT (from fourth to third) and Johns Hopkins (from 14th to seventh) All three have high per-student expenditures and three are in the hard sciences. Universities are continuous variable such as age, you could simi- you only have four observations for a particular sub- larly assign the mean to cases with missing data ject If the subject has earned 4 points out of a pos- (more on this in Chapter 14). Or, missing data can sible 4, you might assign an index score of 6; if the be supplied by assigning values at random. All of subject has 2 points (half the possible score on four these are conservative solutions because they items), you could assign a score of 3 (half the pos- weaken the \"purity\" of your index and reduce the sible score on six observations). likelihood that it will relate to other variables in ways you may have hypothesized. The choice of a particular method to be used depends so much on the research situation that I If you're creating an index out of a large number can't reasonably suggest a single \"best\" method or of items, you can sometimes handle missing data by rank the several I've described. Excluding all cases using proportions based on what is observed. Sup- with missing data can bias the representativeness of pose your index is composed of six indicators, and the findings, but including such cases by assigning

Index Construction \" 165 scores to missing data can influence the nature of Percentage who said they were Index ofScientific Orientations the findings. The safest and best method is to con- more interested in basic struct the index using more than one of these mechanisms a 23 methods and see whether you reach the same con- clusions using each of the indexes. Understanding ?? ?? ?? ?? your data is the final goal of analysis anyway. If you take a minute to reflect on the table, you Index Validation may see that we already know the numbers that ao in two of the cells. To get a score of 3 on the inde; Up to this point, we've discussed all the steps in the respondents had to say \"basic mechanisms\" in re- selection and scoring of items that result in an in- sponse to this question and give the \"scientific\" dex purporting to measure some variable. If each of answers to the other two items as welL Thus, the preceding steps is carried out carefully, the like- 100 percent of the 3's on the index said \"basic lihood of the index actually measuring the variable mechanisms.\" By the same token, all the O's had to is enhanced. To demonstrate success, however, we answer this item with \"total patient management\" must show that the index is valid. Following the Thus, 0 percent of those respondents said \"basic basic logic of validation, we assume that the index mechanisms.\" Here's how the table looks with the provides a measure of some variable; that is, the information we already know. scores on the index arrange cases in a rank order in terms of that variable. An index of political conser- Percentage who said they were Index ofScientific Orientations vatism rank-orders people in terms of their relative more interested in basic conservatism. If the index does that successfully, mechanisms a 23 then people scored as relatively conservative on the index should appear relatively conservative in all a ?? ?? 100 other indications of political orientation, such as their responses to other questionnaire items, There If the individual item is a good reflection of the are several methods of validating an index. overall index, we should expect the 1's and 2's to fill in a progression between 0 percent and 100 per- Item Analysis cent. More of the 2's should choose \"basic mecha- nisms\" than 1'So This result is not guaranteed by the The first step in index validation is an internal vali- way the index was constructed, however; it is an dation called item analysis. In item analysis, you empirical question-one we answer in an item examine the extent to which the index is related to analysis. Here's how this particular item analysis (or predicts responses to) the individual items it turned out comprises. Here's an illustration of this step. Percentage who said they were Index ofScientific Orientations In the index of scientific orientations amanab more interested in basic medical school faculty, index scores ranged from 0 mechanisms a 23 (most interested in patient care) to 3 (most inter- ested in research). Now let's consider one of the a 16 91 100 items in the index: whether respondents wanted to advance their own knowledge more with regard to item analysis An assessment of whether each of total patient management or more in the area of the items included in a composite measure makes an basic mechanisms. The latter were treated as being independent contribution or merely duplicates the more Scientifically oriented than the former. The contribution of other items in the measure. following empty table shows how we would exam- ine the relationship between the index and the in- dividual item.

166 Chapter 6: Indexes, Scales, and Typologies As you can see, in accord with our assumption TABLE 6-1 that the 2's are more scientifically oriented than the 1's, we find that a higher percentage of the 2's Validation ofScientific Orientation Index (91 percent) say \"basic mechanisms\" than the l's (16 percent). Percent interested in Index ofScientific Orientation attending scientific lectures Low High An item analysis of the other two components at the medical school of the index yields similar results, as shown here. a 23 Percent who say faculty Percentage who said they could Index ofScientific Orientations members should have 34 42 46 65 teach best as medical researchers experience as medical 0 23 researchers 43 60 65 89 Percentage who said they preferred reading about rationales 0 4 14 100 Percent who would prefer 0 8 32 66 0 80 97 100 faculty duties involving research activities only 61 76 94 99 Each of the items, then, seems an appropriate component in the index. Each seems to reflect the Percent who engaged in same quality that the index as a whole measures. research during the preceding academic year In a complex index containing many items, this step provides a convenient test of the independent questionnaire. Of course, we're talking about rela- contribution of each item to the index. If a given tive conservatism, because we can't define conser- item is found to be poorly related to the index, it vatism in any absolute way. However, those respon- may be assumed that other items in the index can- dents scored as the most conservative on the index cel out the contribution of that item, and it should should score as the most conservative in answering be excluded from the index. If the item in question other questions. Those scored as the least conserva- contributes nothing to the index's power, it should tive on the index should score as the least conser- be excluded. vative on other items. Indeed, the ranking of groups of respondents on the index should predict Although item analysis is an important first test the ranking of those groups in answering other of an index's validity, it is not a sufficient test. If the questions dealing with political orientations. index adequately measures a given variable, it should successfully predict other indications of that In our example of the scientific orientation in- variable. To test this, we must turn to items not in- dex, several questions in the questionnaire offered cluded in the index. the possibility of such external validation. Table 6-1 presents some of these items, which provide External Validation several lessons regarding index validation. First we note that the index strongly predicts the responses People scored as politically conservative on an in- to the validating items in the sense that the rank dex should appear conservative by other measures order of scientific responses among the four groups as well. such as their responses to other items in a is the same as the rank order provided by the index itself. That is, the percentages reflect greater sci- external validation The process of testing the entific orientation as you read across the rows of validitv of a measure, such as an index or scale, by the table. At the same time, each item gives a dif- exami;ling its relationship to other. presumed indi- ferent description of scientific orientations overalL cators of the same variable. If the index really mea- For example, the last validating item indicates that sures prejudice, for example, it should correlate with the great majority of all faculty were engaged in other indicators of prejudice.

Index Construction 167 research during the preceding year. If this were the There's no cookbook solution to this problem; it only indicator of scientific orientation, we would is an agony serious researchers must learn to sur- conclude that nearly all faculty were scientific vive. Ultimately, the wisdom of your decision to ac- Nevertheless, those scored as more scientific on the cept an index will be determined by the usefulness index are more likely to have engaged in research of that index in your later analyses. Perhaps you'll than were those who were scored as relatively less initially decide that the index is a good one and scientific The third validating item provides a dif- that the validators are defective, but you'll later find ferent descriptive picture: Only a minority of the that the variable in question (as measured by the faculty overall say they would prefer duties limited index) is not related to other variables in the ways exclusively to research. Nevertheless, the relative you expected. You may then have to compose a percentages giving this answer correspond to the new index. scores assigned on the index. Bad Index versus Bad Vafidators The Status of Women: An lfIustration ofIndex Construction Nearly every index constructor at some time must face the apparent failure of external items to vali- For the most part, our discussion of index construc- date the index. If the internal item analysis shows tion has focused on the specific context of survey inconsistent relationships between the items in- research, but other types of research also lend cluded in the index and the index itself, something themselves to this kind of compOsite measure. For is wrong with the index. But if the index fails to example, when the United Nations (1995) set out predict strongly the external validation items, the to examine the status of women in the world, they conclusion to be drawn is more ambiguous. In this chose to create two indexes, reflecting two different situation we must choose between two pOssibilities: dimensions. (1) the index does not adequately measure the variable in question, or (2) the validation items do The Gender-related Development Index (GDI) not adequately measure the variable and thereby compared women to men in terms of three indica- do not provide a sufficient test of the index. tors: life eX1Jectancy education, and income. These indicators are commonly used in monitoring the Having worked long and conscientiously on the status of women in the world. The Scandinavian construction of an index, you'll likely find the sec- countries of Nonvay, Sweden, Finland, and Den- ond conclusion compelling. Typically, you'll feel mark ranked highest on this measure. you have included the best indicators of the vari- able in the index; the validating items are, there- The second index, the Gender Empowerment fore, second~rate indicators. Nevertheless, you Measure (GEM), aimed more at power issues and should recognize that the index is purportedly a comprised three different indicators: very powerful measure of the variable; thus, it should be somewhat related to any item that taps €I The proportion of parliamentary seats held by the variable even poorly. women When external validation fails, you should re- €I The proportion of administrative, managerial. examine the index before deciding that the validat- professional. and technical positions held by ing items are insufficient. One way to do this is to women examine the relationships between the validating items and the individual items included in the in- €I A measure of access to jobs and wages dex. If you discover that some of the index items relate to the validators and others do not you'll Once again, the Scandinavian countries ranked have improved your understanding of the index high but were joined by Canada, New Zealand, the as it was initially constituted. Netherlands, the United States, and Austria. Having two different measures of gender equality rather than one allowed the researchers to make more sophisticated distinctions. For example, in several

168 Chapter 6: Indexes, Scales, and Typologies countries, most notably Greece, France, and Japan, purposes. NO'IV we'll turn our attention from the women fared relatively well on the GDI but quite construction of indexes to an examination of scal- poorly on the GEM. Thus, while women were do- ing techniques. ing fairly well in terms of income, education, and life expectancy, they were still denied access to Scale Construction power. And whereas the GDI scores were higher in the wealthier nations than in the poorer ones, Good indexes provide an ordinal ranking of cases GEM scores showed that women's empowerment on a given variable. All indexes are based on this depended less on national wealth, with many poor, kind of assumption: A senator who voted for seven developing countries outpacing some rich, indus- conservative bills is considered to be more conser- trial ones in regard to such empowerment. vative than one who only voted for four of them. What an index may fail to take into account, By examining several different dinlensions of however, is that not all indicators of a variable the variables involved in their study, the UN re- are equally lmPortant or equally strong. The first searchers also uncovered an aspect of women's senator might have voted in favor of seven mildly earnings that generally goes unnoticed. population conservative bills, whereas the second senator Communications International (1996: 1) has sum- might have voted in favor of four extremely con- marized the finding nicely: servative bills. (The second senator might have considered the other seven bills too liberal and Every year, women make an invisible contribu- voted against them.) tion of eleven trillion US. dollars to the global economy, the UNDP [United Nations Develop- Scales offer more assurance of ordinality by ment Programme] report says, counting both tapping the intensity structures among the indica- unpaid work and the underpay-ment of women's tors. The several items going into a composite mea- work at prevailing market prices. This \"under- sure may have different intensities in terms of the evaluation\" of women's work not only under- variable. Many methods of scaling are available. mines their purchasing power, says the 1995 We'll look at four scaling procedures to illustrate HDR [Human Development Report], but also the variety of teclmiques available, along with a reduces their already low social status and af- teclmique called the semantic differentiaL Al- fects their ability to own property and use credit. though these examples focus on questionnaires, Mahbub ul Haq, the principal author of the re- the logic of scaling, like that of indexing, applies to port, says that \"if women's work were accurately other research methods as welL reflected in national statistics, it would shatter the myth that men are the main bread\\vinner of Bogardus Social Distance Scale the world.\" The UNDP report finds that women work longer hours than men in almost every Let's suppose you're interested in the extent to country, including both paid and unpaid duties. which US. citizens are willing to associate with, In developing countries, women do approxi- say, sex offenders. You might ask the following mately 53% of all work and spend two-thirds of questions: their work time on unremunerated activities. In industrialized countries, women do an average 1. Are you willing to permit sex offenders to live of 51 % of the total work, and-like their coun- in your country? terparts in the developing world-perform about two- thirds of their total labor without pay. 2. Are you willing to permit sex offenders to live Men in industrialized countries are compen- in your community? sated for two-thirds of their work. 3. Are you willing to permit sex offenders to live As you can see, indexes can be constructed in your neighborhood? from many different kinds of data for a variety of

Scale Construction 169 4. Would you be willing to let a sex offender live Motoko Lee, Stephen Sapp, and Melvin Ray next door to you? (1996) noticed an implicit element in the Bogardus social distance scale: It looks at social distance from 5. Would you let your child marry a sex offender? the point of view of the majority group in a society. These researchers decided to turn the tables and These questions increase in terms of the close- create a \"reverse social distance\" scale: looking at ness of contact with sex offenders. Beginning with social distance from the perspective of the minority the original concern to measure willingness to as- group. Here's how they framed their questions sociate with sex offenders, you have thus devel- (1996: 19): oped several questions indicating differing degrees of intensity on this variable. The kinds of items pre- Considering typical Caucasian Americans you sented constitute a Bogardus social distance have known, not any specific person nor the scale (created by Emory Bogardus). This scale is a worst or the best, circle Y or N to express your measurement technique for determining the will- opinion. ingness of people to participate in social relations- Y N 5. Do they mind your being a citizen in this of varying degrees of closeness-with other kinds country? of people. Y N 4. Do they mind your living in the same neighborhood? The clear differences of intensity suggest a YN 3. Would they mind your living next structure among the items. Presumably if a person to them? is willing to accept a given kind of association, he or Y N 2. Would they mind your becoming a close she would be willing to accept all those preceding it friend to them? in the list-those with lesser intensities. For ex- Y N 1. Would they mind your becoming their ample, the person who is willing to permit sex of- kin by marriage? fenders to live in the neighborhood will surely ac- cept them in the community and the nation but As with the original scale, the researchers found mayor may not be willing to accept them as that knowing the number of items minority re- next-door neighbors or relatives. This, then, is the spondents agreed with also told the researchers logical structure of intensity inherent among the which ones were agreed with: 98.9% percent of items. the time in this case. Empirically, one would expect to find the Thurstone Scales largest number of people accepting co-citizenship and the fewest accepting intermarriage. In this Often, the inherent structure of the Bogardus social sense, we speak of \"easy items\" (for example, resi- distance scale is not appropriate to the variable be- dence in the United States) and \"hard items\" (for ing measured. Indeed, such a logical structure example, intermarriage). More people agree to the among several indicators is seldom apparent. easy items than to the hard ones. With some in- evitable exceptions, logic demands that once a per- Bogardus social distance scale A measurement son has refused a relationship presented in the technique for determining the willingness of people scale, he or she will also refuse all the harder ones to participate in social relations-of varying degrees that follow it. of closeness-with other kinds of people. It is an es- pecially efficient tedmique in that one can summa- The Bogardus social distance scale illustrates rize several discrete answers without losing any of the important economy of scaling as a data- the original details of the data. reduction device. By knowing how many relation- ships with sex offenders a given respondent will accept, we know which relationships were ac- cepted. Thus, a single number can accurately summarize five or six data items without a loss of information.

170 Chapter 6: Indexes, Scales, and Typologies A Thurstone scale (created by Louis Thurstone) be regarded as more prejudiced than one who is an attempt to develop a format for generating scored 5 or less. groups of indicators of a variable that have at least an empirical structure among them. A group of Thurstone scaling is not often used in research judges is given perhaps a hundred items that are today, primarily because of the tremendous expen- thought to be indicators of a given variable. Each diture of energy and time required to have 10 to 15 judge is then asked to estimate how strong an indi- judges score the items. Because the quality of their cator of a variable each item is-by assigning scores judgments would depend on their experience with of perhaps 1 to 13. If the variable were prejlldice, for the variable under consideration, they might need example, the judges would be asked to assign the to be professional researchers. Moreover, the score of 1 to the very weakest indicators of preju- meanings conveyed by the several items indicating dice, the score of 13 to the strongest indicators, and a given variable tend to change over time. Thus, an intermediate scores to those felt to be somewhere item having a given weight at one time might have in between. quite a different weight later on. For a Thurstone scale to be effective, it would have to be periodi- Once the judges have completed this task, the cally updated. researcher examines the scores assigned to each item by all the judges, then determines which Likert Scaling items produced the greatest agreement among the judges. Those items on which the judges disagreed You may sometimes hear people refer to a ques- broadly would be rejected as ambiguous. Among tionnaire item containing response categories such those items producing general agreement in scor- as \"strongly agree,\" \"agree,\" \"disagree,\" and ing, one or more would be selected to represent \"strongly disagree\" as a Liken scale. This is techni- each scale score from 1 to 13. cally a misnomer, although Rensis Likert (pro- nounced \"LICK-ert\") did create this commonly The items selected in this manner might then used question format. be included in a survey questionnaire. Respondents who appeared prejudiced on those items represent- The particular value of this format is the unam- ing a strength of 5 would then be expected to ap- biguous ordinality of response categories. If respon- pear prejudiced on those having lesser strengths, dents were permitted to volunteer or select such and if some of those respondents did not appear answers as \"sort of agree,\" \"pretty much agree,\" prejudiced on the items with a strength of 6, it \"really agree,\" and so forth, the researcher would would be eAlJected that they would also not appear find it impossible to judge the relative strength of prejudiced on those with greater strengths. agreement intended by the various respondents. The Likert format solves this problem. If the Thurstone scale items were adequately developed and scored, the economy and effective- Likert had something more in mind, however. ness of data reduction inherent in the Bogardus so- He created a method by which this question format cial distance scale would appear. A single score could be used to determine the relative intensity of might be assigned to each respondent (the different items. As a simple example, suppose we strength of the hardest item accepted), and that wish to measure prejudice against women. To do score would adequately represent the responses to this, we create a set of 20 statements, each of which several questionnaire items. And as is true of the reflects that prejudice. One of the items might be Bogardus scale, a respondent who scored 6 might \"Women can't drive as well as men.\" Another might be \"Women shouldn't be allowed to vote.\" Thurstone scale A type of composite measure, Likert's scaling technique would demonstrate the constructed in accord with the weights assigned by difference in intensity between these items as well \"judges\" to various indicators of some variables. as pegging the intensity of the other 18 statements. Let's suppose we ask a sample of people to agree or disagree with each of the 20 statements.

Simply giving one point for each of the indicators Scale Construction 171 of prejudice against women would yield the possi- bility of index scores ranging from 0 to 20. A Suppose you're evaluating the effectiveness of a Likert scale goes one step beyond that and calcu- new music-appreciation lecture on subjects' appre- lates the average index score for those agreeing ciation of music. As a part of your study, you want with each of the individual statements. Let's say to play some musical selections and have the sub- that all those who agreed that women are poorer jects report their feelings about them. A good way drivers than men had an average index score of to tap those feelings would be to use a semantic dif- 1.5 (out of a possible 20). Those who agreed that ferential format. women should be denied the right to vote might have an average index score of, say, 19.5- To begin, you must determine the dimensions indicating the greater degree of prejudice reflected along which subjects should judge each selection. in that response Then you need to find two opposite terms, repre- senting the polar extremes along each dimension. As a result of this item analysis, respondents Let's suppose one dimension that interests you is could be rescored to form a scale: 1. 5 points for simply whether subjects enjoyed the piece or not. agreeing that women are poorer drivers, 19.5 Two opposite terms in this case could be \"enjoy- points for saying women shouldn't vote, and points able\" and \"unenjoyable.\" Similarly, you might want for other responses reflecting how those items re- to know whether they regarded the individual se- lated to the initial, simple index. If those who dis- lections as \"complex\" or \"simple,\" \"harmonic\" or agreed with the statement \"I might vote for a \"discordant,\" and so forth. woman for president\" had an average index score of 15, then the scale would give 15 points to people Once you have determined the relevant disagreeing with that statement. dimensions and have found terms to represent the extremes of each, you might prepare a rat- In practice, Likert scaling is seldom used today. ing sheet each subject would complete for each I don't know why; maybe it seems too complex. piece of music Figure 6-5 shows what it might The item format devised by Likert, however, is look like. one of the most commonly used formats in con- temporary questionnaire design. Typically, it is On each line of the rating sheet, the subject now used in the creation of simple indexes. With, would indicate how he or she felt about the piece say, five response categories, scores of 0 to 4 or of music: whether it was enjoyable or unenjoyable, 1 to 5 might be assigned, taking the direction of for example, and whether it was \"somewhat\" that the items into account (for example, assign a way or \"very much\" so. To avoid creating a biased score of 5 to \"strongly agree\" for positive items and to \"strongly disagree\" for negative items). Likert scale A type of composite measure devel- Each respondent would then be aSSigned an oped by Rensis Likert in an attempt to improve the overall score representing the summation of the levels of measurement in social research throuah the scores he or she received for responses to the indi- vidual items. use of standardized response categories in surv~y Semantic Differential questionnaires to determine the relative intensity of different items. Likert items are those using such re- Like the Likert format, the semantic differential sponse categories as strongly agree, agree, disagree, asks respondents to a questionnaire to choose be- and strongly disagree. Such items may be used in tween two oppOsite positions by using qualifiers to the construction of true Likert scales as well as other bridge the distance between the two opposites. types of composite measures. Here's how it works. semantic differential A questionnaire format in which the respondent is asked to rate something in terms of two, opposite adjectives (e.g., rate textbooks as \"boring\" or \"exciting\"), using qualifiers such as \"very,\" \"somewhat,\" \"neither,\" \"somewhat,\" and \"very\" to bridge the distance between the two opposites.

172 Chapter 6: Indexes, Scales, and Typologies Enjoyable 0 0 0 0 0 Unenjoyable Simple 0 0 0 0 0 Complex Discordant 0 0 0 0 0 Harmonic Traditional 0 0 0 0 0 Modern FIGURE 6-5 Semantic Differential: Feelings about Musical Selectionso The semantic differential asks respondents to describe something or someone in terms of opposing adjectives. pattern of responses to such items, it's a good idea construction. You begin by examining the face va- to vary the placement of terms that are likely to be lidity of items available for analysis. Then, you ex- related to each other. Notice, for example, that \"dis- amine the bivariate and perhaps multivariate rela- cordant\" and \"traditional\" are on the left side of the tions among those items. In scale construction, sheet. vvith \"harmonic\" and \"modern\" on the right. however, you also look for relatively \"hard\" and Most likely, those selections scored as \"discordant\" \"easy\" indicators of the variable being examined. would also be scored as \"modern\" as opposed to \"traditionaL\" Earlier, when we talked about attitudes regard- ing a woman's right to have an abortion, we dis- Both the Likert and semantic differential for- cussed several conditions that can affect people's mats have a greater rigor and structure than other opinions: whether the woman is married, whether question formats do. As I indicated earlier, these her health is endangered, and so forth. These dif- formats produce data suitable to both indexing and fering conditions provide an excellent illustration of scaling. Guttman scaling. Guttman Scaling Here are the percentages of the people in the 2000 GSS sample who supported a woman's right Researchers today often use the scale developed by to an abortion, under three different conditions: Louis Guttman. Like Bogardus, Thurstone, and Likert scaling, Guttman scaling is based on the fact Woman's health is seriously endangered 89% that some items under consideration may prove to Pregnant as a result of rape 81 % be more-extreme indicators of the variable than Woman is not married 39% others. Here's an example to illustrate this pattern. The different percentages supporting abortion In the earlier example of measuring scientific under the three conditions suggest something orientation among medical school faculty members, about the different levels of support that each item you'll recall that a simple index was constructed. indicates. For example, if someone supports abor- As it happens, however. the three items included tion when the mother's life is seriously endangered, in the index essentially form a Guttman scale. that's not a very strong indicator of general support for abortion, because almost everyone agreed with The construction of a Guttman scale begins that. Supporting abortion for unmarried women with some of the same steps that initiate index seems a much stronger indicator of support for abortion in general-fewer than half the sample Guttman scale A type of composite measure used took that position. to summarize several discrete observations and to represent some more-general variable. Guttman scaling is based on the idea that any- one who gives a strong indicator of some variable will also give the weaker indicators. In this case, we

Scale Construction 173 TABLE 6-2 99 percent) fit into one of the scale types. The pres- ence of mixed types, however. indicates that the Scaling Support for Choice ofAbortion items do not form a perfect Guttman scale. (It would be extremely rare for such data to form a Womens Result Woman Number Guttman scale perfectly.) Health ofRape Unmarried of eases Recall at this point that one of the chief func- + + + 677 tions of scaling is efficient data reduction. Scales + + provide a technique for presenting data in a sum- + 607 mary form while maintaining as much of the + original information as possible. When the sci- Scale lypes 165 entific orientation items were formed into an index + in our earlier discussion, respondents were given 147 one point for each scientific response they gave. If these same three items were scored as a Guttman Total = 1,596 scale, some respondents would be assigned scale scores that would permit the most accurate repro- 42 duction of their original responses to all three items. + +5 +2 In the present example of attitudes regarding Mixed Types +4 abortion, respondents fitting into the scale types would receive the same scores as would be as- Total = 53 signed in the construction of an index. Persons selecting all three pro-choice responses (+ + +) + = favors woman's right to choose; - = opposes woman's right to choose would still be scored 3, those who selected pro- choice responses to the two easier items and were would assume that anyone who supported abor- opposed on the hardest item (+ + -) would be tion for unmarried women would also support it in scored 2, and so on. For each of the four scale the case of rape or of the woman's health being types we could predict accurately all the actual threatened. Table 6-2 tests this assumption by pre- responses given by all the respondents based on senting the number of respondents who gave each their scores. of the possible response patterns. The mixed types in the table present a problem, The first four response patterns in the table compose what we would call the scale types: those however. The first mixed type (- + -) was scored patterns that form a scalar structure. Following those respondents who supported abortion under I on the index to indicate only one pro-choice re- all three conditions (line I), we see (line 2) that sponse. But. if I were assigned as a scale score, we those with only two pro-choice responses have would predict that the 42 respondents in this group chosen the two easier ones; those with only one had chosen only the easiest item (approving abor- such response (line 3) chose the easiest of the tion when the woman's life was endangered), and three (the woman's health being endangered). we would be making two errors for each such re- And finally, there are some respondents who spondent: thinking their response pattern was opposed abortion in all three circumstances (+ - -) instead of (- + -). Scale scores are as- (line 4). signed, therefore, with the aim of minimizing the errors that would be made in reconstructing the The second part of the table presents those re- original responses. sponse patterns that violate the scalar structure of the items. The most radical departures from the Table 6- 3 illustrates the index and scale scores scalar structure are the last two response patterns: that would be assigned to each of the response pat- those who accepted only the hardest item and terns in our example. Note that one error is made those who rejected only the easiest one. for each respondent in the mixed types. This is the The final column in the table indicates the number of survey respondents who gave each of the response patterns. The great majority (1.596, or

174 Chapter 6: Indexes, Scales, and Typologies TABLE 6-3 in any absolute sense. Virtually all sets of such items approximate a scale. As a general guideline, and Scale Scores however, coefficients of 90 or 95 percent are the commonly used standards. If the observed repro- Response Number Index Scale ducibility exceeds the level you've set, you'll ofCases Scores Scores Scale Errors probably decide to score and use the items as a Scale Types +++ 677 3 3 0 scale. The decision concerning criteria in this regard ++- 607 2 2 0 is, of course, arbitrary. Moreover, a high degree of +-- 165 10 reproducibility does not insure that the scale con- structed in fact measures the concept under consid- 147 0 0 0 eration. What it does is increase confidence that all the component items measure rhe same thing. Also, Mixed Types -+- 42 2 42 you should realize that a high coefficient of repro- ducibility is most likely when few items are +-+ 5 2 3 5 involved. -+ 2 02 One concluding remark with regard to -++ 423 4 Guttman scaling: It's based on the structure ob- served among the actual data under examination. Total Scale errors = 53 This is an important point that is often misunder- stood. It does not make sense to say that a set of number of errors questionnaire items (perhaps developed and used by a previous researcher) constitutes a Guttman Coefficient of reproducibility = 1 - number 0fguesses scale. Rather, we can say only that they form a scale within a given body of data being analyzed. =1 53 = 1- ~ Scalability, then, is a sample-dependent, empirical 1,649 X 3 4,947 matter. Although a set of items may form a Guttman scale among one sample of survey re- = 0.989 = 98.9% spondents, for example, there is no guarantee that this set will form such a scale among another This table presents one common method for scoring mixed types, but you should sample. In this sense, then, a set of questionnaire be advised that other methods are also used. items in and of itself never forms a scale, but a set of empirical observations may. minimum we can hope for in a mixed-type pattern. In the first mixed type, for example, we would er- This concludes our discussion of indexing and roneously predict a pro-choice response to the easi- scaling. Like indexes, scales are composite mea- est item for each of the 42 respondents in this sures of a variable, typically broadening the mean- group, making a total of 42 errors. ing of the variable beyond what might be captured by a single indicator. Both scales and indexes seek The extent to which a set of empirical re- to measure variables at the ordinal level of mea- sponses form a Guttman scale is determined by the surement. Unlike indexes, however, scales take ad- accuracy with which the original responses can be vantage of any intensity structure that may be pres- reconstructed from the scale scores. For each of the ent among the individual indicators. To the extent 1,649 respondents in this example, we'll predict that such an intensity structure is found and the three questionnaire responses, for a total of 4,947 data from the people or other units of analysis predictions. Table 6-3 indicates that we'll make 53 comply vvith the logic of that intensity structure, errors using the scale scores assigned. The percent- we can have confidence that we have created an age of correct predictions is called the coefficient ofre- ordinal measure. producibility.: the percentage of original responses that could be reproduced by knm·ving the scale scores used to summarize them. In the present ex- ample, the coefficient of reproducibility is 4,894/4,947, or 98.9 percent. Except for the case of perfect (100 percent) reproducibility, there is no way of saying that a set of items does or does not form a Guttman scale


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook