Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore Language and Cognition in Bilinguals and Multilinguals_ An Introduction

Language and Cognition in Bilinguals and Multilinguals_ An Introduction

Published by fauliamuthmainah, 2022-04-15 14:30:55

Description: Language and Cognition in Bilinguals and Multilinguals_ An Introduction

Search

Read the Text Version

136 LANGUAGE AND COGNITION IN BILINGUALS AND MULTILINGUALS 1992b, 1993, 1995) I elaborated this idea, focusing colleagues (1999; Table 3.2) with which I started on the logical consequence of such a view: that this section. It was shown there that less-fluent within a bilingual person’s memory different types bilinguals are more slowed down by a distracter of memory representations co-exist. Frequently related in form to a word’s actual translation used words and words that, for various reasons (man–hambre, “hunger,” instead of man–hombre, are relatively easy to learn will reach L1 “man”) than by a meaning-related distracter independence earlier than infrequently used and (man–mujer, “woman”, instead of man–hombre), difficult words. Similarly—and making a connec- whereas the opposite held for more fluent tion with the above developmental model—fluent bilinguals. This suggests a relatively large reliance bilinguals (who have experienced the L2 words on (or sensitivity to) form in the less-fluent relatively frequently) will already have reached bilinguals and a relatively large reliance on mean- this state of L1-independent processing for rela- ing in the more-fluent bilinguals (and was argued tively many L2 words, whereas less-fluent to reflect a development similar to developing L1 bilinguals will still exploit the L1 lexical represen- proficiency; Bach & Underwood, 1970). A second tations relatively often. In terms of the concept piece of supporting evidence is the finding that mediation and word association memory struc- less-fluent bilinguals benefit more from a cognate tures hypothesized earlier, the memory of fluent relation between L1 words and their translations bilinguals will contain relatively many concept in L2 than do more-fluent bilinguals, both when mediation structures whereas the memory of less- L2 words have to be named and when L1 words fluent bilinguals will contain more word associ- are translated in L2 and vice versa (Kroll et al., ation structures. Dufour and Kroll put these same 1998; Kroll, Michael, Tokowicz, & Dufour, 2002). ideas the following way: “. . . individuals do not Both these sources of evidence support the pre- wake up one morning suddenly able to mediate dictions of the model regarding the development their second language conceptually [. . .] direct of L2 fluency. The remaining evidence concerns concept mediation of L2 must be acquired grad- the model’s predictions that ensue from the ually, occurring earlier for more familiar words assumed asymmetries in the strengths of the con- and concepts” (Dufour & Kroll, 1995, p. 175). nections between the lexical and conceptual nodes within a single developmental stage. It is to This account can be extended to apply to the this evidence that we will now turn. results of the Dutch–English–French trilingual study referred to above (De Groot & Hoeks, 1995): Directional effects on translation latency For English, the stronger foreign language of our native Dutch participants, L1-independent pro- The data that first led to postulating the model cessing holds for relatively many words, whereas were obtained in an experiment in which fluent word processing in weaker French stills exploits Dutch–English bilinguals translated words from the corresponding L1 word forms relatively often. L1 Dutch to L2 English and vice versa (Kroll & So, depending on the various stages of develop- Stewart, 1994). Due to the hypothesized strength ment of each of the languages of a multilingual, differences between the various connections in the different proportions of word association and bilingual memory structures (see Figure 3.10), the concept mediation structures are likely to exist for authors assumed that L2 to L1 (“backward”) any of these languages in relation to L1 (and to translation primarily employs the strong direct any of the other languages present). connections from the L2 to the L1 word form (or “lexical”) representations, whereas L1 to L2 Supporting evidence (“forward”) translation primarily exploits the indirect connections through the conceptual rep- Kroll and her colleagues collected various resentation shared by L1 and L2. If true, L2-to- sources of evidence consistent with the revised L1 translation follows a shorter translation route hierarchical model. One of these concerned than L1-to-L2 translation. For this reason, the the translation recognition study by Talamas and

3. LATE FOREIGN VOCABULARY LEARNING 137 authors predicted shorter response times for L2- of the stimulus, whereas production involves its to-L1 translation than for translation in the recall.) In agreement with the second of these reverse direction, a finding that had been reported possible accounts, the vocabulary-learning before in the literature (e.g., Sánchez-Casas, studies discussed before (see also Figure 3.2) con- Davis, & García-Albea, 1992; see Kroll, 1993, and sistently showed higher recall scores and faster Snodgrass, 1993, for reviews). The response time recall for receptive cued recall than for productive data confirmed this prediction (see also Miller & cued recall. These test formats are essentially the Kroll, 2002, and Tokowicz & Kroll, 2007; and see same as L2-to-L1 translation and L1-to-L2 trans- Francis & Gallard, 2005, for a trilingual demon- lation, respectively. The only difference between stration of this asymmetry). the format of the cued recall studies discussed earlier and of the present translation studies is However, in a number of other translation that the knowledge tapped in the former had just studies the two translation directions produced been acquired in the learning episode immediately equally long response times (De Groot, Dan- preceding the test episode, whereas in the transla- nenburg, & Van Hell, 1994, Experiment 1; the tion studies it was acquired in the past. most fluent bilinguals in De Groot & Poot, 1997; La Heij, Hooglander, Kerling, & Van der Velden, The point to make here is that the relatively 1996, Experiment 4; Van Hell & De Groot, long response times when bilinguals translate 1998b) or even shorter response times in L1-to-L2 from their L1 to their L2 rather than vice versa translation (De Groot et al., 1994, Experiment 2; may result from the relatively demanding nature Duyck & Brysbaert, 2004, Experiment 1; La of producing instead of recognizing an L2 word Heij et al., 1996, Experiment 3; the least fluent (see also p. 112). Furthermore, the fact that it is bilinguals in De Groot & Poot, 1997). If indeed harder to articulate words from the weaker L2 direction-dependent differences in translation than from the stronger L1 after they have been time constitute an unequivocal signature of retrieved successfully from memory (see De direction-dependent concept mediation versus Groot, Borgwaldt, Bos, & Van den Eijnden, 2002, word association translation, these latter findings for evidence to support this claim) may also con- especially would provide a challenge to the model. tribute to the longer translation latencies for L1- The former—equally long response times in both to-L2 translation, if this effect occurs at all. In translation directions—could easily be accounted other words, assuming qualitatively different for in at least all those cases where rather fluent translation routes for forward and backward bilinguals were tested, because these may be translation is just one of multiple ways to account expected to map not only L1 word forms but also for directional differences in translation latency L2 word forms straight onto meaning. and the model is clearly in need for truly univocal support. But, plausibly, translation time data do not provide the unambiguous evidence one would Directional effects of meaning variables need to falsify or verify the revised hierarchical model, because there are more reasons why the More compelling support for the revised hier- latency difference between forward and backward archical model comes from findings that suggest translation predicted by Kroll and Stewart might that meaning-related variables affect translation occur. Snodgrass mentions three: “. . . faster from L1 to L2 but not from L2 to L1, or the latter access to a well-known than to a less well-known to a lesser extent than the former. Kroll and language, the difference between recognition and Stewart (1994) provided such evidence by demon- recall, and the well-known asymmetry between strating that either clustering the words to be being able to understand a language and being translated into semantic categories (e.g., clothing, able to speak it” (Snodgrass, 1993, p. 101). (Note body parts, musical instruments) or presenting that the latter two distinctions may in fact be them in random order instead (e.g., a sequence two different ways to phrase one and the same coat, suit, hand, flute, ear, piano, trousers . . .), distinction: Comprehension involves recognition

138 LANGUAGE AND COGNITION IN BILINGUALS AND MULTILINGUALS affected response times when L1 words were Finally, similar support for the model but in a translated into L2 but not when translation was strongly mitigated form emerged from a corre- from L2 to L1: In forward translation, latencies lation study performed in our laboratory (De were longer for words in the clustered lists than Groot et al., 1994). In this study a number of for words in the mixed lists; in backward trans- different semantic variables such as word lation no effect of clustering occurred (but see imageability and word concreteness affected Salamoura & Williams, 1999, who in a Greek– translation in both directions and often to the English study obtained a clustering effect in same degree. However, in a small subset of the backward translation, as manifested by faster analyses the semantic effects were slightly smaller translation of the clustered words). As the name in backward translation than in forward trans- indicates, semantic clustering involves a semantic lation. We regarded these semantic effects as a manipulation. Therefore, the effect of this signature of conceptual processing and, thus, of manipulation in L1-to-L2 translation but not in translation via conceptual memory. Accordingly, L2-to-L1 translation suggests the activation and their presence in both translation directions led us exploitation of meaning representations in the to conclude that conceptual memory is involved former but not in the latter translation direction. in both forward and backward translation. In other words, the data refuted a strong version of Sholl, Sankaranarayanan, and Kroll (1995) the revised hierarchical model, which would claim collected converging evidence using a transfer that conceptual memory is never implicated paradigm in which the effect of earlier picture in backward translation. But the fact that in a naming on subsequent translation in English– subset of the analyses the semantic effects were Spanish bilinguals was determined. A number of somewhat smaller in backward translation the words to be translated had been presented as suggested that under some circumstances con- pictures and named just before the translation ceptual memory is involved less in backward session started. Picture naming requires the than in forward translation. These findings thus retrieval of the concept associated with the support a weaker version of the model. depicted entity (see Chapter 5). Therefore, if word- translation times were affected by prior naming Directional effects of semantic priming and of the corresponding pictures, this would suggest translation priming that conceptual access had occurred during the translation process: The activation of a con- Additional support for (a weaker version of) the ceptual representation ensuing from concept model comes from the cross-language semantic retrieval on the critical picture-naming trial has priming studies introduced before (pp. 92–93). In not yet decayed completely the moment the cor- priming studies a target stimulus to be responded responding word is presented for translation and to overtly is preceded by another stimulus, the the residual activation affects translation time “prime” that shares or does not share some (see, e.g., Durgunog˘lu & Roediger, 1987, for a relation with the target stimulus, and the effect of theoretical underpinning of this conclusion). As the earlier prime on target processing is deter- compared to words not presented as pictures mined. In semantic priming studies prime and before, words that had occurred as pictures in the target on the critical trials share a semantic rela- picture-naming task were translated reliably tion to one another (e.g., prime: ice; target: snow) faster in L1-to-L2 translation. In contrast, earlier and target processing on these trials is compared picture naming had no effect on L2-to-L1 trans- to the processing of targets preceded by unrelated lation. This direction-dependent transfer from words (prime: ace; target: snow) or by some mean- earlier picture naming or, more specifically, from ingless sequence of symbols (prime: ####; target: earlier concept retrieval, to word translation, snow). In monolingual studies of this type the suggests that L1-to-L2 translation, but not L2-to- primes and targets are words from one and the L1 translation, involves the activation of con- same language. In bilingual semantic priming ceptual representations.

3. LATE FOREIGN VOCABULARY LEARNING 139 studies, bilinguals serve as participants and the Two accounts of semantic primes and targets may be taken from one and priming from German Apfel the same language (in the “within-language” to English banana: (a) in condition) or the language of primes and targets terms of localist conceptual differs (in the “between-language” condition). memory nodes shared between German and English The critical finding in these studies is the translation pairs and a semantic priming effect, that is, the difference in connection between the two response times (and error rates) to targets pre- shared nodes Apfel/apple ceded by semantically related primes on the one and Banane/banana; (b) in hand and unrelated primes on the other hand. terms of distributed meaning Assuming meaning representations of the localist representations in which type (see p. 132), these priming effects are often German Apfel and English attributed to a process of activation spreading banana share a set of along the connections that exist in memory conceptual nodes. Adapted between the representations of related primes and from De Groot (1992a). targets. This process facilitates the access of the target words’ representations through their pre- Within-language semantic priming reliably activation by the primes. Importantly, the locus occurs, but between-language semantic priming within the memory system where these effects has also been demonstrated many times (see come about is conceptual memory; that is, the Altarriba & Basnight-Brown, 2007, and place where word meanings, not their forms, Basnight-Brown & Altarriba, 2007, for recent are stored. By implication, if semantic priming reviews). These (as well as the within-language occurs, one may conclude that semantic access of effects) can be explained by assuming shared the presented materials has taken place. meaning representations for translation pairs, a state of affairs that is illustrated in Figure 3.11a for a German–English bilingual: Given the German prime Apfel and the English target banana, the effect results from spreading acti- vation between the shared conceptual node for German Apfel and English apple on the one hand and the shared conceptual node for German Banane and English banana on the other hand. An alternative account of within- and between- language semantic priming effects (De Groot, 1992a), shown in Figure 3.11b, assumes dis- tributed meaning representations (see p. 133). According to this account, a semantic priming effect results from the fact that the prime’s repre- sentation in lexical memory directly activates the elementary meaning nodes it shares with the

140 LANGUAGE AND COGNITION IN BILINGUALS AND MULTILINGUALS target’s representation in lexical memory. In this than from L2 form representations and con- set-up, no connections between nodes within ceptual access from an L2 representation may conceptual memory are required. In other words, often fail. Kroll and Sholl (1992) argued that the the very moment the prime gains access to its (dis- relatively large effects of semantic priming when tributed) meaning representation in conceptual the primes are in L1 and the targets in L2 result memory, the meaning of the target is also par- from this privileged access to conceptual memory tially activated, causing the target to be processed from L1 form representations: Because con- relatively quickly when it is subsequently pre- ceptual memory is the locus of the effect, priming sented. As illustrated in Figure 3.11b, this will only occur if access to conceptual memory account can also explain both the within- and has indeed taken place. between-language priming effects (the prime– target pair Apfel and Banane and the prime– A strong version of the model would predict target pair Apfel and banana share the same three null effects of semantic priming when the primes nodes in conceptual memory). are in L2 and the targets in L1. Yet such priming effects do occur, although they are generally As observed by Kroll and Sholl (1992), the smaller than from L1 to L2. The combined results semantic priming effects that occur in cross- obtained with short SOAs thus support a weaker language studies of this type are often asym- version of the model, which holds that L2 words metrical: They are generally larger when the may also access conceptual memory directly primes are in (the stronger) L1 and the targets but do so less often (or less often quickly enough; in (the weaker) L2 than vice versa, especially see below) than L1 words. What has yet to be when the duration between the onset of prime explained is why with longer SOAs symmetrical and target (the “stimulus onset asynchrony” or priming effects have sometimes been obtained. SOA) is relatively short. This result has been Kroll and Sholl (1992) suggested that a long obtained both when the participants’ bilingual- interval between prime and target allows an ism involved languages that employ the same L2 prime sufficient time to access conceptual script (e.g., English–Spanish; Schwanenflugel & memory indirectly, via the L1 word form repre- Rey, 1986) and when the languages concerned sentation. An alternative account of the com- employ different scripts (Chinese–English; Chen bined data is to assume that L2 words also & Ng, 1989). In contrast, with relatively long contact conceptual memory directly in all cases SOAs the effect has at least twice been shown to but that, due to the relatively weak links, the be equally large with primes in L2 and targets in access process is too slow so that by the time the L1 as with primes in L1 and targets in L2 (Frenck target is presented it has not yet been successfully & Pynte, 1987; Kirsner, Smith, Lockhart, King, completed. & Jain, 1984). Recently, Basnight-Brown and Altarriba (2007) nuanced these findings by Directional effects have also been observed for showing that the mentioned asymmetries hold a second type of prime–target relation; namely, when the L1 and L2 are the bilingual’s dominant when the target is the translation of the prime and non-dominant languages, respectively, but word. As compared to targets that follow an that they reverse when the L2 has become the unrelated prime, translation targets are often stronger language, as often occurs over time (= L2 responded to more quickly. In fact, these trans- experience) in immigration settings. lation priming effects are generally larger than semantic priming effects (see, e.g., Basnight- As mentioned, the revised hierarchical model Brown & Altarriba, 1997; De Groot & Nas, assumes that the connections between L1 word 1991). Again, the effects are more robust when the form memory and conceptual memory are primes are in (stronger) L1 and the targets in stronger than those between L2 word form (weaker) L2 than vice versa, and especially when memory and conceptual memory (Figure 3.10). the primes are masked in such a way that they As a consequence, accessing conceptual repre- cannot be consciously perceived, weak effects or sentations from L1 form representations is easier null effects have been reported from L2 primes

3. LATE FOREIGN VOCABULARY LEARNING 141 to L1 targets (e.g., Jiang, 1999; Jiang & Forster, the Stroop task (see p. 223). La Heij et al. (1990) 2001). examined the effect of L1 (Dutch) distracter words on L2 (English) to L1 translation. Shortly A version of the revised hierarchical model after its presentation, the English word to be which would stress the role of the direct con- translated (e.g., spoon) was replaced by a Dutch nections between the translation pairs’ lexical word (the distracter) semantically related (vork, representations in the translation process must be “fork”) or unrelated (geit, “goat”) to the correct rejected on the basis of these data. The reason is response word (lepel, “spoon”). Translation times that it would predict especially strong translation turned out to be longer in the related condition priming effects from L2 to L1 as a consequence of than in the unrelated condition, a finding that the relatively strong lexical connections in this would not have materialized had translation direction (see Figure 3.10). However, a version of exclusively exploited the lexical route to the the model that focuses on the differential strength response. Because the study did not include an of the links between the lexical and conceptual L1-to-L2 translation condition it is impossible to representations can account for these asym- tell whether a translation asymmetry would have metries. It can do so in the same way as it materialized; that is, a larger relatedness effect in accounts for the asymmetrical semantic priming L1-to-L2 translation. Such a finding would have effects: They emerge from the privileged access to supported a weaker version of the model. A conceptual memory from L1 primes. Finkbeiner follow-up study using a similar rationale but et al. (2004) suggested an alternative explanation with non-verbal semantically related or unrelated of asymmetrical (semantic and translation) stimuli as contextual distracters (e.g., pictures priming effects in terms of their sense-model of depicting a concept related or unrelated to the conceptual representation (see pp. 133–134 and word to be translated) tested both directions of Figure 3.9). They hypothesized that the magnitude translation (La Heij et al., 1996). The results con- of the priming effect depends on the proportion tradicted the predictions of the model in both its of the target word’s senses that are activated by a strong and weak form: Across four experiments, prime. If, for instance, a bilingual has six senses the relatedness effect was either equally large in for an L1 word but only one for its translation in both translation directions, or it was larger when L2 (see Figure 3.9b), an L1 prime will activate all translating from L2 to L1. The model predicts senses (one) of its L2 equivalent. If instead its L2 the opposite, smaller relatedness effects in L2-to- translation serves as the prime it will only activate L1 translation. one sixth of the semantic representation of the L1 word. This asymmetry, the authors argued, In a word translation study by a former underlies the asymmetry in the priming effects. student and myself (De Groot & Poot, 1997) we obtained conceptually similar results, as did Counterevidence Duyck and Brysbaert (2004, 2008) in two number translation studies. Rik Poot and I looked at word The previous section presented evidence in translation performance of Dutch native speakers support of the revised hierarchical model, as well with different levels of proficiency in English as some apparent counterevidence that could L2. Concreteness effects on response times, error be reconciled with the model by making some scores, and omission scores were obtained for additional assumptions and accepting a weaker both translation directions: Concrete words were version of the model. However, the results of a translated faster, more often, and more often series of further translation studies truly correctly, than abstract words. These effects challenge the model and, particularly, the suggest semantic processing and, therefore, the model’s claim that L1 and L2 processing employ involvement of conceptual memory, in both qualitatively different processing. translation directions. As with the analogous effects in La Heij et al.’s study (1996), these effects An early study that challenged the model, at were either equally large in both translation any rate its strong version, employed a variant of

142 LANGUAGE AND COGNITION IN BILINGUALS AND MULTILINGUALS directions or larger in backward translation. Per- the revised hierarchical model are the foreign haps most challenging for the revised hierarchical vocabulary acquisition studies discussed earlier model was the fact that the concreteness effects (pp. 108–110). These studies showed that from the tended to be largest in backward translation by earliest stages of learning, attaching a new label participants at the earliest stage of L2 develop- to a concrete concept is easier than attaching one ment, the group predicted by the model to rely on to an abstract concept (see Figure 3.4). This was word-association translation most. demonstrated by the learners’ performance on productive and receptive cued recall tasks, which, In a Dutch–French study, Duyck and Brysbaert as we have seen, are identical in format to the (2004) wondered whether the so-called “number L1-to-L2 and L2-to-L1 translation tasks dis- magnitude” effect might occur in the translation cussed here. This finding thus suggests that task, just as it had been shown to occur in a num- already during the very initial stages of foreign ber of other tasks. The number magnitude effect vocabulary acquisition, meaning is activated and is the phenomenon that magnitude information exploited in the process. A similar result was is activated more rapidly for small numbers (e.g., obtained in Duyck and Brysbaert’s (2004) study two) than for larger numbers (e.g., eight). Because just discussed: These researchers also obtained a number’s magnitude can be considered the the number magnitude effect, in both translation core component of its meaning, whenever such an directions, when participants did not translate effect occurs it points at the involvement of con- between two languages they already knew prior ceptual memory during task performance. In the to the experiment, but between their L1 and a present context the critical questions thus are made-up language in which they had learned the whether this effect occurs for both translations number names only just before the translation directions (Dutch to French and French to task was administered. Dutch), and whether or not translation direction affects the size of the effect, if it occurs at all. All An alternative view four experiments showed a number magnitude effect in both L1 (Dutch) to L2 (French) trans- In conclusion, there is a quite substantial body lation and in L2 to L1 translation: In both of evidence to suggest that both translation translation directions it took longer to translate directions involve conceptual processing and that number words representing large quantities (acht this holds true for learners at all stages of L2 or huit, “eight”) than number words representing development. As an alternative to the revised small quantities (twee or deux, “two”). Further- hierarchical model, La Heij et al. (1996; see also more, generally the size of the effect was not La Heij et al., 1990) therefore proposed a rather influenced by translation direction, suggesting parsimonious view of how translation comes that conceptual mediation is implicated to the about—one that does not assume qualitatively same extent in both translation directions. A different translation routes for forward and more recent Dutch–English–German trilingual backward translation, nor for L2 learners at dif- extension of this study (Duyck & Brysbaert, ferent proficiency levels (see Snodgrass, 1993, 2008) did show an effect of translation direction, for a similar view). It is important to stress that but contrary to the predictions of the revised this view embraces two of the central tenets of the hierarchical model, number magnitude effects revised hierarchical model; namely, (1) that con- were observed in backward translation, from cept activation is easier when the stimulus is an L1 both the L2 and the L3 into L1, but not in word than when an L2 word serves as stimulus forward translation. These results are reminiscent (assuming L1 is the stronger language), and (2) of those of La Heij et al. (1996) and De Groot that an activated concept has easier access to and Poot (1997; see above), who also showed the corresponding L1 word than to the corre- larger semantic effects in backward translation. sponding L2 word. Both these assumptions derive from the differential strengths of the connections A final set of studies that produced results that are difficult to reconcile with the predictions of

3. LATE FOREIGN VOCABULARY LEARNING 143 between L1 word forms and conceptual memory fluency (concept activation) and production on the one hand and L2 word forms and fluency (word retrieval) between the three lan- conceptual memory on the other hand (see guages could explain all direction effects observed Figure 3.10), a difference that reflects differential in this study. The point that all these authors past use of L1 and L2. The authors made no made is that there do not appear to be qualitative assumptions regarding the strength of the links differences between translating from the strong between the word form representations of a trans- native language to a weaker second or third lan- lation pair, nor is it relevant for their argument guage and translating in the opposite direction. whether or not such links exist at all. Instead they concluded that all translation is likely to be conceptually mediated. This con- La Heij and colleagues decomposed the trans- clusion, in fact, echoes the one that Potter et al. lation process into two main components, one in (1984) advanced in the very first study to test the which the meaning of the presented word is word association and concept mediation views of determined (“concept activation”) and a second bilingual memory that underlie much of the more in which the response word is retrieved on the recent work. basis of the activated conceptual information (“word retrieval”). If a difference in dominance Conclusions between L1 and L2 exists, L1 being the stronger language, the relative ease of these two com- As mentioned, the view advanced above that ponent processes is likely to differ in the two translating words always involves the access of translation directions: In L1 to L2 translation, conceptual representations is consistent with two Step 1, concept activation, is easy because it tenets of the revised hierarchical model, despite exploits a frequently trodden memory path (in the fact that the model holds a crucially different L1 comprehension). The potentially problematic view on word translation. The shared assump- part of the translation process is Step 2, word tions are that, in comprehension, concept acti- retrieval, because it exploits a path that has been vation is easier for words in a strong language taken less often (in L2 production). The situation than for words in a weaker language and that, in is reversed in L2 to L1 translation, where concept production, word retrieval is relatively easy in the activation has been practiced less (in L2 com- stronger language. In addition, the translation prehension) than has word retrieval (in L1 recognition study by Talamas and colleagues production). In other words, depending on the (1999) with which I started this section had direction of translation, either concept activation provided evidence that during foreign language or word retrieval is the time-consuming and learning there is a transition from reliance on vulnerable translation component. Of course, form to a focus on meaning, a conclusion that with balanced bilingualism, equally strong links tied in nicely with a similar developmental path are likely to exist between L1 and L2 word form hypothesized by researchers working in other representations and conceptual memory, and no research areas (pp. 126–128). The fact that learn- direction-dependent asymmetries in the data are ers at lower levels of L2 proficiency rely more on expected to occur. cognate relations between L1 and L2 than do learners at more advanced levels (Kroll et al., Both La Heij and his colleagues (1996) and 2002) provided additional support for this con- De Groot and Poot (1997) argued that their data tention. Apparently, a number of the model’s could be accounted for in terms of this two-step main assumptions go unchallenged. view of the translation process. More recently, Francis and Gallard (2005) showed that their The model’s aspect that obviously does not trilingual translation data could also be explained stand scrutiny is the assumption that backward this way. These authors had English–Spanish– translation occurs through tracing the link French participants translate in all six of the trans- between the form representations of the words lation directions enabled by their trilingualism in a translation pair (the “word association” and showed that differences in comprehension

144 LANGUAGE AND COGNITION IN BILINGUALS AND MULTILINGUALS connection), bypassing conceptual memory. As “synforms” discussed earlier; p. 123), there is an a consequence, it also does not seem to be awful lot more to do before he or she can enjoy opportune to somehow attribute the relatively the pleasures associated with the fluent use of strong form reliance in the earliest stage of a foreign vocabulary. For one thing, the size of the foreign language learning to the use of such links, vocabulary (its “breadth”) acquired by means of as seems to be done. But why not simply regard one or more of the direct methods presented so this early form reliance as a phenomenon of far, is likely to be too small for the learner not to interest in its own right, without trying to see it as run into the occasional deadlock when getting evidence for the use of links directly connecting immersed in natural communication settings, in the new foreign word to its L1 translation? Given reading and speech. As pointed out earlier (p. 90), the fact that early on in the learning process (of instruction time in the foreign language class- late learners) it is the new word’s form, not its room is simply too limited to train more than just meaning, that is the unknown element to be a basic foreign language vocabulary. Further- acquired, it is obvious that the learner at this stage more, the newly acquired vocabulary has no is especially attentive to form. A corollary of depth to speak of yet. As stated earlier, the new paying an inordinate amount of attention to the words’ meanings are those of the corresponding unknown form is that meaning analysis is likely words in the native language. Yet translation to be neglected. This is plausibly the reason why “equivalents” seldom share all aspects of their during initial stages of learning the learner is meaning so that adopting the L1 word’s meaning relatively insensitive to word meaning. In other as it is inevitably leads to a strong semantic words, it may be because the initial L2 learner has “accent”: The meaning aspects specific to the L1 insufficient mental resources to attend to both word would be implied when using its L2 equiva- the new word’s form and its meaning that form lent and, conversely, the meaning aspects specific analysis is privileged. to the L2 member of the translation pair would be missed out altogether. To become a proficient VOCABULARY ACQUISITION IN L2 user, the learner’s L2 vocabulary has to CONTEXT become independent of the L1 vocabulary (it must become “autonomous”) and the learner has Introduction to acquire the L2-specific meaning components; we have labeled these two processes “freeing” and The previous sections all dealt with the very “fine tuning” before (De Groot & Van Hell, initial stages of foreign vocabulary acquisition 2005). Gaining the required level of depth and with the structure of the emerging memory involves the learning of the word’s intensional representations. Second language words learned and extensional meanings (Henriksen, 1999), by means of the paired associate and keyword processes that other authors have referred to as methods are just new labels assigned to concepts “network building” or “word web” formation that already existed in memory prior to the learn- and “packaging”, respectively (Aitchison, 1987). ing episode; namely, the concepts associated with (Recall that an L2 word’s intensional meaning the new forms’ L1 translations. Similarly, the concerns the sense relations between this word bilingual memory structures discussed so far all and other words in the L2 vocabulary, such as its assume that the form of a new L2 word inherits antonyms, synonyms, and hyponyms. Its exten- the meaning of its L1 translation. When this stage sional meaning involves its referential meaning; is successfully completed and, furthermore, the that is, knowledge concerning the entities or learner has managed to distinguish this specific events in the external world to which it refers.) form, associated with this specific meaning, from similar forms with a different meaning (the A further thing to do is to strengthen the links between the words within the word web so that upon accessing one of them, those that it is con- nected with rapidly become available. Similarly,

3. LATE FOREIGN VOCABULARY LEARNING 145 the lexical access process must speed up so that acquired a basic vocabulary in the classroom (or ultimately it can come about rapidly, effortlessly, outside of it) through direct vocabulary-focused and automatically. The moment this state is activities. It has therefore been concluded that reached, the spared mental resources can be most vocabulary is learned from context, dedicated to the components of comprehension especially from extensive reading in the target or production that will always be slow and language (e.g., Krashen, 1989, 1993). How else effortful, that cannot be automated. One source can it be that foreign language learners, upon of evidence that these developments take place entering the university, already possess a receptive comes from a number of studies that employed vocabulary of about 11,000 words (as assessed the semantic priming methodology (see pp. 92–93 by Hazenberg & Hulstijn, 1996)? This same and 138–140). For instance, Frenck-Mestre and view has been advanced regarding vocabulary Prince (1997) found that the within-L2 priming acquisition in L1. Educated native speakers of L1 patterns that emerged for proficient non-native may ultimately master, according to conservative speakers were highly similar to those of native estimates, between about 20,000 and 30,000 speakers, showing priming for various types of lexical items (Goulden, Nation, & Read, 1990; lexical relations. In a group of intermediate non- Nation, 1990), and even estimates of 100,000 native speakers smaller priming effects were lexical items occur (Sternberg, 1987). These observed, suggesting weaker links between numbers cannot possibly be covered with direct associated words in their L2 lexicon. A further vocabulary instruction (including the “instruc- experiment demonstrated that in the native and tion” provided by parents, other caregivers, and proficient non-native speakers both meanings of peers). a lexically ambiguous prime word were activated, whereas in the intermediate non-native speakers This view of massive vocabulary acquisition only the dominant meaning was activated. through immersion, especially through reading, is often thought to imply the idea that vocabulary Similarly, Favreau and Segalowitz (1983) acquisition comes about incidentally, without the found that bilinguals who read in their L1 and L2 reader deliberately trying to commit individual equally quickly (suggesting balanced bilingual- words to memory. The idea is that even though a ism) exhibited the same pattern of semantic reader might read merely for recreation, specific priming effects in both languages. The details of word knowledge is acquired as a byproduct. the observed patterns suggested that processing Furthermore, some authors seem to assume was highly automated in both languages. In con- these views imply that the learning of specific trast, bilinguals who read more slowly in their vocabulary in context is more effective than L2 than in their L1 (suggesting unbalanced bi- through direct methods that focus explicitly on lingualism) showed different patterns of priming vocabulary. This inference may however be in their two languages, with the pattern emerging unwarranted, as is suggested by the following for the L2 suggesting a lower level of auto- quote (Sternberg, 1987, p. 89; italics added): maticity. As argued by Segalowitz and Segalowitz (1993) and Segalowitz, Segalowitz, and Wood Most vocabulary is learned from context (1998), developing automaticity of processing is [. . .]. What the claim does imply is that not merely a matter of speeding up the various teaching people to learn better from context subcomponents of a task, in their case reading. can be a highly effective way of enhancing Instead the development involves qualitative vocabulary development. What the claim changes such as the elimination of slow task does not imply is that teaching specific components that require conscious control. vocabulary using context is the most effec- tive, or even a relatively effective, way of So all of this additional learning and teaching that vocabulary. Unfortunately, restructuring, as well as gaining fluency in many believers in learning from context, as exploiting the acquired knowledge, still has to well as their detractors, have drawn the take place after the foreign language learner has

146 LANGUAGE AND COGNITION IN BILINGUALS AND MULTILINGUALS second interference rather than the first. As a were presented: just once or three times. Note result, they are on the verge of throwing out that the control condition imitates a real-life a perfectly clean and healthy baby with its, incidental-learning situation most faithfully admittedly, less than sparkling bath water. (although the bold face in which the target words were printed is rather unnatural, explicitly direct- In the next section I will discuss a number of stud- ing participants’ attention to the target words ies that together have compared the efficacy of and plausibly giving rise to intentional learning foreign vocabulary learning through reading texts contrary to the authors’ purpose). with the efficacy of more direct vocabulary- learning methods and that cover both intentional Retention was tested in several ways, including and incidental learning instructions. To antici- a receptive test in which the participants were pate, their joint results indicate that intentional shown the target words among a set of words that learning is more effective than incidental vocabu- had not appeared in the story, and had to indicate lary learning and that, as already suggested by for each of these words whether it had appeared Sternberg, specific vocabulary is more effectively in the text and if so, what its meaning was (by learned by means of direct methods than through writing down its Dutch translation equivalent). reading texts. Still, the latter has an important The left-hand part of Figure 3.12 shows the role to play. results of this test following the story version that included the critical words three times. The Evidence right-hand part shows the analogous results of a second test. In this test the target words were Several studies suggest that reading texts simply not presented in isolation but as part of a text for pleasure is not a very efficient way to learn fragment consisting of a few lines of the original specific foreign vocabulary. Hulstijn et al. (1996) text. In both cases the maximum score per con- had advanced learners of French read a French dition would have been eight. story consisting of 1306 words. The participants were told in advance they would have to answer As shown, the condition that copied incidental comprehension questions after reading the learning under natural circumstances most story. This was done to promote incidental (and faithfully (the control condition) produced rather discourage intentional) vocabulary learning: poor results. In the out-of-context test condition, “Students’ attention was turned away from even after three occurrences of each of the critical particular unknown words and directed towards an words (printed in bold face) the meaning of understanding of the text as a whole” (Hulstijn less than one could be provided on average. et al., 1996, p. 331, the authors’ italics). The text Surprisingly, when context was added as an add- contained eight target words unknown to the par- itional retrieval cue at test, performance remained ticipants, printed in bold face. It was the learning low at slightly over one correct. The participants of this specific vocabulary the researchers set out in the dictionary condition fared only slightly to test, in three conditions: In a control condition better, a finding the authors explained in terms of the participants were instructed to read the text their observation that the participants had con- and prepare to answer comprehension questions. sulted the dictionary only occasionally (basically In a “marginal glosses” condition, the L1 (Dutch) behaving as the participants in the control con- translations of the targeted unfamiliar words dition). The provision of glosses improved per- were given in the text’s margin. Finally, in a “dic- formance noticeably. tionary” condition the participants were free to use a dictionary whenever they felt like it. Two All in all, these data suggest that incidental versions of the story were presented, to different learning in context does lead to some learning of participant groups. They differed from one words repeatedly presented in the text, but the another in the number of times the target words learning outcome can by no means be called impressive. However, a related study by Rott (1999) showed that increasing the presentation frequency of the target vocabulary to six

3. LATE FOREIGN VOCABULARY LEARNING 147 Retention scores as a function of reading condition and testing method. Data from Hulstijn et al. (1996). presentations per word improved the learning participants would focus their attention on the scores considerably (to about 8 and 5 out of 12 in meaning of the text as a whole, not on individual receptive and productive testing, respectively), words. The text contained 10 target words that, although much of this gain was lost again 4 weeks following the reading phase, were unexpectedly after training. The results of Hulstijn and his col- tested for recall in a cued recall task. leagues furthermore suggest that a combination of reading and techniques that draw the learners’ In the word-focused conditions the same 10 attention to specific vocabulary is more effective English words served as targets. In one of these than mere reading. conditions the participants were told the mean- ings of the words and were asked to create a A Hebrew–English/Arabic–English study by sentence around each of them. In the second they Laufer (2003b) strengthens the conclusion that were presented with the words and their meanings just reading a text is a relatively ineffective way and asked to write a composition incorporating to learn specific vocabulary. In three experiments all 10 of them. In the third the researcher pre- she compared recall in reading conditions similar sented the participants with incomplete sentences to those of Hulstijn et al. (1996) with recall and asked them to complete these partial following one of three word-focused tasks that sentences with the target words after looking up did not involve text reading. In the reading con- their meaning. Importantly, no reading context dition of each experiment, Hebrew or Arabic was provided in any of these three conditions and learners of English read an English short story the participants’ attention would thus be focused containing 10 target words. The stories were on the meanings of the target words. The results either glossed in Hebrew (the two Hebrew– were straightforward: In all three experiments English studies), or bilingual dictionaries could recall performance was considerably better be consulted during reading (the one Arabic– following a word-focused task than following text English study). The participants in both these reading. However, the reading condition also led reading conditions were led to believe that they to some learning, suggesting (as did the study would be tested for comprehension afterwards. by Hulstijn and his colleagues) that incidental It is likely that under these circumstances the learning through reading does occur (see Rott,

148 LANGUAGE AND COGNITION IN BILINGUALS AND MULTILINGUALS 1999, for a review of studies providing additional these conditions the participants were not asked evidence). to memorize the words. In contrast, the meaning- given and inferring + verifying + memorizing The combined results of these studies suggest conditions both involved intentional learning. that merely reading for comprehension leads to If the latter two methods were to turn out to be some growth in vocabulary, but that combining equally effective, it would be tempting to con- reading with a vocabulary-focused activity is clude that the meaning-given method is to be more effective (Hulstijn et al., 1996). A similar preferred because it is the less time-consuming finding but with the texts presented in auditory method of the two. Note that not only the form in a multimedia environment was more meaning-given method, but also the two context recently obtained by Jones (2004). Furthermore, conditions that involve a verifying step are, in a Laufer’s study suggests that even replacing way, meaning-given methods. After all, the mean- reading by a vocabulary-focused activity is more ing is provided in the verification step. Retention effective than merely reading for comprehension. in all conditions was tested in a receptive cued recall task. Figure 3.13 shows the results of this According to a definition of incidental vocabu- study. lary learning as involving all situations in which the participants are not explicitly asked to com- A first thing to note is that the instruction to mit vocabulary to memory, all conditions in both simply infer the targeted French word’s meaning Hulstijn and collaborators’ study and in the led to poorer recall than the condition in which Laufer study (including Laufer’s word-focused the inferred vocabulary was checked for correct- activities) concerned incidental learning con- ness. This finding confirms the more general ditions. In a Dutch–French study, in which observation that language learners often make the isolated sentences instead of complete texts wrong guesses on the basis of context (see served as context for the target vocabulary, Frantzen, 2003, and Huckin & Coady, 1999, for Mondria (2003) posed the question of what the an overview of factors that determine guessing contribution of an intentional learning instruc- accuracy). The instruction to memorize the tion to learning words in context might be. A inferred and verified word improved performance further goal was to find out whether a “meaning- substantially, suggesting a large effect of an inten- given” method might be equally effective as a tional learning set on contextual learning. But “meaning-inferred” method. most strikingly, the simple meaning-given method produced equally good results as the much more In a meaning-given condition the learners were complex inferring + verifying + memorizing simply provided with the meanings of a set of method. Accordingly, after establishing that, as unknown French words in the form of their expected, the former method indeed took rela- Dutch translations and were instructed to tively little time to apply, the author concluded memorize them. In one of three meaning-inferred the meaning-given method to be the preferred conditions the learners first had to infer the method. A further conclusion was that an explicit words’ meaning from context, to subsequently instruction to memorize target vocabulary boosts verify whether the inferred meaning was the word learning in context dramatically. targeted meaning, and, finally, to commit the correctly inferred meaning to memory. In the Evaluation and conclusions remaining two meaning-inferred conditions one or two steps from this procedure were skipped: In The studies discussed above focused on the learn- the “inferring” condition the participants were ing of a specific set of words and all of them simply asked to infer the meaning of the target showed that incidental learning from context in word from context. In the “inferring + verifying” its purest form (no glossing; no dictionary look- condition they inferred the meaning and then up) led to disappointingly low learning scores. checked whether their inference had been correct. There is, however, a further gain of incidental The inferring and inferring + verifying conditions both concerned incidental learning because in

3. LATE FOREIGN VOCABULARY LEARNING 149 vocabulary learning through reading that is Retention scores as a ignored in many of the pertinent studies, function of learning condition. including those discussed above: If at least some Data from Mondria (2003). learning of the vocabulary selected for testing has taken place, it is likely that some learning has In addition to the increments in knowledge also occurred for all the other words in the text. regarding the previously unknown words, the Imagine the situation where a reader reads a 500- memory representations of all previously known word text that includes 10 unknown target words, words may have undergone some change as a each occurring once, and is subsequently tested result of reading the text: They may have become on these 10 words in a receptive cued recall test. more strongly established in memory so that on If the meaning of one of them can successfully be subsequent encounters they can be retrieved more delivered during testing (e.g., in the form of its L1 quickly. They may also have become enriched translation), on average 1 out of every 10 of the with some new content; for instance, a previously remaining previously unknown (non-target) unknown aspect of meaning or some new words in the text may be expected to also have information on the linguistic environment in been recalled successfully had they actually been which the word can occur. In other words, some tested. Furthermore, it is plausible that some learning may have occurred for all of the words in learning of the selected 10 target words that were the 500-word text, at least for all of those not successfully recalled at test has nevertheless attended to (see, e.g., Huckin & Coady, 1999, for a occurred in the form of, for instance, some aspect discussion of the role of attention to individual of the word’s form or a hunch of its meaning words in learning vocabulary). and perhaps some strengthening of the connec- tion between form and meaning. However, the To summarize, many studies that have tried to accrual that has taken place might have been too establish the efficacy of vocabulary acquisition small to be detected in the cued recall test format. through reading seem to have ignored the fact Similarly, all remaining unknown words in the that vocabulary acquisition proceeds incre- text, those not selected for testing, may have mentally, piecemeal, rather than instantaneously left some trace of new knowledge in memory. and completely from a single exposure (see also This process of gradual acquisition was called Bogaards, 2001, for an experiment that substanti- “incremental” learning before. ates this claim). Furthermore, these studies have ignored the fact that not all tests are sufficiently sensitive to detect small gains in knowledge. As a consequence, the amount of vocabulary know- ledge foreign language learners pick up from mere reading has arguably been highly underestimated.

150 LANGUAGE AND COGNITION IN BILINGUALS AND MULTILINGUALS However, this qualification of the efficacy of will be more common—the more common the reading text to enhance foreign vocabulary younger the child. In these cases the provision knowledge should not be taken as a plea for the of contextual information will obviously be dismissal of the direct methods, such as paired- conducive to learning. associate learning, the keyword method, or any other vocabulary-focused activity such as those But of course, even when the targeted mean- discussed above. Even Sternberg, the fervent ings are known in advance, in many situations advocate of the view that most vocabulary is this knowledge cannot be exploited for the simple learned from context I quoted at the beginning reason that the learner does not usually carry a of this section, acknowledges their crucial role in bilingual dictionary around, nor is there a teacher foreign language vocabulary acquisition. In so around to help make the connection with the doing he stressed that direct methods exploit the stored knowledge. This truism, combined with fact that foreign language learners who are fluent the fact that in the foreign language classroom the in their native language already master the (larger time available for direct vocabulary acquisition is part of the) meanings of many foreign words to limited, led Sternberg (1987) to his claim that be acquired (namely those of all words they know most vocabulary is learned from context, through in their native language). When the learner is pro- reading. A final point stressed by Sternberg—one vided with an unknown foreign word directly that has often been ignored by others—is that paired with its L1 translation (as is done in context learning in the classroom should not paired-associate learning and the keyword equal having the students read texts extensively method), the meaning of the former will become in class, but to teach them appropriate methods available instantaneously through its translation. of how learners can learn most from context This enables the learner to focus attention on themselves; that is, to teach vocabulary-building the unknown parts of the foreign words to be skills. learned: their forms and the knowledge of what meaning goes with each of them (that is, the link To conclude, the little time available for between form and meaning). Instead, during vocabulary teaching in the foreign language class- vocabulary learning from texts the learners must room can best be spent on a combination of figure out the meanings of the unknown forms direct teaching of a base vocabulary that covers themselves, a process that detracts attention from as large a percentage as possible of the words to learning the forms and that presumably takes a be encountered in naturally occurring foreign number of encounters with each unknown form, language texts and discourse, augmented by the spread out across a number of texts and over an teaching of effective skills of how to build extended period of time. Obviously, this meaning- vocabulary from context. As mentioned in the gathering process cannot be skipped because for a introduction to this chapter, a basic vocabulary of word to serve its referential function its form and the 3000 most frequent word families, covering meaning must be joined. And all the while this about 5000 lexical items, suffices for a learner to laborious meaning discovery process is taking comprehend the essence of many foreign texts place, the relevant meaning is just sitting there in because it covers about 95% of the texts’ words memory! Hence Sternberg’s conclusion: “If one (Laufer, 1992; Nation, 1993). When learners have has definitions available, then learning from reached this state and are furthermore equipped context is not so efficient. Why waste a good with a set of effective vocabulary-building skills, definition?” (Sternberg, 1987, p. 95). Needless to they can venture out successfully on their own on say, the use of direct methods does not work for their path to advanced bilingual literacy. vocabulary not known in the L1. For adult native speakers this may hold for foreign concepts that Finally, what has been ignored fully in this are not lexicalized in L1 and for words that exist section is how vocabulary is acquired from in L1 but are very infrequent. For child learners it immersion in foreign language speech. Unlike in printed text, words are not contained as neat, dis- crete packages of information in the speech signal: Speech breaks often fall within words and

3. LATE FOREIGN VOCABULARY LEARNING 151 are often lacking between words. Yet isolating the words’ stress patterns. Both infants (Saffran, words’ forms from the speech signal is a pre- 2001; Saffran et al., 1996a) and adults (Saffran, requisite for aural vocabulary acquisition. In Newport, & Aslin, 1996b; Schön et al., 2008) addition, unlike printed text the speech signal is use these sources of information successfully in transient and spoken words therefore cannot be isolating words from continuous speech. Possibly lingered on to figure out what they might mean even, the prosodic information contained in (except when they are rehearsed internally, but speech but absent in print compensates for the this happens at the cost of inattention to further lack of physical boundaries in speech, especially input). Because of these facts, vocabulary learn- when prosody correlates with syllabic cues to ing from a speech context may be more effortful segmentation (Schön et al.). In all then, the above than from written text. But fortunately, though conclusions regarding the best mix of direct lacking reliable physical boundaries between vocabulary teaching and context learning words, the speech signal contains other cues to (adapted to the aural modality, for instance by word boundaries; namely statistical information a focus on the aural forms of the words to about the transitional probability of syllables and teach directly) might also apply to the listener’s speech segments and prosodic cues such as the perspective. SUMMARY • Learners reach a sufficient level of comprehension in a foreign language when their vocabulary covers 95% of the words in a text or discourse. A basic vocabulary of the 3000 most frequent word families, equaling 5000 lexical items, suffices to reach this state. • Because instruction time in the foreign language classroom is too limited to teach more than a basic vocabulary through direct means, most vocabulary must be learned from context. Yet to acquire specific vocabulary, direct word-focused methods are more effective than context learning. • Despite being a rather complex procedure for learning foreign vocabulary, the keyword method is effective across many different types of learners, languages, and learning environments. The method is also applicable to other learning materials than foreign vocabulary. • The keyword method appears more effective with receptive testing of foreign vocabulary than with productive testing and it appears more effective for inexperienced foreign language learners than for experienced learners. • After only a few learning trials per item the vocabulary learned by means of the keyword method is more prone to forgetting than the vocabulary learned by means of other methods (rote rehearsal; context learning). With more practice the various methods result in equal amounts of forgetting. • Keywords provided by the experimenter or by the learners’ peers seem more effective than self- generated keywords and pictorial support increases the efficacy of the keyword, especially in young learners. These findings limit the keyword method’s efficacy outside the classroom and laboratory. • Unlike the common imagery version of the keyword method, its verbiage version is arguably as suitable for experienced foreign language learners as are rote rehearsal and uninstructed learning. • Foreign language learning gets easier the more experienced the learner is in learning foreign languages. The likely reason is that the learner exploits knowledge already stored in long-term memory. • Word–word paired-associate learning is applicable to all types of words: concrete as well as abstract, cognates as well as non-cognates. The keyword method is unsuitable for learning abstract foreign words and for learning words that share a cognate relation with their L1 equivalent.

152 LANGUAGE AND COGNITION IN BILINGUALS AND MULTILINGUALS • Background music seems to affect foreign vocabulary learning in complicated ways, sometimes boosting learning and at other times impeding it, depending on the type of background music (e.g., vocal or instrumental) and learner characteristics (e.g., their level of baseline brain arousal). • Foreign language equivalents of concrete L1 words are learned faster and remembered better than the foreign names of abstract words. Some evidence exists that learning the foreign names of words that occur frequently in L1 is easier than learning the foreign names of infrequent L1 words. The likely cause of the concreteness effect is that the representations of concrete L1 words in memory contain more information than those of abstract L1 words. Consequently it is relatively easy to attach the new foreign names to the representations of concrete L1 words. Arguably the effect of word frequency can be accounted for in a similar way. • Foreign language words with typical phonotactical forms are acquired faster and retained better than foreign words with atypical forms. The cause of this effect is that learning the sounds of new words involves the operation of phonological short-term memory (the “phonological loop”) and the exploitation of phonological information in long-term memory. The phonological loop operates smoothly on typical forms but is impeded when atypical forms are presented for learning. In addition, only typical sound forms can benefit from relevant phonological information in long-term memory. • Foreign language words that share a cognate relation with the corresponding L1 words are easier to learn and are retained better than non-cognates. A reason is that the presentation of a word automatically activates similarly formed words in long-term memory, thus facilitating recall. An infelicitous effect of this process is that a word that shares form but not meaning with the input word can be mistaken for the latter’s translation. • Receptive cued recall leads to larger recall scores than productive cued recall. Reasons may be that the, previously known, L1 words are more available than the newly learned L2 words, that comprehension is easier than production, and/or that the L1 words are embedded in a large network of lexical connections whereas the new L2 word is only connected to its L1 translation. • Encoding strategies during both first and second language learning shift from a predominant focus on the form aspects of the learning materials to a predominant focus on the meaning aspects. First and second language learning thus seem to involve a similar developmental route. • The compound, coordinate, and subordinative models of bilingual memory organization differ from one another along two dimensions: the number of underlying conceptual systems that the bilingual possesses (one: compound and subordinative; two: coordinate) and, in the case of a single conceptual system, the way in which this system is accessed when an L2 word is input: directly (compound), or indirectly, via the corresponding L1 word (subordinative). • It was once thought that compound, coordinate, and subordinative bilingualism result from different acquisition contexts and that any individual bilingual had memory representations of only one of these types. The evidence for these assumptions is weak and each bilingual may have memory structures of different types. • The vast majority of translation “equivalent” word pairs consist of words that have language- specific meaning nuances and senses in addition to their shared meaning components. Also, word meaning changes over time and differs between individuals. These facts are better accounted for in terms of distributed models of bilingual memory than in terms of localist models. • The revised hierarchical model assumes two direct links, of different strengths, between the two word form representations of a pair of translations, one from the L1 word to the L2 word and one in the reverse direction. In addition, it assumes a single conceptual representation shared by a pair of translations. This shared representation is connected with the L1 form representation by means of a strong link and with the L2 form representation along a weaker link.

3. LATE FOREIGN VOCABULARY LEARNING 153 • The revised hierarchical model was developed (1) to account for a gradual change from primary reliance on form to primary reliance on meaning with increasing L2 proficiency and (2) to explain differential amounts of meaning activation during processing L1 and L2. • The revised hierarchical model assumes qualitatively different processes for translating words from L1 to L2 and from L2 to L1. A simpler view is that word translation always involves meaning access (“concept activation”), in addition to a second processing component, word retrieval, and that differential results obtained with L1-to-L2 and L2-to-L1 translation are due to differences in the relative ease with which these two processing components can be executed in each of these translation conditions. • To reach a high level of proficiency in an L2 the learners’ L2 vocabulary must become independent of their L1 vocabulary: L2-specific meaning nuances must be learned, L1-specific nuances must be lost, and knowledge regarding each L2 word’s relations with other words in the L2 lexicon must be established. In addition, the access and retrieval of L2 lexical representations must be automated. These goals can never be met by classroom instruction alone but require extensive subsequent reading and/or oral communication in naturalistic L2 environments. • An explicit instruction to memorize target vocabulary embedded in a larger linguistic context leads to the learning of far more foreign vocabulary than when no such instruction is given, but it is not more effective than simply presenting the foreign words to learn with their native language glosses with the instruction to memorize them. In general, contextual learning by reading texts is a less-effective way to learn specific vocabulary than out-of-context activities that focus explicitly on this vocabulary. • The sparse time available for vocabulary teaching in the foreign language classroom can best be spent on a mix of direct teaching of a base vocabulary that covers as large a percentage as possible of the words to be encountered in naturally occurring foreign language texts and discourse, augmented by the teaching of effective skills for building vocabulary from context.



4 Comprehension Processes: Word Recognition and Sentence Processing INTRODUCTION AND PREVIEW prehension is parsing, the process of unraveling the grammatical structure of a sentence. Together Whereas Chapter 3 dealt with the acquisition with word recognition it enables the listener or of vocabulary, the major part of this chapter dis- reader to figure out the literal meaning of cusses how words, once learned, are recognized. sentences. Parsing is dealt with in the final part When we hear or see a word, how does it make of this chapter. contact with its representation in the mental lexicon that contains the information which The term “word recognition” is used in both a enables us to understand what it means? As narrow and a broad sense. When used in the mentioned before, word recognition is beyond narrow sense it refers to the moment a match doubt the most important constituent process of occurs between a printed word and one (and just language comprehension and, therefore, to one) of the orthographic word-forms stored in the understand how language users can make sense lexicon or between a spoken word and a single of print and speech requires a detailed under- phonological word-form. Only after this match standing of how word recognition comes about. has taken place does all the information stored One of the puzzles to solve is how it can be that with this form, including the syntactical and it only takes fluent language users a quarter of morphological specifications of the word and, a second or so to recognize a word despite the most importantly, its meaning, become available fact that they, even when they only master one for further processing. The second stage in this language, have stored tens of thousands of words two-step view of word processing is often called in their mental lexicon. A second mental process “lexical access”. Used in a broader sense the term that plays an important role in language com- word recognition includes both these processing steps, thus covering all mental activity from the perception of the word until all the knowledge 155

156 LANGUAGE AND COGNITION IN BILINGUALS AND MULTILINGUALS stored with its lexical representation is available. two organizations could be chosen. One is to To complicate matters further, the term “lexical organize them around the types of memory units access” is also used to refer to this complete pro- that are thought to be activated simultaneously cess. In the ensuing discussion I will use the terms in both subsystems upon the presentation of a word recognition and lexical access interchange- word—for instance, units that represent phon- ably, referring to the complete process in both ology or units that store meaning. Another way is cases. In the next chapter, on speech production, to organize the discussion around the types of in agreement with common practice I will use the stimuli that researchers have used to tackle the term “lexical access” to refer to all the processing present question—for instance, cognates, or inter- that occurs between the intention to produce a lexical homographs. From a theoretical point of word (the “conceptualization” of a word) and view the former approach is the most attractive, the moment its lexical element is selected for because it is the pattern of activation in the production. bilingual’s memory system that these studies attempt to discover. Unfortunately, in cases where The present chapter and Chapter 5 are com- the data clearly suggest parallel activation in panion chapters that largely deal with one specific both subsystems, the identity of the co-activated question regarding lexical access in bilinguals, linguistic units in the contextually inappropriate addressing it from the perspective of word subsystem (henceforth also referred to as the recognition (this chapter) or word production non-target language) is not always clear. To (Chapter 5). This chapter addresses the question illustrate, under many circumstances it takes of whether a spoken or written word encountered bilinguals longer to recognize interlexical homo- by a bilingual causes activation in both of the graphs (words that are ambiguous across lan- linguistic subsystems stored in bilingual memory guages, e.g., coin, meaning “corner” in French) or whether activation is restricted to the con- than to recognize matched non-ambiguous con- textually appropriate subsystem, the one that trol words. Because the only difference between contains the representation of the input word. the homographs and their controls is that only the Co-activation of information in the other sub- former have two different meanings, it is tempting system is known as language-nonselective lexical to conclude that the homograph effect indexes access. Exclusive activation of information in the co-activation of the homograph’s meaning in the contextually appropriate system is known as other language. Yet, as we shall see, the effect has language-selective lexical access. The analogous (among others) been explained in terms of a question regarding word production is whether or model of bilingual memory that does not even not during the process of generating a word out- represent meaning. So either the assumption put, from the moment its content is conceptual- the effect is caused by co-activated meaning is ized to actually articulating it, co-activation wrong or the model is flawed. Because of this occurs in the contextually inappropriate linguistic interpretative indeterminacy I have chosen a subsystem. If word production were to turn mixed organization of the experimental evidence, out to be language-nonselective, a next question opting for the theoretically more interesting presents itself: How then do bilinguals manage “memory-units” organization where it seems safe to separate their languages in production; that is, to do so (pp. 183–197 and adopting the theoretic- to produce relatively pure, monolingual, output ally more neutral “type-of-stimulus” organization whenever they intend to do so? This latter in other sections (pp. 165–176 and 199–203). question will be touched upon here and there in Chapters 4 and 5 but will be more thoroughly All but the final section of this chapter deals covered in Chapter 6. with word recognition in bilinguals, visual word recognition as well as spoken word recognition, A sizable number of studies suggest that the the stimulus words presented in word lists (often presentation of a word to a bilingual often gives intermingled with meaningless strings of letters) rise to parallel activation in both linguistic sub- and as part of complete sentences. The chapter systems. In reviewing these studies either one of

4. COMPREHENSION PROCESSES 157 concludes with a discussion of grammatical pro- the occasional experiment in which nonwords are cessing in bilinguals and specifically addresses the the focus of the researcher’s attention or at least question of how bilinguals parse sentences in share this prominent role with words. A clear their two languages. drawback of the lexical decision task—one that does not apply to word naming—is that it is METHODS AND TASKS rather unnatural because language users do not usually go about deciding whether the letter Word recognition sequences they encounter in print are words or not. The vast majority of studies on bilingual word recognition have used the lexical decision and An additional problem with both tasks is word naming tasks, reflecting the immense popu- that neither of them provides a pure measure of larity of these two tasks in monolingual studies lexical access. Lexical decision is essentially a dis- on word processing. In the word naming task the crimination task in which words and nonwords participants simply read printed words aloud and have to be distinguished from one another. The response latencies and/or reading accuracy are discrimination process is influenced by the extant registered. In (visual) lexical decision tasks the experimental circumstances, the composition participants are presented with written letter of the stimulus set, and specific characteristics of sequences and have to decide for each of them the stimuli. The response criteria set by the parti- whether it is a word or not. If it is, they must cipants are not fixed but vary with these variables press a “yes” button; if it is not, they press a “no” so that under different sets of circumstances dif- button (occasionally an oral response is asked ferent sources of lexical and non-lexical infor- for). Again, response times and/or accuracy are mation (such as orthographic, phonologic, and registered. The nonword stimuli (those that invite semantic memory codes and the familiarity of the a “no” response) are usually pseudowords; that is, presented letter patterns) are exploited during letter strings that obey the orthography (and response generation. A “word” decision may not phonology) of the test language and that only dif- even require complete word identification in all fer from words in that they lack meaning. Using cases, even not when all nonwords are pseudo- well-formed letter strings as nonwords is import- words, because a mere feeling of familiarity or of ant because it increases the chance that the lexical meaningfulness may suffice to tell actual words decision response will be based on the outcome from nonwords (e.g., Balota & Chumbley, 1984; of the process of interest, lexical access, and not Grainger & Jacobs, 1996). Furthermore, the on the basis of a more shallow perceptual process moment sufficient information has been that assesses whether the stimulus looks normal. assembled to conclude the stimulus is or is not a The latter would be feasible for nonwords that word, this assessment has to be translated into the violate the target language’s orthography and correct response, “yes” or “no”, and this response phonology. Rubinstein, Lewis, and Rubinstein has to be executed. It is well known that the (1971) were the first to demonstrate such a “non- duration of this “post-lexical” response stage is word legality” effect in the very same study that not fixed but responds to the prevailing circum- introduced the lexical decision task as a new stances. In conclusion, whenever a particular research tool in the study of word recognition. effect is obtained in lexical decision, it is not Although the presence of nonwords is demanded always obvious that it is a marker of actual word by the task and it is imperative that they are con- recognition. structed carefully, in most lexical decision studies they merely serve as fillers and the responses they The word naming task has its own shortcom- invite are ignored in the analyses. But there is also ings. A clear disadvantage of this task is that in languages written in an alphabetic (or syllabic) script, and especially in alphabetic languages with regular grapheme–phoneme relations, responses can be assembled by merely applying the

158 LANGUAGE AND COGNITION IN BILINGUALS AND MULTILINGUALS script-to-sound conversion rules, thus bypassing use most often to examine lexical access both tap actual recognition (in the same way as pseudo- task-specific processes that have little to do with words can be read aloud despite the fact that they word recognition per se and that may conceal the have no representation in the mental lexicon). task independent word recognition processes There is evidence to suggest that this indeed of interest. Grainger and Jacobs (1996) have sometimes happens: An important signature of visualized this state of the art with the Venn lexical access is the occurrence of a frequency diagram presented in Figure 4.1. effect, the shorter response time that is generally obtained for frequent than for infrequent words. The process of interest, word recognition, These frequency effects are often considerably concerns the area where the circles representing smaller in naming than in lexical decision, and lexical decision and naming overlap. Grainger smaller in naming words in a language with a and Jacobs call this the “functional overlap” relatively regular orthography than in naming between these two tasks. In addition to word words in an irregular orthography (De Groot naming and lexical decision, Figure 4.1 refers to et al., 2002). a third category of tasks popular in studies on lexical access: perceptual identification. This is a A second drawback of the naming task relates class of tasks in which the stimulus word to be to the fact that performance in this task not only identified is presented in what is called a “data- requires recognizing the word but also pro- limited” way, or “masked” or “degraded”: It is nouncing it. As a consequence, any effect to be presented too briefly or too vaguely to be clearly obtained may have its locus not in the recognition seen and the participants are asked to figure out stage but in the production stage of the task; that what the stimulus might be. For instance, the is, at some point in time between recognition and word stimulus might be presented for, say, 40 the onset of vocalization. In an ingenious experi- milliseconds and preceded and followed by a ment Balota and Chumbley (1985) demonstrated pattern mask (called a “forward” and “back- that this is not an imaginary danger. They com- ward” mask, respectively) consisting of a pared performance in a standard naming task sequence of hash marks that impede a clear (where the participants read aloud the words as quickly as possible from the moment they appear Venn diagram illustrating the concept of functional overlap. on the screen) with performance in a delayed From Grainger and Jacobs (1996). Copyright © 1996 naming task. In the latter task the words are not American Psychological Society. to be read aloud immediately upon their presenta- tion, but only when a particular signal (in this particular study a pair of brackets surrounding each word) is presented. Interestingly, a frequency effect materialized despite the fact that the parti- cipants clearly had sufficient time to recognize the words, also the infrequent ones, before the response signal appeared. This finding led the authors to conclude that word frequency also affects the output stage of word naming. Fre- quency effects in delayed naming have since been obtained more often, including once in a study examining word naming in L2 English (De Groot et al., 2002). The conclusion to be drawn from this is that, as in the lexical-decision task, the locus of an effect in the naming task may be uncertain. In conclusion, the two tasks that researchers

4. COMPREHENSION PROCESSES 159 view of the target stimulus. Certain word charac- alternation process, the presentation durations of teristics, such as word frequency, may then be target and mask gradually increase and decrease, manipulated and the researcher tries to find respectively. The participant presses a button the out which words are relatively easy to identify moment target identification occurs and sub- under these data-limited circumstances and which sequently reveals the word’s identity. In a second ones are hard to recognize. A related technique type of perceptual identification study the targets is to start out presenting the stimulus very briefly are presented for a fixed duration but too short to at the onset of experimentation and to sub- be easily identified. Because perceptual thresholds sequently increase the presentation duration vary between individuals, these experiments incrementally up until the moment the partici- often involve a preliminary session in which the pant can identify it. presentation duration for that particular partici- pant is determined. A perceptual identification In most of the bilingual studies to be discussed technique used in the study of bilingual speech in the sections to follow the researchers had their recognition is gating. In gating, increasingly participants perform one (version of one) of larger fragments of spoken words (“gates”) are these three tasks, sometimes in combination with presented to the participants, who are asked to the word priming technique introduced before guess the words from which the fragments are (pp. 92–93): A word target is preceded by an derived. For instance, they first hear the first 40 earlier stimulus, the prime, and the effect of the ms of English pick, then the first 80 ms, and so on, earlier prime on accessing the target’s lexical rep- until the fragment is correctly recognized as the resentation is assessed. Across these priming stud- word pick. The task may thus be regarded an ies, primes and targets can both be clearly visible, auditory variant of the progressive demasking the targets may be clearly visible but the primes task. The primary dependent variable is the masked, or both primes and targets may be “isolation point”, the gate duration at which the masked. Also, a cross-modal priming technique target word is first guessed correctly and beyond has been employed, in which the prime is pre- which, at later gates, participants hold on to this sented aurally and the target visually. identification. Other measures are how confident the participants are about the correctness of their Two versions of the lexical decision task have guess, whether they might already have an idea been developed specifically to study bilingual about the language of the fragment before the word recognition. These are generalized lexical corresponding word can actually be identified, decision (also called language-neutral lexical and what the set of incorrect guesses prior to decision) and language-specific lexical decision. identification looks like. In generalized lexical decision a “yes” response is required if the presented letter sequence is a In addition to the above three main categories word in either of the participant’s two languages; of tasks, other tasks have been used but only if not, a “no” response must be given. In occasionally. As lexical decision, a number of language-specific lexical decision a “yes” response them are binary classification tasks in which must only be given to letter strings that are on each trial one of two possible responses is words in the language specified by the experi- required. But instead of categorizing letter strings menter prior to the onset of the experiment. as words or nonwords, a categorization on some Whenever a word in the other language is pre- other stimulus feature is required. The details of sented, a “no” response must be given, just as to these tasks will be given along the way. For now real nonwords. it suffices to say that in evaluating the results obtained with these tasks the researcher should One version of the perceptual identification always be aware of the possibility that a par- methodology applied in bilingual studies is pro- ticular effect might have its locus not in the gressive demasking, a task first used in a mono- process of interest but in the stage in which the lingual study by Grainger and Segui (1990). In outcome of the relevant process is translated this task the visual presentation of a target word alternates with that of a mask. During this

160 LANGUAGE AND COGNITION IN BILINGUALS AND MULTILINGUALS into one of the two possible responses (the measuring techniques that are gaining popularity “post-lexical” stage referred to earlier). A similar because they detect cognitive activity the very warning is in order regarding a class of tasks, the moment it takes place or only slightly after. One so-called go/no-go tasks, that impel the partici- of these is self-paced reading combined with a pants to translate the outcome of the cognitive moving-window technique: A text appears on the process of interest in an overt response on some screen in successive segments (the “windows”). trials (“go”) but to withhold a response on other The participant summons each subsequent trials (“no-go”). An example is the language go/ segment by pressing a key. This segment then no-go task, in which bilingual participants have to appears in the position next to where the previous produce an overt response whenever a stimulus segment was. A trial starts with the presentation is a word in their one language but to refrain of groups of dashes separated by spaces, each from responding when it is a word in their other group serving as a placeholder for a word in the language. text to be presented on that trial and each dash representing one letter. When the participant sub- Yet other studies infer the nature of cognitive sequently presses the key, the first segment processing involved during task performance appears, say the first two words, replacing the from the participants’ eye movements. Eye- corresponding placeholders while the place- movement recording is most often employed in holders for the remaining words remain on the reading studies, both monolingual and bilingual, screen. Upon pressing the key next, the first two in which participants read complete sentences or words are replaced by their placeholders again text, but the technique has also been exploited in and the next segment appears, taking the position studies on spoken word recognition: On each trial of the corresponding placeholders. This con- in a common application of the eye-movement tinues until the whole text has been read. The tracking paradigm the participant hears the name segments’ size is determined by the researcher and of an object and has to identify the corresponding depends on the specific question posed. The object among a larger set of objects on a visual interval between two successive key presses is display. The details of the pertinent studies and measured and is regarded the reading time for the their rationale will be explained in due course and current window. some more general features of the eye-movement recording methodology will be explicated in the In another version of the technique the next section, which deals with research methods window remains fixed in the same location on used to study the processing of linguistic units the screen and the successive segments all appear larger than the word. in this same place. In a third version the window does move, but the segments revealed in previous Sentence processing windows remain on the screen, a procedure that enables the reader to go back to earlier parts Research that examines how language users pro- of the text. Finally, the rapid serial visual cess complete sentences makes use of a number of presentation (RSVP) technique presents segments off-line tasks such as a grammaticality judgment (usually words) one by one at a fixed rate in task or a task that asks the participant to assign the same location on the screen. The crucial thematic (or “semantic”) roles (e.g., agent or difference between this technique and the three patient) to the noun phrases in a sentence. The others just mentioned is that reading is not self- goal of the latter type of studies is to find out paced. Instead, the experimenter determines the what aspects of the surface form of sentences presentation speed. (e.g., word order or subject–verb agreement) play a role in determining the semantic relations Self-paced reading techniques assume that between the words in a sentence. In addition to the speed with which the participant progresses these off-line techniques, comprehension research through the text reflects the speed of the com- exploits a number of rather sophisticated on-line prehension processes involved and the mental processing load at every moment in time. A further

4. COMPREHENSION PROCESSES 161 assumption is that the moment a new segment catastrophe for the first time and specifically on its appears, the participant immediately tries to inte- first syllable ca and then jump to the third syllable grate it in the representation of the previous text stro after a while. “First fixation duration” is the (this is the so called “immediacy” hypothesis). time between the moment the eyes first land in a Furthermore it is assumed that at every moment particular region of interest (a word or larger in time the mind processes the word on which fragment), here ca, and subsequently move else- the eyes are currently fixating (this is called the where, to stro in the example. “Gaze duration” is eye–mind hypothesis; see Haberlandt, 1994, for a the sum of first and all consecutive fixations discussion). within a region before the eyes move to another region (on either side of the critical region). This Self-paced reading as studied by means of the measure is the equivalent of the reading times moving-window (and fixed-window) technique obtained with the self-paced reading technique provides a single measure of what might be going described above. In our example gaze duration on mentally: the total, first-pass reading time for is the sum of the fixations on ca and stro. If the current window. As cogently argued by a only one fixation is made to a particular word number of researchers (e.g., Frenck-Mestre, or fragment, first fixation duration and gaze 2005a, 2005b; Rayner & Sereno, 1994), this per- duration are equivalent. “Total fixation duration” formance measure is rather coarse-grained or “total gaze duration” on a particular region is because of the many-faceted processing the sum of the gaze duration on a region and the operations that may have taken place in between fixation time involved in later fixations on this two successive button presses. The single total same region (Rayner & Sereno, 1994). So after reading time measure collected for every single first having fixated on ca and stro, the eyes may segment of text does not allow the researcher regress to an earlier region, say to den in the to determine which one(s) of these processing adjective unprecedented that precedes catastrophe, operations are reflected in the observed latency. and subsequently land somewhere on catastrophe The technique is also somewhat unnatural again, say on phe this time. If after this moment because under normal circumstances readers do the eyes never get back to catastrophe again, the not get to see a text in a piecemeal fashion and the total fixation duration for this word is the sum reader, not the experimenter, determines the unit of the fixations on ca and stro and phe. Gaze of reading. duration (ca plus stro) is also referred to as “first- pass” reading time, whereas all later fixations A more natural and more sensitive on-line in the same area (phe) are called “second-pass” technique is “eye-movement recording” or “eye reading time (the latter thus equaling total tracking”, which registers the participants’ eye fixation time minus first-pass reading time; movements and eye fixations while they read a Frenck-Mestre, 2005b). A final measure is the text presented on a computer screen, document- “regression path duration”: the time from first ing what the participants are looking at and for fixating a word until moving the eyes beyond that how long. Such recordings have shown that the word, including the regression time. For example, eyes do not move smoothly through the text but if the eyes land on ca, stro, den, and phe, in that jump from one region to the next with ballistic order, and then move on beyond catastrophe movements (called saccades), fixating for some never to regress to this word again, the regression time on the region where they land (“fixations”). path duration equals the sum of the fixation dur- Once in a while the eyes jump back to an earlier ations of ca, stro, den, and phe. First fixation dur- part of the text, a movement called a “regres- ation and gaze duration are assumed to reflect sion”. With this method, several measures are initial word recognition whereas regression path available. Consider the word catastrophe in the duration is thought to be a marker of higher- sentence The islanders were struck by an order reading processes such as semantic unprecedented catastrophe that completely para- integration. lyzed them. Now imagine that (after processing the earlier parts of the sentence) the eyes land on

162 LANGUAGE AND COGNITION IN BILINGUALS AND MULTILINGUALS In addition to these measures of fixation time, change relative to some baseline. A component’s measuring the length of the saccades and the pat- topography refers to where over the scalp’s tern of regressions provides valuable further surface the electrical activity aroused by the information on what is going on mentally during stimulus is detected. Components are often reading (see Frenck-Mestre, 2005b; Haberlandt, named after their polarity and latency (as in N400 1994; and Rayner & Sereno, 1994, for details). and P600) or after their topography and polarity Contrary to the multidimensional nature of the (e.g., LAN, left anterior negativity). Alternatively, data yielded by the eye-movement recording they are named after the assumed underlying technique, the self-paced moving-window tech- functional process in combination with their nique only provides the unidimensional gaze polarity (e.g., MMN, for mismatch negativity, duration measure. To illustrate the resulting dif- and SPS, for syntactic positive shift). This latter ferential sensitivity of the two techniques, the practice involves the danger that ultimately a dif- recording of eye movements during reading can ferent functional process than the one suggested discriminate between initial parsing and later by its name may turn out to underlie the backtracking when the initial solution turns component. out to be wrong, whereas the self-paced moving- window technique cannot distinguish between The component’s qualitative features—that is, these two processing components but merges its topography and polarity—are assumed to them into a single measure. reveal information on the neural structure and functional process involved (without identifying A final technique for studying on-line com- the exact locus of the neural structure in the prehension processes concerns the registration brain; see pp. 407–411 for a more detailed of “event-related potentials” (ERPs). ERPs are description of the ERP technique). The quantita- small voltage changes in the electroencephalo- tive features of a component, its amplitude and gram (EEG), measured with electrodes placed on latency, are thought to reflect to what extent the the scalp, and induced by a particular stimulus, underlying neural structure is involved and the the eponymous “event”. Such an event could, time course of the functional process, respectively for instance, be one of the words in a visually or (see Hagoort & Ramsey, 2004, and Hahne & aurally presented sentence. One and the same Friederici, 2001, for more details). The ERP stimulus may give rise to a number of different methodology is especially well suited to provide ERPs in the EEG, called “components”, that may information on temporal aspects of the various vary in polarity: They can be positive (indicated mental processes that go on during task perform- by P) or negative (indicated by N). It is common ance, but provides a relatively imprecise measure practice to plot negative components upwards of where exactly in the brain this processing takes and positive signals downwards. Components place (because brain activity is detected at the that differ in polarity are assumed to be generated scalp, not directly in the brain). In other words, by different groups of neurons. In addition to the method is known to have a high temporal but varying in polarity, the components vary in low spatial resolution. As such, the ERP method- “latency”, “amplitude”, and scalp distribution ology complements two increasingly popular or “topography”. A component’s latency is the techniques of measuring brain activity during time interval, expressed in milliseconds, between cognitive operations—positron emission tom- the onset of the critical event and the moment ography (PET) and functional magnetic resonance the voltage change is maximal. For instance, imaging (fMRI). Both measure metabolic changes P600 refers to a positive-polarity ERP that is correlated with neural activity and reveal where in maximally strong 600 milliseconds after the onset the brain this activity takes place but not exactly of the critical event, and N400 refers to a when. In other words, they have a high spatial but negative-polarity ERP that is maximally strong low temporal resolution. These techniques will 400 milliseconds after the event’s onset. A com- be explained elsewhere (pp. 411–413). A particu- ponent’s amplitude is the degree of the voltage larly attractive feature of all three of these

4. COMPREHENSION PROCESSES 163 brain-imaging methods (ERP, PET, and fMRI) Barss, Forster, & Garrett, 1991; Osterhout & is that no overt behavioral response is required. Holcomb, 1992). Friederici and her colleagues By implication, no mental process that translates (Friederici & Kotz, 2003; Hahne & Friederici, the process of interest (e.g., word recognition) 1999; Kotz & Friederici, 2003) have suggested into a behavioral response (e.g., a lexical decision that the ELAN and P600 are markers of two and the associated manual or verbal motor serial stages of syntactic analysis, an automatic response) is involved. Because such “nuisance” structure-building stage occurring early, followed processes leave their own mark on the brain by one that is under the participant’s attention response, thus complicating its interpretation, control and that reflects syntactic integration doing without a behavioral response enables a processes. (relatively) straightforward interpretation of the brain response in terms of the processes of inter- THE PROCESSING OF INTERLEXICAL est. Still, many studies combine brain imaging HOMOGRAPHS AND HOMOPHONES with the registration of some overt behavioral response. Introduction The ERP methodology is applied in many A much-debated question in the study of mono- subdomains of study within the broad area of lingual language comprehension is whether the cognitive neuroscience and each of the associated linguistic context of a word exerts an influence on cognitive functions studied—for instance, lan- the way this word is recognized. Much of the guage, memory, visual perception, or attention— pertinent research was inspired by Fodor’s (1983) is characterized by a unique collection of relevant highly influential modularity of mind theory. ERP components. Language comprehension has Central in this theory is the concept of mental been shown to give rise to at least three ERP modules, information-processing devices that components, the presumably best known of perform basic cognitive functions on incoming which is the N400 already alluded to above. This information, such as recognizing faces or words, component is elicited by any content (or and that are characterized by a number of charac- “open-class”) word and its amplitude is inversely teristic features (not defining features, as they have related to the ease with which the word can be often been taken to be; see Fodor, 1985, and semantically related to its context. It is therefore Coltheart, 1999, for clarifications of the concept thought to reflect semantic integration processes of a module). Some of these features are that (Brown & Hagoort, 1993; Kutas & Van Petten, modules tend to operate in a “domain-specific” 1994). Kutas and Hillyard (1980) were the first to way—which means that they only respond to report the effect in a study that employed visual input of a particular type, say faces or words— presentation of the materials but it has since been that they operate fast and mandatorily, and, most shown to occur with aural presentation as well. importantly in the theory, that they are “infor- The two other ERP components that have been mationally encapsulated”. This latter concept shown to occur during language comprehension means that modules are impenetrable by infor- appear to reflect the structural analysis of sen- mation delivered by “higher” cognitive processes tences. The first is named after its topography and such as thinking, problem solving, or making polarity: LAN (left anterior negativity). It is inferences, and cannot exploit the background manifest early on in the signal, mostly 300 to 500 knowledge these higher cognitive processes make milliseconds after the onset of the critical event, use of. but sometimes even earlier, between 100 and 250 milliseconds after stimulus onset; in this case it is Given these features of mental modules, called ELAN (early left anterior negativity; scientists have wondered whether the word recog- e.g., Hahne & Friederici, 1999, 2001). The second nition system of fluent language users might be is the P600 mentioned above, which occurs over centro-parietal electrodes (e.g., Neville, Nicol,

164 LANGUAGE AND COGNITION IN BILINGUALS AND MULTILINGUALS considered a module as well: Word recognition The majority of studies that attempted to in fluent readers and listeners is fast, mandatory resolve this issue employed a cross-modal sen- (as is, for instance, shown by the occurrence of tence context version of the semantic priming “Stroop effects”; see p. 255), and dedicated to the methodology: The ambiguous word (e.g., bug) processing of one particular type of input. They was presented in a sentence context and served have put this hypothesis to a test, focusing on the as prime for a subsequently presented target. assumed information-encapsulation feature of This target was either related to the contextually modules. If the word recognition system is a appropriate reading of the homograph (e.g., module in Fodor’s sense, neither the linguistic ant), to its contextually inappropriate reading context of a word nor extra-linguistic contextual (spy), or was unrelated to both readings (sew). information should affect the way it is recognized. Table 4.1 exemplifies the various conditions with materials employed by Swinney (1979) in one of A sizable number of studies have tackled this the very first studies to use this methodology. issue by looking at the way lexical ambiguity is In this specific study the cross-modal priming resolved in sentence context. Lexically ambiguous methodology was used: The context fragments, words have two (or more) meanings that are including the ambiguous word, were presented unrelated. They are also called “intra-lexical aurally and the targets were presented visually. A homographs”. An example is the English word further important manipulation was the inter- bug, which can either refer to a type of insect stimulus interval (ISI) between prime and target; or to a carefully concealed little microphone. The that is, the time interval between prime offset question posed in the ambiguity-resolution and target onset: The target was presented either studies is whether all meanings of a word are ini- immediately at the offset of the prime or a few tially activated or whether activation is restricted syllables later. The results of this study supported to the meaning that fits the context. Evidence of the idea that initially multiple activation occurs multiple, parallel activation irrespective of the and that only later is the contextually appropriate nature of the contextual information, is regarded meaning selected on the basis of the contextual support for autonomy of word processing and, information: In the short ISI condition, targets thus, for the notion that the mental lexicon related to both meanings of the ambiguous operates as an informationally encapsulated prime (both ant and spy) were processed faster module. In contrast, evidence that only the mean- than targets unrelated to either meaning (sew). ing compatible with the context is activated is In the long ISI condition, only the responses to seen as support for “interactive” word recogni- targets related to the contextually appropriate tion; that is, the idea that the conceptual represen- meaning of the prime (ant) were facilitated as tation built from the linguistic context preceding compared to the unrelated targets. These data the word permeates the lexicon with the effect suggest context-independent lexical activation that only the contextually appropriate meaning and are thus in accordance with the view that gets activated. TABLE 4.1 The cross-modal priming methodology Auditory context sentence Rumor had it that for years the government building had been plagued with problems. The man was not surprised when he found several spiders, roaches, and other bugs_in the corner _ of his room. Target word contextually appropriate contextually inappropriate contextually unrelated ANT SPY SEW The context sentence, including the critical ambiguous prime word (bugs), is presented aurally. The target word (ANT, SPY, or SEW, depending on the condition) is presented visually. The target is presented at prime offset (immediately following bugs) or a few syllables later (following corner). Example materials used by Swinney (1979).

4. COMPREHENSION PROCESSES 165 word recognition is a modular process in Fodor’s participants in which peanuts were considered (1983) sense. animate beings (that, typically, are not salted). The prior context in question was a story about a This same pattern of results has since been peanut falling in love with an enchanting little obtained more often (e.g., Onifer & Swinney, almond. When the participants had been put in 1981; Tanenhaus, Leiman, & Seidenberg, 1979), this state, the locally incorrect but globally correct providing additional support for what is called in love in the sentence The peanut was in love failed “multiple access”. However, much evidence to to elicit an N400 effect. These and similar N400 support other views on ambiguity resolution has findings (Van Berkum, Hagoort, & Brown, 1999; also been collected, including the opposite view Van Berkum, Zwitserlood, Hagoort, & Brown, that context constrains lexical activation at the 2003) strongly suggest that the meaning repre- very initial stage of lexical access so that only sentation of prior discourse has an immediate the contextually appropriate meaning becomes effect on word recognition (immediate, because activated (e.g., Simpson, 1981). Yet other views the effect is already manifest 400 ms after the are that not context but the relative frequencies onset of the critical word). Dahan and Tanenhaus of the homograph’s meanings determine the (2004) obtained converging support that context order of access (Hogaboam & Perfetti, 1975), interacts with word recognition using the eye- or that context and the relative frequency of tracking paradigm. the homograph’s meanings interact to resolve the ambiguity (e.g., Tabossi, Colombo, & Job, Bilingual studies 1987). In a comprehensive review of the literature, Simpson (1994) concluded that “the Bilingual studies on word processing have, in a range of results obtained in ambiguity studies modified form, adopted the rationale of the suggests clearly that the extreme views of the above monolingual studies on lexical ambiguity lexicon as either fully autonomous or promis- resolution. The general purpose of these bilingual cuously interactive are not tenable” and that studies is to find out whether lexical activation “the truth must almost surely lie somewhere is encapsulated within a language (language- in between and must be highly dependent on selective) or is not constrained by language characteristics of the context and on charac- (language-nonselective). In other words, if one teristics of the tasks required of the subject” and the same word means something different in (p. 372). a bilingual’s two languages, are both meanings activated when it is encountered in the input or More recent studies have addressed the is only the word’s meaning in the contextually question of whether or not the word recognition appropriate language activated? In most cases, the system is a mental module in the sense defined cross-language ambiguous words tested were above using newer and extremely sensitive the interlexical homographs introduced earlier. paradigms. The evidence collected in these studies As mentioned, these are words with the same favors the conclusion that word recognition is not orthographic form but different meanings in a immune to contextual information. For instance, bilingual’s two languages. Examples are provided in an ingenious ERP study Nieuwland and Van in Table 4.2. Occasionally interlexical homophones Berkum (2006) showed that the N400 effect that have served as the critical stimuli. These have is usually elicited by words that violate local an identical phonological form but different lexical-semantic constraints (such as clock in the meanings in the two languages. (Note that, if the sentence The girl comforted the clock) disappears two languages are written in the same alphabetic when the anomaly is in fact supported by the script and largely share the same set of grapheme- discourse context (e.g., a girl talking to a clock to-phoneme correspondence rules, cross-language about his depression). More strikingly even, orthographic and phonological homonymy will be locally correct predicates (e.g., salted in The correlated.) peanut was salted) showed an N400 effect if the prior discourse had set up a mental state in the

166 LANGUAGE AND COGNITION IN BILINGUALS AND MULTILINGUALS TABLE 4.2 processing (and in particular homograph pro- cessing) in sentence context begun to emerge. Examples of English–German and English–Dutch interlexical homographs A second noticeable difference between the monolingual and bilingual studies is that the for- English–German English–Dutch mer explicitly focus on the processing of an ambiguous word’s meaning because, as compared BAD (“bath”) BEER (“bear”) to unambiguous words, this appears to be the BALD (“soon”) BOOT (“boat”) defining characteristic of an ambiguous word: BRAND (“fire”) BRIEF (“letter”) that it has more than one meaning. Yet, as we will FAST (“almost”) DOOR (“through”) see, many bilingual studies on the processing of GIFT (“poison”) FEE (“fairy”) interlexical homographs seem not to have had GRAB (“grave”) GLAD (“slippery”) this focus. This is most obvious from the fact GUT (“good”) KIND (“child”) that the model of bilingual word recognition that NUN (“now”) ROOF (“robbery”) many researchers have turned to in explaining RAT (“advice”) STRAND (“beach”) their effects, the bilingual interactive activation STERN (“star”) WORST (“sausage”) model (Dijkstra & Van Heuven, 1998), does not even include representations of word meaning The German and Dutch meanings of the homographs are in (but can nevertheless account for much of the parentheses. data). In the remainder of this section I will first summarize the data from out-of-context studies, A noticeable difference between the mono- explaining the methodological details along the lingual and bilingual lexical ambiguity studies is way, and present a few of the relevant studies in that the large majority of the former have pre- more detail. Next, I will discuss the studies in sented the critical words (the within-language which interlexical homonyms were presented in homonyms) in a sentence context, whereas most a sentence context, including some that have of the latter have presented the analogous stimuli looked at how the brain responds to these stimuli (the interlexical homonyms) in isolation. The pre- (pp. 172–176). In a further section (pp. 177–181) I sentation of sentence contexts in the monolingual will present the models that have been developed studies directly follows from the main question to account for the assembled results. motivating these studies: whether or not lexical access is encapsulated in the sense that it is Processing interlexical homographs not affected by a linguistic context. However, the and homophones out of context specific question regarding encapsulation posed in the bilingual studies is a different one: Is lexical Beauvillain and Grainger (1987) were the first to access in the one language encapsulated in the adopt the rationale of the monolingual ambiguity sense that the other language is not involved in resolution studies and apply it to the study of the process? Given this different focus, it makes bilingual lexical access by exploiting the dual- sense to start out looking for evidence of lan- meaning characteristic of interlexical homo- guage encapsulation in out-of-context studies. If graphs (but studying the processing of such evidence of language-nonselective processing homographs out of context). The task they were to be found under those circumstances, the used was cross-language primed lexical decision: next logical step would be to look at the more English–French bilinguals were presented with a constraining case, where sentence context points set of stimulus pairs each consisting of a French towards one of the (interlexical) homonym’s prime word and an English target word (or meanings in particular. This is the research nonword), the prime and target presented strategy that, without formulating it explicitly in sequentially. They were instructed to read each advance, seems to have been taken, as may be prime and to then perform a lexical decision on concluded from the fact that all early studies have presented the homographs in isolation. Only just recently have studies on interlexical homonym

4. COMPREHENSION PROCESSES 167 the subsequent target. The vast majority of the ology, have each in their own way and for their primes were words in French only, but a number own reasons also included elements from the non- of them were French–English interlexical homo- target language in the stimulus set, thus creating graphs such as coin. The question of interest suboptimal conditions for testing the theoretical was whether the interlexical homographs would issue under study. facilitate the processing of subsequent English targets that were related to the homographs’ Given the fact that the most salient charac- English meaning (e.g., would money be responded teristic of an interlexical homograph is that it to faster when it followed the “French” prime coin has different meanings in the bilingual’s two than when following an unrelated prime word). languages, a noteworthy aspect of many of these This turned out to be the case when the interval later studies (De Groot, Delmaar, & Lupker, between prime and target was relatively short 2000; Dijkstra, De Bruijn, Schriefers, & Ten (150 ms). With a longer interval (750 ms) no such Brinke, 2000a; Dijkstra et al., 1999; Dijkstra, priming effect occurred. These findings suggest Timmermans, & Schriefers, 2000b; Dijkstra, that, even though the majority of the primes were Van Jaarsveld, & Ten Brinke, 1998; French & French only, both meanings of the interlexical- Ohnesorge, 1995; Gerard & Scarborough, 1989; homograph primes were initially activated and Jared & Szucs, 2002; Kerkhofs, Dijkstra, Chwilla, that only at a later moment the contextually & De Bruijn, 2006; Von Studnitz & Green, 2002a) inappropriate meaning (the English meaning) was is that they did not explicitly look for evidence deactivated. that both of the homograph’s meaning repre- sentations are temporarily activated upon its In retrospect, this evidence of language- presentation. Instead, theoretically more neutral, nonselective processing seems hardly surprising they looked for evidence of co-activation in the because Beauvillain and Grainger’s study non-target lexicon without making the a priori involved the presentation of both French and assumption that dual meaning activation would English words. It has been suggested that in this underlie such evidence. One of these studies situation both lexicons become automatically (Kerkhofs et al., 2006) employed the semantic activated and the participants were thus perform- priming methodology, but now with primes and ing the task in a “bilingual processing mode” targets in the same language, and looking at (e.g., Grosjean, 1998; see pp. 288–291 for a both behavioral and brain responses (ERPs) to detailed presentation of language-mode theory). the critical stimuli. None of the other studies used A stronger test of language-nonselective word the priming methodology. Instead, responses to recognition would involve the presentation of (unprimed) interlexical homographs were com- words in one language exclusively to see whether pared with responses to (unprimed) unilingual also under those circumstances the non-target control words; that is, words that exist in the tar- language permeates processing. This is the get language only. The homographs and controls research strategy that some of the later studies were matched on a number of variables that are have pursued by not employing the cross- known to affect processing difficulty, especially language priming methodology (in which all word frequency. The only difference between primes are in the bilingual’s one language and all the two categories of words thus was the fact targets are in the other language) and by not in that only the homographs occur in both of the any other way presenting language-mixed stimu- bilingual participants’ two languages. Therefore, lus materials. Instead, they presented words in any difference in the responses to homographs one language only (although some of them were and controls to be obtained is likely to result from interlexical homographs) and looked for evidence this one difference and will, one way or the other, that, nevertheless, the experimentally absent lan- have to be explained accordingly. Co-activation guage affected performance. However, as we will of (yet to be specified) representation units in see, a number of the later studies, though not the non-target language is a plausible source of employing the cross-language priming method- this effect.

168 LANGUAGE AND COGNITION IN BILINGUALS AND MULTILINGUALS In most of these studies the visual lexical deci- “nonwords” are included that happen to be words sion task was used, either its language-generalized in the non-target language. Under these circum- (language-neutral) or its language-specific form. stances, a lexical decision may be based on the These lexical decision studies have shown differ- homograph making contact with either one of ences in response time and number of errors two lexical representations, one for each language, between homographs and controls in the majority and there is no need to check the language of cases. Depending on the exact demands of the membership of the contacted representation. On task (language neutral or language specific) and average, this process will come to a conclusion at the composition of the stimulus set (the presence an earlier moment in time than when only one or absence of words from the non-target language lexical representation matches the stimulus word, among the “nonword” letter strings), response as is the case with non-homographic control words. times to homographs were either longer or This difference in completion time is reflected shorter than to their controls. This difference (in in relatively fast responses to homographs. In either direction) is called the homograph effect. contrast, under circumstances where such task The size of the effect depended on a number of reconfiguration (performing the language-specific variables, most notably on the relative frequency version of the task as if language-neutral of the homograph in the two languages. It was decisions had been asked for) would punish the generally especially large when the homograph participants with a high error score (many false was more frequent in the non-target language positives to words of the non-target language), than in the target language and when, at the they are likely to adopt the requested language- same time, the participants performed the specific processing mode. Under these circum- language-specific version of the task. If language stances homographs will be responded to more nonselectivity occurs under at least some circum- slowly than their controls, especially when a stances, as these data suggest to be the case, these homograph’s meaning in the non-target language frequency effects are to be expected: A highly is more frequent than its meaning in the target activated node in the lexicon of the non-target language. The reason is that the representation of language (a node that represents a high-frequency the more frequent non-target meaning will be word) will interfere more with responding than accessed first, checked for language membership, a less activated node in the non-target system and rejected, all of this causing a delay in process- (a unit representing a low-frequency word). In ing the target meaning. But, importantly, in both some cases, however, no homograph effect was cases, the fact that there is a difference between obtained, suggesting the possibility of language- processing time for homographs and controls selective processing under at least some sets of points to the activation of the non-target lexicon. circumstances. Analogous to the homograph effects, inter- That specific constellations of task demands lexical homophone effects have been obtained, and stimulus set characteristics can affect the dir- for words that sound (approximately) the same ection of the homograph effects may be readily in a bilingual’s two languages but are ortho- understood if we assume that the participants graphically different and have a different meaning have at least some control over the way they in these languages. Dijkstra et al. (1999) separated execute the task and adapt task performance to the contributions of cross-language orthographic the task’s exact requirements. For instance, when and phonological overlap in a study where the participants are instructed to perform Dutch–English bilinguals performed an English- language-specific lexical decision but no cost is specific lexical decision task to visually presented incurred if, despite these instructions, they recon- letter strings. Interestingly, the two types of figure the task and perform it in its language- cross-language similarity turned out to have neutral version (respond “yes” to any word, opposite effects: Orthographic overlap produced irrespective of language), they may in fact opt for a facilitating effect (as compared to only English this latter strategy. Such will be an option if no control words), whereas phonological overlap

4. COMPREHENSION PROCESSES 169 produced an inhibitory effect. Doctor and Klein French trials. These results suggest that under (1992) obtained similar results in an English– certain circumstances language-selective process- Afrikaans study, but in a more recent study ing may occur. More precisely, they suggest that (Haigh & Jared, 2007), French–English bilinguals the stronger language can be immune to an influ- responded faster to interlexical homophones ence from weaker L2. Furthermore, the pattern than to control stimuli in an English lexical of results for the English–French participants decision task. demonstrates that under circumstances in which the activation of the non-target language is Although it is not immediately clear why boosted somehow (here as a result of prior pro- orthographic and phonological overlap occasion- cessing of a set of French words) the dominant ally produce opposite effects, both effects provide language does not enjoy immunity. Converging evidence of language-nonselective processing. evidence that language-selective processing may Dijkstra and his colleagues hypothesized the occur under some circumstances was obtained reverse effects may explain why occasionally null by Haigh and Jared (2007), who found support effects of interlexical homography have been for language-nonselective processing when observed (e.g., De Groot et al., 2000, Experiment French–English bilinguals processed interlexical 2; Dijkstra et al., 1998, Experiment 1; Gerard & homophones in an English lexical decision task, Scarborough, 1989): A facilitating effect of hom- but not when English–French bilinguals served as ography and an inhibitory effect of homophony participants. might have cancelled one another. Alternatively, the experimental design may have been too There is one more noteworthy outcome of insensitive to detect a real but small effect of Jared and Szucs’ (2002) study that deserves homography. However, because of their ad hoc mentioning here; namely the fact that these character these explanations of the occasional authors obtained an inhibitory effect of inter- null effect are not really satisfactory. lexical homographs, whereas Dijkstra et al. (1999) obtained faster responses for homographs than A more satisfactory, because less ad hoc, controls. Apparently, the exact task that is used, approach would be to take these null effects at word naming or lexical decision, is a factor that face value and assume that under specific circum- determines the direction of the homograph effect. stances bilingual word recognition is in fact The inhibition in naming is plausibly due to the language selective. An interlexical homograph requirement of the naming task (but not the study by Jared and Szucs (2002) provides support lexical decision task) that participants pronounce for this hypothesis. These authors tested French– the target words. Homographs are typically English and English–French bilinguals in a word pronounced differently in the two languages, and naming task in which visually presented English therefore the pronunciation process is likely to words had to be read aloud. In one condition be frustrated by the co-activation of a different the English target words were preceded by a block phonological form. of French words, to be read aloud in French; in a second condition, they were not. The French– So far we have encountered two results that English bilinguals (with the target language suggest that a participant adapts flexibly to the English as their weaker L2) named the English requirements of the task: Language-specific and homograph targets more slowly than non- language-neutral lexical decision appear to incite homographic control words in both conditions, different processing strategies in the participants evidencing language-nonselective processing in and the different direction of the homograph both conditions. In contrast, the English–French effects in lexical decision and naming suggests bilinguals (for whom target English was the dom- the same. To account for such flexible behavior, inant language) named the English homographs a number of theories on monolingual (e.g., more slowly than the control words when they Grainger & Jacobs, 1996) and bilingual word were preceded by a French naming block but not recognition (e.g., Dijkstra & Van Heuven, when they were not preceded by a block of 2002; Green, 1998) have added some sort of

170 LANGUAGE AND COGNITION IN BILINGUALS AND MULTILINGUALS task-setting component onto the word recogni- up over three groups of interlexical homographs: tion system proper (see p. 180 for details). Per- homographs with a high frequency of occurrence formance in any given experiment is explained in in English but a low frequency of occurrence in terms of the joint operation of both of these Dutch (HFE-LFD); homographs that occur fre- components. Given the flexibility of task per- quently in Dutch but infrequently in English formance, in trying to resolve the dispute between (LFE-HFD); and homographs with a low the language-selective and language-nonselective frequency of occurrence in both languages processing views it is advisable to look for con- (LFE-LFD). verging evidence from a number of different tasks that are all assumed to tap word recognition to at As shown, substantial differences between the least some extent. If the effect of interest turns up homographs and controls occurred: Responding in all of these tasks, this would strengthen the was faster and hit rates were higher for unilingual conclusion the effect is a real one, attributable to controls than for homographs. Furthermore, as in some feature of the actual word recognition pro- the lexical decision studies, the size of the homo- cess and not to some component specific to a par- graph effects depended on the relative frequency ticular task (see the notion of “functional over- of the homographs in the two languages: The lap” introduced on p. 158 Grainger & Jacobs, effects were especially large when the task 1996; Jacobs & Grainger, 1994). required a “go” response to a homograph that was more frequent in the non-target language Dijkstra and his colleagues (1999, 2000a, than in the target language (LFE-HFD in the left 2000b) have adopted this approach by looking at panel; HFE-LFD in the right panel). This finding interlexical homograph processing in three other suggests that such a homograph activated its tasks: language decision, language go/no-go, and representation in the non-target language first, progressive demasking. As lexical decision, lan- upon which the participants concluded they had guage decision is a binary classification task, to refrain from responding (an incorrect “no where on each trial one of two possible responses go”). Alternatively, this false start delayed correct is required: The bilingual participants are pre- responding, thus resulting in a long response sented with words of both languages and have to time. Both language versions of the task thus decide for each word to which language it belongs suggest that the participants could not deactivate (see also Grainger & Dijkstra, 1992). In general, the non-target language, even though this would in go/no-go tasks the participants only respond to have improved performance dramatically. Not one type of stimuli and are asked to let a stimulus shown are the data of the progressive demasking pass without responding if it is of a different type. and language decision tasks, but these also In the language go/no-go paradigm used by produced differences between homographs and Dijkstra et al. (2000b) the bilingual participants controls. only had to respond when a presented word belonged to one of their languages, specified by Critics might object that under the particular the experimenter. As in the lexical decision studies circumstances of this study it would have been (and in Jared & Szucs’, 2002, word naming study), completely impossible not to process the “pass” in all cases, performance on interlexical homo- stimuli (the words in the non-target language). graphs was compared to that on unilingual Given that word recognition in fluent readers frequency-matched control words. is an automatic process, how then could the participants have prevented these stimuli to Figure 4.2 shows the data of the English (left) access and activate the word recognition system? and Dutch (right) go/no-go conditions as Should they have shut their eyes at the appropri- obtained by Dijkstra et al. (2000b) in a study that ate moments? But given that “go” and “pass” tested Dutch–English bilinguals. The upper part stimuli appeared at unpredictable moments, presents the mean response times for homographs how would they have known what the appropriate and controls; the lower part the corresponding moments were? The alternative assumption that percentages correct (hit rates). All data are split the task set might work like some miraculous

Mean response times (in ms) and percentages correct (top and bottom, respectively) for interlexical homographs and controls in the English (left) and Dutch (right) go/no-go conditions. Three groups of homographs were included: frequent in English but infrequent in Dutch (HFE-LFD); infrequent in English but frequent in Dutch (LFE-HFD); infrequent in both English and Dutch (LFE-LFD). Data from Dijkstra et al. (2000b).

172 LANGUAGE AND COGNITION IN BILINGUALS AND MULTILINGUALS drug that temporarily freezes the non-target Processing interlexical homographs language system (but not the target language sys- in context tem) into lethargy is equally implausible. The researchers might riposte that such considerations Two further studies that examined the processing would miss the exact point made by this study: of interlexical homographs in bilinguals provided That, apparently, the non-target language cannot relevant information on the way context may be deactivated or switched off at will. And yet, modulate word recognition in bilinguals (Elston- these critics have a point: The experiment is Güttler, Gunter, & Kotz, 2005; Paulmann, Elston- not unlike the hypothetical event of getting the Güttler, Gunter, & Kotz, 2006). In these studies instruction in the morning that this day you behavioral responses as well as event-related should only recognize your left-hand neighbors— potentials (ERPs) to the critical stimuli were col- the instructor trying to find out whether, if lected. Both studies employed a version of the later that day you were to bump into Dolan semantic priming paradigm introduced earlier and Glowearth, your right-hand neighbor, he would the task to be performed by the participants was indeed be a complete stranger to you. It thus lexical decision (in the ensuing discussion of this seems that in this particular experiment—using work, responses to the nonword stimuli will be the language go/no-go task—the question of ignored). In both studies, German–English whether or not co-activation of the non-target bilinguals were shown interlexical homographs language can be prevented at all has been (e.g., the words gift or bald, meaning “poison” and stretched ad absurdum. It is one thing for a “soon”, respectively, in German) that were language not to be activated mentally when it immediately followed by an L2 English target is not around in the environment; it is another word expressing the German meaning of the thing for it not to be when it is actually there, homograph (e.g., the words poison or soon). An imposing itself upon the viewer at unpredictable unrelated control condition was included in moments. which English targets were preceded by unrelated English primes, matched on frequency with the At the beginning of this section I raised doubts homographs. The participants were instructed to about the suitability of Beauvillain and perform L2 English lexical decisions on the tar- Grainger’s (1987) cross-language semantic prim- gets. In one of these studies (Paulmann et al., ing study to investigate the present question, and 2006) the homograph primes were presented as it was for similar reasons: There too non-target isolated words, whereas in the other (Elston- language materials occurred among the stimulus Güttler et al., 2005) they were the final words of a materials, in that specific case in the form of complete L2 English sentence (see Table 4.3 for primes to be read silently. Similarly, the above example materials). language-specific lexical decision experiments in which words from the non-target language were In both studies, prior to the actual data presented to be treated as nonwords, may be collection phase of the study the participants criticized for the same reason. The general point watched one and the same originally silent to make here is that the question of whether film, now supplied with either a German or bilingual word recognition can or cannot be English narrative, each spoken by a native speaker language selective can best be addressed in truly of the language in question. This manipulation unilingual experiments because, given the auto- was intended to create what the researchers matic nature of word recognition (in literate called a “global language context”, which might language users fluent in that language), stimuli bias the participants towards the language of from the non-target language will always activate the film fragment when performing the sub- the associated language subsystem. Therefore sequent lexical decision task. We have experiments in which the non-target language is encountered a similar context manipulation on physically absent provide a more conclusive test p. 169, where a block of trials in the non-target of the issue at stake. language was or was not provided prior to

4. COMPREHENSION PROCESSES 173 TABLE 4.3 further variable that the investigators included was “block”, comparing the priming effects in Example materials used by Elston-Güttler et al. the first half (block) of the experiment with those (2005) in the second half. This was done to find out whether a language expectancy created by the film Related The woman gave her friend a POISON fragment might change over the course of the Unrelated pretty GIFT POISON experiment: The participants in the conditions The woman gave her friend a where the language of the film fragment and of pretty SHELL the lexical decision part of the experiment differ may gradually become aware of this mismatch Related Joan used scissors to remove DAY and this might affect their performance. Unrelated the TAG DAY Joan used scissors to remove Of the experimental conditions included in the LABEL these experiments, the one (in Elston-Güttler et al., 2005) in which the homographic primes Related His father’s head was turning SOON were embedded in all-English sentences (see Unrelated BALD SOON Table 4.3) that, in turn, were preceded by the film His father’s head was turning spoken in English, provided the strongest test TAN of language-(non)selectivity: This all-English condition involves a strong bias towards the Related Jim had some problems with GOOD English meaning of the homograph and imple- Unrelated his GUT GOOD ments language use under natural circumstances Jim had some problems with rather faithfully, indeed much more faithfully his AXE than the isolated word experiments do. If, never- theless, a homograph effect occurs, showing The sentence-final words (in capitals) serve as primes for the that the homograph’s L1 (German) meaning was adjacent words, the targets (italized). In the related condition the processed as well, this would provide a strong targets are the English translations of the German meaning of the case that word recognition in naturalistic reading corresponding primes (thus, Gift, Tag, Bald, and gut mean settings is language-nonselective. “poison”, “day”, “soon”, and “good”, respectively, in German). In the unrelated condition the targets are unrelated to the As mentioned, in addition to behavioral corresponding primes. responses, in these experiments ERPs were meas- ured. The researchers focused on the N400 com- data collection on the target materials (Jared & ponent in the ERP signal, time-locked to the Szucs, 2002). target. The N400 to targets preceded by seman- tically related primes has been shown to be less These two manipulations—the global lan- negative than the N400 to targets following guage context manipulation and the comparison unrelated primes (e.g., Chwilla, Brown, & of a sentence context condition and an isolated Hagoort, 1995). In agreement with the general word condition—are of particular interest interpretation of the N400 as reflecting semantic because they have the potential to answer a ques- integration processes, it is thought that this differ- tion that may have bothered the reader up to this ence in negativity (that is, the “N400 effect”) point: In the prototypical experiment discussed so reflects the difference in the ease with which the far, the critical stimulus words, the interlexical meanings of pairs of related words on the one homographs or homophones, were presented in hand and pairs of unrelated words on the other isolation. But wouldn’t it have been more natural hand can be integrated. In the single word prime to embed them in context, which is how we study (Paulmann et al., 2006), targets preceded encounter these words in natural speech and by related homographic primes (prime: gift; tar- print? This context generally supports only a get; poison) were responded to faster than targets single reading of the homograph. In contrast, unrelated to their primes (prime: shell; target: when a homograph is presented in isolation, there is no context to constrain its meaning, a state of affairs that might foster language-nonselective activation. The more recent literature on within- language ambiguity resolution (p. 164 indeed suggests that context may constrain lexical acti- vation to the contextually appropriate word. A

174 LANGUAGE AND COGNITION IN BILINGUALS AND MULTILINGUALS poison). An N400 effect was also obtained: The meaning) was activated. It thus appears that N400 to related targets was significantly less nega- language-nonselective processing is not a ubiqui- tive than the N400 to unrelated targets. The lan- tous phenomenon. guage of the film fragment that preceded the lexical decision part of the experiment did not Two further investigations provided additional modulate these effects. As mentioned, Paulmann evidence to support this conclusion (Conklin & and colleagues also compared performance in the Mauner, 2003; Schwartz & Kroll, 2006). Whereas first and second halves of their experiment. The Elston-Güttler and her colleagues demonstrated semantic priming effects, both on latency and the that language context can restrict activation to N400, turned out to be equally large in both parts the target language, these two studies both show of the study. These data suggest an influence of that relative strength of the two languages is a the non-target language (L1 German) on L2 Eng- further factor to constrain nonselective access. lish and, thus, language-nonselective processing. Conklin and Mauner’s study closely resembled Elston-Güttler et al.’s (2005) investigation, the However, when the primes constituted the final major difference being that it was run in both the word in sentences (Elston-Güttler et al., 2005; participants’ dominant language (English) and Table 4.3), the results were crucially different their weaker language (French). When the sen- from those reported above. In this study, only the tences and targets were in L2 French, a relatedness German film context condition showed a pattern effect materialized, suggesting that the homo- consistent with language-nonselectivity, and this graphic prime’s contextually inappropriate (L1) only during the first half of the experiment. In the meaning was also activated. However, when the three remaining conditions (second half, German materials were presented in L1 English, the parti- film; first half, English film; second half, English cipants’ dominant language, no relatedness effect film), neither response latency nor the negativity occurred, which pointed to language-selective of the N400 differed between related and processing. unrelated targets. Figure 4.3 shows the ERPs for the German film condition, both for the first half Schwartz and Kroll (2006) employed a dif- of the experiment (Block 1; upper panel) and the ferent design and a different task, the read-aloud second half (Block 2; lower panel). (naming) task. In this study, testing Spanish– English bilinguals, the homographs appeared The N400 priming effect is the area between somewhere in the middle in all-English sentences the ERPs for related targets (solid lines) and (rather than occurring in sentence-final position; unrelated targets (dashed lines) around 400 ms see Table 4.4 for examples), and the homographs after target onset (that is, around position 0.4 (and their controls) themselves were the targets to on the x-axis). As can be seen, this effect only which the participants responded. The words in a occurred in the first half of the experiment. In the sentence were presented word-by-word by means second half the N400 amplitudes for related and of the rapid serial visual presentation (RSVP) unrelated targets did not differ from one another technique. The target word (homograph or con- (the brain signals overlap). The data for the trol) occurred in red (bold in Table 4.4), and English film conditions (not shown) are similar this was the signal for the participant to read it to those of the German/second half condition, aloud. The words preceding and following the suggesting the absence of a priming effect by the target were presented for 250 milliseconds each homograph’s German meaning and, therefore, and did not require an overt response. The target indicating language-selective processing. In other words were embedded in two types of sentences. words, in the experimental condition that mim- In “high constraint” sentences, the target’s icked natural language processing most faithfully prior context strongly biased the homograph’s (the condition that provided global language and intended meaning (its meaning in the language of sentence context information in the language the experiment, L2 English). In “low constraint” of the target), only the contextually appropriate sentences there was no strong bias towards the meaning of the homograph prime (the English homograph’s appropriate meaning. (Note that

4. COMPREHENSION PROCESSES 175 ERPs elicited by the critical targets. Solid and dashed lines represent the average voltage for related and unrelated targets, respectively, from 200 ms prior to target onset to 800 ms after target onset. The upper and lower panels show ERPs from the first and second half, respectively, of the experiment after viewing the German film version. Reprinted from Elston-Güttler et al. (2005). Copyright © 2005, with permission from Elsevier.

176 LANGUAGE AND COGNITION IN BILINGUALS AND MULTILINGUALS TABLE 4.4 Example materials used by Schwartz and Kroll (2006) From the beach we could see the shark’s fin pass through the water (H-HC) We were a little nervous as we watched the fin of the shark go through the water (H-LC) At the pond we could see a green frog jumping in and out of the water (C-HC) The school children watched the frog jump from one rock to another (C-LC) We vacuumed the rug and mopped the floor to help our parents (H-HC) When we went inside we could see the floor was covered with dirt (H-LC) We talked about the cows and chickens we saw when we visited the farm in New York state (C-HC) During dinner our guest told us about the farm he was going to buy (C-LC) I leave the bacon frying for a while in the pan to make it crisp (H-HC) We went to the store to buy a pan for our kitchen (H-LC) He gave me his number and I looked in my purse for a pen to write it down with (C-HC) She left the house to buy a pen for her son (C-LC) I sliced apples because I was going to bake a pie for my dinner guests (H-HC) I rushed around because I was worried that the pie would not be ready in time (H-LC) They did not tell the child that the delicious ham came from a pig because it was her favorite animal (C-HC) They did not walk down the path because there was a pig blocking the way (C-LC) Homographs (H) and control (C) words are printed in bold; HC = high-constraint sentence; LC = low-constraint sentence. Spanish fin, floor, pan, and pie mean “end”, “flower”, “bread”, and “foot”, respectively, in Spanish. the experiment did not involve a bias towards one more errors were made to homographs than to or the other meaning of the homograph.) Both control words, and this effect was especially large balanced bilinguals and bilinguals with L1 in the low-constraint condition. It thus seems that Spanish clearly stronger than L2 English took in these bilinguals the stronger language, Spanish, part in the experiment. was co-activated with weaker English, especially when the context did not strongly point at the If, while processing an interlexical homograph homograph’s targeted meaning. But what these in context, its contextually inappropriate meaning results, just as those of the other studies discussed (here, its Spanish meaning) is also temporarily in this section, demonstrate more convincingly is active, this should result in slower processing of a that under certain circumstances language- homograph than of its control stimulus (the selective processing occurs. homograph effect). Such an inhibitory effect may be expected to be especially large in the low- Conclusions constraint condition, in which the previous con- text does not strongly point at the targeted mean- The studies discussed in the two previous sections ing. For balanced bilinguals, in neither condition show a mixed pattern of results from which, was a homograph effect obtained: They responded nevertheless, some general conclusions can be equally fast, and made equally many errors, to drawn. Many of them suggest language- homographs and controls in both the high- and nonselective processing of homographs (and, low-constraint conditions. This same pattern held occasionally, homophones), and particularly for the unbalanced bilinguals in the analyses with many of those in which the homographs were response time as the dependent variable. So far, the presented in word lists. To draw from this the con- data suggest that context eliminates the homo- clusion that bilinguals generally fail to block out graph effect, even in the low-constraint condition, the non-target language would, however, be pre- suggesting that the system processes homographs mature because a number of studies have revealed as if they were unambiguous words. However, the factors that dampen the involvement of this error analyses revealed a clear homograph effect in language. One of them is the relative dominance unbalanced bilinguals, which interacted with the of the two languages: If the non-target language sentence constraint manipulation: Considerably

4. COMPREHENSION PROCESSES 177 is the weaker of a bilingual’s two languages it despite the fact that in its original form it does not is less likely to influence processing the target represent word meaning. This is surprising given language than when it is the stronger of the two. the fact that the only obvious difference between Furthermore, immersing the participants in the interlexical homographs and unilingual control non-target language prior to presenting the words is that the former but not the latter mean critical homograph stimuli—thereby plausibly two different things across the bilingual’s two increasing the activation level of the non-target languages. Apparently, the effect cannot be language—increases the chance that this language attributed exclusively to the processing of mean- permeates processing the target language. But ing. A number of other cross-language effects most noteworthy, when the homographs were obtained in studies on visual lexical access in bi- presented in circumstances that resembled natural linguals (to be presented further on pp. 181–183) language processing most faithfully—by embed- have also been successfully simulated with BIA. ding them in a sentence context and adding a Furthermore, the model has simulated the mono- global context in the same language as the sen- lingual behavioral data that McClelland and tence—the contextually inappropriate meaning Rumelhart modeled in their IA model, suggesting of the homograph did not exert any effect what- that BIA can be regarded a true extension of IA. soever on processing its appropriate meaning. In The left part of Figure 4.4 illustrates BIA’s later sections I will present converging evidence structure and processing assumptions. to suggest that there are limits to language- nonselective lexical processing. But first I will The model contains four levels of representa- present two models of bilingual word recognition tion units or “nodes”, which represent visual that were developed to account for the present letter features, letters, the orthographic forms of interlexical homograph effects and for a number whole words, and language information, respec- of other effects that suggest language- tively. The bilingual’s two languages share the nonselective lexical access. feature and letter nodes, whereas the word nodes are organized in language subsets, which are fully MODELS OF LANGUAGE-NONSELECTIVE connected between the languages (see below). The LEXICAL ACCESS layer of language nodes contains just two nodes, one for each language. The model is “interactive” The bilingual interactive activation in the sense that representations at one particular model and its successors level can activate and inhibit representations at adjacent higher and lower levels. Activation The interlexical homograph effects obtained in a comes about via excitatory connections (visual- number of the isolated word studies presented ized by means of an arrowhead); inhibition via above have been simulated successfully with the inhibitory connections (visualized by means of a bilingual interactive activation (BIA) model bullet head). In addition to the excitatory and developed by Dijkstra, Grainger, and Van inhibitory connections between representation Heuven (Dijkstra & Van Heuven, 1998; Grainger, levels, the model assumes inhibitory connections 1993; Grainger & Dijkstra, 1992; Van Heuven, between all orthographic word form nodes. As a Dijkstra, & Grainger, 1998). It is a connectionist consequence of this interconnectedness of nodes computational model of visual word recognition within the word level, the word nodes mutually in bilinguals and concerns an extended version of inhibit each other’s activation. This is called McClelland and Rumelhart’s (1981) interactive “lateral inhibition”. Importantly, inhibitory con- activation model (IA) of monolingual visual nections also exist between word nodes from word recognition. The model can simulate the different languages (e.g., Thomas & Van Heuven, above homograph effects in certain conditions 2005). A visual word presented to the system first activates the feature nodes that correspond to the input. These, in turn, feed activation into the layer

178 LANGUAGE AND COGNITION IN BILINGUALS AND MULTILINGUALS The bilingual interactive activation (BIA) model and the semantic, orthographic, and phonological interactive activation (SOPHIA) model of visual word recognition in bilinguals. Arrowheads represent excitatory connections. Bullet heads represent inhibitory connections. Adapted from Dijkstra and Van Heuven (1998) and Van Heuven and Dijkstra (2001). of letter nodes, exciting the nodes for letters that letter nodes they share with the input word contain the activated features and inhibiting sand, but this activation may be nullified by the the nodes for letters that do not contain these inhibitory effect of the mismatching remaining features. Similarly, activated letter nodes activate letters. Importantly, activated letter nodes will or inhibit word nodes, depending on the presence activate word nodes corresponding to words in or absence of the corresponding letters (in the both languages. So in a Dutch–English bilingual, corresponding positions) in the words repre- the letter nodes that are activated following the sented by the word nodes. For instance, when the presentation of sand will also excite, among English word sand is presented to the system, it others, the word nodes for Dutch words like zand will activate, via the feature level, the letter (“sand”) and mand (“basket”), which also contain nodes s, a, n, and d, which in their turn will acti- the majority of the activated letters. In their turn, vate the word node corresponding to target sand, activated word nodes transmit activation to the but also the nodes for similar words like hand, language node of the corresponding language, sane, and sank, which share most of their letters at which moment the latter starts to inhibit (in the same positions) with the actual stimulus. word nodes of the other language. All activated The word nodes for less similar words like salt, word nodes compete with one another in the wind, and sin will also be excited by the activated recognition process, inhibiting each other through

4. COMPREHENSION PROCESSES 179 lateral inhibition, until the activation level in one difference in the activation state of the recogni- of them will exceed a so-called “recognition tion system upon the presentation of a homo- threshold”. In the case of the example, this will graph on the one hand and a control word on the most likely be the node for sand. It is at that other hand. The consistent effects of the relative moment that the input sand will be recognized as frequency of a homograph’s readings in the two the word sand. languages are attributed to differences in the resting-level activation of the homograph’s two If and when the activation level of a word word form nodes: Because of its higher resting node reaches the recognition threshold is not only level of activation, the word node associated with determined by the match between stimulus and the homograph’s higher-frequency reading has word node in terms of shared letters but by a a head start in the recognition process and, number of other variables as well. One of them is therefore, its activation will reach the activation the number of activated word nodes that compete threshold relatively early, thus determining the with one another during the recognition process. response. A second is the level of activation in the word node when it is in its resting state. This “baseline” Dijkstra et al. (1999) considered a second level of activation is not the same for all word interpretation of the homograph effect, now nodes but differs as a function of when the assuming that interlexical homographs are not corresponding word was last used (“recency”) represented in two separate word nodes but share and—particularly important in the present one and the same word form node between context—the frequency with which it is used in the two languages, this node being connected natural language: Word nodes that represent differently to each of the two language nodes. frequent words are assumed to have a higher base- They, however, regarded this set-up implausible line level of activation in their resting state than because simulations of this model produced the word nodes for infrequent words. As a con- results that deviated from the corresponding sequence, when a frequent word is presented behavioral data: Whereas, as we have seen, the to the system, the recognition threshold of the behavioral data often show slower processing corresponding word node will be reached at a of interlexical homographs than controls, these relatively early moment in time. This account simulations always resulted in faster processing of explains why we occasionally misread or mishear the former (Thomas & Van Heuven, 2005). (and mispronounce) a less common word for a more common, similar word. But more to the It thus appears that the model can deal with point, it can also account for the fact that the size the out-of-context homograph effects. But what of the homograph effect depends on the relative about the fact that sound (dis)similarity of a frequency of a homograph’s two readings, as homograph’s two readings (Dijkstra et al., 1999) reported above. How then are homograph effects has also shown to affect processing? The BIA explained in terms of BIA? model does not contain phonological (sound) representations and is therefore not equipped to Dijkstra and Van Heuven (1998) assume two explain these effects, nor can it explain the more orthographic word node representations for inter- substantial additional evidence (to be discussed lexical homographs, one for each language. If a on pp. 183–191) that during visual word recogni- homograph is presented to the system, because of tion phonological memory nodes are activated as the perfect match of both its word nodes with the well. Similarly, what about the large likelihood visual input, both of them will become highly that in more natural language use meaning plays activated. In contrast, when a non-homographic a dominant role in the ambiguity resolution control word is presented there generally will be process? Yet, despite the truism that meaning just one word node that reaches a high level of assignment is the ultimate goal of any com- activation, namely the node that represents this prehension process, the BIA model does not control stimulus. The homograph effects obtained include nodes that represent meaning. Finally, in the out-of-context studies are attributed to this what about the fact that several experiments have

180 LANGUAGE AND COGNITION IN BILINGUALS AND MULTILINGUALS shown that sentence context can nullify the SOPHIA is not merely an extended version of homograph effect, suggesting that lexical access is BIA, but differs crucially from BIA in one not language-nonselective under all circum- important respect: Whereas BIA contained both stances? To account for the effects of phonology excitatory connections from each word node to and acknowledging the central role of meaning in the corresponding language node and inhibitory language processing, the builders of the BIA connections from a language node to all the word model have subsequently proposed an extended nodes of the other language, the latter connec- model that, in addition to the various types of tions have been removed in SOPHIA. Because nodes that represent orthography and language, these inhibitory connections served several also includes nodes that represent phonology and important functions in the original model (see semantics (Van Heuven & Dijkstra, 2001). The e.g., Dijkstra, 2005; Thomas & Van Heuven, model is elegantly coined SOPHIA, to stress that 2005), their removal from the model demands semantics, orthography, and phonology are all alternative solutions to explain a number of the included (the semantic, orthographic, and phono- effects that have been simulated with BIA. An logical interactive activation model). The right example is the language-switching effect; that is, part of Figure 4.4 shows the model’s components the finding that words preceded by words of the and general structure and the excitatory and same language are responded to faster than words inhibitory connections between the various types preceded by words from the other language. This of representations. effect (and especially the analogous switching effect in speech production) will receive more Although not shown, the model represents attention in Chapter 6. For now it suffices to say orthography at a more detailed level than BIA that the solution that is presently explored, in a does. Two additional layers are installed in model called BIA+, is to add a task/decision sys- between the original letter and word levels: a level tem (a control system) to SOPHIA’s word identi- of orthographic clusters and a level of ortho- fication system (Dijkstra & Van Heuven, 2002). graphic syllables (the component “sublexical Importantly in the present context, this new sys- orthography” in Figure 4.4 thus summarizes three tem is also thought to be responsible for the fact levels of units: one representing letters, a second that, as we have seen above, the interlexical representing letter clusters smaller than the syl- homograph effect varies with the specific lable unit, and a third representing syllables). demands of the task (e.g., language-neutral Phonology is represented in four analogous levels versus language-specific lexical decision; lexical of nodes that represent phonological units of dif- decision versus word naming) and with the com- ferent sizes. The processing assumptions are to a position of the stimulus set and changes therein large extent the same as those in BIA. Nodes at during the course of the experiment. In general, one particular level (e.g., orthographic syllables) Dijkstra and Van Heuven (2002) propose that the can activate (via excitatory connections, indicated task/decision system is sensitive to extra-linguistic by arrowheads) and inhibit (via inhibitory con- influences (such as participant expectancies) nections, indicated by bullet heads) representa- whereas the word-identification system is only tions at adjacent levels (e.g., orthographic words affected by linguistic sources of information such and orthographic clusters). Representations as lexical, syntactic, and semantic information. within a particular orthographic or phonological The above finding that homograph effects may component mutually inhibit one another via lat- disappear in sentence context is attributed to pro- eral inhibition. In contrast, orthographic units cesses operating within the word-identification activate the corresponding phonological units system. and vice versa. For instance, if the written word bird has been presented to the system and has These proposed changes in the system’s activated the orthographic word node for bird, the architecture may at first sight seem relatively corresponding phonological form /burd/ will minor ones but in fact involve no less than a land- become activated as well. slide shift in the authors’ views on what causes

4. COMPREHENSION PROCESSES 181 the homograph effects and their magnitude. section I will present evidence that makes it clear Given that the original BIA model only the ultimate model of bilingual visual word encompasses a word identification component recognition cannot do without a phonological (a mental lexicon), the effects of interlexical component, a conclusion that has led to the homography were all attributed to processes of development of SOPHIA. activation and inhibition in this lexicon and to its structural characteristics (threshold settings; Further support for the bilingual facilitatory versus inhibitory connections between interactive activation model nodes; dual nodes for interlexical homographs). This is what is sometimes referred to as a According to the above description of BIA’s “lexicon-internal” locus of control (e.g., Von functioning, complete form overlap between the Studnitz & Green, 1997, 2002a, 2002b). In BIA+, presented input and the information specified the effects are partly attributed to what Von in orthographic word nodes is not required Studnitz and Green have called “external con- for a word node to become activated upon the trol”, that is, the ability of a lexicon-external task/ presentation of a word input: The stimulus sand decision system to respond flexibly to a number not only activates the node <sand> but also the of variables such as the specific requirements of (same-language) nodes <hand> and <sank> as the task. A similar external control system is a well as the (other-language) nodes <zand> and central component in a further influential model <mand>, among others. The relevant evidence of how bilinguals’ lexical performance is modified has been gathered in studies that investigated the by the task context, namely, Green’s (1998) effect of a word’s neighborhood characteristics on inhibitory control model (see pp. 307–308 for visual word recognition. details). This is no coincidence because in devel- oping BIA+ its builders have been strongly influ- A word’s neighborhood is defined as the set of enced by Green’s model, so much so that the two words that share a substantial part of their models now strongly resemble one another. (orthographic and/or phonological) form with the target word (where “substantial” in most studies So far the evidence that lexical access is has been taken to mean three letters out of four language-nonselective, at least in out-of-context or four letters out of five). Monolingual neigh- studies, has come primarily from studies that borhood studies have shown that the time to examined how visually presented interlexical recognize a visually presented word is influenced homographs are processed. The skeptical reader by the number and frequency of orthographically might therefore not be convinced. After all, inter- similar words, its “neighbors” (Andrews, 1989; lexical homographs only exist for pairs of lan- Grainger, 1990; Grainger, O’Regan, Jacobs, & guages that employ the same alphabetic script Segui, 1989; Grainger & Segui, 1990). This and even within such language pairs the total finding implies that word recognition does not number of interlexical homographs one can come take place independently from the rest of the up with is usually quite small. Is it maybe the lexical system but that a written word activates a case the BIA model has been developed to whole set of orthographic word nodes in memory account for a rather idiosyncratic phenomenon, and not just its own representation. This conclu- one that has little to do with how we recognize sion and the associated methodology provided less exotic words? The homograph data are, how- bilingual researchers with an additional means to ever, not the only evidence to support the model. address the question whether word recognition A second cornerstone of the model concerns the is language-nonselective. Specifically, they posed phenomenon that the memory representations of the question of whether neighborhood effects so called “neighbors” of the stimulus word are extend beyond the language of input: Does a excited by the stimulus, irrespective of the lan- visually presented word activate orthographically guage to which they belong. This is the main topic similar words in both of a bilingual’s two lan- to be addressed in the next section. In a further guages? As we have seen, the BIA model assumes

182 LANGUAGE AND COGNITION IN BILINGUALS AND MULTILINGUALS this to be the case. The results from a set of cross- the “traitor” condition, intermediate in the language neighborhood studies provided the “neutral” condition, and shortest in the “patriot” ground for this assumption. I will illustrate condition. Van Heuven et al. (1998) obtained the logic of these studies in discussing two of similar results and showed that the effect them. The results of both have been successfully extends to other tasks, in their case, progressive simulated with the BIA model (Dijkstra & Van demasking. Heuven, 1998), suggesting the model’s archi- tecture and processing assumptions are correct. A further study made the important new point that a visually presented word does not have to Grainger and Dijkstra (1992) had French– be consciously perceived to trigger its word English bilinguals perform an English lexical node and the nodes representing its within- and decision task to three types of English target cross-language neighbors into activity. In a words (and to pseudowords), presented visually. French–English study, Bijeljac-Babic, Biardeau, The words in one of these groups, called and Grainger (1997) used the masked priming “patriots”, had many more neighbors in the methodology: Each word (or pseudoword) target target language English than in non-target was preceded by a word prime that was presented French. A second group consisted of English too briefly to be identified by the participants, and words with many more French than English the participants’ task was to make lexical neighbors (“traitors”). The third group contained decisions to the targets, which were clearly visible. English words with approximately the same The targets were all French words (or French-like number of neighbors in both languages pseudowords). The primes and targets were (“neutral”). Because word frequency is known orthographically similar or dissimilar and the to have a large effect on lexical decision time primes belonged to the same language as the (see above for an explanation of why this is so), targets or to the other language (e.g., French the three groups of English target words were prime, French target: soin–soif vs. huit–soif; matched on word frequency. The data, shown in English prime, French target: soil–soif vs. gray– Figure 4.5, demonstrated an influence of the soif ). Two groups of French–English bilinguals relative number of neighbors in the two lan- were tested, one group of balanced bilinguals guages: Lexical decision times were longest in and a second with L1 French stronger than Mean lexical decision times (in ms) as a function of relative number of neighbors in the bilingual’s two languages. Patriots are words that have more neighbors in the target language than in the non-target language. Neutral words have about equally many neighbors in both languages. Traitors have more neighbors in the non- target language. From Grainger and Dijkstra (1992). Copyright © 1992, with permission from Elsevier.

4. COMPREHENSION PROCESSES 183 L2 English. In addition, a monolingual French targets there is extra fierce lexical competition control group was tested. Inclusion of this group that needs to be resolved, delaying the response. enabled testing of the hypothesis that any effect of a prime in the non-target language to emerge PARALLEL PHONOLOGICAL ACTIVATION IN would be due to co-activation of lexical repre- TWO LANGUAGES sentations in the non-target language and not merely to the orthographic similarity of prime Evidence from same-alphabet and target supporting target processing in a bilingualism more peripheral, non-lexical, stage of processing: If the effects were non-lexical, the between- Many studies have provided evidence of simul- language effects should also occur for the French taneous activation of phonology in a bilingual’s monolinguals. However, if competition between two languages. They have done so in various lexical elements is the source of the effects, the ways and, between them, they show that parallel monolingual French group should process targets phonological activation not only occurs in same- following orthographically similar and ortho- alphabet bilingualism but also in forms of graphically dissimilar English primes equally bilingualism involving languages that use dif- fast. After all, these participants’ mental lexicon ferent alphabets. I will first discuss the evidence does not contain any representations of English from same-alphabet bilingualism (this section) words to cause competition. and proceed with a discussion of the cross- alphabet studies. Many of these studies were The results were clear-cut: In all three groups, inspired by a number of monolingual studies when prime and target were words from the that have shown that the moment a printed same language, French, targets preceded by word hits the visual word recognition system, its orthographically similar masked primes were phonological form is assembled automatically responded to more slowly than targets preceded by by applying the language’s spelling-to-sound (or dissimilar primes. In contrast, when prime and grapheme to phoneme) conversion rules (e.g., target were from different languages, this inhibi- Frost, 1998; Jared, Levy, & Rayner, 1999; Van tory effect of orthographically similar primes only Orden, 1987; Van Orden, Johnston, & Hale, showed up in the bilinguals. This combination of 1988). Interestingly, also when reading non- results suggests that the source of the masked alphabetic scripts such as syllabic Japanese and orthographic priming effect was indeed lexical. At even ideographic Chinese, the written words the same time it suggests that, in bilinguals, lexical automatically activate a sound code (Erickson, co-activation of neighbors extends to ortho- Mattingly, & Turvey, 1977; Perfetti, Liu, & Tan, graphically similar words in the non-target lan- 2005; Perfetti & Zhang, 1995; Tzeng, Hung, & guage. A further finding of interest was that the Wang, 1977) despite the fact that these scripts are between-language effect was larger for the bal- not based on grapheme–phoneme associations. anced bilinguals than for the group less proficient As with alphabetic scripts, syllabic scripts reflect in L2 English (both groups performing the task in the associated languages’ sound system: Their L1 French). This finding again suggests that the written symbols correspond to syllables, which level of co-activation in a non-target language are units of speech. The fact that these scripts also depends on the bilingual’s degree of command manifest automatic activation of phonology is over this language (see also pp. 174–176). Finally, therefore not so surprising. However, the relation these results demonstrate that even a masked between the printed characters of ideographic prime can trigger the word identification system Chinese and phonology is, albeit not totally into activity, because this is the only way it can be absent, much more opaque. The finding that explained that target processing is affected at all in that case the printed symbols also activate by the prior presentation of a prime. The effects can then be understood by assuming that in the case of orthographically similar primes and

184 LANGUAGE AND COGNITION IN BILINGUALS AND MULTILINGUALS phonological codes thus suggests that phono- as well, because up until the moment of lexical logical activation plays a central role in written access there is nothing that distinguishes the type language processing in general (see Frost & Katz, of nonwords used (obeying the phonological 1992, for a complete volume dedicated to the rela- rules of English) from words. In other words, tion between orthography and phonology in vari- these nonword data inform L2 word processing ous scripts). But more to the point in the present and show that non-native speakers of English context is that, in bilinguals, one and the same use the English sound-to-spelling rules to gener- visually presented word may lead to parallel ate phonological codes for English written spelling-to-sound coding in both languages, as words. the ensuing discussion will illustrate. Jared and Kroll (2001) took this research an Employing the lexical decision task, Nas important step further by posing the question of (1983) obtained early evidence that during visual whether bilinguals apply spelling-to-sound con- recognition of L2 words bilinguals assemble these version rules in both of their languages in parallel words’ phonological forms just as native speakers or whether, instead, only the set of spelling– of this language do. Although he was primarily sound correspondences of the target language interested in how bilinguals process words in is activated upon stimulus presentation. They their L2, part of the critical evidence was based examined this question using the word naming on an analysis of how they process nonwords. task, testing English–French and French–English It is these data that I will present here. The bilinguals. Three types of English stimulus words participants were Dutch–English bilinguals who were presented visually, the types differing from were presented with L2 English stimulus one another with respect to their neighborhood materials. The nonwords were all letter sequences characteristics. One type of words contained a that obeyed the phonological rules of English “word body” (that is, the medial vowel plus and looked like common English words. Half final consonants) that is always pronounced in of them were so-called “cross-language pseudo- the same way in English. Examples are the homophones”: When pronounced according to words drip, gulp, and gosh, containing the bodies the grapheme–phoneme conversion rules of L2 -ip, -ulp, and -osh. These words are said to have English, they sounded like real L1 Dutch words English “friends” only. The words of the second (e.g., the pseudohomophones snay and roak type had inconsistently pronounced bodies: sound like the Dutch words snee and rook, mean- bodies that, in English, are pronounced in more ing “incision” and “smoke”, respectively). The than one way. These words are said to have remaining nonwords were non-homophonic English “enemies”. Examples are steak and bead, controls (e.g., prusk or floon). Correct “no” where –eak and –ead can also be pronounced dif- decisions to cross-language pseudohomophones ferently in English, as in beak and head. Studies took longer than to non-homophonic nonwords, on English word naming have generally obtained and more errors were made to the former. These longer naming latencies for words with spelling findings suggest that the participants generated patterns that are inconsistently pronounced in the phonological forms of the presented non- English than for consistent words (Glushko, 1979; words, applying the L2 English grapheme– Jared, McRae, & Seidenberg, 1990), suggesting phoneme conversion rules, and that the that the different pronunciations compete during phonological forms of the pseudohomophones the naming process. The third type of words had thus generated contacted those of the similar- French enemies, containing bodies pronounced sounding Dutch words. This apparently created differently (from the English pronunciations) in a tendency to respond “yes” (mistaking the French (-ait as in the English bait versus -ait as in nonword for a word), which was either sup- the French fait). This type was included to find pressed, slowing down the response, or was given out whether longer naming latencies would also in to, resulting in an error. Of course, if the non- be obtained for English words with French words are phonologically coded, the words will be enemies. Such a result would suggest that also

4. COMPREHENSION PROCESSES 185 words from the non-target language take part in the French naming session. The effects were the naming competition, thus revealing that par- generally modest, and, in fact, most of the time allel application of both the English and French non-existent, when English naming preceded spelling-to-sound conversion rules takes place. French naming. Figure 4.6 illustrates this effect of French and English block order, collapsing Jared and Kroll (2001) included a couple of over a group of participants more fluent in variables that might modulate the involvement French and a group more fluent in English (only of the French competitors. For instance, they response time data are shown). The correspond- presented a block of French filler words, to be ing data of the English enemies condition are named in French, in between two blocks of shown for comparison. English naming trials. The question addressed with this manipulation was whether for competi- As shown, when the English words with tion by French enemies to occur in English word French enemies were named before the block of naming, French should have been recently acti- French words, their response times were equally vated. Jared and Szucs’ (2002) homograph study long as those for English words that do not have discussed before (see p. 169) shows that such French enemies. However, the naming of English “warming up” of the non-target language can words with French enemies was slowed down indeed increase the overall competition in the considerably when it followed a session of French bilingual language system. Another variable they naming. This suggests that recent activation of manipulated was the participants’ relative fluency the other language system is a prerequisite for in the two languages. language-nonselective, parallel spelling-to-sound coding to occur. The relative fluency of English In agreement with the earlier monolingual and French also modulated the effect of French neighborhood studies (e.g., Jared et al., 1999), enemies: The non-target language especially longer latencies and more errors occurred for interfered with naming (demonstrating parallel English words with English enemies than for sound-to-spelling coding) if it was the stronger English words containing spelling bodies that are language of the two. This finding converges with always pronounced the same way in English, but similar results presented above. Jared and Kroll the cross-language effects were more mixed. The (2001) concluded that spelling-to-sound con- data showed especially strong interference effects version rules of a bilingual’s two languages can of French enemies when English naming followed Mean naming times (in ms) for English words as a function of word type and presentation order: before or after a block of French trials (see text for details). Data from Jared and Kroll (2001).