Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore Trudne Zagadki Logiczne

Trudne Zagadki Logiczne

Published by gaharox734, 2021-01-17 14:25:14

Description: Sprawdź się próbując rozwiązać ciekawe zagadki logiczne. Zagadki rozwijają inteligencję oraz mózg. W dobie Covid-19 rozwiązywanie zagadek to idealny sposób na zabicie nudy. Nie czekaj zajrzyj na naszą strone i zacznij ćwiczyć umysł!
Łamigłówki to idealny sposób na poszerzenie naszej inteligencji oraz zasobu słownictwa. Łamigłówki takie ja ksazchy sudoku czy właśnie zagadki logiczne tworzą nowe połączenia neuronowe w naszym mózgu dzięki czemu stajemy się bardziej inteligentni. Koronawirus sprawił, że spędzamy czas w domu bezużytecznie ale nie musi tak być! Możesz rozwijać swój mózg, wyobraźnie oraz ćwiczyć koncentracje poprzez rozwiązywanie logicznych zagadek. Nasz blog zawiera wiele ciekawych zagadek które sprawią że będziesz co raz to bardziej madry, lepiej skupiony i powiększysz swoje IQ. Nie czekaj rozwijaj swoją logikePrzedmowa
Ten podręcznik zawiera spójny przegląd badań nad uczeniem się asocjacyjnym jako
podchodzi się do niego ze stanowiska naukowców o uzupełniających się zainter

Keywords: Zagadki,mózg,neurny,neurscience,health,mind,focus,strenght,enterteiment,computer,think,style,memory,game,love,covid19,coronavirus,news

Search

Read the Text Version

546 Kim Plunkett (auditory and visual) activate neighboring regions of the map, and Hebbian learning ensures that these regions form homogeneous crossmodal connections. This explana- tion can thus be regarded as a form of architectural/computational innateness (Elman et al., 1996). The TC is the result of associative learning within specified microcircuitry of the brain (probably somewhere in the infero‐temporal cortex) and suggests that no cognitive preprogramming is needed to explain this word‐learning constraint. Mutual Exclusivity New words from old An efficient strategy available to young word learners is to use their existing vocabu- lary to help them decipher the meaning of new words. An infant might see two objects, say, a shoe and a key, while knowing only the name for the shoe. Upon hearing the word “key,” she might decide that it refers to the key, ruling out the shoe as a potential referent, because she already knows what the shoe is called. This strategy is commonly known as mutual exclusivity (ME).3 Although the inference is not fool‐ proof – objects always have at least two names whether you are monolingual (Fido/ dog/mammal/animal) or multilingual (Fido/dog/chien/hund) – researchers generally agree that young children exploit ME to acquire new words (Halberda, 2003; Markman, 1989; Merriman & Bowman, 1989). Yet, the age when ME is first used, and the nature of the underlying mechanism that drives ME, remains a matter of dispute (cf. Mather & Plunkett, 2010, 2012). Markman, Wasow, and Hansen (2003) have shown that 15‐ to 17‐month‐old infants, upon hearing a novel word, will search for an alternative object if the only object they can see is name‐known. On the basis of these findings, they argue that ME is operative at 15 months and may contribute to the spurt in productive vocabulary often observed during the second half of the second year of life (e.g., Benedict, 1979; Goldfield & Reznick, 1992; Mervis & Bertrand, 1994). Similarly, Halberda (2003) has shown that 17‐month‐olds will look significantly longer at a name‐unknown object image than a name‐known one upon hearing a novel word. In contrast, Mervis and Bertrand (1994) have argued that the ability to select a name‐unknown object in response to a novel word in 16‐ to 20‐month‐olds only appears at the onset or after the vocabulary spurt has begun. And Merriman and Bowman (1989) argued that ME is not available to word learners until they are over 2 years, as illustrated by a series of object selection experiments. A review by Merriman, Marazita, and Jarvis (1995) sug- gested that only toddlers aged over 2.5 years will reliably demonstrate ME. Differing task demands may be responsible for these reported age differences in the use of ME. The failure of Mervis and Bertrand (1994) to find ME in prevo- cabulary spurt infants may be due to the processing demands of the task, as infants were presented with several name‐known objects. In Markman et al. (2003), infants were presented with only a single name‐known object. Halberda (2003) used looking time, argued to be a more sensitive measure of processing than the object selection measures used in Merriman and Bowman (1989) and Mervis and Bertrand (1994).

Associative Approaches to Lexical Development 547 In considering this earlier research, it is important to keep in mind that an ME response to a novel word, that is, preferential attention to the novel object, is not itself evidence that an association has been formed with the name‐unknown object. The mechanism underlying ME might guide attention toward the name‐unknown object, but only subsequently might this lead to learning. Studies such as Halberda (2003), Markman et al. (2003), and Merriman and Bowman (1989) demonstrate the ME response, but do not test for formation or retention of any association. Other studies, for example, Liitschwager and Markman (1994) and Mervis and Bertrand (1994) test comprehension, but their testing procedures are problematic. For example, the test objects might differ in familiarity, or the test might compare a trained word with a novel control word. In either case, confounds created by differences in stimulus nov- elty could influence responding. To determine whether ME makes a direct contribu- tion to vocabulary development, both a test of ME and a carefully controlled test of comprehension are required. Mather and Plunkett (2011) provided such a test by exposing 16‐month‐olds to two novel objects, first independently in two ME sce- narios, each involving a familiar object, one of the novel objects, and a novel label (Figure 21.4A), and subsequently in a test with only the two novel objects and either of the novel labels (Figure 21.4B). Infants looked systematically longer at the appro- priate novel object upon hearing one of the novel labels, indicating that an association had been formed between that object and the novel label during the initial ME sce- nario. Interestingly, if the novel label sounded similar to an existing word (e.g., “pok’ – similar to “clock”), infants failed to demonstrate a systematic looking preference for the appropriate novel object. These results indicate that prevocabulary spurt infants can acquire new word‐referent associations through ME and that label novelty is not all or nothing but graded. In addition to disagreement about the timing of the onset of ME, a second point of dispute is the nature and ontogenesis of the mechanism underlying ME. ME is commonly argued to involve some form of reasoning. Some theorists (Golinkoff, Mervis, & Hirsh‐Pasek, 1994; Mervis & Bertrand, 1994) argued that infants use a Novel‐Name‐Nameless‐Category principle that operates specifically on linguistic input and states that “novel terms map to previously unnamed objects.” Alternatively, ME could be the outcome of a more general cognitive process not specific to lan- guage. Markman’s (1989, 1992) ME principle leads infants to reject second names for already‐name‐known objects, as part of a general preference for one‐to‐one map- ping regularities. More recently, Halberda (2003) proposed that ME is driven by syllogistic reasoning of the form “A or B, not A, therefore B,” where A is the name‐ known object, and B is the name‐unknown object. What all these explanations have in common, aside from supposing an inferential process, is the assumption that the mechanism underlying ME operates on the basis of the lexical status of objects, that is, that they have a name. However, in most exper- iments on ME, the lexical status of the objects is confounded with their novelty. That is, the name‐known object is familiar, whereas the name‐unknown object is typically novel (e.g., Halberda, 2003; Mervis & Bertrand, 1994). This leaves open the possi- bility that infants displaying ME could be responding on the basis of object novelty, rather than lexical status. In order to evaluate these alternatives, Mather and Plunkett (2012) presented 22‐ month‐olds with a choice of one name‐known object and two name‐unknown objects,

548 Kim Plunkett (A) Training: Look meb Look pok (B) Test: Look meb/pok Figure 21.4  Training and test used by Mather and Plunkett (2011) for 16‐month‐old infants. During training (A), the 16‐month‐old infants were presented with two ME scenarios, and the novel labels meb and pok were introduced. At test (B), infants were shown both novel objects and either of the novel labels. Reproduced with permission from Cambridge University Press. of which one was novel at test, whereas the other was previously familiarized to the infants. Upon hearing a novel label, the infants increased their attention to the novel object, but not the preexposed object, despite the fact that both the novel and preex- posed objects were unfamiliar kinds for which the infants did not have names. This finding is compelling evidence that the ME response is sensitive to object novelty, and that nameability alone cannot account for infants’ behavior. However, it was not clear that the novel label directly guided attention to the novel object. The novel label may have only prompted infants to reject the nameable object as a referent, with the novel object subsequently favored as an outcome of habituation to the preexposed object. In a follow‐up experiment identical to the first, apart from the omission of the name- able object, infants looked longer at the novel object than the preexposed object upon hearing the novel label. This provided confirmation that a novel label can directly guide attention toward a novel object even when the competing object is name‐ unknown. Mather and Plunkett (2012) concluded that object novelty was both necessary and sufficient for the ME response. A novelty‐based mechanism Given the contribution of novelty to ME, it is not unreasonable to conjecture that “attention to novelty” might play a role in the development of the response. Attention to the novelty of words and objects could lead to the acquisition of ME: As an infant

Associative Approaches to Lexical Development 549 becomes familiar with the words of her language, an appreciation of the correlation between the familiarity of words and the familiarity of the objects to which they refer might emerge. If a word is familiar, then it probably refers to a familiar object, but if a word is novel to an infant, it will probably refer to a novel object. If infants can detect this correlation, they could learn ME. This learning process would require infants to abstract a general correlation between a property of words and a property of objects, namely their novelty. This is potentially a difficult task, as any given word will be heard in the presence of many objects, so the infant will need to attend to the correct referent. One possibility is that speakers draw attention to the object to which a word refers, for example by looking or pointing at the object. However, this involves explicit teaching, and this information might not always be available. Alternatively, infants might be able to detect the correlation without explicit teaching. If infants already have some vocabulary, they will attend to familiar objects when hearing familiar words, because they know the referents of the words. Conversely, infants might attend to novel objects in the presence of novel words because they have a general tendency to attend to novelty; thus, in the absence of a comprehended word that directs attention elsewhere, the infant may persist in attending to a novel object. If the infant associates the novel word with the novel object based on their temporal contiguity,4 this information could be used eventually to abstract an ME principle. Evidence of behavior similar to the ME response at the earliest stages of vocabulary development provides additional support for a novelty‐based mechanism. Mather and Plunkett (2010) presented 10‐month‐olds with pairs of familiar and novel objects and different labeling phrases. Prior to naming, the infants preferred to look at the novel objects; yet their looking behavior diverged upon hearing different phrases. When the infants heard novel labels, their interest in the novel object was maintained and enhanced; yet when they heard familiar labels or a control phrase (e.g., “look”), they lost interest in the novel object. The authors concluded that as young as 10 months of age, novel labels have a specific role in supporting attention to novel objects. Importantly, a further experiment suggested that the 10‐month‐olds did not compre- hend the names of the familiar objects. Hence, their responses appeared to be guided by novelty, rather than object nameability. Associative learning is readily applied to ME through the process known as blocking (Kamin, 1969). Blocking involves “the disruption in conditioning with one element of a compound when it is accompanied by another element that has already been paired with the unconditioned stimulus” (Pearce, 2008, p.53). Note that infants implicitly name the objects with which they are familiar (Mani & Plunkett, 2010). Hence, the implicit name can block the formation of an associative link between the familiar object and a novel label, whereas no such blocking occurs for novel objects. Blocking itself is readily explained by the Rescorla and Wagner (1972) theory of learning which is itself a theory for measuring an animal’s degree of surprise on encountering a stimulus in a given context. Alternatively, latent inhibition (Lubow, 1973) could account for the ME response: Latent inhibition is the “reduction in effectiveness of pairing a conditioned stimulus with an unconditioned stimulus, as a result of prior exposure to the conditioned stimulus” (Pearce, 2008, p. 76). In the context of ME, the familiarity of the name‐known object reduces the associability of the object as a result of latent inhibition. A basic learning mechanism of this kind could guide the infant toward selectively associating a novel label with a novel object

550 Kim Plunkett even without the need to retrieve the names of familiar objects to exclude them as potential referents. There are several implications of these associative accounts of ME. First, familiar words should be better blockers than recently learned words. This prediction follows directly from Rescorla and Wagner’s (1972) theory in which the strength of association between a familiar label and familiar object (an index of surprise) attenuates any change in strength of association formed between the novel label and familiar object. Second, any implicit label generated in the ME situation ought to be available, together with the novel label, to form associations with the novel object. There is no direct evidence, for or against, in the infant literature to evaluate these predictions. However, Mather and Plunkett’s (2012) finding that the relative familiarity of objects influences the strength of an ME response points to the role of surprise – a well‐ established associative construct. Conclusions A compelling strategy in evaluating associative approaches to language development is to compare the human potential for language acquisition with that of other great apes, in particular bonobos and chimpanzees. It is commonly agreed that our closest relatives are not well prepared for language acquisition. The obvious and probably inescapable conclusion is that humans have some special genetic endowment that sup- ports the construction of specialized microcircuitry in the brain without which lan- guage acquisition is difficult, if not impossible. From an associative perspective, this raises a perplexing paradox: Given the powerful associative processes at work in the brains of the great apes (or corvids, or any number of species, for that matter), why should they be inept at language if this capacity is based on associative learning processes? Only two solutions seem valid: 1 Language must rely on processes of acquisition that are nonassociative and lacking in other species. 2 There are associative processes at work in humans that we do not find in other species. Since virtually the whole of contemporary associative learning theory is based on work with animals, the likelihood of finding an answer based on associative learning skills that are uniquely human seems remote.5 The demonstration in this chapter of word‐learning constraints entirely reliant on learning processes that are exploited in nonhuman brains leaves us with the perplexing problem as to why nonhuman brains cannot acquire a human‐like lexical system. One solution to this problem is that “the ability of associative processes to implement cog- nition… could arise from the constraints imposed by a particular processing architecture” (Dickinson, 2012, p. 2739). General associative learning processes can then operate within the confines of a dedicated architecture to produce a specialized processing mechanism. A similar solution is offered by Elman et al. (1996) in which architectural/computational considerations, rather than innate representations,

Associative Approaches to Lexical Development 551 underlie the acquisition of knowledge. On this account, general associative learning processes implemented in connectionist networks with prespecified architectures construct mental representations for language processing. The uniquely human capacity to construct a lexical system so rapidly in early childhood need appeal not to built‐in cognitive constraints but rather the unique configuration of initially inno- cent neural systems guided in their growth by general learning processes in a highly structured environment. As I suggested at the start of this chapter, cognitivists might object that this type of account of word‐learning constraints is merely implementa- tional. If this objection turns out to be correct, then at least we have seen how an associative approach can provide a closer view of the mechanisms at work, rather than just giving them a name. Notes 1 At the risk of overlaboring the point, these featural representations of objects and words are theoretical entities in search of a mechanism: Further machinery will be needed in order to account for their emergence. 2 “Stimulus generalization: Responding to a test stimulus as a result of training with another stimulus” (Pearce, 2008, p. 37). 3 There is some confusion in the literature in the use of the terms mutual exclusivity and fast mapping. In this chapter, I use the term fast mapping in a neutral manner to indicate that older infants can quickly form an association between a label and an object (or category of objects). 4 This mechanism might also constitute the basis of recent reports of word learning via cross‐ situational statistics, for example, Smith and Yu (2008). 5 Of course, one might also contemplate the possibility that animal minds are nonassociative, as do many contemporary scholars of comparative cognition (see Heyes, 2012, for further discussion). References Barrett, M., Harris, M., & Chasin, J. (1991). Early lexical development and maternal speech: A comparison of children’s initial and subsequent uses of words. Journal of Child Language, 18, 21–40. Benedict, H. (1979). Early lexical development: Comprehension and production. Journal of Child Language, 6, 183–200. Bergelson, E., & Swingley, D. (2012). At 6–9 months, human infants know the meanings of many common nouns. Proceedings of the National Academy of Sciences of the United States of America, 109, 3253–3258. Chomsky, N. (1959). Review of Skinner’s Verbal Behaviour. Language, 35, 26–58. Dickinson, A. (1980). Contemporary animal learning theory. Cambridge, UK: Cambridge University Press. Dickinson, A. (2012). Associative learning and animal cognition. Philosophical Transactions of the Royal Society of London B Biological Sciences, 367, 2733–2742. Elman, J. L., Bates, E. A., Johnson, M. H., Karmiloff‐Smith, A., Parisi, D., & Plunkett, K. (1996). Rethinking innateness: A connectionist perspective on development. Cambridge, MA: MIT Press.

552 Kim Plunkett Fodor, J., & Pylyshyn, Z. (1988). Connectionism and cognitive architecture: A critical analysis. Cognition, 28, 3–71. Goldfield, B., & Reznick, J. S. (1992). Rapid change in lexical development in comprehension and production. Developmental Psychology, 28, 406–413. Golinkoff, R. M., Mervis, C. B., & Hirsh‐Pasek, K. (1994). Early object labels: The case for a developmental lexical principles framework. Journal of Child Language, 21, 125–155. Halberda, J. (2003). The development of a word‐learning strategy. Cognition, 87, B23–B24. Hebb, D. (1949). The organization of behavior: A neuropsychological theory. New York, NY: John Wiley & Sons. Heyes, C. (2012). Simple minds: a qualified defence of associative learning. Philosophical Transactions of the Royal Society B: Biological Sciences, 367, 2695–2703. Holland, P. C. (1990). Event representation in Pavlovian conditioning: image and action. Cognition, 37, 105–131. Horst, J. S., & Samuelson, L. K. (2008). Fast mapping but poor retention by 24‐month‐old infants. Infancy, 13, 128–157. Jusczyk, P., & Aslin, R. N. (1995). Infant’s detection of sound patterns of words in fluent speech. Cognitive Psychology, 29, 1–23. Kamin, L. J. (1969). Selective association and conditioning. In N. J. Mackintosh & W. K. Honig (Eds.), Fundamental issues in associative learning (pp. 42–64). Halifax, Canada: Dalhousie University Press. Kellman, P., Spelke, E., & Short, K. (1986). Infant perception of object unity from translatory motion in depth and vertical translation. Child Development, 57, 72–86. Kohonen, T. (1984). Self‐organization and associative memory. Berlin: Springer. Kruschke, J. K. (1992). Alcove: An exemplar‐based connectionist model of category learning. Psychological Review, 99, 22–44. Kuczaj, S. I., & Barrett, M. (1986). The development of word meaning: Progress in cognitive development research. New York, NY: Springer. Lachter, J., & Bever, T. G. (1988). The relation between linguistic structure and associative theories of language learning–a constructive critique of some connectionist learning models. Cognition, 28, 195–247. Landau, B., Smith, L. B., & Jones, S. (1988). The importance of shape in early lexical learning. Cognitive Development, 3, 299–321. Liitschwager, J. C., & Markman, E. M. (1994). Sixteen‐ and 24 month‐olds’ use of mutual exclusivity as a default assumption in second‐label learning. Developmental Psychology, 30, 955–968. Lubow, R. E. (1973). Latent inhibition. Psychological Bulletin, 79, 398–407. Mani, N., & Plunkett, K. (2010). In the infant’s mind’s ear: Evidence for implicit naming in 18‐month‐olds. Psychological Science, 21, 908–913. Markman, E. M. (1989). Categorization and naming in children: Problems of induction. Cambridge, MA: MIT Press. Markman, E. M. (1990). Constraints children place on word meanings. Cognitive Science, 14, 57–77. Markman, E. M. (1992). Constraints on word learning: speculations about their nature, origins and domain specificity. In M. R. Gunnar & M. P. Maratsos (Eds.), Modularity and c­onstraints in language and cognition: The Minnesota symposium on child psychology (pp. 59–101). Hillsdale, NJ: Erlbaum. Markman, E. M., & Hutchinson, J. (1984). Children’s sensitivity to constraints on word meaning: Taxonomic versus thematic relations. Cognitive Psychology, 16, 1–27. Markman, E. M., Wasow, J. L., & Hansen, M. B. (2003). Use of the mutual exclusivity assump- tion by young word learners. Cognitive Psychology, 47, 241–275.

Associative Approaches to Lexical Development 553 Mather, E., & Plunkett, K. (2010). Novel labels support 10 month‐olds’ attention to novel objects. Journal of Experimental Child Psychology, 105, 232–242. Mather, E., & Plunkett, K. (2011). Mutual exclusivity and phonological novelty constrain word learning at 16 months. Journal of Child Language, 38, 933–950. Mather, E., & Plunkett, K. (2012). The role of novelty in early word learning. Cognitive Science, 36, 1157–1177. Mayor, J., & Plunkett, K. (2010). A neuro‐computational model of taxonomic responding and fast mapping in early word learning. Psychological Review, 117, 1–31. Meints, K., Plunkett, K., & Harris, P. L. (1999). When does an ostrich become a bird: The role of prototypes in early word comprehension. Developmental Psychology, 35, 1072–1078. Meints, K., Plunkett, K., & Harris, P. L. (2002). What is “on” and “under” for 15‐, 18‐ and 24‐month‐olds? Typicality effects in early comprehension of spatial prepositions. British Journal of Developmental Psychology, 20, 113–130. Meints, K., Plunkett, K., & Harris, P. L. (2008). Eating apples and houseplants: Typicality constraints on thematic roles in early verb learning. Language and Cognitive Processes, 23, 434–463. Meints, K., Plunkett, K., Harris, P. L., & Dimmock, D. (2004). The cow on the high: Effects of background context on early naming. Cognitive Development, 19, 275–290. Merriman, W. E., & Bowman, L. L. (1989). The mutual exclusivity bias in children’s word learning. Monographs of the Society for Research in Child Development, 54, 1–132. Merriman, W. E., Marazita, J., & Jarvis, L. (1995). Children’s disposition to map new words onto new referents. In M. Tomasello & W. E. Merriman (Eds.), Beyond names for things: Young children’s acquisition of verbs (pp. 147–183). Hillsdale, NJ: Erlbaum. Mervis, C. B., & Bertrand, J. (1994). Acquisition of the novel name nameless category (N3C) principle. Child Development, 65, 1646–1662. Moore, J. W. (1972). Stimulus control: studies of auditory generalization in the rabbit. In  A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 206–320). New York, NY: Appleton‐Century‐Crofts. Pearce, J. M. (2008). Animal learning & cognition (3rd ed.). Hove, UK: Psychology Press. Pinker, S., & Prince, A. (1988). On language and connectionism: Analysis of a parallel distrib- uted processing model of language acquisition. Cognition, 29, 73–193. Plunkett, K., Sinha, C., Møller, M. F., & Strandsby, O. (1992). Symbol grounding or the emer- gence of symbols? Vocabulary growth in children and a connectionist net. Connection Science, 4, 293–312. Posner, M., & Keele, S. (1968). On the genesis of abstract ideas. Journal of Experimental Psychology, 77, 353–363. Quine, W. V. O. (1960). Word and object. Cambridge, MA: MIT Press. Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and non‐reinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning (Vol. II, pp. 64–99). New York, NY: Appleton‐Century‐Crofts. Rumelhart, D. E., & McClelland, J. L. (1986). On learning the past tense of English verbs. In J. L. McClelland & D. E. Rumelhart (Eds.), Parallel distributed processing: explorations in the microstructure of cognition. Cambridge, MA: MIT Press. Skinner, B. F. (1957). Verbal behavior. New York, NY: AppletonCentury‐Crofts. Smith, L., & Yu, C. (2008). Infants rapidly learn word‐referent mappings via cross‐situational statistics. Cognition, 106, 1558–1568. Tincoff, R., & Jusczyk, P. (1999). Some beginnings of word comprehension in 6‐month‐olds. Psychological Science, 10, 172. Treves, A., & Rolls, E. (1994). Computational analysis of the role of the hippocampus in memory. Hippocampus, 4, 374–391. Waxman, S., & Markow, D. B. (1995). Words as invitations to form categories: Evidence from 12 to 13‐month‐old infants. Cognitive Psychology, 29, 257–302.

22 Neuroscience of Value‐Guided Choice Gerhard Jocham, Erie Boorman, and Tim Behrens Introduction When studying the neural mechanisms of choice, one of the first obvious questions that comes to mind is why one should make a decision at all. Decisions can be quite effortful, so there needs to be some value in making a choice. In other words, differ­ ent courses of action ought to have different values; otherwise choosing one over the other would have no obvious advantage. Accordingly, frameworks of decision‐making often start off by assuming that we need to have some representation of the available set of options, assign value to them, and then choose between them on the basis of these values. Finally, after observing the outcome of our choice, we can use this result to update our estimate of this option’s value: We can learn from the outcome (Rangel, Camerer, & Montague, 2008). In the following chapter, we will first describe some of the various representations of value that have been found in the brain. It will become e­ vident that value correlates are very widespread in the brain; however, as we discuss, not all of them bear a direct relationship to choice. Next, we will discuss which value representations might constitute signatures of a decision process, and what such a decision mechanism might look like. We will then highlight that different brain regions come to the fore depending on a number of factors. Particular attention will be given to different frames of reference, such as deciding between stimuli as opposed to deciding between motor actions. Finally, in a second section, we will con­ sider behavioral adaptation from a currently preferred default position and strategic decision‐making. Ubiquity of Value Representations From functional magnetic resonance imaging (fMRI) and single‐unit recording studies in animals, we know today that representations of value can be found in many regions throughout the brain. They have been found in frontal and parietal association cortices, the basal ganglia, but even in early sensory and motor cortical areas. However, The Wiley Handbook on the Cognitive Neuroscience of Learning, First Edition. Edited by Robin A. Murphy and Robert C. Honey. © 2016 John Wiley & Sons, Ltd. Published 2016 by John Wiley & Sons, Ltd.

Neuroscience of Value‐Guided Choice 555 finding a value correlate in one brain region does not necessarily imply that this area is also involved in choice. It is important to consider exactly what kind of value r­epresentation is found. A correlation with the overall value of available options is not of much use for deciding between options, but can rather serve motivational and/or attentional purposes. Action value representations, a correlation with the value of specific motor actions, are more likely an input to a decision process, or alternatively may reflect motor preparation. By contrast, a correlation with the value of the chosen option, a chosen value signal, is more intimately linked to choice. If neural activity correlates with the value of a selected option, independent of whether the subject chose left or right, and independent of the trial’s overall value, this pro­ vides a hint that neural activity in this area relates to a choice between options, rather than b­ etween preparing a particular motor command. We will come back to these issues in the next section. Of the areas studied so far, the frontal lobe, in particular the orbitofrontal cortex (OFC), arguably is the part of the brain that first attracted scientific interest. It has been known for decades that primates with lesions to the OFC are severely compro­ mised at adjusting their behavior when the value of options is suddenly changed. Behavioral flexibility is often probed using reversal learning, reinforcer devaluation, or extinction (Chapter  16). In reversal learning, stimulus–outcome contingencies are suddenly changed, such that a rewarded option becomes incorrect, and a previously nonrewarded option becomes the correct option. Reinforcer devaluation tests degrade the value of a reward (usually food or liquid) either by feeding to satiety or by pairing the reward with malaise. Extinction measures the reduction in instrumental respond­ ing when a previously rewarded response is no longer reinforced. Primates with lesions to the OFC are impaired at all of these tests: Extinction of instrumental responding is slowed (Butter, Mishkin, & Rosvold, 1963), animals keep responding for a devalued food (Baxter, Parker, Lindner, Izquierdo, & Murray, 2000; Pickens, Saddoris, Gallagher, & Holland, 2005; Pickens et al., 2003), and they take longer to relearn stimulus–outcome contingencies following reversals (Dias, Robbins, & Roberts, 1996; Iversen & Mishkin, 1970; Izquierdo, Suda, & Murray, 2004; Jones & Mishkin, 1972; Mishkin, 1964). Similar deficits in reversal learning, have been found in humans with lesions to the OFC and adjacent ventromedial prefrontal cortex. When stimulus– outcome contingencies are reversed, those patients make more errors (selecting the previously correct option) than controls or patients with dorsolateral prefrontal lesions (Fellows & Farah, 2003; Hornak et al., 2004). Patients with OFC lesions seem to have general difficulty in using option values to make beneficial choices, despite ­otherwise entirely intact cognitive abilities (Bechara, Damasio, Damasio, & Anderson, 1994; Bechara, Tranel, & Damasio, 2000; Tsuchida, Doll, & Fellows, 2010). These effects of lesions to OFC correspond well with what is known about its responses to reward and reward‐predicting stimuli. Neural activity in the OFC appears to reflect the reward value of stimuli across diverse modalities. Human neuroimaging studies have found OFC activity to correlate with the pleasantness of music (Blood, Zatorre, Bermudez, & Evans, 1999) or odors (Anderson et al., 2003), monetary or erotic rewards (Sescousse, Redoute, & Dreher, 2010), and the subjective desirability of food (Plassmann, O’Doherty, & Rangel, 2007). When a food reward is no longer valued (by feeding to satiety), OFC responses to this food, or stimuli that predict it, are diminished (Kringelbach, O’Doherty, Rolls, & Andrews, 2003; O’Doherty et al.,

556 Gerhard Jocham, Erie Boorman, and Tim Behrens 2000; Rolls, Sienkiewicz, & Yaxley, 1989; Figure 22.1A). Furthermore, OFC n­ eurons respond not only to reward itself, but also to stimuli that predict it, and their responses rapidly adjust when cue–outcome associations are changed (Roesch & Olson, 2004; Schoenbaum, Chiba, & Gallagher, 1999; Thorpe, Rolls, & Maddison, 1983; Tremblay & Schultz, 1999). Two features about the reward‐predictive properties of OFC n­ eurons are particularly important. First, they encode the expected value of stimuli, irrespective of their physical or spatial properties, or motor responses to the stimuli (Padoa‐Schioppa & Assad, 2006; Tremblay & Schultz, 1999). In other words, an OFC neuron might respond similarly to two visual stimuli that look very different and are presented in opposite spatial positions, but predict the same outcome. Second, and perhaps more importantly, their reward‐predictive responses are relative, or con­ text‐dependent. Imagine a monkey that prefers raisins over banana, but banana over apples. An OFC neuron might only display little responding to a cue predicting banana, when the monkey is in a situation where the rewards are the best‐liked raisins and bananas. The same neuron might display a pronounced response to the same banana‐predicting cue when the alternative reward is the least‐preferred apple. Thus, the neuron reflects the primate’s relative reward preferences (Tremblay & Schultz, 1999). It has to be noted, however, that whether OFC value representations follow an absolute or relative code may depend on the specific features of the task at hand, in particular whether trials of a given type are presented in blocks or in an interleaved fashion (Padoa‐Schioppa & Assad, 2006). Second, OFC neurons adjust the range of their firing to the range of available rewards. An OFC neuron will respond with a strong increase in firing rate to a stimulus that predicts two units of reward when this is the highest reward currently available, but show only a modest increase to the same reward when the highest reward amount is 10 units of reward (Padoa‐Schioppa, 2009; Figure 22.1B). Such flexible, context‐dependent representations are extremely important for everyday life situations. A price difference of 5 euros makes no difference (A) (B) % change in BOLD signal 9 Average ring rate (sp/s)1.0 0 6 –1.0 3 ΔV=2 2 00 ΔV=3 ΔV=4 1 –2 0 10 20 30 40 50 ΔV=6 0 Peristimulus time (secs) Δ V = 10 –1 2 4 6 8 10 Pleasantness ratings Offer value (uV) Figure 22.1  (A) Activity in OFC correlates with the subjective pleasantness of a liquid food (left). The BOLD response (right) is diminished with decreasing pleasantness ratings that ocurred when the food was devalued by feeding to satiety. (B) Orbitofrontal neuronal responses to reward cues adapt to the range of available rewards. The figure shows neuronal responses (spikes/s) for a range of offer values (in ascending order on the the x‐axis) for different value ranges (ΔV). Reproduced with permission from Kringelbach et al. (2003), Cerebral Cortex (A) and Padoa‐Schioppa (2009).

Neuroscience of Value‐Guided Choice 557 when choosing betweeen two cars that may cost tens of thousands of euros. However, the same 5 euros may be a crucial determinant when deciding between two dishes in a restaurant. Firing rates of cortical neurons typically do not exceed 60 Hz, so there is only a limited dynamic range. If neurons could not adjust to the current context, they would have to represent the entire range of possible values from zero to, say one ­million Euro within 1–60 Hz. Making a food choice in a restaurant would become impossible! A further interesting feature about OFC neurons is that they appear to code subjective, rather than objective, values. In a now famous experiment, Padoa‐ Schioppa and Assad (2006) offered monkeys choices between different types of juice rewards. They determined the subjective value of each juice by making the animals select between two juices of varying quantities and determining an indifference point. For example, an animal might display a strong preference for apple juice over water, when offered one drop of each. However, it might be equally likely to select either of the two when offered a choice between four drops of water and one drop of apple juice. The authors found that OFC neurons coded the subjective value of the options, rather than the reward quantity. In other words, an OFC neuron would show the same response to one drop of apple juice as to four drops of water. It has to be noted that the value‐coding properties of these cells reflected diverse features; in particular, the authors found subsets of neurons that represented “offer value” (the sum of the subjective value of available options), “chosen value” (the subjective value of the option the monkey would end up choosing), or simply the identity of the chosen taste. The latter finding also highlights the fact that not only does OFC represent abstract values independent of stimulus properties, but it is a highly polymodal association cortex that receives inputs from all five senses (Carmichael & Price, 1995). Accordingly, OFC neurons signal sensory properties of both rewards but also of stimuli independent of their association with reward (Critchley Rolls, 1996; Rolls & Baylis, 1994). Taken together, OFC value representations display a number of prop­ erties that make them ideally suited for guiding choices based on specific expected outcomes, and lesions in this region have a profound impact on these kinds of v­ alue‐ guided choices. An important distinction that needs to be highlighted is that between medial and lateral sectors of OFC. Anatomical studies provide evidence for two distinct networks, a lateral orbitofrontal network (LOFC) and a medial orbital/ventromedial prefrontal cortex (mOFC/vmPFC) network. Regions within both the LOFC and the mOFC/ vmPFC network are heavily interconnected, but connections between the two n­ etworks are relatively sparse (Öngür & Price, 2000). These different connectivity patterns are mirrored by differences in functional specialization. While lesions to mOFC/vmPFC impair reward‐guided choice, LOFC seems to be critical for learning from the outcomes of these choices (Noonan et al., 2010; Rushworth, Noonan, Boorman, Walton, & Behrens, 2011). LOFC is particularly important for a certain kind of learning called contingent learning, in which an outcome is associated with the precise choice that caused it. Primates with LOFC lesions still do learn, but they are no longer able to assign credit for a reward to the causative choice and instead distribute credit to the average recent choice history (Walton, Behrens, Buckley, Rudebeck, & Rushworth, 2010). It is important to note that most human fMRI studies have reported value correlates in vmPFC, whereas primate neurophysiological studies have typically recorded from more lateral OFC areas, likely because (among

558 Gerhard Jocham, Erie Boorman, and Tim Behrens other reasons) the vmPFC is difficult to access for recording. Given these functional differences, it would be highly interesting to explore what the behavior of single ­neurons in primate vmPFC looks like. Motivated by the surge of evidence from human functional imaging, researchers have only begun recording from primate vmPFC during value‐guided choice (Bouret & Richmond, 2010; Monosov & Hikosaka, 2012; Rich & Wallis, 2014; Strait, Blanchard, & Hayden, 2014). Value representations have been found in other frontal cortical areas, in particular in the lateral prefrontal cortex (LPFC) and anterior cingulate cortex (ACC). ACC and LPFC value correlates share many properties with those found in OFC, but there are also some clear distinctions. OFC neurons show barely any coding of motor responses (Wallis & Miller, 2003) and, as we have noted above, they code the value of stimuli independent of movement parameters or stimulus characteristics (Kennerley, Dahmubed, Lara, & Wallis, 2009a; Padoa‐Schioppa & Assad, 2006, 2008). In contrast, neuronal activity in ACC seem to reflect more the value of actions, rather than stimuli. In a task that required monkeys to make either a go‐ or nogo‐response to one of two cues, only a few cells coded for the visual cue. In con­ trast, many ACC neurons represented the upcoming motor response, the expected reward, or the interaction of action and reward (Matsumoto, Suzuki, & Tanaka, 2003). Likewise, firing rates of ACC neurons are correlated with the probability that an action will be rewarded (Amiez, Joseph, & Procyk, 2006). Studies examining both ACC and OFC have, however, reported that cells representing the value of stimuli and actions exist in both areas (Luk & Wallis, 2013), which might also explain why medial OFC and the adjacent ventromedial prefrontal cortex (vmPFC) have been found to correlate with the value of both stimuli and actions in human fMRI studies (Glascher, Hampton, & O’Doherty, 2009). Nevertheless, the relative abun­ dance differs, such that cells representing stimulus values are more prevalent in OFC than in ACC, and vice versa for cells correlating with the value of actions (Luk & Wallis, 2013). These differences between ACC and OFC in representing stimulus versus action values reflect the connectional anatomy of those two regions. While the ACC has very direct access to the motor systems, with the cingulate motor area directly targeting the premotor and primary motor cortex and even motor neurons in the ventral horn of the spinal cord, the OFC is several synapses away from the motor system. In c­ ontrast, OFC receives direct input from all five senses, in particular highly processed visual input about object identity, information to which the ACC has far less direct access (Carmichael & Price, 1995; Cavada, Company, Tejedor, Cruz‐Rizzolo and Reinoso‐Suarez, 2000; Dum & Strick, 1991; He, Dum, & Strick, 1995). Accordingly, lesions to the OFC impair stimulus–reward learning without affecting action–reward learning, while ACC lesions interfere with action–reward learning, but not with stimulus reward learning in both macaques (Kennerley, Walton, Behrens, Buckley, & Rushworth, 2006; Rudebeck et al., 2008) and humans (Camille, Tsuchida, & Fellows, 2011). ACC has strong connections with LPFC, which, like ACC, also is strongly connected with the motor system (Petrides & Pandya, 1999), and the two areas are often found coactive in various cognitive tasks in human functional imaging (Duncan & Owen, 2000). Very much like ACC neurons, the firing of LPFC cells reflects motor response, o­ utcome, and the interaction of the two (Matsumoto et al., 2003; Watanabe, 1996). It has been shown that LPFC neuron firing is modulated by actions, outcomes, and

Neuroscience of Value‐Guided Choice 559 action–outcome combinations not only of the current trial, but also of previous trials (Barraclough, Conroy, & Lee, 2004; Seo, Barraclough, & Lee, 2007). Such coding of previous choices and outcomes might be a potential mechanism for linking actions with delayed outcomes. While these characteristics of LPFC neurons are remarkably similar to those of ACC neurons, it has also been shown that responses reflecting action–outcome associations emerge only late in the trial in LPFC, whereas they were evident almost immediately after cue onset in ACC (Matsumoto et al., 2003). A notable feature of ACC neurons is that they are able to “multiplex” several decision variables. Kennerley and colleagues (2009a) simultaneously recorded from macaque OFC, LPFC, and ACC while the animals chose between two options that varied on each trial along the expected reward magnitude, reward probability, or cost (lever presses required to obtain reward). Neurons that encoded one of the three value parameters were found in all three areas in roughly equal proportions. However, neu­ rons whose activity was modulated by two or even three value parameters were far more abundant in ACC than in OFC, or even LPFC, where only a few neurons showed such multiplexing. Another striking feature of value representations by individual neurons and the BOLD signal in the ACC is that they encode not only the reward associated with the action actually selected, but also the counterfactual reward that would have resulted from an alternative course of action (Boorman, Behrens, & Rushworth, 2011; Hayden, Pearson, & Platt, 2009). Together with the monitoring of extended action–outcome histories that has been described in both primate and human ACC (Behrens, Woolrich, Walton, & Rushworth, 2007; Jocham, Neumann, Klein, Danielmeier, & Ullsperger, 2009; Kennerley et al., 2006; Seo & Lee, 2007), these “counterfactual” value signals may play an important role when deciding to switch away from a current behavior, which we will discuss below. The dorsal striatum receives dense projections from the ACC (Kunishio & Haber, 1994), and movement‐related activity of cells in the primate striatum is modulated by expected reward (Cromwell & Schultz, 2003; Kawagoe, Takikawa, & Hikosaka, 1998; Shidara, Aigner, & Richmond, 1998). There is some heterogeneity in the exact value parameter that is found to be represented by striatal neurons. Some neurons in the caudate and putamen code action values, for example, the value of a left‐ or right­ ward movement (Samejima, Ueda, Doya, & Kimura, 2005), but a large fraction of cells in both dorsal and ventral striatum represent the overall value of options (Cai, Kim, & Lee, 2011; Wang, Miura, & Uchida, 2013). Representations of overall value are important for the response‐invigorating effects of high‐value options: An organism should be motivated to expend more effort when much reward is at stake, regardless of what option it ends up choosing. Thus, signaling of overall value in the striatum is consistent with its role in response invigoration (McGinty, Lardeux, Taha, Kim, & Nicola, 2013; Salamone, Correa, Farrar, & Mingote, 2007), rather than choice (Wang et al., 2013). In contrast, correlations with chosen value, which is, by definition, more tightly linked to the outcome of a decision process, have only seldom been reported in the striatum (Lau & Glimcher, 2008). In addition to these prefrontal and subcortical regions, value correlates have been found in a number of further areas. Largely separate from the research that focused on frontal cortical areas and the basal ganglia, another research community investi­ gated an area in the primate parietal cortex, the lateral intraparietal area (LIP). This area contains neurons that are involved in the generation of eye movements, and they

560 Gerhard Jocham, Erie Boorman, and Tim Behrens usually show selectivity for gazes toward a particular direction in space, that is, they have a preferred direction. LIP had been intensively studied in the domain of percep­ tual choice (Gold & Shadlen, 2007; Shadlen & Newsome, 1996). While a discussion of this extremely influential research is outside the scope of this chapter, it was these studies that laid the foundation for investigations on decision variables related to value rather than perceptual evidence. In a pivotal study, Platt and Glimcher (1999) were able to demonstrate that LIP neurons were sensitive to the reward value associated with a saccade to a particular direction. Importantly, value‐related activity in these neurons was independent of movement‐related parameters and also emerged early in the trial, prior to movement onset. Later, it was shown that LIP neurons track the local relative reward rate in a dynamically changing environment (Sugrue, Corrado, & Newsome, 2004). Further studies corroborated these findings, but also showed that LIP neurons carry diverse value representations. For instance, they were shown to display modulation by the value difference of two options, the value sum, but also the animal’s upcoming and previous choices (Seo, Barraclough, & Lee, 2009). Some of these characteristics bear some resemblance to what has been described above for LPFC, with which LIP has strong connections (Blatt, Andersen, & Stoner, 1990). However, a debate has recently arisen as to whether LIP responses do indeed reflect value, rather than motivational salience (Leathers & Olson, 2012). In addition to these parietal cortices, value correlates have even been found as early as in visual cortex (Serences, 2008; Shuler & Bear, 2006) and throughout premotor and supplementary motor areas (Pastor‐Bernier & Cisek, 2011; Roesch & Olson, 2003). Again, value‐related activity in the motor system appears to pertain more to action values, rather than chosen values. Finally, value signals have also been observed in pri­ mate posterior cingulate (PCC; McCoy, Crowley, Haghighian, Dean, & Platt, 2003; McCoy & Platt, 2005), but the role of PCC in cognition is still fairly mysterious (Pearson, Heilbronner, Barack, Hayden, & Platt, 2011). From Value to Choice We have stated that signals related to economic value are widespread across the brain and are even observed in sensory and motor cortical areas. Two obvious questions arise: First, are all of these value representations used in the service of decision‐­making? Second, if a neural signal related to value is indeed used for a decision, then exactly how is this value representation transformed into a choice? Correlates of value could serve a number of functions, choice only being one of them. In fact, what appears to be a value correlate may in many cases reflect other aspects, such as motivational factors, motor preparation, attention, or modulation of sensory processing, as has been discussed in detail recently (O’Doherty, 2014). Sometimes, the exact nature of the value representation can already give some clues. For instance, one of the studies described above found cells in the ventral striatum whose firing rate correlated with the value sum of the two available options (Cai et al., 2011). Such a representation is unlikely to be used for a choice, since the value sum can be high either whenever there is a high value of one option and a low value of the other option (regardless of which option is the high‐value one) or when both options have intermediate value. Therefore,

Neuroscience of Value‐Guided Choice 561 it does not inform about what option to choose. Instead, it is useful for motivational purposes, such as invigoration of responding. If an organism is in a situation in which a large amount of reward is at stake, it should be willing to exert more effort to obtain that goal. By contrast, value representations indicating a difference between two options’ values are more informative, as they directly reflect how good one option is relative to an alternative. Nevertheless, in action‐based tasks, some of the authors have interpreted this relative value signal as an indication of motor preparation, rather than of a decision (Pastor‐Bernier & Cisek, 2011). In human fMRI studies, subjects are often asked to make choices between two options that are not prelearned, but instead vary from trial to trial, for instance by drawing randomly from a distribution and explicitly presenting reward magnitudes and probabilities on the screen. In a number of these studies, a correlate of the chosen option’s value was found in the vmPFC (Boorman, Behrens, Woolrich, & Rushworth, 2009; Jocham, Hunt, Near, & Behrens, 2012a; Wunderlich, Rangel, & O’Doherty, 2009). In some of these studies, the fMRI signal in vmPFC correlated not only positively with the value of the chosen option, but also negatively with the value of the unchosen option (Boorman et al., 2009; Jocham et al., 2012a; Kolling, Behrens, Mars, & Rushworth, 2012). Such a represen­ tation of value difference between chosen and unchosen option would appear to reflect the outcome of a decision process, rather than motor preparation. Because, in these studies, values for the left and right options were generated afresh on each trial, the chosen and unchosen values are not tied to a particular response side. Wunderlich and colleagues further dissected this in two very elegant studies. In the first, they made subjects decide on each trial whether to perform a saccade to a particular location or to press a button. Each of these two motor responses was asso­ ciated with a probability of being rewarded that drifted slowly over time. By coupling the choice to two effectors that are represented in separable regions of the brain, they were able to test whether there were any separable correlates of the value of the particular motor actions, and where in the brain activity would correlate with the value of the option chosen, regardless of the effector required to execute the choice. It was found indeed that the value of the exact motor action (“action value”) was cor­ related with activity in the brain areas responsible for that movement. Thus, the value of the hand movement correlated with activity in the supplementary motor area, while the value of making a saccade correlated with activity in the presupplementary eye field on each trial, regardless of which movement was ultimately performed. In contrast, activity in vmPFC was related neither to the value of the eye nor to hand movement, but instead correlated with the value of the movement chosen by the par­ ticipant (Wunderlich et al., 2009). In the next study, the authors went on to show that representations of chosen value in the vmPFC were even evident without subjects knowing the exact motor output required to obtain an option. Subjects were first shown two options on each trial that were again associated with time‐varying reward probabilities. However, only several seconds later, it was revealed to participants which motor response (again, saccade or button press) was required for which of the two options. It was found that the correlation of vmPFC activity with the value of the chosen option emerged before the stimulus–action pairing was revealed (Wunderlich, Rangel, & O’Doherty, 2010). Thus, representations of choice in the vmPFC could be found in an abstract “goods space,” independent of the action needed to obtain that good. Intriguingly, a value correlated in vmPFC is observed even when people are not

562 Gerhard Jocham, Erie Boorman, and Tim Behrens actively making choices. When subjects were asked to perform a cover task during fMRI, and only later were asked about their preferences between options, activity in vmPFC nevertheless covaried with the subjective value of the options (Lebreton, Jorge, Michel, Thirion, & Pessiglione, 2009). It therefore appears as if the brain auto­ matically makes choices, even when they are not expressed behaviorally. Taken together, value representations in the vmPFC appear to fulfill the required criteria for a neural signal reflecting choice. This does not imply that vmPFC alone is important for making decisions. After all, patients and primates with vmPFC lesions still are able to make reward‐guided choices, albeit showing suboptimal decisions and altered behavioral strategies (Camille, Griffiths, Vo, Fellows, & Kable, 2011; Fellows, 2006; Noonan et al., 2010). In fact, it appears likely that several brain areas may be capable of transforming value representations into choice, depending on the kind of decision to be made or on contextual factors, as we will discuss below. Mechanisms of Choice Because a correlate of chosen value, or value difference between chosen and unchosen options, reflects the outcome of a decision process (by definition, those signals are related to choice), brain areas carrying such representations are likely candidate regions for transforming value into choice. However, this also implies that we only observe the end‐point of a decision process, or the neural representation we can mea­ sure after a neural network has made a choice. It therefore does not inform us how a population of neurons could have made this decision. A crucial impetus for research on the neural mechanisms of value‐guided choice again came from the field of percep­ tual decision‐making. The drift diffusion model is a very successful mathemical ­formulation of continuous evidence accumulation that has been able to capture behavior and neural dynamics during continuous evidence accumulation such as dur­ ing the random dot motion task (Bogacz, 2007; Smith & Ratcliff, 2004). In this kind of task, a subject is observing a noisy sensory stimulus, in this particular case a cloud of dots moving around randomly on the screen. A certain fraction of these dots is moving toward either the left or right side, and the subject is asked to perform a sac­ cade to the direction of net motion. Decision difficulty is manipulated by varying the percentage of dots moving coherently into one direction (motion coherence). This class of models assumes that a decision for a left or right motion is made whenever a decision variable reaches a predetermined threshold. This decision variable evolves according to a differential equation by sampling at each timepoint the momentary evidence in favor of a left or right decision. The drift rate, the steepness at which the decision variable ramps up to the threshold, is determined by the strength of e­ vidence, that is, by the motion coherence. These models successfully capture both the longer reaction times (slower drift rates) and decreased accuracy (stronger influence of noise on the decision variable more often leads to passing the incorrect decision threshold) on trials with low motion coherence. They are purely mathematical descriptions in the sense that they do not care about how this process would be realized neurally, and in fact, some of their features are not realistic from a biophysical perspective.

Neuroscience of Value‐Guided Choice 563 Some researchers have therefore devised biophysically realistic neural network models capable of performing evidence accumulation similar to that in drift diffusion models (Lo & Wang, 2006; Wang, 2002; Wong & Wang, 2006). In this class of models, a decision circuit in area LIP is simulated. The model contains two pools of neurons, L and R, that are sensitive to left and right motion direction, respectively. The receive inputs from motion‐sensitive cells in area MT in the temporal lobe that are known to increase firing in their preferred direction with increasing motion coherence. Therefore, the inputs to both L and R are proportional to the evidence in favor of left or right, respectively. The connections of these two pools display two key features. First, each pool of neurons has recurrent excitatory connections endowed with NMDA and AMPA receptors. Second, both pools of neurons excite a pool of GABAergic interneurons that provides feedback inhibition to both pools (Figure  22.2A). This architecture leads to so‐called attractor dynamics: While, ini­ tially, both pools of neurons fire in proportion to their inputs, at the end of the dynamics, only one pool of neurons ends up in a persistent, high‐firing state (Figure  22.2B). The recurrent excitation at NMDA receptors is crucial for these dynamics, as it allows slow evidence integration over a time span of several hundred milliseconds, comparable with behavioral reaction times. When recurrent excitation is governed only by AMPA receptors, their short time constant (about 5 ms compared with ~100 ms for NDMA receptors) causes the network almost immediately to latch onto one of the two attractor states. This results in very fast but also inaccurate decisions. The second key feature is GABAergic inhibition. With more GABA, the attractor dynamics are slowed down (corresponding to lower drift rates in the diffu­ sion models), allowing more time for evidence integration and making the decision less susceptible to noise. The noise in these models arises from two sources, the sensory stimulus, but also from within the nervous system. These attractor models governed by recurrent excitation and mutual inhibition have not only captured behavioral data but also very accurately reproduced LIP firing rates during the random dot motion task. Recently, these models have been adapted to value‐guided decision‐making. Now, the two pools of neurons represent two options, rather than left or right motion, and they receive input proportional to the options’ values. Furthermore, the noise arises exclusively from within the neural circuit, not from the sensory stimulus, but every­ thing else about the model is the same. In a recent study, this adapted model was used to simulate synaptic currents (rather than spikes) in order to predict MEG data during decision‐making (Hunt et al., 2012). The motivation was to generate a bottom‐up prediction of what neural activity would look like if a brain area used a mechanism like that in the model for transforming value into choice. The model simulation revealed that overall network activity first represented overall value and then transitioned to represent the value difference between the chosen and unchosen option. This occurred in a frequency range of 2–10 Hz (Figure 22.2C,D). That is, within the same brain area, two different representations would be observed in rapid temporal succession. Such rapid dynamics would be invisible to fMRI, so the investigators used MEG to test these predictions. It was found that activity in two brain areas, vmPFC and pos­ terior parietal cortex, exhibited the very dynamics predicted by the model (Figure 22.2E). Therefore, it appears likely that activity in these regions reflects the

(A) Noise (B) Decision space (C) IA IB 40 Firing rate rB (Hz) 30 10Frequency (Hz) 8 6 4 20 B 2 200 400 600 800 1,000 1,200 0 54 AB Interneurons 10 A 27 10 0 8 6 –27 –54 0 4 200 400 600 800 1,000 1,200 0 10 20 30 40 20 Time (ms) (D) Z-Scored effect on Frequency (Hz)Firing rate rA (Hz) 10 vmPFC 3 30 2–10 Hz activity (E) 8 0 25 6 200 400 600 800 1000 –3 20 pSPL 4 15 2 200 400 600 800 1000 10 10 0 5 8 Time (ms) 6 10 4 8 20 200 400 600 800 1000 1200 10 8 0 66 –5 44 0 200 400 600 800 1,000 1,200 20 200 400 600 800 1000 1200 2 0 Time (ms) Time (ms) Figure 22.2  (A) Recurrent network model for decision‐making. Pools 1 and 2 corresponds to pools representing leftward or rightward choice and receive inputs I1 and I2 from motion‐sensitive cells in area MT. In the model variant adapted for value‐ guided choice, the two pools correspond to pools representing the left and right option, and they receive input proportional to the value of that option. Recurrent excitation is dominated by NMDA receptors (indicated by the coupling parameter w+). Both pools indirectly inhibit each other indirectly by exciting a pool of GABA interneurons that provides feedback inhibition. (B) Attractor dynamics. Initially, firing is high in both pools A and B, but as the competition is resolved (a decision is made), only one pool ends up in a high‐firing attractor state while activity in the other pool is suppressed. (C) Biophysical model predictions of MEG data. Network activity in the range of 2–10 Hz represents the overall value (top panel) and value difference between the chosen and unchosen option (bottom). (C) Z‐scored effect of the overall value and value difference. Solid lines are correct trials; dashed lines are incorrect trials. This reveals that the effect of the value sum occurs first before the network activity transitions to represent value difference. Furthermore, value difference representations are only observed on correct trials. (E) MEG data from two key brain areas for decision‐making, the posterior superior parietal lobule (pSPL), and the ventromedial prefrontal cortex (vmPFC) matched with model predictions. The top panels show the effect of the value sum, and the bottom panels the effect of the value difference. Colors indicate the z‐scores. Reproduced with permission from Wong and Wang (2006) (A), Wang (2002) (B), and Hunt et al. (2012) (C–E).

(A) T-score 30 25 20 200 400 600 800 1000 1200 1400 15 Time (steps) 10 5 0 –5 –10 –15 0 Recurrent excitation (w+) 1 0.95 0.9 0.85 1 1.5 2 2.5 3 Choice accuracy (B) Value difference slope GABA + Glutamate 20% 70% Choice Outcome 2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16 Time (s) Residual vmPFC glutamate Residual vmPFC GABA Residual choice accuracy Residual choice accuracy Figure  22.3  (A) Biophysical model predictions on the effect of increasing the degree of recurrent excitation (w+) in the network. The model predicts that the value difference correlate (top), a neural signature of a decision process, ramps up steeper followed by a faster decline

566 Gerhard Jocham, Erie Boorman, and Tim Behrens fact that they are involved in making a decision by using a mechanism as specified in the model. The model makes further testable predictions. Because the key components in the network are recurrent glutamatergic excitation and GABAergic inhibition, the network will vary predictably, depending on the level of excitation and inhibition. We simulated how interindividual differences in the concentrations of GABA and glu­ tamate would translate into differences in neural dynamics and choice behavior. The simulations predicted that choices would become more accurate with higher levels of GABA, and less accurate with higher levels of glutamate. Neurally, the evolution of the value difference representation would be slower with high levels of GABA, and faster with high levels of glutamate (Figure  22.3A). We found that interindividual differences in vmPFC GABA and glutamate concentrations (measured by MR spec­ troscopy) were related to choice performance and neural dynamics consistent with model predictions. Subjects with high levels of GABA and low levels of glutamate in vmPFC were most accurate at choices on difficult trials. Furthermore, ramping up of the value difference signal (as measured with fMRI) was positively related to gluta­ mate, and negatively to GABA (Jocham et al., 2012a; Figure 22.3B). In other words, with high levels of GABA relative to glutamate, the decision was implemented slower in vmPFC, which led to more accurate choices. Together, these findings strongly sug­ gest that the representations of the chosen value and value difference found in vmPFC reflect the outcome of a choice mechanism on the basis of competition via mutual inhibition. A recent single unit recording study provides direct neuronal evidence for mutual inhibition in vmPFC. When monkeys were ­presented with two options suc­ cessively, neural activity reflected the value of the first and second option, respectively, at the time they were presented. Importantly, at the time of the second option presen­ tation, cells were tuned to the value of both option 1 and option 2, but they were tuned in the opposite direction. In other words, if a cell was positively modulated by the value of option 1 during presentation of the second option, this same cell was negatively modulated by the value of option 2 during the same interval, despite the values of the two options being uncorrelated. Furthermore, even after the authors regressed all value‐related activity out of firing rates, neural activity was still predictive of the upcoming choice the monkey would make (Strait et al., 2014). Together with studies showing that lesions to primate vmPFC impair value‐guide choice (Noonan with increased levels of recurrent excitation. Behaviorally, the model’s choice accuracy on diffi­ cult trials (as measured by the softmax inverse temperature) is reduced with a higher w+. (B) Experimental results. Subjects performed a simple binary choice paradigm. Participants tried to maximize their payoffs by making repeated selections between two options that differed in terms of reward magnitude and probability. GABA and glutamate concentrations were mea­ sured with MR spectroscopy in the vmPFC (white rectangle indicating the voxel position) and a control region in the parietal cortex (not shown). The slope of the value difference correlate (middle panel in the top row) depended on both GABA and glutamate (right). With high basal vmPFC concentrations of glutamate and low concentrations of GABA, the value difference correlate emerged very quickly but also decayed very rapidly. Behaviorally, performance (softmax inverse temperature) was best in subjects with high levels of GABA and low levels of glutamate. Reproduced with permission from Jocham et al. (2012a).

Neuroscience of Value‐Guided Choice 567 et al., 2010; Rushworth et al., 2011), these findings strongly suggest vmPFC as a brain region that implements a choice, and it appears to do so through a mechanism of competition via mutual inhibition. Multiple Brain Mechanisms for Choice? The evidence outlined above supports a role for vmPFC in value‐guided choice, yet a good deal of evidence suggests decisions can be made in different frames of reference using (at least partly) different neural circuitry: Sometimes, a choice is made between two stimuli, whereas in other cases, choices are made between motor actions. In some situations, choices are made between options that are presented simultaneously; in other situations, the options are presented sequentially. How much the role of particular brain regions depends on these different frames of reference is probably best illustrated by the finding that lesions to ACC impair performance when choices are made between actions rather than stimuli, whereas OFC lesions produce the exact opposite deficit (Rudebeck et al., 2008). Another study using MEG found that value representations were found in vmPFC when options were displayed side by side, but were found in the motor cortex and not in vmPFC when options were presented sequentially, separated by a brief delay. In these sequential trials, the first option was shown on the left, the second option always on the right. At the time of the first option presentation, a correlate of this option’s value was found in beta‐band power in contralateral motor cortex. At the time the second option was presented, beta power represented the value difference between the contra‐ and ipsilateral option (Hunt, Woolrich, Rushworth, & Behrens, 2013). These findings suggest that when choices can be made in the space of motor actions, rather than abstract goods, valua­ tion, choice, and motor preparation may proceed in parallel, rather than serially. This is further supported by a study using transcranial magnetic stimulation to study the relationship between value and corticospinal excitability as measured by motor‐evoked potentials at the effector muscle. The authors found that corticospinal excitability was greater on trials with a high value difference, and this effect of value gradually evolved over the course of a trial, suggesting that motor preparation is facilitated by a high value difference (Klein‐Flugge & Bestmann, 2012). It is important to note that our discussion does not argue against serial models of decision‐making, as suggested by Kable and Glimcher (2009). They propose a two‐stage progression, in which valua­ tion occurs in circuits involving vmPFC and striatum, and circuitry spanning lateral prefrontal and parietal cortex using these value signals for choice. Above, we have presented mechanistic evidence on how the progression from value to choice over time can be implemented within a single brain area. However, we do not think that those two proposals are mutually exclusive. Rather, we suggest that choice mecha­ nisms are deployed bespoke to the particular demands of the task at hand. In addition to these different frames of reference, even more subtle details of the particular choice context can matter. In the study by Hunt and colleagues (2012), a value difference correlate was only found in the vmPFC when participants had to compute an abstract value estimate by integrating a reward magnitude and proba­ bility, but not when both stimulus dimensions mandated the same choice. In addition,

568 Gerhard Jocham, Erie Boorman, and Tim Behrens the value difference correlate was only evident in vmPFC in the first half of the experiment, while in the second half it became more pronounced in posterior parietal cortex. Because reaction times declined steeply during the course of the experiment, it was suggested that choices gradually became more automated and less deliberative; hence, parietal cortex was interpreted as guiding behavior when choices are made fast, nearly automated, without long deliberation. In agreement with this, we have recently shown that when forcing subjects to make choices very rapidly, a pronounced value difference correlate is found in parietal cortex, but not in vmPFC. The situation exactly reversed when allowing subjects much time to decide – the value difference came to be represented in vmPFC but was absent from parietal cortex (Jocham et al., 2014). Finally, it is important to point out the intricate relationship between valuation and choice with attention. A recent fMRI study provided evidence that vmPFC value sig­ nals are anchored to attention, not choice. The authors manipulated subjects’ visual fixation orthogonally to option values in order to decorrelate attention from choice. Using this procedure, they found that the vmPFC fMRI signal correlated positively with the value of the attended, and negatively with the value of the unattended option. However, even though attention was deliberately decoupled from choice, they also found that guiding subjects’ attention to one option also made them more likely to select that option (Lim, O’Doherty & Rangel, 2011). Indeed, moment‐to‐moment fluctuations of a decision variable were closely tracked by a drift‐diffusion model under the control of visual attention (Krajbich, Armel, & Rangel, 2010). However, from that study, the direction of effect is unclear: Did people value an item more because they fixated on it, or did they fixate longer on it because they already assigned a higher value to it? Indeed, recent evidence has shown that visually salient options are more likely to be chosen than less salient alternatives during consumer choice (Milosavljevic, Navalpakkam, Koch, & Rangel, 2012). Furthermore, a descriptive accumulator model that integrates measures of salience and value in guiding fixations and ultimately value‐based choice has recently been shown to outperform similar models without a salience component (Towal, Mormann, & Koch, 2013). Most of our everyday decisions involve choices between items with multiple attributes, such as when deciding between two pairs of trousers that may vary in price, quality of the material, color, and so forth. In our laboratory experiments, we often mimic these situations by giving subjects two options that each have an amount of reward, and a probability with which that reward can be obtained. Economic theory posits that we compute an integrated value estimate, which, in our laboratory example, would be the Pascalian value (probability × magnitude), and in the trousers example a some­ what more abstract estimate of “how good” the item is. However, there is evidence to suggest that we do not always compute an integrated value. A notable study inves­ tigated the choice behavior of patients with vmPFC lesions and controls. Rather than looking at which option the subject ended up choosing, the study examined how information about options was gathered. Patients were asked to choose between three apartments that varied along three attributes (noisiness, neighborhood, and size). There were thus three pieces of information for each flat. Each of the resulting nine fields was covered with a card, and participants were allowed to turn over one card at a time. It was found that patients with vmPFC lesions gathered information across attributes, that is, they first uncovered all information for one flat, before proceeding

Neuroscience of Value‐Guided Choice 569 to the next. In contrast, healthy individuals and patients with LPFC lesions sampled within attributes, that is, they first uncovered information, for example, for the prize for all of the flats, before proceeding to the next attribute (Fellows, 2006). These data suggest that healthy individuals make choices by comparing items with respect to specific attributes, and then either compare across attributes or select the one that compares best in the attribute(s) most relevant to the individual. By contrast, the vmPFC patient’s behavior appears more consistent with the computation of an integrated value. Taken together, it seems that healthy individuals choices’ are guided not only by how good an items is overall but also by attending to particular features most important to the individual. In sum, it does not appear implausible that similar or overlapping neural computations subserve attention, valuation, and choice. Behavioral Adaptation A useful distinction in value‐guided choice can be drawn between comparative eval­ uative choices and sequential choices (Boorman, Rushworth, & Behrens, 2013; Freidin, Aw, & Kacelnik, 2009; Kolling et al., 2012; Vasconcelos, Monteiro, Aw, & Kacelnik, 2010). Comparative evaluative choices are made between simultaneously presented, well‐defined choice options, whose attributes, including any uncertainty (also called risk in this context), are known. An example of a comparative evaluative choice is a decision between a Snickers bar and a Mars bar at the canteen. Much of the evidence and modeling discussed in this chapter so far has stemmed from exper­ iments implementing comparative evaluative choices, in part because they simplify the decision problem, facilitating a tractable examination of the decision‐making mechanism. Sequential choices, on the other hand, are made in series or repeatedly, often under unknown uncertainty (also called ambiguity), which may or may not be resolvable with further experience. Examples of sequential choices abound in the real world, ranging from a foraging animal deciding whether to hunt a gazelle or search for prey further afield to a homeowner deciding whether to hold onto or sell their house. In sequential choices, the animal frequently faces a decision about whether to ­continue selecting an option or to adapt its choice, to a known or unknown set of alternatives. This means it is adaptive to track several decision variables: rewards, costs, and uncertainties associated with choice options. Tracking options’ rewards and costs is clearly advantageous, since these variables should guide behavior on the basis of current expectations about their future values – the expected reward relative to the cost that pursuing the options would likely entail. Yet in an ever‐changing world, different courses of action are pervaded by uncertainty. Consequently, exploring less‐known options with lower expected values enables the animal to gain potentially valuable information it could exploit in the future to obtain even better rewards. Both value‐guided and information‐guided behavioral adaptation can be described as either undirected or directed – in other words as a decision concerning whether to switch away from a known alternative to any alternative or to a specific alternative or portion of the sampling space guided by expected outcomes based on previous experience.

570 Gerhard Jocham, Erie Boorman, and Tim Behrens Default choices In everyday life, animals face a daunting problem: How do they make good choices given the multitude of potential options to select between at any given moment? Reinforcement learning models often assume that agents perfectly track decision var­ iables associated with each possible choice option and select the one that maximizes the agent’s expected future reward (Sutton & Barto, 1998). Yet finding the optimal solution to this problem in the real world is not only difficult; it is impossible. The brain requires some means of constraining the decision space to reduce the computa­ tional demand of such continual effortful comparisons. One appealing heuristic to this problem is to form a default position, or long‐term preferred option or limited set of options, based on their history of predicting favorable outcomes such as reward (Boorman et al., 2013); for a similar problem concerning identification of relevant stimulus dimensions, see Wilson and Niv (2011). This strategy dramatically simplifies the computational demand of the decision problem, rendering it tractable. A default option can be readily identified in many everyday decisions: shopping for breakfast at the supermarket, choosing an airline for travel to an upcoming conference, or surfing the Internet. These sequential choices can often be reduced to a decision about whether to stick with the default position or switch to something else. Cross‐ species lesion and recording evidence from rodents, monkeys, and humans supports a central role for dorsal ACC (dACC) and adjacent pre‐SMA in making such decisions. Both the BOLD response and single unit activity in dorsal ACC increase markedly at response time when subjects switch behavioral responses, especially when these are made volitionally based on the history of reinforcement, as opposed to an external cue (Procyk, Tanaka, & Joseph, 2000; Shima & Tanji, 1998; Walton, Devlin, & Rushworth, 2004). In the same vein, lesions to dorsal ACC produce def­ icits selecting options based on reinforcement history (Chudasama et al., 2013; Hadland, Rushworth, Gaffan, & Passingham, 2003; Rudebeck et al., 2008), partic­ ularly ­following a change in contingencies (Kennerley et al., 2006). More recently, it has been proposed that decision‐related dACC/pre‐SMA activity reflects a decision variable amounting to the accumulated evidence favoring behavioral adaptation from a long‐term or default option during sequential decisions (Boorman et al., 2011, 2013; Hayden, Pearson, & Platt, 2011; Hunt et al., 2012; Kolling, Wittmann, & Rushworth, 2014; Figure 22.4). Rather than merely increasing activity during switches, in each of these studies the dACC signal at choice scales monotoni­ cally with the value‐based evidence for adapting behavior – that is, the difference or ratio between subjective values associated with adapting away from the default option and continuing to select it. Importantly, the signal is present independently of whether or not the subject does in fact switch, but is notably absent if the default option is transiently removed from the menu of available options (Boorman et al., 2013). In each of several tasks, dACC is sensitive to the task‐relevant variable that is relevant on the longer term, whether it is reward probability (Boorman et al., 2013), average environmental reward size (Kolling et al., 2012), travel time between reward “patches” (Hayden et al., 2011), time pressure (Kolling et al., 2014), or a predictable spatial location (O’Reilly et al., 2013). Moreover, it integrates this long‐term variable with other short‐term variables in the form of subjective value comparisons relevant for the decision at hand (Boorman et al., 2009, 2013; Kolling et al., 2012).

I Latest Late Early Earliest (B) Saccade (A) n = 94 sessions, 2834 blocks Reward 60 Firing rate (spikes per s) Time in patch (s) 20 Cell E090921b 40 15 20 10 0 9 10 5 –1 0 1 1234 5678 (E) –2 Time (s) Travel time (s) (C) (D) Firing rate (spikes per s) 25 Cell E090921b Firing rate (normalized) 2.0 n = 49 cells Firing rate (normalized) 2.2 Earliest departure 1.8 Early departure 20 1.6 Late departure 1.4 2.1 Latest departure 15 1.2 1.0 2.0 10 0.8 0 10 20 30 1.9 Time in patch (s) 0 10 20 30 Time in patch (s) 1.8 1.6 n = 43 neurons 3 2 10 Trails before switch II Search value Forage VD- Decision VD (A) ROI: y=28 z=30 (B) Foraging (engage) Search vlaue in ACC Stimulus + Choice periods Foraging (search) Stimulus + Choice periods 0.15 0.10 0.05 0 –0.05 –0.10 –0.15 3 69 12 0 36 9 12 0 Time in seconds Cost Time in seconds Cost Encounter value Search value Encounter value Search value Figure 22.4  dACC and default adaptation. I. Monkey dACC neurons integrate switch evidence across trials. (A) Departure times of monkey choices are plotted as a function of travel time bet­ ween patches and residence time within a patch, color‐coded from the earliest to the latest departure times. (B) Saccade‐locked phasic responses of a single dACC neuron, color‐coded as in (A). (C) Same single neuron’s firing rate plotted as a function of both travel time between patches and residence time in a patch. The gain of the response is inversely proportional to travel time.

III (A) Cluster 6 Cluster 5 Cluster 4 Cluster 3 Cluster 9 Cluster 2 Cluster 1 Cluster 7 Cluster 8 Outcome presented (B) Trial presented Outcome removed vmPFC Chosen value x=2 0.1 V2 (C) V3 dACC Effect size (a.u.) 0.05 0 –0.05 –0.1 0 2 4 6 8 10 12 14 16 18 0.1 Default V1 0.05 Default V2 Default V3 0 x=–6 –0.05 –0.1 0 2 4 6 8 10 12 14 16 18 Time (s) Figure 22.4  (Continued) (D) Same as in (C) for a population of 49 dACC neurons. (E) Firing rate for different travel times overlaid for the three trials preceding a switch and on switches, illustrating a rise to similar putative decision thresholds. Reproduced with permission from Hayden et al. (2011). II. dACC activity and average search value in foraging‐style decisions. (A) dACC activity reflected the main effect of search value during foraging decisions (left) and was better related to VD during foraging‐style decisions than decision VD during comparative evaluative decisions (right). (B) dACC time courses during “engage” decisions (left) and “search” decisions (right). Adapted from Kolling et al. (2012) with permission from the American Association for the Advancement of Science. III. Choice and default value coding during multialternative choice. (A) Reference image for comparison with (B) and (C) showing diffusion‐weighted imaging‐based parcellation of the cingulate cortex based on clustering of probabilistic connectivity profiles. (B, C) Left: sagittal slices through z‐statistic maps relating to subjective expected value of the chosen option (chosen EV) during decisions. Positive effects are shown in red–yellow (B) and negative effects in blue–light blue (C). Right, top: time course of the effect size of the chosen EV, short‐term next‐best (V2), and short‐term worst (V3) option EV plotted across the decision vmPFC. Right, bottom: the same for the long‐term best (default V1), long‐term next‐best (default V2), and long‐term worst (default V3) option EV in dACC. Thick lines: mean; shadows: SEM. Adapted with permission from Boorman et al. (2013).

Neuroscience of Value‐Guided Choice 573 Although there has been some debate surrounding whether decision‐related dACC activity informs the current choice or instead predicts choice outcomes or monitors decision quality for learning (Alexander & Brown, 2011; Blanchard & Hayden, 2014), there are several properties of the dACC signal that are reminiscent of a mechanism that integrates evidence for behavioral change (Figure 22.4). In one particularly c­ ompelling study (Hayden et al., 2011), Hayden and colleagues trained monkeys to perform a “patch foraging” task, which required them either to choose to stay in a “patch,” whose reward depleted with patch residence time, or to leave the “patch” and “travel” to a new one with some variable delay between patches. This is in essence a stay/switch decision, where staying can be seen as the default option because it is chosen again and again until the monkey has accumulated sufficient evidence to motivate a switch. Monkeys’ decisions to leave a patch depended on both travel time between patches and handling time within a patch, and were predicted quantitatively by marginal value theorem (MVT), an optimal solution to foraging in a “patchy habitat” under certain assumptions (Charnov, 1976). Response‐locked phasic responses in both single dACC neurons and the population integrated patch residence time and travel time over multiple stay decisions (Figure 22.4). Perhaps most convincingly, the gain of dACC firing rate with respect to patch residence time was inversely proportional to the travel time between patches but terminated at a similar threshold across departure times (Figure  22.4). These properties of integration across multiple sequential decisions, adaptive response gain with switch evidence, and rise to a threshold are c­onsistent with an evidence accumulation‐to‐bound process, here guiding behavioral change. This putative mechanism for behavioral change can be contrasted with the neural signatures of decision mechanisms for comparative evaluative decisions discussed ear­ lier in the chapter. Notably the vmPFC and posterior parietal cortex value comparison signals measured during comparative evaluative choices initially reflect the sum of values and then transition to the difference between chosen and unchosen (or attended and unattended) subjective values (Boorman et al., 2009, 2013; Hunt et al., 2012; Jocham, Hunt, Near, & Behrens, 2012b; Lim et al., 2011). In many paradigms, the sign of this comparison signal is the inverse of the value comparison signals recorded in dACC described here. We propose that the dACC signal is inversely proportional to vmPFC and portions of posterior parietal cortex in many paradigms because it adopts a different reference frame: one of staying versus changing behavior, rather than a decision between well‐defined options or goals (Boorman et al., 2013; Kolling et al., 2012). This distinction supports the view that decisions are governed by mul­ tiple controllers, whose recruitment depends on the type of decision required by current environmental demands (Rushworth, Kolling, Sallet, & Mars, 2012). Undirected behavioral adaptation MVT has proven powerful in capturing the sequential foraging behavior of many species, including bees, birds, monkeys, and human hunters in their respective eco­ logical contexts (Hayden et al., 2011; Smith & Winterhalder, 1992; Stephens & Krebs, 1986). At the core of the stay/leave decision rule implied by MVT is a comparison between two terms: the marginal energy intake rate in a patch and the average energy intake rate for the habitat. When the latter exceeds the former, animals should leave the patch and search elsewhere. Notably, the intake rate is a function of

574 Gerhard Jocham, Erie Boorman, and Tim Behrens rewards and energetic costs, including delay and effort. As mentioned earlier in this  chapter, there is evidence that dACC activity reflects the abstract value of choices and the environment on average. A large proportion of dACC neurons multi­ plex over several decision variables, including reward probability, reward size, effort, and time, and this coding is significantly more prevalent than in LPFC or lOFC (Hosokawa, Kennerley, Sloan, & Wallis, 2013; Kennerley, Behrens, & Wallis, 2011; Kennerley, Dahmubed, Lara, & Wallis, 2009b). Furthermore, dACC neurons encode pure reward‐prediction errors in these multiplexed values (Kennerley et al., 2011), which may be important for tracking these values. In humans, dACC BOLD activity was shown to reflect the average reward value of the environment in a foraging‐style task when this information guided behavioral change from a long‐term preferred or default option (Figure 22.4; Hunt et al., 2012). Finally, lesions to the ACC in rats impair decisions that require reward size to be weighed against effort costs (Floresco & Ghods‐Sharifi, 2007; Rudebeck, Walton, Smyth, Bannerman, & Rushworth, 2006; Walton, Bannerman, Alterescu, & Rushworth, 2003; Walton, Bannerman, & Rushworth, 2002), and BOLD activity in human dACC, but not vmPFC, reflects a comparison between options’ values that increase with reward size and decrease with effort cost (Lim, Colas, O’Doherty, & Rangel, 2013; Prevost, Pessiglione, Metereau, Clery‐Melin, & Dreher, 2010). Collectively, these findings suggest that dACC decision‐related activity is ideally suited to comparing marginal and average energy intake rates for sequential decisions. Behavioral adaptation can be described as directed or undirected. The decision rule implemented by MVT is essentially undirected. Animals need only maintain a repre­ sentation of the marginal value of the current option and the environment’s average value, without any required representation of the value of specific alternatives they may choose when they do adapt their behavior (or transitions to subsequent states they may visit). Conversely, directed behavioral adaptation requires a representation of the reinforcement or information potential of specific alternatives in the environ­ ment. The role of dorsomedial frontal cortex in directed and undirected behavioral adaptation may depend on the structures with which it interacts, contingent upon contextual demands. One candidate neuromodulator known to heavily innervate dACC that is well posi­ tioned to inform undirected behavioral adaptation is dopamine (DA; Berger, 1992; Lindvall, Bjorklund, & Divac, 1978). A noteworthy theory proposed that reward‐ prediction errors are integrated into an average reward rate encoded by tonic DA levels (Niv, Daw, Joel, & Dayan, 2007). Although there has been limited empirical support for this hypothesis to date, there is nevertheless evidence to suggest that DA may perform computations important for guiding undirected sequential decisions. In one tour‐de‐force study (Hamid et al., 2016), optogenetics was combined with fast cyclic voltammetry to measure changes in DA in NAc both with and without physio­ logically titrated ventral tegmental area stimulation in a decision‐making task between two options with independently varying reward probabilities. The authors isolated two temporally dissociable DA signals that causally impacted behavior in distinct ways: a phasic burst in response to a tone marking the onset of a trial and a graded reward‐prediction error at the time of a tone indicating a reward would be delivered for the animal’s preceding choice. Optically stimulating DA neurons at the first tone led to vigorous approach behavior but did not impact learning, whereas stimulating

Neuroscience of Value‐Guided Choice 575 at the second tone led to increased preference for the selected arm. Crucially, the DA reward‐prediction error signal recorded in NAc took the form of the reward obtained minus the expected reward based on the state value of the environment, rather than the chosen or left/right action value. This reward‐prediction error, comprising the obtained reward minus the expected reward on average, is precisely the form of p­ rediction error useful for learning about the average energy‐intake rate. In a second study, Constantino and Daw (2015) developed a foraging task for humans and showed that an adaptation of MVT to incorporate a learning rule c­aptured human choices dramatically better than a canonical temporal difference‐ learning algorithm in this setting. Comparing Parkinson’s patients on and off DA medication with matched controls, they found that patients harvested longer in a patch before switching when on relative to off medication and also relative to matched controls. This finding is consistent with a reduced estimate of the average reward rate of the environment when off DA medication, suggesting that DA is criti­ cal for tracking average reward rate. Although untested to our knowledge, these DA signals may modulate dACC activity, facilitating decisions about when to continue or adapt behavior based on the local average reward, thereby guiding undirected behavioral adaptation. Although MVT can capture much of the sequential behavior of diverse species, a­nimals’ behavior is also governed by the drive to gather information, which is not explicitly modeled by MVT. As with value, information‐guided behavioral adaptation can be described as directed or undirected (Johnson, Varberg, Benhardus, Maahs, & Schrater, 2012). Setting aside the important interplay between value and uncertainty for now (discussed in a subsequent section), one formalization of undirected information foraging posits that animals continue sampling surprising locations they encounter, given prior experience at that location (Johnson et al., 2012). Similar to MVT, undirected information‐guided behavioral adaptation can also be conceptual­ ized as guided by a comparison between the information available from a currently observed sample relative to the expected information in the habitat on average, based on previous experience. In other words if the animal encounters a sufficiently sur­ prising sample, an information‐seeking animal should continue sampling that location to resolve the high uncertainty, relative to the animal’s experiences in the environ­ ment on average (assuming equated expected values). Conversely, if the sample is relatively unsurprising, and the environment is sufficiently information rich, it should sample elsewhere. This simple information‐based comparison can also inform undi­ rected sequential decisions concerning whether to continue sampling an option or adapt behavior to sample elsewhere. While it is challenging to disentangle the contributions of uncertainty and value on behavior, two productive approaches have been to remove reward from the experimental setting or to match expected rewards across options. In these circum­ stances, rats, monkeys, and humans all exhibit a preference for novel stimuli (Baillargeon, Spelke, & Wasserman, 1985; Berlyne, Koenig, & Hirota, 1966; Bromberg‐Martin & Hikosaka, 2009, 2011), which can be shown to emerge natu­ rally from formal treatments of information foraging and active Bayesian inference (Johnson et al., 2012; Schwartenbeck, Fitzgerald, Dolan, & Friston, 2013). Intriguingly, the very same DA and lateral habenula neurons that encode reward‐ prediction errors also encode information prediction errors, even when the options

576 Gerhard Jocham, Erie Boorman, and Tim Behrens are carefully matched for expected reward value (Bromberg‐Martin & Hikosaka, 2009, 2011). This surprising finding suggests that information may be inherently rewarding and points to a ­potential role for DA in undirected information‐guided, as well as value‐guided, behavioral adaptation. Another candidate neuromodulator likely to be important for undirected information adaptation is norepinephrine (NE). An influential theoretical framework has proposed that NE tracks unknown uncertainty, or ambiguity (Yu & Dayan, 2005), which can be used both to modulate learning and to motivate exploration. There is indirect evidence to suggest that NE plays a role in tracking uncertainty, which theo­ retically should and empirically does control the rate of learning in rats (Gallistel, Mark, King, & Latham, 2001), monkeys (Rushworth & Behrens, 2008), and humans (Behrens et al., 2007). Release of NE from the locus coeruleus (LC) nucleus corre­ lates with dilation of the pupils (Joshi, Kalwani, & Gold, 2013). This observation enables an indirect but noninvasive putative measure of LC activity. Matthew Nassar and colleagues have measured pupil diameter while subjects performed a change‐ detection task (Nassar et al., 2012). In this task, there are two computational factors that should make subjects amenable to changes of belief: the long‐term probability that the world might change, and a term known as the relative uncertainty, which captures mathematically the subject’s doubt that his previous belief was correct. As these factors are varied throughout the experiment, they both exhibit strong and separable influences on pupil diameter. Perhaps most impressively, if the experimenter introduces a surprising stimulus (a loud noise) at an unexpected time in the experiment, this not only causes an increase in pupil diameter but also results in a rapid period of revising beliefs about the subject’s completely unrelated task. Another recent study showed that during decision‐making, baseline pupil diameter is increased directly ­preceding exploratory compared with exploitative choices, a difference that predicted an individual’s tendency to explore (Jepma & Nieuwenhuis, 2011). Notably, LC and dACC have strong reciprocal connections (Chandler, Lamperski, & Waterhouse, 2013; Jones & Moore, 1977). These observations suggest that interactions between NE and dACC may regulate both the rate at which new information replaces old ­during learning (Behrens et al., 2007; Jocham et al., 2009; O’Reilly et al., 2013) and also the extent to which overall uncertainty drives changes in exploratory behavior. Directed behavioral adaptation When animals do adapt from a default position or status quo, how do they know what to choose? So far, we have discussed behavioral and neural evidence pertaining to decisions to stay or switch to any option. According to such accounts, when the status quo becomes unrewarding or uninformative, animals switch randomly, akin to inject­ ing noise into the choice process (Cohen, McClure, & Yu, 2007). However, in many scenarios, animals change behavior in a directed manner (Johnson et al., 2012). Both the computations and neural structures underpinning such directed adaptation differ in some noteworthy respects from those underpinning undirected adaptation. Computationally, directed behavioral adaptation requires some representation of the expected values and/or uncertainties of specific alternatives that serve to guide behavioral change toward those options, as opposed to, for example, only the average value. Formally, this can be defined by the Bayesian belief (or probability distribution

Neuroscience of Value‐Guided Choice 577 over rewards) describing how much reward is likely to be available from selecting each possible alternative outcome or location (Johnson et al., 2012), or only a subset of sufficiently valuable/informative alternatives in the environment (Koechlin & Hyafil, 2007), where the mean and variance of the distribution can be taken to rep­ resent the value and the uncertainty in that belief. Recent evidence suggests that interactions between dACC and hippocampus (HIPP) contribute to such directed behavioral change in sequential decision‐making. Although it is not often high­ lighted, rat dACC projects throughout parahippocampal cortex, including presu­bi­ culum, parasubiculum, entorhinal cortex, and postrhinal cortex, and also sparsely to subiculum (Jones & Witter, 2007), and cingulate activity has been shown to be phase‐ locked to the well‐described hippocampal theta rhythm (Colom, Christie, & Bland, 1988), supporting the plausibility of coordination between HIPP and dACC neural ensembles during behavior. Evidence for the involvement of these structures in directed adaptation comes from studies with both rats and humans. Using a sequential choice task, Remondes and Wilson (2013) trained rats to learn to perform sequences of four trajectories in a “wagon‐wheel maze” to ultimately obtain a chocolate reward while they recorded multiunit activity in HIPP and ACC. In this task, rats start at the center of the wagon‐ wheel maze, enter an outer circle via an exit arm, navigate around the outer circle, and choose whether to enter each of several entry arms (or trajectories) that return to the maze center. They then leave the exit arm again and return to the outer circle from where they will select the next arm in the sequence. This task is comparable with a choice adaptation paradigm in that continuing along the outer circle can be concep­ tualized as a default option; at each choice point (or entry arm), the rat has to select whether to continue along the outer circle or adapt behavior and select to enter the encountered entry arm. Both HIPP and ACC neural populations decoded choice tra­ jectories in the intervals directly preceding these choice points, with HIPP trajectory content arising earlier than ACC. The authors then tested whether these changes in information content were reflected in distinct patterns of HIPP–ACC coherence of the local field potential. This revealed a progression in HIPP–ACC coherence initially dominated at high‐frequency theta to wide‐band theta as the animals progressed toward choice points. Moreover, they found that this change in coherence was accom­ panied by increases in the amount of trajectory information encoded by HIPP and ACC, again with HIPP preceding ACC. Finally, they investigated the relative timing and Granger causality, a test for inferring whether one time series is useful in predict­ ing another, between HIPP and ACC spikes and local field potential, and found that HIPP spikes were Granger causal for ACC neural activity. Taken together, these find­ ings suggest that lower‐frequency HIPP–ACC theta coherence coordinates the integration of contextual information from hippocampus to ACC to adapt from current choices to specific trajectories in a directed manner. Complementary evidence has arisen from an active visual exploration experiment in humans (Voss, Gonsalves, Federmeier, Tranel, & Cohen, 2011; Voss, Warren, et al., 2011). In this paradigm, subjects explore a visual grid to learn about the location of different occluded objects one at a time in an active condition, where they control which item is revealed and for how long, and a passive condition, where they observe the objects as seen by the previous subject, thus enabling precise control of viewing sequence, duration, and information content between pairs of subjects. The sole

578 Gerhard Jocham, Erie Boorman, and Tim Behrens difference between conditions is whether the joystick movements are volitional or passive, and hence, the authors contend, whether information acquisition is active or passive. In the active condition, subjects spontaneously revisited recently viewed object locations, which they termed spontaneous revisitation. Spontaneous revisita­ tion led to striking memory enhancements in both object identification and spatial memory in the active, but not passive, conditions. Superficially, such spontaneous revisitation behavior may seem to have little in common with the sequential decision‐ making paradigms discussed so far, but they do in fact share some key features. Subjects have to volitionally change their behavior from continuing to the next sample and instead direct it toward specific previously viewed objects to gain information and resolve uncertainty. Interestingly, the degree of memory enhance­ ment afforded by such spontaneous revisitation was associated with coactivation of dACC and HIPP but only in the volitional condition (Voss, Gonsalves, et al., 2011). Moreover, spontaneous revisitation, and its benefits on subsequent memory performance, was only rarely observed in amnesic patients with severe damage to the hippocampus. Another brain region that plays a key role in directed behavioral adaptation in humans is the lateral frontopolar cortex (lFPC). In one line of research, subjects were asked to make sequential choices on the basis of two separate pieces of information that an ideal observer should integrate: reward probabilities that drifted slowly but independently and could be tracked; and independent reward magnitudes that were generated randomly at the onset of a trial and hence could not be tracked (Boorman et al., 2009, 2011). This manipulation meant that future choices should only be dependent upon the options’ reward probabilities, whereas current choices should be dependent upon both the options’ reward probabilities and reward magnitudes, thus enabling variables important for long‐term strategies and short‐term behavior to be dissociated. In two studies, changes to future behavior could be shown to depend upon the relative unchosen reward probability: the difference (or ratio) between the reward probability associated with the best alternative in the environment and the reward probability associated with the selected option, with no impact of a third inferior option. Because subject switching behavior was driven by a comparison bet­ ween the best two alternatives’ reward probabilities, but not by the third inferior option, it can be described as directed. Neurally, lFPC BOLD activity uniquely encoded the reward probability of the best alternative option relative to the reward probability of the selected option but was not sensitive to the randomly generated magnitudes only relevant for current decisions (Figure 22.5). These findings suggest that lFPC compares the future reward potential of specific valuable counterfactual options with a selected or default option for upcoming choices. Consistent with this interpretation, subjects in whom this evidence was better represented in lFPC switched to the previous next‐best alternative more frequently when advantageous. Notably, this pattern of coding contrasted with other brain regions such as vmPFC, dACC, posterior parietal cortex, and ventral striatum whose signals reflected the integration of reward probabilities and magnitudes into expected values relevant for current choices. The evidence for directed future behavioral change, reflected in lFPC activity, may also help coordinate decisions about whether to adapt behavior with interconnected dACC (Neubert, Mars, Thomas, Sallet, & Rushworth, 2014), where long‐term v­ ariables were integrated with short‐term variables relevant for current choices.

I (B) (A) Trial Response Outcome Outcome 0.14 Frontal polar cortex R presented made presented removed 0.12 *** *** p < 0.005 Z = –8 0.1 Y = 56 Z = 2.8, p = 0.05 Effect size (a.u.) Beta 0.08 NS whole-brain 0.06 cluster corrected 0.04 0.02 0 L Relative unchosen r –0.02 Relative unchosen r Relative unchosen m 0 5 10 15 20 Time (s) (C) Unchosen r Trial Response Outcome Chosen r presented made removed 0.25 0.2 Chosen option Option 2 Effect size (a.u.) Effect size (a.u.) 0.15 Option 3 3.9 0.1 0.05 0 –0.05 3.1 5 10 15 20 –0.1 Time (s) –0.15 0 5 10 15 20 Time (s) Figure 22.5  (Continued)

II (A) Exploration Explorers Non-explorers 4000 3 2 RT diff (ms)3000 Model exp term 1 Standardized RT swing z(RTdiff)2000RT diff0 –1 1000 –2 –3 0 –1000 –2000 –3000 Single subject, DEV –4000 5 10 15 20 25 30 35 40 45 50 –2 0 2 –2 0 2 Trial Standardized relative uncertainty z(σdiff) Figure 22.5  lFPC and strategic adaptation to counterfactuals. I. lFPC and adaptation to next best alternatives based on relative value. (A) Axial and coronal slices through z‐statistic maps relating to the relative unchosen probability, log(unchosen action probability / chosen action proba­ bility). Maps are corrected for multiple comparisons across the whole‐brain by means of cluster‐based correction at p < 0.05. (B) Top panel: time course for the effect size of the relative unchosen probability in the lFPC shown throughout the duration of the trial. Bottom panel: same time course shown with the signal decomposed into log unchosen and log chosen option probabilities. Thick lines: mean effect sizes. Shadows: standard error of the mean (±SEM). Adapted with permission from Boorman et al. (2009). (B) lFPC signal reflecting reward probabilities, which are relevant for both current and future choices, but not reward magnitudes, which are only relevant for current choices. (C) Second study involving trinary choices (Boorman et al., 2011), time course of lFPC effect of the reward probability associated with the chosen option, best unchosen option (option 2), and worst unchosen (option 3). The signal reflects a directed comparison between option 2 and the chosen option. II. lFPC and strategic exploration based on relative uncertainty. (A) Left: plot from a representative participant illustrating that changes in the Explore term (blue) partially capture trial‐to‐trial swings in RT (green). Right: correlation between RT swings and relative uncertainty among explorers (left, [mean r = 0.36, p < 0.0001]) and nonexplorers (mean r = −0.02, p > 0.5). All trials in all participants are plotted in aggregate with color distinguishing individuals.

(B) Explore participants only Explore minus non-explore participants RLPFC Figure 22.5 (Continued) (B) Left: effect of relative uncertainty, controlling for mean uncertainty and restricted to explore participants, revealing activation in dorsal and ventral lFPC regions, rendered at p < 0.05 FWE corrected (cluster level). Right: contrast of relative uncertainty effect, controlling for mean uncertainty, in explore versus nonexplore participants revealing a group difference in lFPC, rendered at p < .05 FWE corrected (cluster level). Adapted with permission from Badre et al. (2012).

582 Gerhard Jocham, Erie Boorman, and Tim Behrens In the real world, value and uncertainty both impact animal choices. Animals must trade off maximizing reward by exploiting well‐known options with changing strategy and exploring less well‐known options to gain information that may reveal even better rewards, a classic problem known as the exploration–exploitation dilemma (Sutton & Barto, 1998). Theoretical work has proposed that adaptive exploration can be directed toward options in proportion to the difference in uncertainty between them (Kakade & Dayan, 2002). In practice, however, isolating the influence of uncertainty on explo­ ration has proven challenging (e.g., Daw, O’Doherty, Dayan, Seymour, & Dolan, 2006), partly because it requires very precise modeling of exploitation. In one elegant study, Michael Frank and colleagues (Frank, Doll, Oas‐Terpstra & Moreno, 2009) accomplished this using a clock‐stopping task. In this task, subjects have to decide when to stop a clock hand moving clockwise one full rotation over an interval of 5 s in different contexts with different reward structures: an increasing expected value (iEV), decreasing expected value (dEV), and constant expected value (cEV) condition. After accounting for the influence of incrementally learned Go and No‐Go action values, among other factors, the authors found that the model failed to capture large swings in subjects’ reaction times (RTs; Figure 22.5). These RT swings were accounted for by introducing an explore term that could influence both RTs and choices in proportion to the relative Bayesian uncertainty between “fast” and “slow” responses (those faster or slower than the local average). Inclusion of this term significantly improved model performance by capturing these large swings from fast to slow responses or vice versa. This indicated that subjects explored fast and slow responses at key points in the experiment to learn about the structure of the reward environ­ ment, critically doing so in proportion to the relative uncertainty about obtaining positive reward‐prediction errors from categorical fast and slow choices. These directed behavioral changes from an exploitative to exploratory strategy, driven by relative uncertainty, were associated with variation in the expression of catechol‐O‐ methyltransferase, a gene preferentially controlling prefrontal DA expression, but not with genes preferentially controlling striatal DA function. These results add to evi­ dence indicating that the impact of DA on behavior depends upon its afferent targets and further suggest that its modulatory effects in prefrontal cortex, but not striatum, contribute to directed behavioral change. In a second study, Badre, Doll, Long, and Frank (2012) investigated the neural correlates of such directed exploration in the same task using fMRI. They found that some subjects could be described as “explorers,” while others could not, based on whether they used the relative uncertainty to drive strategic exploratory RT swings as described above. Only explorers showed an effect of the relative uncertainty in a region of lFPC that neighbored those reported by Boorman et al. (2009, 2011; Figure  22.5). By contrast, overall uncertainty, rather than relative uncertainty, was reflected in the activity of more posterior lateral regions of PFC, among other regions, but not in lFPC. Collectively, these studies suggest that lFPC compares variables, which may themselves be represented individually elsewhere, to guide directed or strategic changes to upcoming behavior toward counterfactual options, whether those variables are values or uncertainties. Whether and how lFPC, hippocampus, DA, NE, and other structures and neuromodulators, together orchestrate such directed behavioral change, or are selectively recruited depending on current environmental demands, remains an open question likely to be addressed in the coming years.

Neuroscience of Value‐Guided Choice 583 In this chapter, we have described how widespread value representations in the brain can result from distinct mechanisms, and how one might go about generating more mechanistic predictions about signatures of a decision‐making system. Although it may be tempting to think about a single common decision‐making system, evidence reviewed in this chapter suggests that contextual demands might determine the extent to which decisions are in fact implemented by distinct or at least partly distinct neural systems specialized for distinct kinds of decisions. According to this view, parallel neural circuits mediate decisions depending on the type of decision at hand, whether they are made between stimuli or actions, between well‐defined options presented simultaneously or potentially changing options presented sequentially and under uncertainty, or finally, directed or undirected changes to behavior. Despite this apparent diversity in anatomical implementation, it is likely that these different kinds of decisions deploy a conserved computational architecture. References Alexander, W. H., & Brown, J. W. (2011). Medial prefrontal cortex as an action–outcome pre­ dictor. Nature Neuroscience, 14, 1338–1344. Amiez, C., Joseph, J. P., & Procyk, E. (2006). Reward encoding in the monkey anterior cingu­ late cortex. Cerebral Cortex, 16, 1040–1055. Anderson, A. K., Christoff, K., Stappen, I., Panitz, D., Ghahremani, D. G., Glover, G., … Sobel, N. (2003). Dissociated neural representations of intensity and valence in human olfaction. Nature Neuroscience, 6, 196–202. Badre, D., Doll, B. B., Long, N. M., & Frank, M. J. (2012). Rostrolateral prefrontal cortex and individual differences in uncertainty‐driven exploration. Neuron, 73, 595–607. Baillargeon, R., Spelke, E. S., & Wasserman, S. (1985). Object permanence in five‐month‐old infants. Cognition, 20, 191–208. Barraclough, D. J., Conroy, M. L., & Lee, D. (2004). Prefrontal cortex and decision making in a mixed‐strategy game. Nature Neuroscience, 7, 404–410. Baxter, M. G., Parker, A., Lindner, C. C., Izquierdo, A. D., & Murray, E. A. (2000). Control of response selection by reinforcer value requires interaction of amygdala and orbital pre­ frontal cortex. The Journal of Neuroscience, 20, 4311–4319. Bechara, A., Damasio, A. R., Damasio, H., & Anderson, S. W. (1994). Insensitivity to future consequences following damage to human prefrontal cortex. Cognition, 50, 7–15. Bechara, A., Tranel, D., & Damasio, H. (2000). Characterization of the decision‐making def­ icit of patients with ventromedial prefrontal cortex lesions. Brain: A Journal of Neurology, 123, 2189–2202. Behrens, T. E., Woolrich, M. W., Walton, M. E., & Rushworth, M. F. (2007). Learning the value of information in an uncertain world. Nature Neuroscience, 10, 1214–1221. Berger, B. (1992). Dopaminergic innervation of the frontal cerebral cortex. Evolutionary trends and functional implications. Advances in Neurology, 57, 525–544. Berlyne, D. E., Koenig, I. D., & Hirota, T. (1966). Novelty, arousal, and the reinforcement of diver­ sive exploration in the rat. Journal of Comparative and Physiological Psychology, 62, 222–226. Blanchard, T. C., & Hayden, B. Y. (2014). Neurons in dorsal anterior cingulate cortex signal postdecisional variables in a foraging task. The Journal of Neuroscience, 34, 646–655. Blatt, G. J., Andersen, R. A., & Stoner, G. R. (1990). Visual receptive field organization and cortico‐cortical connections of the lateral intraparietal area (area LIP) in the macaque. The Journal of Comparative Neurology, 299, 421–445.

584 Gerhard Jocham, Erie Boorman, and Tim Behrens Blood, A. J., Zatorre, R. J., Bermudez, P., & Evans, A. C. (1999). Emotional responses to pleasant and unpleasant music correlate with activity in paralimbic brain regions. Nature Neuroscience, 2, 382–387. Bogacz, R. (2007). Optimal decision‐making theories: linking neurobiology with behaviour. Trends in Cognitive Sciences, 11, 118–125. Boorman, E. D., Behrens, T. E., & Rushworth, M. F. (2011). Counterfactual choice and learning in a neural network centered on human lateral frontopolar cortex. PLoS Biology, 9, e1001093. Boorman, E. D., Behrens, T. E., Woolrich, M. W., & Rushworth, M. F. (2009). How green is the grass on the other side? Frontopolar cortex and the evidence in favor of alternative courses of action. Neuron, 62, 733–743. Boorman, E. D., Rushworth, M. F., & Behrens, T. E. (2013). Ventromedial prefrontal and anterior cingulate cortex adopt choice and default reference frames during sequential multi‐alternative choice. The Journal of Neuroscience, 33, 2242–2253. Bouret, S., & Richmond, B. J. (2010). Ventromedial and orbital prefrontal neurons differen­ tially encode internally and externally driven motivational values in monkeys. The Journal of Neuroscience, 30, 8591–8601. Bromberg‐Martin, E. S., & Hikosaka, O. (2009). Midbrain dopamine neurons signal preference for advance information about upcoming rewards. Neuron, 63, 119–126. Bromberg‐Martin, E. S., & Hikosaka, O. (2011). Lateral habenula neurons signal errors in the prediction of reward information. Nature Neuroscience, 14, 1209–1216. Butter, C. M., Mishkin, M., & Rosvold, H. E. (1963). Conditioning and extinction of a food‐ rewarded response after selective ablations of frontal cortex in rhesus monkeys. Experimental Neurology, 7, 65–75. Cai, X., Kim, S., & Lee, D. (2011). Heterogeneous coding of temporally discounted values in the dorsal and ventral striatum during intertemporal choice. Neuron, 69, 170–182. Camille, N., Griffiths, C. A., Vo, K., Fellows, L. K., & Kable, J. W. (2011). Ventromedial frontal lobe damage disrupts value maximization in humans. The Journal of Neuroscience, 31, 7527–7532. Camille, N., Tsuchida, A., & Fellows, L. K. (2011). Double dissociation of stimulus‐value and action‐value learning in humans with orbitofrontal or anterior cingulate cortex damage. The Journal of Neuroscience, 31, 15048–15052. Carmichael, S. T., & Price, J. L. (1995). Sensory and premotor connections of the orbital and medial prefrontal cortex of macaque monkeys. The Journal of Comparative Neurology, 363, 642–664. Cavada, C., Company, T., Tejedor, J., Cruz‐Rizzolo, R. J., & Reinoso‐Suarez, F. (2000). The anatomical connections of the macaque monkey orbitofrontal cortex. A review. Cerebral Cortex, 10, 220–242. Chandler, D. J., Lamperski, C. S., & Waterhouse, B. D. (2013). Identification and distribution of projections from monoaminergic and cholinergic nuclei to functionally differentiated subregions of prefrontal cortex. Brain Research, 1522, 38–58. Charnov, E. L. (1976). Optimal foraging, the marginal value theorem. Theoretical Population Biology, 9, 129–136. Chudasama, Y., Daniels, T. E., Gorrin, D. P., Rhodes, S. E., Rudebeck, P. H., & Murray, E. A. (2013). The role of the anterior cingulate cortex in choices based on reward value and reward contingency. Cerebral Cortex, 23, 2884–2898. Cohen, J. D., McClure, S. M., & Yu, A. J. (2007). Should I stay or should I go? How the human brain manages the trade‐off between exploitation and exploration. Philosophical Transactions of the Royal Society of London B Biological Sciences, 362, 933–942. Colom, L. V., Christie, B. R., & Bland, B. H. (1988). Cingulate cell discharge patterns related to hippocampal EEG and their modulation by muscarinic and nicotinic agents. Brain Research, 460, 329–338.

Neuroscience of Value‐Guided Choice 585 Constantino, S., & Daw, N. D. (2015). Learning the opportunity cost of time in a patch‐­ foraging task. Cognitive, Affective, & Behavioral Neuroscience, 1–17. Critchley, H. D., & Rolls, E. T. (1996). Olfactory neuronal responses in the primate orbito­ frontal cortex: analysis in an olfactory discrimination task. Journal of Neurophysiology, 75, 1659–1672. Cromwell, H. C., & Schultz, W. (2003). Effects of expectations for different reward magnitudes on neuronal activity in primate striatum. Journal of Neurophysiology, 89, 2823–2838. Daw, N. D., O’Doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (2006). Cortical sub­ strates for exploratory decisions in humans. Nature, 441, 876–879. Dias, R., Robbins, T. W., & Roberts, A. C. (1996). Dissociation in prefrontal cortex of affective and attentional shifts. Nature, 380, 69–72. Dum, R. P., & Strick, P. L. (1991). The origin of corticospinal projections from the premotor areas in the frontal lobe. The Journal of Neuroscience, 11, 667–689. Duncan, J., & Owen, A. M. (2000). Common regions of the human frontal lobe recruited by diverse cognitive demands. Trends in Neurosciences, 23, 475–483. Fellows, L. K. (2006). Deciding how to decide: ventromedial frontal lobe damage affects information acquisition in multi‐attribute decision making. Brain: A Journal of Neurology, 129, 944–952. Fellows, L. K., & Farah, M. J. (2003). Ventromedial frontal cortex mediates affective shift­ ing in humans: evidence from a reversal learning paradigm. Brain, 126, 1830–1837. Floresco, S. B., & Ghods‐Sharifi, S. (2007). Amygdala–prefrontal cortical circuitry regulates effort‐based decision making. Cerebral Cortex, 17, 251–260. Frank, M. J., Doll, B. B., Oas‐Terpstra, J., & Moreno, F. (2009). Prefrontal and striatal dopa­ minergic genes predict individual differences in exploration and exploitation. Nature Neuroscience, 12, 1062–1068. Freidin, E., Aw, J., & Kacelnik, A. (2009). Sequential and simultaneous choices: testing the diet selection and sequential choice models. Behavioural Processes, 80, 218–223. Gallistel, C. R., Mark, T. A., King, A. P., & Latham, P. E. (2001). The rat approximates an ideal detector of changes in rates of reward:implications for the law of effect. Journal of Experimental Psychology Animal Behavior Processes, 27, 354–372. Glascher, J., Hampton, A. N., & O’Doherty, J. P. (2009). Determining a role for ventromedial prefrontal cortex in encoding action‐based value signals during reward‐related decision making. Cerebral Cortex, 19, 483–495. Gold, J. I., & Shadlen, M. N. (2007). The neural basis of decision making. Annual Review of Neuroscience, 30, 535–574. Hadland, K. A., Rushworth, M. F., Gaffan, D., & Passingham, R. E. (2003). The anterior c­ingulate and reward‐guided selection of actions. Journal of Neurophysiology, 89, 1161–1164. Hamid, A. A., Pettibone, J. R., Mabrouk, O. S., Hetrick, V. L., Schmidt, R., Vander Weele, C. M., … & Berke, J. D. (2016). Mesolimbic dopamine signals the value of work. Nature Neuroscience, 19, 117–126. Hayden, B. Y., Pearson, J. M., & Platt, M. L. (2009). Fictive reward signals in the anterior ­cingulate cortex. Science, 324, 948–950. Hayden, B. Y., Pearson, J. M., & Platt, M. L. (2011). Neuronal basis of sequential foraging decisions in a patchy environment. Nature Neuroscience, 14, 933–939. He, S. Q., Dum, R. P., & Strick, P. L. (1995). Topographic organization of corticospinal projections from the frontal lobe: motor areas on the medial surface of the hemisphere. The Journal of Neuroscience, 15, 3284–3306. Hornak, J., O’Doherty, J., Bramham, J., Rolls, E. T., Morris, R. G., Bullock, P. R., & Polkey, C. E. (2004). Reward‐related reversal learning after surgical excisions in orbito‐frontal or dorsolateral prefrontal cortex in humans. Journal of Cognitive Neuroscience, 16, 463–478.

586 Gerhard Jocham, Erie Boorman, and Tim Behrens Hosokawa, T., Kennerley, S. W., Sloan, J., & Wallis, J. D. (2013). Single‐neuron mechanisms underlying cost–benefit analysis in frontal cortex. The Journal of Neuroscience, 33, 17385–17397. Hunt, L. T., Kolling, N., Soltani, A., Woolrich, M. W., Rushworth, M. F., & Behrens, T. E. (2012). Mechanisms underlying cortical activity during value‐guided choice. Nature Neuroscience, 15, 470–476, S1–3. Hunt, L. T., Woolrich, M. W., Rushworth, M. F., & Behrens, T. E. (2013). Trial‐type dependent frames of reference for value comparison. PLoS Computational Biology, 9, e1003225. Iversen, S. D., & Mishkin, M. (1970). Perseverative interference in monkeys following selective  lesions of the inferior prefrontal convexity. Experimental Brain Research, 11, 376–386. Izquierdo, A., Suda, R. K., & Murray, E. A. (2004). Bilateral orbital prefrontal cortex lesions in rhesus monkeys disrupt choices guided by both reward value and reward contingency. The Journal of Neuroscience, 24, 7540–7548. Jepma, M., & Nieuwenhuis, S. (2011). Pupil diameter predicts changes in the exploration– exploitation trade‐off: evidence for the adaptive gain theory. Journal of Cognitive Neuroscience, 23, 1587–1596. Jocham, G., Furlong, P. M., Kröger, I. L., Kahn, M. C., Hunt, L. T., & Behrens, T. E. (2014). Dissociable contributions of ventromedial prefrontal and posterior parietal cortex to value‐ guided choice. NeuroImage, 100, 498–506. Jocham, G., Hunt, L. T., Near, J., & Behrens, T. E. (2012a). A mechanism for value‐guided choice based on the excitation–inhibition balance in prefrontal cortex. Nature Neuroscience, 15, 960–961. Jocham, G., Hunt, L. T., Near, J., & Behrens, T. E. (2012b). A mechanism for value‐guided choice based on the excitation–inhibition balance in prefrontal cortex. Nature Neuroscience, 15, 960–961. Jocham, G., Neumann, J., Klein, T. A., Danielmeier, C., & Ullsperger, M. (2009). Adaptive coding of action values in the human rostral cingulate zone. The Journal of Neuroscience, 29, 7489–7496. Johnson, A., Varberg, Z., Benhardus, J., Maahs, A., & Schrater, P. (2012). The hippocampus and exploration: dynamically evolving behavior and neural representations. Frontiers in Human Neuroscience, 6, 216. Jones, B., & Mishkin, M. (1972). Limbic lesions and the problem of stimulus – reinforcement associations. Experimental Neurology, 36, 362–377. Jones, B. E., & Moore, R. Y. (1977). Ascending projections of the locus coeruleus in the rat. II. Autoradiographic study. Brain Research, 127, 25–53. Jones, B. F., & Witter, M. P. (2007). Cingulate cortex projections to the parahippocampal region and hippocampal formation in the rat. Hippocampus, 17, 957–976. Joshi, S., Kalwani, R. M., & Gold, J. I. (2013). The relationship between locus coeruleus neu­ ronal activity and pupil diameter. SFN Abstracts. Kable, J. W., & Glimcher, P. W. (2009). The neurobiology of decision: consensus and contro­ versy. Neuron, 63, 733–745. Kakade, S., & Dayan, P. (2002). Dopamine: generalization and bonuses. Neural Networks, 15, 549–559. Kawagoe, R., Takikawa, Y., & Hikosaka, O. (1998). Expectation of reward modulates cognitive signals in the basal ganglia. Nature Neuroscience, 1, 411–416. Kennerley, S. W., Behrens, T. E., & Wallis, J. D. (2011). Double dissociation of value compu­ tations in orbitofrontal and anterior cingulate neurons. Nature Neuroscience. Kennerley, S. W., Dahmubed, A. F., Lara, A. H., & Wallis, J. D. (2009a). Neurons in the frontal lobe encode the value of multiple decision variables. Journal of cognitive neurosci- ence, 21, 1162–1178.

Neuroscience of Value‐Guided Choice 587 Kennerley, S. W., Dahmubed, A. F., Lara, A. H., & Wallis, J. D. (2009b). Neurons in the frontal lobe encode the value of multiple decision variables. Journal of Cognitive Neuroscience, 21, 1162–1178. Kennerley, S. W., Walton, M. E., Behrens, T. E., Buckley, M. J., & Rushworth, M. F. (2006). Optimal decision making and the anterior cingulate cortex. Nature Neuroscience, 9, 940–947. Klein‐Flugge, M. C., & Bestmann, S. (2012). Time‐dependent changes in human corticospinal excitability reveal value‐based competition for action during decision processing. The Journal of Neuroscience, 32, 8373–8382. Koechlin, E., & Hyafil, A. (2007). Anterior prefrontal function and the limits of human decision‐making. Science, 318, 594–598. Kolling, N., Behrens, T. E., Mars, R. B., & Rushworth, M. F. (2012). Neural mechanisms of foraging. Science, 336, 95–98. Kolling, N., Wittmann, M., & Rushworth, M. F. (2014). Multiple neural mechanisms of decision making and their competition under changing risk pressure. Neuron, 81, 1190–1202. Krajbich, I., Armel, C., & Rangel, A. (2010). Visual fixations and the computation and comparison of value in simple choice. Nature Neuroscience, 13, 1292–1298. Kringelbach, M. L., O’Doherty, J., Rolls, E. T., & Andrews, C. (2003). Activation of the human orbitofrontal cortex to a liquid food stimulus is correlated with its subjective pleas­ antness. Cerebral Cortex, 13, 1064–1071. Kunishio, K., & Haber, S. N. (1994). Primate cingulostriatal projection: limbic striatal versus sensorimotor striatal input. The Journal of Comparative Neurology, 350, 337–356. Lau, B., & Glimcher, P. W. (2008). Value representations in the primate striatum during match­ ing behavior. Neuron, 58, 451–463. Leathers, M. L., & Olson, C. R. (2012). In monkeys making value‐based decisions, LIP neu­ rons encode cue salience and not action value. Science, 338, 132–135. Lebreton, M., Jorge, S., Michel, V., Thirion, B., & Pessiglione, M. (2009). An automatic val­ uation system in the human brain: evidence from functional neuroimaging. Neuron, 64, 431–439. Lim, S. L., Colas, J. T., O’Doherty, J. O., & Rangel, A. (2013). Primary motor cortex encodes relative action value signals that integrate stimulus value and effort cost at the time of cost. SFN Abstracts. Lim, S. L., O’Doherty, J. P., & Rangel, A. (2011). The decision value computations in the vmPFC and striatum use a relative value code that is guided by visual attention. The Journal of Neuroscience, 31, 13214–13223. Lindvall, O., Bjorklund, A., & Divac, I. (1978). Organization of catecholamine neurons ­projecting to the frontal cortex in the rat. Brain Research, 142, 1–24. Lo, C. C., & Wang, X. J. (2006). Cortico‐basal ganglia circuit mechanism for a decision threshold in reaction time tasks. Nature Neuroscience, 9, 956–963. Luk, C. H., & Wallis, J. D. (2013). Choice coding in frontal cortex during stimulus‐guided or action‐guided decision‐making. The Journal of Neuroscience, 33, 1864–1871. Matsumoto, K., Suzuki, W., & Tanaka, K. (2003). Neuronal correlates of goal‐based motor selection in the prefrontal cortex. Science, 301, 229–232. McCoy, A. N., Crowley, J. C., Haghighian, G., Dean, H. L., & Platt, M. L. (2003). Saccade reward signals in posterior cingulate cortex. Neuron, 40, 1031–1040. McCoy, A. N., & Platt, M. L. (2005). Risk‐sensitive neurons in macaque posterior cingulate cortex. Nature Neuroscience, 8, 1220–1227. McGinty, V. B., Lardeux, S., Taha, S. A., Kim, J. J., & Nicola, S. M. (2013). Invigoration of reward seeking by cue and proximity encoding in the nucleus accumbens. Neuron, 78, 910–922.

588 Gerhard Jocham, Erie Boorman, and Tim Behrens Milosavljevic, M., Navalpakkam, V., Koch, C., & Rangel, A. (2012). Relative visual saliency dif­ ferences induce sizable bias in consumer choice. Journal of Consumer Psychology, 22, 67–74. Mishkin, M. (1964). Perseveration of central sets after frontal lesions in monkeys. In J. M. Warren & K. Akert (Eds.), The frontal granular cortex and behavior (pp. 219–241). New York, NY: McGraw‐Hill. Monosov, I. E., & Hikosaka, O. (2012). Regionally distinct processing of rewards and punish­ ments by the primate ventromedial prefrontal cortex. The Journal of Neuroscience, 32, 10318–10330. Nassar, M. R., Rumsey, K. M., Wilson, R. C., Parikh, K., Heasly, B., & Gold, J. I. (2012). Rational regulation of learning dynamics by pupil‐linked arousal systems. Nature Neuroscience, 15, 1040–1046. Neubert, F. X., Mars, R. B., Thomas, A. G., Sallet, J., & Rushworth, M. F. (2014). Comparison of human ventral frontal cortex areas for cognitive control and language with areas in monkey frontal cortex. Neuron, 81, 700–713. Niv, Y., Daw, N. D., Joel, D., & Dayan, P. (2007). Tonic dopamine: opportunity costs and the control of response vigor. Psychopharmacology (Berlin), 191, 507–520. Noonan, M. P., Walton, M. E., Behrens, T. E., Sallet, J., Buckley, M. J., & Rushworth, M. F. (2010). Separate value comparison and learning mechanisms in macaque medial and l­ateral orbitofrontal cortex. Proceedings of the National Academy of Sciences of the United States of America, 107, 20547–20552. O’Doherty, J. P. (2014). The problem with value. Neuroscience and Biobehavioral Reviews, 43, 259–268 O’Doherty, J., Rolls, E. T., Francis, S., Bowtell, R., McGlone, F., Kobal, G., … Ahne, G. (2000). Sensory‐specific satiety‐related olfactory activation of the human orbitofrontal cortex. Neuroreport, 11, 893–897. O’Reilly, J. X., Schuffelgen, U., Cuell, S. F., Behrens, T. E., Mars, R. B., & Rushworth, M. F. (2013). Dissociable effects of surprise and model update in parietal and anterior cingulate cortex. Proceedings of the National Academy of Sciences of the United States of America, 110, E3660–E3669. Öngür, D., & Price, J. L. (2000). The organization of networks within the orbital and medial prefrontal cortex of rats, monkeys and humans. Cerebral Cortex, 10, 206–219. Padoa‐Schioppa, C. (2009). Range‐adapting representation of economic value in the orbito­ frontal cortex. The Journal of Neuroscience, 29, 14004–14014. Padoa‐Schioppa, C., & Assad, J. A. (2006). Neurons in the orbitofrontal cortex encode economic value. Nature, 441, 223–226. Padoa‐Schioppa, C., & Assad, J. A. (2008). The representation of economic value in the orbi­ tofrontal cortex is invariant for changes of menu. Nature Neuroscience, 11, 95–102. Pastor‐Bernier, A., & Cisek, P. (2011). Neural correlates of biased competition in premotor cortex. The Journal of Neuroscience, 31, 7083–7088. Pearson, J. M., Heilbronner, S. R., Barack, D. L., Hayden, B. Y., & Platt, M. L. (2011). Posterior cingulate cortex: adapting behavior to a changing world. Trends Cogn Sci, 15, 143–151. Petrides, M., & Pandya, D. N. (1999). Dorsolateral prefrontal cortex: comparative cytoarchi­ tectonic analysis in the human and the macaque brain and corticocortical connection ­patterns. The European Journal of Neuroscience, 11, 1011–1036. Pickens, C. L., Saddoris, M. P., Gallagher, M., & Holland, P. C. (2005). Orbitofrontal lesions impair use of cue–outcome associations in a devaluation task. Behavioral Neuroscience, 119, 317–322. Pickens, C. L., Saddoris, M. P., Setlow, B., Gallagher, M., Holland, P. C., & Schoenbaum, G. (2003). Different roles for orbitofrontal cortex and basolateral amygdala in a reinforcer devaluation task. The Journal of Neuroscience, 23, 11078–11084.

Neuroscience of Value‐Guided Choice 589 Plassmann, H., O’Doherty, J., & Rangel, A. (2007). Orbitofrontal cortex encodes willingness to pay in everyday economic transactions. The Journal of Neuroscience, 27, 9984–9988. Platt, M. L., & Glimcher, P. W. (1999). Neural correlates of decision variables in parietal cortex. Nature, 400, 233–238. Prevost, C., Pessiglione, M., Metereau, E., Clery‐Melin, M. L., & Dreher, J. C. (2010). Separate valuation subsystems for delay and effort decision costs. The Journal of Neuroscience, 30, 14080–14090. Procyk, E., Tanaka, Y. L., & Joseph, J. P. (2000). Anterior cingulate activity during routine and non‐routine sequential behaviors in macaques. Nature Neuroscience, 3, 502–508. Rangel, A., Camerer, C., & Montague, P. R. (2008). A framework for studying the neurobi­ ology of value‐based decision making. Nature Reviews Neuroscience, 9, 545–556. Remondes, M., & Wilson, M. A. (2013). Cingulate‐hippocampus coherence and trajectory coding in a sequential choice task. Neuron, 80, 1277–1289. Rich, E. L., & Wallis, J. D. (2014). Medial‐lateral organization of the orbitofrontal cortex. Journal of cognitive neuroscience, 26, 1347–1362. Roesch, M. R., & Olson, C. R. (2003). Impact of expected reward on neuronal activity in pre­ frontal cortex, frontal and supplementary eye fields and premotor cortex. Journal of Neurophysiology, 90, 1766–1789. Roesch, M. R., & Olson, C. R. (2004). Neuronal activity related to reward value and motivation in primate frontal cortex. Science, 304, 307–310. Rolls, E. T., & Baylis, L. L. (1994). Gustatory, olfactory, and visual convergence within the primate orbitofrontal cortex. The Journal of Neuroscience, 14, 5437–5452. Rolls, E. T., Sienkiewicz, Z. J., & Yaxley, S. (1989). Hunger modulates the responses to gusta­ tory stimuli of single neurons in the caudolateral orbitofrontal cortex of the macaque monkey. The European Journal of Neuroscience, 1, 53–60. Rudebeck, P. H., Behrens, T. E., Kennerley, S. W., Baxter, M. G., Buckley, M. J., Walton, M. E., & Rushworth, M. F. (2008). Frontal cortex subregions play distinct roles in choices between actions and stimuli. The Journal of Neuroscience, 28, 13775–13785. Rudebeck, P. H., Walton, M. E., Smyth, A. N., Bannerman, D. M., & Rushworth, M. F. (2006). Separate neural pathways process different decision costs. Nature Neuroscience, 9, 1161–1168. Rushworth, M. F., & Behrens, T. E. (2008). Choice, uncertainty and value in prefrontal and cingulate cortex. Nature Neuroscience, 11, 389–397. Rushworth, M. F., Kolling, N., Sallet, J., & Mars, R. B. (2012). Valuation and decision‐making in frontal cortex: one or many serial or parallel systems? Current Opinion in Neurobiology, 22, 946–955. Rushworth, M. F., Noonan, M. P., Boorman, E. D., Walton, M. E., & Behrens, T. E. (2011). Frontal cortex and reward‐guided learning and decision‐making. Neuron, 70, 1054–1069. Salamone, J. D., Correa, M., Farrar, A., & Mingote, S. M. (2007). Effort‐related functions of nucleus accumbens dopamine and associated forebrain circuits. Psychopharmacology (Berlin), 191, 461–482. Samejima, K., Ueda, Y., Doya, K., & Kimura, M. (2005). Representation of action‐specific reward values in the striatum. Science, 310, 1337–1340. Schoenbaum, G., Chiba, A. A., & Gallagher, M. (1999). Neural encoding in orbitofrontal cortex and basolateral amygdala during olfactory discrimination learning. The Journal of Neuroscience, 19, 1876–1884. Schwartenbeck, P., Fitzgerald, T., Dolan, R. J., & Friston, K. (2013). Exploration, novelty, surprise, and free energy minimization. Frontiers in Psychology, 4, 710. Seo, H., Barraclough, D. J., & Lee, D. (2007). Dynamic signals related to choices and out­ comes in the dorsolateral prefrontal cortex. Cerebral Cortex, 17, i110–i117.

590 Gerhard Jocham, Erie Boorman, and Tim Behrens Seo, H., Barraclough, D. J., & Lee, D. (2009). Lateral intraparietal cortex and reinforcement learning during a mixed‐strategy game. The Journal of Neuroscience, 29, 7278–7289. Seo, H., & Lee, D. (2007). Temporal filtering of reward signals in the dorsal anterior cingulate cortex during a mixed‐strategy game. The Journal of Neuroscience, 27, 8366–8377. Serences, J. T. (2008). Value‐based modulations in human visual cortex. Neuron, 60, 1169–1181. Sescousse, G., Redoute, J., & Dreher, J. C. (2010). The architecture of reward value coding in the human orbitofrontal cortex. The Journal of Neuroscience, 30, 13095–13104. Shadlen, M. N., & Newsome, W. T. (1996). Motion perception: seeing and deciding. Proceedings of the National Academy of Sciences of the United States of America, 93, 628–633. Shidara, M., Aigner, T. G., & Richmond, B. J. (1998). Neuronal signals in the monkey ventral striatum related to progress through a predictable series of trials. The Journal of Neuroscience, 18, 2613–2625. Shima, K., & Tanji, J. (1998). Role for cingulate motor area cells in voluntary movement selec­ tion based on reward. Science, 282, 1335–1338. Shuler, M. G., & Bear, M. F. (2006). Reward timing in the primary visual cortex. Science, 311, 1606–1609. Smith, E. A., & Winterhalder, B. (1992). Evolutionary ecology and human behavior. New York, NY: de Gruyer. Smith, P. L., & Ratcliff, R. (2004). Psychology and neurobiology of simple decisions. Trends in Neurosciences, 27, 161–168. Stephens, D. W., & Krebs, J. R. (1986). Foraging theory. Princeton, NJ: Princeton University Press. Strait, C. E., Blanchard, T. C., & Hayden, B. Y. (2014). Reward value comparison via mutual inhibition in ventromedial prefrontal cortex. Neuron. Sugrue, L. P., Corrado, G. S., & Newsome, W. T. (2004). Matching behavior and the repre­ sentation of value in the parietal cortex. Science, 304, 1782–1787. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: an introduction. Cambridge, MA: MIT Press. Thorpe, S. J., Rolls, E. T., & Maddison, S. (1983). The orbitofrontal cortex: neuronal activity in the behaving monkey. Experimental Brain Research, 49, 93–115. Towal, R. B., Mormann, M., & Koch, C. (2013). Simultaneous modeling of visual saliency and value computation improves predictions of economic choice. Proceedings of the National Academy of Sciences of the United States of America, 110, E3858–E3867. Tremblay, L., & Schultz, W. (1999). Relative reward preference in primate orbitofrontal cortex. Nature, 398, 704–708. Tsuchida, A., Doll, B. B., & Fellows, L. K. (2010). Beyond reversal: a critical role for human orbitofrontal cortex in flexible learning from probabilistic feedback. The Journal of Neuroscience, 30, 16868–16875. Vasconcelos, M., Monteiro, T., Aw, J., & Kacelnik, A. (2010). Choice in multi‐alternative envi­ ronments: a trial‐by‐trial implementation of the sequential choice model. Behav Processes, 84, 435–439. Voss, J. L., Gonsalves, B. D., Federmeier, K. D., Tranel, D., & Cohen, N. J. (2011). Hippocampal brain‐network coordination during volitional exploratory behavior enhances learning. Nature Neuroscience, 14, 115–120. Voss, J. L., Warren, D. E., Gonsalves, B. D., Federmeier, K. D., Tranel, D., & Cohen, N. J. (2011). Spontaneous revisitation during visual exploration as a link among strategic behavior, learning, and the hippocampus. Proceedings of the National Academy of Sciences of the United States of America, 108, E402–E409. Wallis, J. D., & Miller, E. K. (2003). Neuronal activity in primate dorsolateral and orbital ­prefrontal cortex during performance of a reward preference task. The European Journal of Neuroscience, 18, 2069–2081.

Neuroscience of Value‐Guided Choice 591 Walton, M. E., Bannerman, D. M., Alterescu, K., & Rushworth, M. F. (2003). Functional s­pecialization within medial frontal cortex of the anterior cingulate for evaluating effort‐ related decisions. The Journal of Neuroscience, 23, 6475–6479. Walton, M. E., Bannerman, D. M., & Rushworth, M. F. (2002). The role of rat medial frontal  cortex in effort‐based decision making. The Journal of Neuroscience, 22, 10996–11003. Walton, M. E., Behrens, T. E., Buckley, M. J., Rudebeck, P. H., & Rushworth, M. F. (2010). Separable learning systems in the macaque brain and the role of orbitofrontal cortex in contingent learning. Neuron, 65, 927–939. Walton, M. E., Devlin, J. T., & Rushworth, M. F. (2004). Interactions between decision mak­ ing and performance monitoring within prefrontal cortex. Nature Neuroscience, 7, 1259–1265. Wang, A. Y., Miura, K., & Uchida, N. (2013). The dorsomedial striatum encodes net expected return, critical for energizing performance vigor. Nature Neuroscience, 16, 639–647. Wang, X. J. (2002). Probabilistic decision making by slow reverberation in cortical circuits. Neuron, 36, 955–968. Watanabe, M. (1996). Reward expectancy in primate prefrontal neurons. Nature, 382, 629–632. Wilson, R. C., & Niv, Y. (2011). Inferring relevance in a changing world. Frontiers in Human Neuroscience, 5, 189. Wong, K. F., & Wang, X. J. (2006). A recurrent network mechanism of time integration in perceptual decisions. The Journal of Neuroscience, 26, 1314–1328. Wunderlich, K., Rangel, A., & O’Doherty, J. P. (2009). Neural computations underlying action‐based decision making in the human brain. Proceedings of the National Academy of Sciences of the United States of America, 106, 17199–17204. Wunderlich, K., Rangel, A., & O’Doherty, J. P. (2010). Economic choices can be made using only stimulus values. Proceedings of the National Academy of Sciences of the United States of America, 107, 15005–15010. Yu, A. J., & Dayan, P. (2005). Uncertainty, neuromodulation, and attention. Neuron, 46, 681–692.

Index acquisition processes  10, 11, 12, 15, 16, allergy, food (experiment)  116–17, 69, 76 119, 394 action(s)  411–28, 489 alpha suppression  527 choice between  417–18 Alzheimer’s disease (AD)  271 definition and distinction from response 435 mouse model  155, 275 goal‐directed see goal‐directed processes ambiguity 104–5 habits as sequences of  422–3 habitual see habit cue  292, 293 object‐directed, mirror neurons and  feature  255, 257, 267–9, 270 523, 529 amnesia, medial temporal lobe  272, 274 selection and execution  504–5 AMPA receptors value of  555 recognition memory and subunit GluA1 action–outcome (A–O)  412–18 of 189–92 habit formation and  412–18 value‐guided choice and  563 value‐guided choice and  559 amygdala  98–9, 459–61 basolateral action understanding  529, 530, 531 active avoidance  442, 443, 458, 460, 460 child and adolescent anxiety and  adaptation 473, 477 behavioral 569–83 habit formation and  431, 432 in mirror neuron origin  519–22, 532 prediction error and  104 reduction in attention and  101 distinction from associative accounts  reward and  100–1 522–31 central/central nucleus (CeA)  98–9 avoidance and  459, 460 in perceptual learning  206–7, 209, 212 habit formation and  427–8 addiction, drug  129–30, 287 prediction error and  104 adolescents, anxiety  468–88 reward and  98, 99, 100, 101 adults contextual conditioning and  295, 305 fear conditioning  32–3, 291–2, 295 mirror neurons  520, 522–3, 527, 528 fear learning/conditioning and  473, 474, perceptual learning  226, 228, 229, 232, 475, 477, 478 233, 234, 239 The Wiley Handbook on the Cognitive Neuroscience of Learning, First Edition. Edited by Robin A. Murphy and Robert C. Honey. © 2016 John Wiley & Sons, Ltd. Published 2016 by John Wiley & Sons, Ltd.

Index 593 contextual conditioning and  32–3, avoiding the experience of see avoidance 291–2, 295 flavor/taste  14–15, 71, 72–3, 74, 77, 80, lateral  27, 32, 33 229, 315, 413–14, 417, 419, 427, animal learning 455–6, 492 inhibition and  492, 495, 496–7, 499–500, contemporary 70–2 502, 504, 505, 506, 507 historical studies  69–70 timing and aversive unconditioned anxiety, child and adolescent  468–88 stimulus  357, 359, 362 see also fear avoidance 442–67 Aplysia acquisition 450–1 epigenetics  144, 147, 160, 164, 165 active  442, 443, 458, 460, 460 Hebbian hypothesis  27 associative theories  444–50 appetitive conditioning (incl. appetitive conditions necessary for  450–3 maintenance 451–2 instrumental conditioning)  5–7, mechanisms 458–61 77, 287, 427, 442, 443, 447, 455, passive  442, 443 457, 460 Babbage, Charles  57 inhibitory processes and  495, 496, 497, backward conditioned inhibition  493 500, 505, 506 basal ganglia and response inhibition  501 timing  360–1, 362, 363, 364, 365, 371 beacons in mazes  320, 325, 327, arenas and navigations  317–18, 321–2, 323, 330–1, 332 324, 328, 332, 333, 334, 337 behavioral adaptation  569–83 arousal, general enhancement of  288 behavioral inhibition  497 associative learning  1–176 top‐down 500–1 central problem of standard models  78–9 belongingness (stimulus–reinforcer conditions for  49–57 relevance)  21, 23 theoretical elaboration  78–80 blocked exposure in perceptual learning  205, associative representations  177–407 209, 213–17 associative stream in perceptual learning  blocking effect  19, 24, 31, 32, 33, 201, 202 35, 94–6 attention 86–135 causation and  381–2 brain regions involved see brain lexical development and  549 derived, humans  114–35 spatial learning  331, 332, 333, 337 joint, and lexical development  545 BNDF (brain‐derived neurotrophic factor) perception and  131, 225–9 gene  145, 148, 157, 159, 161, 162, reductions in  100–1 163, 165, 166 reward learning see reward brain (incl. neural mechanisms/correlates) set see set attentional processes  212 spatial learning and changes in  330–1 attentional set  90–6 value‐guided choice and  568 reduction in attention  101–2 weighting 225–9 reward 98–101 auditory stimuli and their processing avoidance behavior and  458–61 lexical development  541, 544, 545 epigenetics and areas involved in  157–65 perceptual learning  226, 230 fear learning and  473–4 auto‐associator habitual behavior and  423–8 causal learning and  383, 387, 390, 393 imaging see neuroimaging lexical development and  540–1, 543 inhibition and  498–9 autocatalytic loops  142 learning and conditioning (in general)  auto‐encoder and lexical development  543 26–33 automatic imitation  519, 529 mediated learning and  74–7 automatization  502, 503, 505, 506 autoshaping, pigeon  16–17, 20, 23, 25 aversion

594 Index brain (incl. neural mechanisms/correlates) perceptual learning  226, 228 (cont’d ) see also infancy; neonates choice and decision‐making  554–91 mirror neuron populations  519, 531 between two actions  417–18 outcome expectations and  393–5 information processing and  351–2 plasticity see plasticity mechanisms 562–7 prediction error and  33, 57–65, 102–3, risky 104–6 value‐guided 554–91 393–5 cholinergic systems timing mechanisms and  356–62 reduction in attention and  101 value‐guided choice and  554, 557, 558, temporal learning and  367 chromatin markings  139–41 561, 562, 563, 566, 568, 582, 591 storage of memory and  152 see also specific parts cingulate cortex, anterior (ACC), value‐ brain‐derived neurotrophic factor (BDNF) guided choice and  558, 559, 567, gene  145, 148, 157, 159, 161, 162, 570, 573–4, 575, 576, 577, 578 163, 165, 166 cognitive expectancy  448–90 cognitive expectancy see expectations CA1 of hippocampus and temporal cognitive impairment, mild (MCI)  cognition  358, 359, 360, 361 276, 277 cognitive inhibition  489, 497 CA3 of hippocampus and temporal cognitive maps 318–25, 326, 327, 329, cognition  358, 361, 362 330, 331, 332, 336, 337, 338 color (walls of arena) and spatial Caenorhabditis elegans  160, 164, 165, 168 learning  318, 333 calcineurin  145, 162 comparison/comparator processes  25–6, 34 calcium/calmodulin‐dependent protein in perceptual learning  205–7 competition (between) kinase II  142 cues see cues Cambridge Neuropsychological Test recognition memory  187–9 short‐term and long‐term Automated Battery (CANTAB) habituation 180–1 ID–ED task  90, 91 short‐term and long‐term memory  cAMP response element‐binding(CREB) 1 184, 189 and CREB2  143, 147 short‐term and long‐term spontaneous cannabinoid (CB) receptors  425, 426 novelty preference behaviour  181–2 canonical neurons  517, 522 complete serial compound temporal capture, value‐driven  122–3, 124 difference simulator  355 catechol‐O‐methyltransferase enzyme gene  compound discrimination  90, 91 399–400 computation(s) 65 categorical perception  236–9 associative 48–9 categorization (category learning)  223–35, computational models of memory  249–82 236, 237 computer simulations see simulation causation 380–407 conceptual knowledge and low‐level inferring larger causal structures from perception 239–40 individual causal links  340–3 concurrent processing of CS–US  22–3, 25, cell 26, 33, 34 epigenetic mechanisms and formation of condition(s) (for learning/Pavlovian cellular associations  150–1 learning) 35–6 heredity 139 associative learning  49–57 memory  136, 137, 139, 143 avoidance learning  450–3 cerebellum and eyeblink conditioning  29–30, 31–2 Chasmagnathus granulatus 159 checkerboards  202, 205, 213, 230, 233 children anxiety 468–88 development see development

Index 595 fear learning  468–74, 476, 477 contextual (context‐specific/dependent) spatial learning  327–36 processes/conditioning 285–312 conditioned excitation see excitatory definition of context  286 conditioning excitatory  294, 295–6, 298, 300–1, conditioned inhibition see inhibitory 301–2 conditioning; inhibitory inhibitory  294–306, 307 control long‐term habituation  184 conditioned response (CR)  349, 353, 357, mirror neurons and  526 360, 362, 363 US associations  286, 287–94, 306–7 acquisition  351, 353 contiguity timing 351 mirror neurons and  522 conditioned stimuli (CS)  8–35 spatial 17–18 contextual learning and  286 temporal  13–15, 20, 24, 26–30, intensity 10 novelty 10 348–50 prediction and  52–4 contingency (learning)  19–20 timing 348–71 conditioned stimuli–unconditioned stimuli action–outcome, manipulating  412–13 (CS–US) degradation  35, 418 avoidance learning and  450–2 instrumental  412, 422, 432, 445, concurrent processing of representations of  22–3, 25, 26, 33, 34 452–3 contingency degradation effects  35 mirror neurons and  522, 522–6, 528 convergence  27, 31 overt attention during  116 pairings  11–13, 22, 24, 27, 28 Pavlovian 340–2 number 10–12 correlation order 12–13 in lexical development  541, 549 similarity 18–21 response rate–reward rate  414–16 training trials  11–12 cortico‐striatal and cortico‐striato‐thalamic conditioning appetitive see appetitive conditioning circuitry context‐specific/dependent see contextual attentional set and  92–4 processes habit formation and  424, 426, 433 fear see fear value‐guided choice and  559 instrumental see instrumental learning CpG (cytosine–guanine) dinucleotides neural principles  26–33 Pavlovian see Pavlovian learning (CpG)  139, 140, 161, 162 serial 97–8 CREB1 and CREB2 (cAMP response timing and see temporal characteristics trace see trace conditioning element‐binding 1 and 2)  143, 147 conflict (competition between response cue(s) options) 499–500 congruency tasks  497, 499 ambiguity  292, 293 connectivity (neural network)  138 blocking  19, 24, 31, 32, 33, 35, 94–6, habit formation and  424, 425 lexical development and  541 381, 382 memory and  138, 151–2, 541 competition between  19 contents of learning  35–6 avoidance and  454 spatial learning and  331–6 context chamber and context testing room temporal difference and  354 conditioning 289–90 contextual  286, 292–3, 298–302 priority 52–3 relative cue validity  20, 23, 24, 25, 33 unblocking  32, 33, 62, 99, 100–1 cue–outcome (A–O) associations and causal learning 381–90 “cycle” to “trial” (C/T) ratio  16–17, 20, 23, 24, 25, 28 cytosine–guanine (CpG) dinucleotides  139, 140, 161, 162


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook