190 8 Bayesian-Based Constructivist Computational Models is to offer computational accountability and epistemic sufficiency for the construc- tivist rapprochement and epistemic conjectures proposed in Chap. 7. 8.1 The Derivation of a Bayesian Stance Probabilistic modeling tools have supported significant components of AI research since the 1950s. Researchers at Bell Labs built a speech system that could recognize any of the ten digits spoken by a single speaker with accuracy in the high 90% range (Davis et al. 1952). Shortly after this Bledsoe and Browning (1959), built a proba- bilistic letter recognition system that used a large dictionary to serve as the corpus for recognizing hand-written characters, given the likelihood of character sequences and particular characters. Later research addressed authorship attribution by look- ing at the word patterns in anonymous literature and comparing these to similar patterns of known authors (Mosteller and Wallace 1963). By the early 1990s, much of computation-based language understanding and generation in AI was accomplished using probabilistic techniques, including pars- ing, part-of-speech tagging, reference resolution, and discourse processing. These techniques often used tools like greatest likelihood measures (Jurafsky and Martin 2020) that we will describe in detail. Other areas of artificial intelligence, especially machine learning, became more Bayesian-based (Russell and Norvig 2010; Luger 2009a). In many ways, these uses of stochastic technology for pattern recognition and learning were another instantiation of the constructivist tradition, as collected sets of patterns were used to condition the recognition of new patterns. We begin by asking how an epistemologist might build a model of a constructivist worldview. Historically, an important response to David Hume’s skepticism was that of the English cleric, Thomas Bayes (1763). Bayes was challenged to defend the gospel and other believers’ accounts of Christ’s miracles in the light of Hume’s claims that such “accounts” could not attain the credibility of a “proof.” Bayes response, pub- lished posthumously in the Transactions of the Royal Society, was a mathematics- based demonstration of how an agent’s prior expectations can be related to their current perceptions. Bayes’ approach, although not supporting the creditability of miracles, has had an important effect on the design of probabilistic models. We develop Bayes’ insight next and then conjecture, using several examples, how Bayes’ theorem can support a computational model of epistemological access. Suppose a medical doctor is examining the symptoms of a patient to determine a possible infecting organism. In this example, there is a single symptom, evidence e, and a single hypothesized infecting agent, h. The doctor wishes to determine how the perception of a patient’s bad headache, for example, can indicate the presence of meningitis infection. In Fig. 8.1, there are two sets: One set, e, contains all the people who have bad headaches, and the second set, h, contains all people that have the meningitis infec- tion. We want to determine the probability that a person who has a bad headache
8.1 The Derivation of a Bayesian Stance 191 also has meningitis. We call this p(h|e), or “the probability p that a person has the disease h, given that he/she suffers headaches e, the evidence.” To determine p(h|e), we need to determine the number of people having both the symptom and the disease and divide this number by the total number of people hav- ing the symptom. We call this the posterior probability, or the probability that the new information the diagnostician obtains is indicative of the disease. Since each of the sets of people in the example of Fig. 8.1 may be divided by the total number of people considered, we represent each number as a probability. Thus, to represent the probability of the disease h given the symptom e as p(h|e): ph| e h e / e ph e / pe, where “|” surrounding symbols, e.g., | e |, indicates the number of people in that set. Similarly, we wish to determine the prior probability, or expectations of the diagnostician, given this situation. This prior information reflects the knowledge the diagnostician has accumulated in medical training and during past diagnostic expe- riences. In this example, the probability that people with meningitis also have head- aches, or the probability of evidence e, given the disease h, is p(e|h). As previously argued: pe | h e h / h pe h / ph. The value of p(e ∩h) can now be determined by multiplying by p(h): pe | h ph pe h. Finally, we can determine a measure for the probability of the hypothesized dis- ease, h, given the evidence, e, in terms of the probability of the evidence given the hypothesized disease: ph | e pe h / pe pe | h ph / pe. This last formula is Bayes’ law for one piece of evidence and one disease. Let’s review what Bayes’ formula accomplishes. It creates a relationship between the posterior probability of the disease given the symptom, p(h|e), and the prior disease h he symptom e Fig. 8.1 A representation of the numbers of people having a symptom, e, and a disease, h. We want to measure the probability of a person having the disease given that he/she suffers the symp- tom, i.e., the number of people in both sets e and h, or e ∩ h. We then divide this number by the total number of people in set e
192 8 Bayesian-Based Constructivist Computational Models knowledge of the symptom given the disease, p(e|h). In this example, the medical doctor’s experience over time supplies the prior knowledge of what should be expected when a new situation—the patient with a symptom—is encountered. The probability of the new person with symptom e having the hypothesized disease h is represented in terms of the collected knowledge obtained from previous situations, where the diagnosing doctor has seen that a diseased person had a symptom, p(e|h), and how often the disease itself occurs, p(h). Consider the more general case with the same set-theoretic argument of the prob- ability of a person having a possible disease given two symptoms, say of having meningitis while suffering from both a bad headache and a high fever. The probabil- ity of meningitis given these two symptoms is a function of the prior knowledge of having the two symptoms at the same time as the disease and the probability of the disease itself. We present the general form of Bayes’ rule for a particular hypothesis, hi, given a set of possibly multiple pieces of evidence, E: phi | E pE | hi phi / pEphi | E is the probability a particular hypothesis, hi , is true given evidence E. p hi is the probability that hi can happen. p E | hi is the probability of observing evidence E when hypothesis hi is true. pE is the probability of the evidence being true in the population. When extending Bayes rule from one to multiple pieces of evidence, the right- hand side of the rule reflects the situation where a number of pieces of evidence co-occur with each hypothesis hi. We make two assumptions on this evidence. First, for each hypothesis hi, the pieces of evidence are independent. Second, the sum, or set union, of all the individual pieces of the evidence, ei, make up the full set of evidence, E, as seen in Fig. 8.2. Given these assumptions about the evidence occurring with each hypothesis, it is possible to directly calculate, when required in Bayes’ theorem, the probability of that evidence, given an hypothesis, p(E|hi): p E | hi p e1,e2 ,},en | hi p e1 | hi up e2 | hi u}u p en | hi . e1 e2 . . . en Fig. 8.2 The set of all possible evidence, E, is assumed to be partitioned by the individual pieces of evidence, ei, for each hypothesis hi
8.1 The Derivation of a Bayesian Stance 193 Under the independence assumption, the denominator of Bayes’, p(E), is then: pE 6i pE | hi u phi In most realistic diagnostic situations, the assumed independence of the evi- dence, given an hypothesis, is not supported. Some pieces of evidence may not be unrelated to others, such as the presence of individual words in a sentence. This violates the independence assumption for calculating p(E|hi). Making this indepen- dence of evidence assumption, even when it is not justified, is called naive Bayes. The general form of Bayes’ theorem offers a computational model for the prob- ability of a particular situation happening, given a set of evidence clues. The right- hand side of Bayes’ equation represents a schema describing how prior accumulated knowledge of phenomena can relate to the interpretation of a new situation, described by the left-hand side of the equation. This theorem itself can be seen as an example of Piaget’s assimilation, where new information is interpreted using the patterns created from prior experiences. There are limitations to proposing Bayes’ theorem, as just presented, as an epis- temic description of interpreting new data in the context of prior knowledge. First, the fact is that the epistemic subject is not a calculating machine. We simply do not have all the prior numerical values for calculating the hypotheses/evidence rela- tions. In a complex situation such as medicine, where there can be hundreds of hypothesized diseases and thousands of symptoms, this calculation is impossible. One response to the “requirement of extensive mathematics” criticism is that a Hebbian (1949) like conditioning occurs across time and expectations to the point where new posterior information triggers an already constituted expectation-based interpretation. This would be particularly true for the trained expert, such as a doc- tor, working within her own area of expertise. Hume’s (1748/1975) suggestion that associations are built up over time from accumulated perceptual experience also describes this interpretation. Further, in many applications, the probability of the occurrence of evidence across all hypotheses, p(E), the right-hand-side denominator of Bayes’ equation, is simply a normalizing factor, supporting the calculation of a probability measure in the range of 0 to 1. The same normalizing factor is utilized in determining the actual probability of each of the hi, given the evidence, and thus it can be ignored. When the denominator is simply ignored, the result is described as a determination of the most likely explanation, or greatest likelihood measure for any hypothesis hi, given the accumulation of evidence. For example, if we wish to determine which of all the hi has the most support at any particular time, we can consider the largest p(E|hi) p(hi) and call this the argmax(hi) for the hypothesis hi. As just noted, this number is not a probability. argmax hi of pE | hi phi ,for each hi. In a dynamic interpretation, as pieces of evidence change across time, we will call this argmax of hypotheses given a set of evidence at a particular time the
194 8 Bayesian-Based Constructivist Computational Models greatest likelihood of that hypothesis at that time. We show this relationship, an extension of the Bayesian maximum a posteriori (MAP) estimate, as a dynamic measure of time t: gl hi | Et argmax hi of p Et | hi p hi ,for each hi. When there are multiple pieces of evidence at time t, that is Et = e1t, e2t, …, ent, the naive Bayes independence assumption means that: p Et | hi p e1t , e 2 ,}e n | hi p e1t | hi u p e2t | hi uu p ent | hi . t t This model is both intuitive and simple: the most likely interpretation of hi, given evidence E, at time t, is a function of which interpretation is most likely to produce that evidence at the time t and the probability of that interpretation itself occurring. We can now ask how the argmax specification can produce a computational epis- temic model of phenomena. First, we see that the argmax relationship offers a falsi- fiable approach to explanation. If more data turns up at a particular time, an alternative hypothesis can attain a higher argmax value. Furthermore, when some data suggest an hypothesis, hi, it is usually only a subset of the full set of data that can support that hypothesis. Going back to our medical hypothesis, a bad headache can be suggestive of meningitis, but there is much more evidence gathered over time that is even more suggestive of this hypothesis, for example, fever, nausea, and the results of certain blood tests. As other data become available, it might also decrease the likelihood of a diagnosis of meningitis. We view the evolving greatest likelihood relationship as a continuing tension between a set of possible hypotheses and the accumulating evidence collected across time. The presence of changing data supports the continuing revision of the greatest likelihood hypothesis, and, because data sets are not always complete, the possibility of a particular hypothesis motivates the search for data that can either support or falsify it. Thus, the greatest likelihood measure represents a dynamic equilibrium, evolving across time, of hypotheses suggesting supporting data as well as the presence of data supporting particular hypotheses. Piaget (1983) refers to this perception/response processing as finding equilibration. When a data/hypothesis relationship is not “sufficiently strong” over any time period ti, and/or no particular data/hypothesis relationship seems to dominate, as measured by the values of argmax(hi) of p(Et|hi) p(hi), the search for better expla- nations, based on revised models, becomes important. Two approaches often help in this task: First, to search for new relationships among the already known data/ hypothesis relationships—perhaps some important component of the possibility space is overlooked. For example, when the amount of energy required to accelerate a particle increases, as well as the mass of the particle, at extreme accelerations, Newton’s laws needed to be revised. A second approach to model revision is to actively intervene in the data/hypoth- esis relationships of the model. “Is there a new diagnosis if the patient’s headache
8.2 Bayesian Belief Networks, Perception, and Diagnosis 195 goes away?” “What has changed if a fever spikes?” “How should a parent respond when their 2-year-old throws food on the floor?” How evidence changes over time can suggest new hypotheses. These two techniques support model induction, the creation of new models to explain data. Model induction is an important component of current research in machine learning (Tian and Pearl 2001; Sakhanenko et al. 2008; Rammohan 2010). This type of agent-based model revision demonstrates Piaget’s notion of accommodation and will be discussed further in Sect. 9.3. In Sect. 8.2, we present Bayesian belief networks and show how these can, in a form called dynamic Bayesian networks, model changes in hypotheses, given the perception of new data across time. In Sect. 8.3, we discuss model revision and the use of contrapositive, or what if, reasoning as an epistemic tool for understanding complex tasks in a changing world. 8.2 B ayesian Belief Networks, Perception, and Diagnosis Data collection is often a limiting factor for using full Bayesian inference for diag- noses in complex environments. To use Bayes’ theorem to calculate probabilities in medicine, for example, where there can be hundreds of possible diagnoses, and thousands of possible symptoms, the data collection problem becomes impossible (Luger 2009a, p. 185). As a result, for complex environments, the ability to perform full Bayesian inference can lose plausibility as an epistemic model. The Bayesian belief network or BBN (Pearl 1988, 2000) is a graph whose links are conditional probabilities. The graph is acyclic, in that there are no link sequences from a node back to itself. It is also directed, in that links are conditioned probabili- ties that are intended to represent causal relationships between the nodes. With these assumptions, it can be shown that BBN’s nodes are independent of all their non- descendants, nodes that they are not directly or indirectly linked to, given knowl- edge of their parents, that is, nodes linking to them. Judea Pearl proposed the use of Bayesian belief networks, making the assump- tion that their links reflected causal relationships. With the demonstrated indepen- dence of states from their non-descendants, given knowledge of their parents, the use of Bayesian technology came to entirely new importance. First, the assumption of these networks being directed graphs that disallowed cycles was an improvement to the computational costs of reasoning with traditional Bayes (Luger 2009a, Sect. 9.3). More importantly, the independence assumption that splits, or factors, the rea- soning space into independent components makes the BBN a transparent represen- tational model and captures causal relationships in a computationally useful format. We demonstrate, in our next examples, how the BBN supports both transparency and efficient reasoning. We next illustrate the BBN diagnosing failures in discrete component semicon- ductors (Stern and Luger 1997; Chakrabarti et al. 2005). The semiconductor failure
196 8 Bayesian-Based Constructivist Computational Models model determines the greatest likelihood for hypotheses, given sets of data. Consider the situation of Fig. 8.3 showing two different types of semiconductor failure. The examples of Fig. 8.3 show a failure type called open, or the break in a wire connecting components to other parts of the system. For the diagnostic expert, the perceptual aspects of a break support a number of alternative hypotheses of how the break occurred. The search for the most likely explanation for an open failure broad- ens the evidence search: How large is the break? Is there any discoloration related to the break? Were there any sounds or odors when it happened? What are the result- ing conditions of the other components of the system? Driven by the data search supporting multiple possible hypotheses that can explain the open, or break, the expert notes the bambooing effect in the discon- nected wire in Fig. 8.3a. This suggests the greatest likelihood hypothesis that the open was created by metal crystallization and likely caused by a sequence of low- frequency, high-current pulses. The greatest likely hypothesis for the wire break of Fig. 8.3b, where the wire’s end is seen as balled, is melting due to excessive current. Both of these diagnostic scenarios have been implemented by an expert system-like search through an hypothesis space (Stern and Luger 1997) as well as reflected in a Bayesian belief net (Chakrabarti et al. 2005). Figure 8.4 presents the Bayesian belief net (BBN) capturing these and other related diagnostic situations for discrete component semiconductor failures. The BBN, before new data are presented, represents the a priori state of the experts’ knowledge of this application domain, including the likelihood of the fail- ures of the individual components. These networks of causal relationships are usu- ally carefully crafted through many hours of working with human experts’ analysis of components and their known failures. As a result, the BBN captures the a priori expert knowledge implicit in a domain of interest. When new data are given to the BBN, e.g., the wire is “bambooed” or the color of the copper wire is normal, the belief network “infers” the most likely explanation for the break using its a priori model and given this new information. Fig. 8.3 Two examples of discrete component semiconductors, each exhibiting the “open” failure (Luger et al. 2002; Chakrabarti et al. 2005)
8.2 Bayesian Belief Networks, Perception, and Diagnosis 197 Fig. 8.4 A Bayesian belief network representing the causal relationships and data points implicit in the discrete component semiconductor domain. As data are discovered and presented to the BBN, probabilistic hypotheses change. Figure adapted from (Luger et al. 2002) There are several BBN reasoning rules available for arriving at this best explana- tion, including loopy belief propagation (Pearl 1988), discussed later. An important result of using the BBN technology is that as one hypothesis achieves its greatest likelihood, other related hypotheses are “explained away,” i.e., their likelihood mea- sures decrease within the BBN. We will see further demonstrations of this phenom- enon in our next example. Finally, the BBN semiconductor example supports both Conjectures 7 and 8 of Sect. 7.4. Conjecture 7 states that a priori knowledge, data known at a particular time, supports purpose-driven diagnoses. Conjecture 8 claims that world knowledge is continuously reformulated through a model revision process. We next demon- strate how, as data changes across time, different hypotheses can offer best explana- tions. For this, our model must change its probabilities over time as new data is encountered. A dynamic Bayesian network, or DBN, is a sequence of identical Bayesian net- works whose network nodes are linked in the directed dimension of time. This rep- resentation extends BBNs into multidimensional environments and preserves the same tractability in reasoning toward best explanations. With the factoring of the search space and the ability to address complexity issues, the dynamic Bayesian network becomes a potential model for exploring diagnostic situations across both changes of data and time. We next demonstrate the BBN and the DBN. Figure 8.5 shows a BBN model for a typical traveling situation. Suppose you are driving your car in a familiar area where you are aware of the likelihood of traffic
198 8 Bayesian-Based Constructivist Computational Models slowdowns, road construction, and accidents. You are also aware that flashing lights often indicate emergency vehicles at an accident site and that orange traffic control barrels indicate construction work on the roadway. (Weighted orange barrels are often used in the United States to control traffic flow for road projects). We will name these situations T, C, A, L, and B, as seen in Fig. 8.5. The likelihoods, for the purpose of this example, are reflected in the partial probability table of Fig. 8.5, where the top row indicates that the probability of both construction, C, and bad traffic, T, being true, t, is 0.3. For full Bayesian inference, this problem would require a 32-row probability table of 5 variables, each either true or false. In the separation, or factoring, that BBN reasoning supports (Pearl 2000), this becomes a 20-row table, where Flashing Lights is independent of Construction, Orange Barrels is independent of Accident, and Construction and Accident are also independent (Luger 2009a, Sect. 9.3). We present part of this table in Fig. 8.5. Suppose that, as you drive along and without any observable reasons, the traffic begins to slow down; now Bad Traffic, T, becomes true, t. This new fact means that the probabilities of Fig. 8.5, Bad Traffic, is no longer false. The sum of the proba- bilities for the first and third lines of the table in Fig. 8.5 goes from t = 0.4 to t = 1.0. This new higher probability is then distributed proportionately across the prob- abilities for Construction and Accident and, as a result, both situations become more likely. Now suppose you drive along farther and notice Orange Barrels, B, along the road that blocks a lane of traffic. This means that on another probability table, not shown here, B is true, and in making its probabilities sum to 1.0, the probability of Construction, C, gets much higher, approaching 0.95. As the probability of Construction gets higher, with the absence of Flashing Lights, L, the probability of an Accident decreases. The most likely explanation for what you now experience is road Construction. The likelihood of an Accident goes down and is said to be explained away. The calculation of these higher probabilities as new data are encountered is called marginalization and, while not shown here, may be found in Luger and Chakrabarti (2008). Figure 8.6 represents the changing dynamic Bayesian network for the driving example just described. The perceived information changes over the three time peri- ods: driving normally, cars slowing down, and seeing orange traffic control barrels. At each new time with new information, the values reflecting the probabilities for Fig. 8.5 A Bayesian belief network (BBN) for the driving example and a partial table giving sample probability values for Construction, C, and Bad Traffic, T
8.2 Bayesian Belief Networks, Perception, and Diagnosis 199 that time and situation change. These probability changes reflect the best explana- tions for each new piece of information the diver perceives at each time period. Finally, in Fig. 8.6, consider the state of the diagnostic expert at the point where Time = 2. Once traffic has slowed, the diagnostician may begin an active search for Orange Barrels or Flashing Lights, to try to determine, before Time = 3, what might be the most likely explanation for the traffic slowdown. In this situation, the driver’s expectations motivate his/her active search for supporting information. These changing situations and their explanations support Conjecture 8 of Sect. 7.4. In another example of using dynamic Bayesian networks, Chakrabarti et al. (2007) analyze a continuous data stream from a set of distributed sensors. These sensors reflect the running “health” of the transmission of a US Navy helicopter rotor system. These data consist of temperature, vibration, pressure, and other mea- surements reflecting the state of the various components of the helicopter’s trans- mission. In the top portion of Fig. 8.7, the continuous data stream is broken into discrete and partial time slices. Chakrabarti et al. (2007) used a Fourier transform to translate these signals into the frequency domain as shown on the left side of the second row of Fig. 8.7. These frequency readings were then compared across time periods to diagnose the health of the rotor system. The diagnostic model used for this analysis is the auto-regressive hidden Markov model (AR-HMM) of Fig. 8.8. The internal states St of the system are made up of the sequences of the segmented signals in the frequency domain. The observable states, Ot, are the health states of the helicopter rotor system at time t. The “health” recommendations, that the transmission is safe, unsafe, or faulty, are shown in Fig. 8.7, lower right. The hidden Markov model (HMM) is an important probabilistic technique that can be seen as a variant of the dynamic BBN. In the HMM, we attribute values to states of the network that are themselves not directly observable; in this case, we cannot directly “see” the “health state” of the transmission. There are instruments in the helicopter for directly observing temperature, vibration, oil pressure, and other data of the running system. But there is no instrument that can directly tell the pilot the health of the system. The pilot, given all the other information, must make an estimate of this health value. Producing this health information is the task of the HMM. Fig. 8.6 An example of a dynamic Bayesian network where at each time period the driver per- ceives new information and the DBN’s probabilities change, as described in the text, to reflect these changes
200 8 Bayesian-Based Constructivist Computational Models Fig. 8.7 Real-time data from the transmission system of a helicopter’s rotor system. The top com- ponent of the figure presents the original data stream, left, and an enlarged time slice, right. The lower left figure is the result of Fourier analysis of the time slice data transformed into the fre- quency domain. The lower right figure represents the “hidden” states of the rotor system (Chakrabarti et al. 2005, 2007) Fig. 8.8 The data of Fig. 8.7 are processed using an auto-regressive hidden Markov model. States Ot represent the observable values at time t; these are {safe, unsafe, faulty}. The St states capture the processed signals from the rotor system at time t (Chakrabarti et al. 2007) In the helicopter example, the US Navy supplied data for model training pur- poses. Data from a normally running transmission conditioned the model. Other sets of data containing faults, such as a transmission running after metal fatigue had broken the cogs of a gear assembly, were also used to train the model. After the model was conditioned on these tagged data sets, new untagged data, where the testers did not know whether the data was from a sound or faulty transmission, were tested. The model was asked to determine the health state of the rotor system, and if the data were from a faulty system, to determine when the unsafe state occurred. The HMM was able to successfully accomplish these tasks. In Fig. 8.8, the processing states St of the A-RHMM capture the fused data from the transmission system and combine these to produce the most likely hypothesis, Ot, of the state of the system. This output is the observed state Ot at any time t.
8.3 Bayesian-Based Speech and Conversation Modeling 201 Because the HMM is auto-regressive, the value of the output state Ot is also a proba- bilistic function of the output state at the previous time, Ot-1. The helicopter transmission model is not a “knowledge-based” program in the traditional sense of Chap. 4. In the knowledge-based system, specific rules relate the parameters of the model to each other, for example, “if a cog is broken in a gear then the vibration of the transmission increases” or “if the oil temperature increases then the oil circulation component has failed.” There are no such rules in this model, but rather the fusion of multiple perceptual sensor readings whose changes over time are indicative of the “health” of the overall system. Thus, the program can conclude “there is a problem here” without having any idea of exactly what the problem is. This is an important example of a complex perceptual system interpreting “danger” or “time to take action” without specific knowledge of why that action is necessary. It bears repeating that the families of Bayesian network models we have demon- strated in this section are data-driven, in the sense that the a priori knowledge of the network’s designers is reflected in the model itself. When training data are pre- sented, the combination of data and model adjusts to produce the most likely inter- pretation for each new piece of data. In many situations, probabilistic models can also ask for further information, given several possible explanations. Finally, when critical information is missing, perhaps from a sensor that is damaged, the network can also suggest that sensor’s most likely value, given the current state of the model and the perceived data (Pless and Luger 2003). We see further examples of this in Sect. 8.4. This movement of Bayesian belief networks toward a steady state where sets of data are linked to their most likely explanations is similar to Piaget’s notion of equilibration seen in Sect. 7.2. For further details on algorithms that implement Bayesian belief networks, see Pearl (2000) and Chakrabarti et al. (2007). A Bayesian belief net interpreter, based on Pearl’s loopy belief net propagation algorithm, is available at url 8.2. 8.3 Bayesian-Based Speech and Conversation Modeling An area where the HMM technique is widely used is the computational analysis of human speech. To determine the most likely word that is spoken, given a stream of acoustic signals, the computer compares these signals to a collection, called a cor- pus, where signal patterns and their associated words are stored. We humans have our own version of the HMM. We interpret other people’s words according to the sound patterns we hear, the gestures we see, and our current social context. Since we have no direct access to what is going on within another person’s head, we must make our best interpretation of what they are intending, given the observed expres- sions. This is an ideal task for a hidden Markov model and is precisely what Piaget’s theory of interpretation suggests is happening.
202 8 Bayesian-Based Constructivist Computational Models One intuition we often have is that speech recognition software systems must translate sound patterns into words for the speaker’s intentions to be understood. Paul De Palma (2010) and De Palma et al. (2012) made a different assumption and produced a program that, given human speech, converted these sound waves to pat- terns of syllables. De Palma then determined, using the greatest likelihood measure, how syllable patterns can indicate the most likely concept intended by the speaker. In De Palma’s research, there was no transduction of sounds into words but rather the program interpreted voice patterns directly to the most likely concepts of the speaker, given the patterns of sounds produced. Earlier we saw the maximum likelihood equation used to determine which hypothesis hi was most likely, given evidence at a particular time, Et. gl hi | Et argmax hi for each hi as the maximum value of pEt | hi phi . This equation can be modified to determine which concept, coni, of a set of con- cepts is most likely, given a particular syllable, sylt1, or set of syllables, sylt1, sylt2, …, syltn, at time t. We assume the naive Bayes independence of evidence: gl coni | sylt1, sylt2 ,}, syltn argmax coni for each coni as the maximum value of p sylt1 | coni u sylt2 | coni uu syltn | coni p coni . The De Palma (2010) and De Palma et al. (2012) research used standard lan- guage software available from the SONIC group at the University of Colorado and a syllabifier from the National Institute of Standards and Technology to transduce voiced speech into a stream of syllables. Then, a corpus of acoustic data, where known syllable patterns were linked to concepts, was used to train the syllable lan- guage model. This corpus was taken from human users talking to human agents from a major airline call center. Most users were attempting to purchase tickets for air travel. A typical request might be: “I want to fly from Seattle to San Francisco” or “I need to get to Seattle.” Human agents created the tagged corpus where syllable patterns were linked to concepts. For example, the syllables of “want to fly to,” “travel to,” “get to,” and “buy a ticket to” were grouped into concepts, such as “customer”... travel to... “airport” Similarly, syllable patterns describing a city are clustered into an airport’s name. There are a number of interesting aspects to the De Palma (2010) and De Palma et al. (2012) research. First, the number of words in the English language is much larger than the number of syllables. Therefore, it was predicted, and it turned out to be true, that the syllable error rate was much smaller than the word error rate in the sound decoding process. Second, in the context of people working with airline ser- vice staff, it was straightforward to determine concepts, given syllable strings, e.g., “fly” has a specific intended meaning. Finally, it seems entirely reasonable that a human’s sound patterns should be at least as indicative of their intentions as are their words.
8.3 Bayesian-Based Speech and Conversation Modeling 203 We next consider programs that are intended to communicate with humans. In the simplest case, many computer programs answer questions, for example, Amazon’s Alexa, Apple’s Siri, or even IBM’s Watson. In more demanding situa- tions, programs are intended to have a dialogue, i.e., a more complete and goal- directed conversation with the human user. Typical examples of this task might be when a human goes on-line to change a password or, more interestingly, to get financial, insurance, medical, or hardware troubleshooting advice. In these more complex situations, the responding program must have some notion of teleology or the implicit end purposes of the conversation. Chakrabarti and Luger (2015) have created such a system where probabilistic finite state machines, such as in Fig. 8.9, monitor whether the human agent’s implied goal is met by the computational dialogue system. Meanwhile, at each step in the commu- nication, a data structure, called a goal-fulfillment map, Fig. 8.10, contains the knowledge necessary to answer particular questions. This dialog management software demonstrates a method for combining knowl- edge of the problem domain with the pragmatic constraints of a conversation. A good conversation depends on both a goal-directed underlying process and conver- sation grounding in a set of facts about the knowledge supporting the task at hand. Chakrabarti’s approach combines content semantics, a knowledge-based system, Fig. 8.9 A probabilistic finite-state automaton for conversations used in the troubleshooting domain, from Chakrabarti and Luger (2015)
204 8 Bayesian-Based Constructivist Computational Models Fig. 8.10 A goal-fulfillment map that supports discussion of a financial transaction, from Chakrabarti and Luger (2015) with pragmatic semantics, in the form of a conversation engine that generates and monitors the conversation. The knowledge engine employs specifically designed goal-fulfillment maps that encode the background knowledge needed to drive conversations. The conversation engine uses probabilistic finite state machines that model the different types of con- versations. Chakrabarti and Luger (2015) used a form of the Turing test to validate the success of the resulting conversations. Grice’s (1981) maxims were used as measures of the conversation quality. Transcripts of computer-generated conversa- tions and human-to-human dialogues in the same domain were judged by humans as roughly (~86%) equivalent, with the computational dialogue significantly (<0.05) more focused on the task or goal of the conversation. To summarize, the probability-based schemas that represent the knowledge of particular problem domains are driven by the goal-fulfillment state machines in an attempt to both determine and satisfy the concerns of the customer that initiated the conversation. It is important to note that the schema knowledge is about understand- ing and solving customer problems, and in this example, support Conjectures 4 and
8.4 Diagnostic Reasoning in Complex Environments 205 5 from Sect. 7.4. Conversations are seen as speech actions (Searle 1969) focused on finding solutions. The stochastic examples presented in this chapter are intended to demonstrate how the human agent can interpret perception-based patterns. Over time, experience in a world of perception-coupled intentions conditions the expectations of the human agent. Whether the conditioning model is Hebbian, Bayesian, or otherwise, the epistemic issue is that the human agent’s expectations are conditioned by these experiences over time. In our examples, we have attempted to demonstrate that probabilistic models are sufficient for capturing aspects of the phenomena of human interpretation. The final section considers two probabilistic monitoring and diagnostic models in complex problem-solving applications. 8.4 Diagnostic Reasoning in Complex Environments We next describe two programs that use combinations of different AI software tools, including knowledge rules, neural networks, and dynamic Bayes, to address com- plex problem situations. These examples demonstrate how constructing and training a program to “learn” about its world can then enable it to react appropriately in novel situations. The first example of model learning comes from the domain of particle accelera- tor beam tuning. A particle beam accelerator is a device that is used to transport highly charged particles from a source to a target. The beamline consists of a num- ber of devices designed to either change the beam’s characteristics, its direction, size, or shape, or to monitor these characteristics. The purpose of the particle accelerator is to steer, focus, and otherwise modify the beam of sub-atomic particles. The beam must be transported through a “pipe” to a specified location all the while maintaining desired characteristics of strength and focus. This final beam should reach the target with a set of characteristics deter- mined by the task of the physicists employing the beam. Figure 8.11 shows a simple accelerator beamline, which includes trim magnets for steering, quadrupole mag- nets for focusing, Faraday cups and stripline detectors for measuring current, and profile and popup monitors for measuring the size and position of the beam (Klein et al. 1999, 2000). Accelerator beamlines are designed by placing these various components along the beampipe to produce specific effects. A good design will minimize the number of components necessary to maintain acceptable beam conditions while still allow- ing freedom of control to achieve a range of target conditions. Unfortunately, actual systems rarely work exactly as designed. Problems arise from imperfect beam pro- duction, remnant magnetic fields, poorly modeled beam behavior, misplaced or flawed control elements, and changes to the design and use of the beam facility once it has been built. Even with built-in diagnostic tools, the uncertainties of each situa- tion can make beamline control difficult.
206 8 Bayesian-Based Constructivist Computational Models Fig. 8.11 A graphic representation (Klein et al. 1999) of an accelerator beam or stripline. The magnets steer and focus the beam by guiding and changing the direction of the particles. Monitors, such as the Faraday cup, measure the beam’s strength and profile Klein and his colleagues (1999, 2000) built an object-oriented control system that utilized AI tools, including connectionist networks, fuzzy reasoning, and teleo- reactive, or goal-oriented, planning (Nilsson 1994). With this approach, they were able to successfully model and control particle beam environments at Brookhaven and Argonne National Laboratories. One of their major achievements shows the power of their modeling tools. The task was to discover the location of a trim mag- net at the Argonne ATLAS facility. Because of time, use, and the changing conditions of the Argonne facility itself, the precise location of the magnet was unknown. The fact that the exact location of a multi-hundred-pound magnet could not be determined is not as impossible as it might seem. Many of these magnets are not physically accessible, buried under the facility where, with earth movements and temperature variations, they can change their location, power, and field strength over time. Klein et al.’s model refinement algorithm was able to re-establish the location and power parameters of the magnet. The model-based approach simply asked, over repeated trials on the beamline, what model or organization of components was most likely to account for the observed behavior. What is even more interesting to consider is that Klein and his colleagues may not have found the exact location of the magnet at all! But for all the practical purposes required by the experiments involved, the imputed location was a good enough fit. This is an important issue from an epistemic viewpoint: What is really “out there,” and in what sense can/do we know and use “it?” Finally, one of Klein’s colleagues, Stern (Stern and Lee 2001), extended this model refinement approach. While working at the Stanford Linear Accelerator Center, SLAC, doing what they describe as model calibration, the research team was able to improve their understanding of and their models for the accelerator. They did this through the processes of using the accelerators’ current supposed models to get a more precise fit of these models to the actual accelerator hardware. Better calibration of the accelerator model itself makes it conform more closely to the physicists’ needs and expectations. Our final example comes from the domain of building computational models to monitor potential problems in producing electric power using a sodium-cooled nuclear reactor. Nuclear accidents are rare, but their effects are extremely harmful
8.4 Diagnostic Reasoning in Complex Environments 207 to people, the environment, and the economy. Jones et al. (2016) and Darling et al. (2018) have designed a computational monitoring system based on dynamic Bayesian networks to support the observations and knowledge of the human moni- tors. There are several reasons for employing the DBN technology in this challeng- ing environment. First, the Bayesian network is composed of nodes and links that reflect expert human knowledge and judgment in the field of reactor physics. This fact is impor- tant as the day-to-day monitors are usually not as skilled as the experts that designed the system. The probabilities of the DBN also reflect the results of multiple tests on individual components of the power system, such as sensors, as well as on simula- tions of the full working environment. Thus, the resulting model contains both explicit human physics and engineering knowledge as well as a probabilistic account of the reactor’s running health. Second, in the very complex environment of nuclear power generation, the DBN is able to produce faster than real-time analytic and diagnostic results. Kevin Murphy (2002) has described the transparent and tractable reasoning powers of DBN-based technology. As a result the human monitor is able to understand events as soon as, and often before, they actually happen. The monitor also receives from the model itself recommendations for remediating potential problems. Figure 8.12 presents a schematic for a sodium-cooled nuclear reactor that pro- duces electric power. The reactor system, and its model, have multiple sensors mon- itoring the states of the pumps, the temperatures of the various vessels, the positions of the control rods, and the turbine speeds. The ten monitors of the state of the power generation system are represented by the rectangular boxes of Fig. 8.13. The circles of Fig. 8.13 represent the nodes of the DBN and the cylinders extending off the circles represent the values of each circle changing over time. Training the dynamic Bayesian network takes place as the power generation sys- tem runs across multiple scenarios and time cycles. Training on near-normal data establishes a state of equilibrium of the DBN model. The DBN model, in a near- normal running situation, can also provide approximate values for missing sensor data from the system. The values proposed for missing information or damaged sen- sors are what the model determines, using the expectation maximization algorithm to be most likely, given the current state of the running system (Pless and Luger 2001, 2003). The algorithm used to determine these most likely values is Baum - Welch, a variant of expectation maximization (Dempster et al. 1977, Luger 2009a, Sect. 13.2). Once the DBN model was trained, the research group generated multiple acci- dent sequences using their simulation system. In each scenario, for example, having differential pressures within the plants cooling system or performing control rod insertion, the model captured the state of the system as the “accident” evolved. This allowed visualization of all parameters related to each situation as well as presented options for remediation. The fact that these options could be realized faster than real time supported the human operators’ steps toward remediation. An important component of the Darling et al. (2018) DBN nuclear power genera- tion modeling project was that it supported what Pearl (2000) calls counterfactual reasoning. What this means is that the system can reason about situations that are not currently happening in the reactor. For example, in a danger situation, the DBN
208 8 Bayesian-Based Constructivist Computational Models Fig. 8.12 The schematic of a sodium-cooled nuclear power generation system (Darling et al. 2018). The various reservoirs, pumps, control rods, turbine, etc. have sensors reporting their states to the DBN, as seen in Fig. 8.13 Fig. 8.13 The dynamic Bayesian network model of the sodium-cooled nuclear reactor of Fig. 8.12. The ten rectangular boxes represent the monitors collecting sensor data from the reactor. The cir- cles represent the nodes of the DBN, lines from the rectangles to the circles represent the connec- tivity of the network, and the cylinders emanating from the nodes represent the nodes as they change over time
8.5 In Summary 209 model can be asked “What would be the result of inserting the control rods further into the sodium?” Similarly, when the coolant temperature is getting dangerously high, the monitor could ask “What if I added supplementary coolant to the current state of the system?” The result of such queries is that the model moves forward into future time predicting what the state of the system would be if these actions were actually taken. This prognostic information can be critical in determining an opti- mal outcome, given a current danger state of the reactor. This hypothetical reasoning is supported by the fact that the computational model offers an accurate reflection of the power-producing reactor. The knowledge- based and probabilistic model allows monitors to try out different control strategies and get almost immediate feedback on what would happen, as well as the time sequence for it to occur. Examining these possible responses can direct the reactor monitors to make the most informed decisions at appropriate times. This trained computational model captures the human-like reasoning that an informed diagnos- tic expert would offer in similar situations. The research projects of this chapter are presented at a rather high level of detail, and further information may be found in the references for each project. As noted earlier, the reason for presenting these examples is to both demonstrate their suffi- ciency as models of human perception, understanding, and decision-making in complex situations and also to offer concrete examples of the epistemic conjectures presented in Sect. 7.4. 8.5 In Summary Chapter 8 first considered Bayes’ theorem, its extensions, and several of its epis- temic implications. In offering a demonstration of how Bayes works in simple situ- ations, we develop the intuition of its importance: new information, the a posteriori, is interpreted in the context of already understood, the a priori, knowledge of a situ- ation. This a priori knowledge can be understood as a form of Kant’s (1781/1964), Bartlett’s (1932), or Piaget’s (1970) schemata used in problem resolution. We presented a number of research projects in the second, third, and fourth sec- tions of the chapter. The goal of presenting these problem scenarios was to show how Bayesian systems are sufficient for characterizing important aspects of human perception and reasoning. The final example, monitoring sodium-cooled nuclear power generation, used a dynamic Bayesian network to show “what would happen if …” scenarios. Visualizing possible alternatives was a direct way to address poten- tial problems. Many aspects of the program examples of this chapter reflect the conjectures supporting a modern epistemology presented in Sect. 7.4. We begin the final chapter with a brief summary of the task of this monograph. We ask how, through active exploration, an agent can come to understand its envi- ronment. We next question what happens when the state of the world no longer matches an agent’s expectations. As an example, we propose a Bayesian belief net explanation for the changes in the stages of early childhood development described
210 8 Bayesian-Based Constructivist Computational Models by Piaget (1983), Bower (1977), and others (Gopnik 2011a, b). We then make the case for the overall health, excitement, and promise of continuing research in artifi- cial intelligence. We conclude by describing again our human-centric epistemic stance called an active, pragmatic, model-revising realism. Further Thoughts and Readings Pearl’s books have introduced probabilistic rea- soning and the Bayesian belief network technology to modern AI (see the Bibliography for full reference details): Judea Pearl (1988), Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Judea Pearl (2000), Causality. To understand the importance of probabilistic techniques in modern AI, here are several relevant textbook resources: Jurafsky and Martin (2020), Speech and Language Processing, third ed. Luger (2009a), Artificial Intelligence: Structures and Strategies for Complex Problem Solving. Nilsson (1997), Artificial Intelligence: A New Synthesis. Russell and Norvig, (2010), Artificial Intelligence: A Modern Approach, third ed. See also Bayesian Epistemology in the Stanford Encyclopedia of Philosophy, at url 8.1. Although based on Bayesian assumptions, it differs from what we propose. Dr. Dan Pless created the BBN traffic example of Sect. 8.2. Many of my other PhD graduates, especially Dr. Chayan Chakrabarti, Dr. Michael Darling, Dr. Thomas Jones, Prof. Paul De Palma, and Dr. Roshan Rammohan, are responsible for building many of the probabilistic models for the applications presented in this chapter. Figures 8.7 and 8.8 were developed for SBIR research sponsored by the US Navy. We thank Karger Publications, Basel, for permission to use Figs. 8.3, 8.4, and 8.8. These appeared in Luger et al. (2002). Figures 8.9 and 8.10 are from the PhD dissertation at UNM of Dr. Chayan Chakrabarti. Figure 8.11 was developed for the US Dept of Energy, as part of an SBIR contract. We thank Sandia National Laboratories (DOE) for creating Figs. 8.12 and 8.13 as part of our research contract for monitoring sodium-cooled nuclear reactors. I created all other figures in this chapter to support my teaching needs at UNM. Programming Support There are a number of Bayesian belief net and hidden Markov model software products available on the internet. A probabilistic inter- preter, called Generalized Loopy Logic, created by Dr. Daniel Pless as part of his PhD thesis can be found at url 8.2.
Chapter 9 Toward an Active, Pragmatic, Model- Revising Realism There are more things on heaven and earth, Horatio, than are dreamt of in our philosophy… —SHAKESPEARE, Hamlet (First Folio), 1623 The purpose of computing is insight, not numbers… —RICHARD HAMMING, 1996 ACM Turing Award Winner Contents 212 213 9.1 A Summary of the Project 216 9.2 M odel Building Through Exploration 221 9.3 Model Revision and Adaptation 225 9.4 What Is the Project of the AI Practitioner? 226 9.5 Meaning, Truth, and a Foundation for a Modern Epistemology 229 231 9.5.1 Neopragmatism, Kuhn, Rorty, and the Scientific Method 232 9.5.2 A Category Error 9.5.3 T he Cognitive Neurosciences: Insights on Human Processing 9.5.4 O n Being Human: A Modern Epistemic Stance Chapter 9 brings into focus the task of this book, using insights from the histories of philosophy and artificial intelligence as the foundation for a science of understand- ing ourselves and the world. Section 9.1 briefly reviews the story to this point. Section 9.2 discusses model building through exploring the environment, and Sect. 9.3 suggests several methods for the adaptation of models in light of new discover- ies. Section 9.4 makes conjectures about the future of AI, and Sect. 9.5 offers thoughts on the construction of a modern epistemology. With an analysis of cate- gory errors in computation, we see humans, with our life, intelligence, and respon- sibilities as aessentially different than machines. © Springer Nature Switzerland AG 2021 211 G. F. Luger, Knowing our World, https://doi.org/10.1007/978-3-030-71873-2_9
212 9 Toward an Active, Pragmatic, Model-Revising Realism 9.1 A Summary of the Project The first chapter addressed the notion of what it means to compute, presenting the Turing machine, the Post production system, and the Church-Turing thesis. We described an important limitation of computation with Turing’s undecidability proof. We also addressed the epistemic components of programming. For most AI technicians, programming is an interactive and iterative refinement process, where each new piece of computer code is an experiment in discovery. If that code is suc- cessful, it is integrated into the larger program, and if it fails to express the program- mer’s intentions, it is revised and retested. This iterative refinement process was initially made possible through the expres- sive powers of high-level computer languages including Lisp, Prolog, Logo, Smalltalk, ML, OCaml, and Scheme. In fact most modern languages support this active exploratory process. Above all, iterative refinement is an epistemic commit- ment that supports the programmer as she continues to approximate her desired goals: revising her thoughts and code as she explores its use-based implications. Chapter 2 offered a review of the philosophical traditions that led to the creation of the digital computer and our present understanding of the world. There were two important themes in Chap. 2. The first theme is skepticism that asks whether the world is actually knowable. Another view of this skepticism is the contention that what we think we know about ourselves and the world may never be proven to be correct. The second theme of Chap. 2 is the use of the scientific method as a strategy for understanding the natural world. Whether one thinks that reality is a form of water, or whether it is earth, air, fire, and water, or fashioned from some atomic substrate, these ideas are proposed as conjectures that can be refuted. In this refuta- tion, there is always the promise of a new synthesis, which again can be questioned. Chapter 3 included the early history of AI and the Dartmouth College Summer Workshop of 1956. This workshop gathered together the current AI practitioners, adopted the name artificial intelligence, and proposed topics suitable for ongoing research. As the discipline evolved, many philosophical issues, including the idea of trying to better understand how humans solved problems, came into play. Part II, Chaps. 4 through 6, explored the main representational paradigms of research and development in artificial intelligence. We focused on early examples from each of the symbolic, connectionist, and evolutionary approaches to AI. The goal of these chapters was to represent each approach with examples of early suc- cesses as well as to describe their recent products. At the end of each chapter, we summarized the strengths and limitations of that approach to artificial intelligence problem-solving. We also noted the effects on artificial intelligence projects of the rationalist, empiricist, and pragmatist philosophical traditions. In Chap. 7, we proposed a constructivist epistemology as a synthesis of the philo- sophical positions of empiricism, rationalism, and pragmatism. After presenting arguments to support this position, we offered five assumptions that give a basis for a modern epistemological science and eight follow-on conjectures that support understanding ourselves and our environment.
9.2 Model Building Through Exploration 213 In Chap. 8, we presented Bayes’ rule and gave a suggestive proof using a single disease and symptom. The critical issue with Bayes’ formulation is to see a coherent mathematics-supported relationship between a priori knowledge, that is, what an agent already knows, and a posteriori information, new data currently perceived. This mathematical relationship can be computationally interpreted, especially as it is used across periods of time. The formula for Bayes’ rule can be seen as an inter- preter for the schemas of Kant, Bartlett, Piaget as well as many AI practitioners. Section 8.2 described several AI programs that implemented this constructivist epis- temic stance. This final chapter concludes with optimistic support for the future of artificial intelligence research and development. Further, with an analysis of category errors in computation, we see humans, with our life, intelligence, and responsibilities as essentially different from machines. We conclude that AI can raise to, or even sur- pass, many aspects of human intelligence but that human intelligence and decision- making are different. Through insights gained from our philosophical tradition and the AI endeavor, we suggest that humans are best served by adopting an epistemic stance based on pragmatism, relativism, and an unconditioned commitment to the scientific method that supports progressively comprehending and utilizing our ever-evolving life-environment. 9.2 M odel Building Through Exploration We have described at length the AI program designer’s use of computer code to explore their world. An important contribution of the AI community is to have built automata that, through exploration, come to know and use their environments. We saw this happen virtually when, in Sect. 6.3, artificial life designers created entities able to survive, procreate, interact in communities, and explore their environments. The robotics community has designed and built many physical entities able to accomplish similar goals. Early robots, similar to the tripods created by Aeschylus’ Hephaestus to serve the Olympian gods, were designed to perform specific tasks. These early robots have been so successful in tasks including production line assembly, automated welding, directing delicate surgeries, and controlling deep-space vehicles that most are no longer even considered to be a component of AI technology. Shakey was the first mobile robot to be able to sense and reason about its sur- roundings. Built in the late 1960s by the Stanford Research Institute, now SRI International, Shakey could follow commands that required making plans for move- ment and performing simple tasks, such as rearranging objects. The Shakey project was funded by DARPA and Charles Rosen was the lead designer. Shakey’s planning program was STRIPS, the STanford Research Institute Problem Solver (Fikes and Nilsson 1971, see Luger 2009a, Sect. 8.4.2). I first saw Shakey in action at the Third International Joint Conference on Artificial Intelligence
214 9 Toward an Active, Pragmatic, Model-Revising Realism at Stanford University in 1973 where I presented my first AI paper in a symposium organized for graduate students to present their PhD research. After Shakey, many different AI groups entered the robotics domain. The more famous include soccer-playing robots that formed the ROBO Cup annual soccer team competitions that began in 1996 (see url: 9.1). Also important are the groups at NASA that created the Mars rover Opportunity that traveled more than 28 miles on the surface of Mars before its demise in 2018. These efforts continue to this day with the design of autonomous vehicles. For our story, however, we next take a slightly different tack and describe several early robotic programs that come to understand their world by actively exploring it. More than 30 years ago at MIT, Rodney Brooks and colleagues (1986, 1991) designed a robot explorer. Brooks’ goal was to search through and accomplish tasks in an environment without any prior knowledge or planning in that space. Brooks approach actually questioned the need for any centralized representational scheme. Brooks employed a subsumption architecture, see Fig. 9.1, to show how a general intelligent mechanism might evolve from lower supporting forms of intelligence. Brooks suggests, and gives examples through his robotic creations, that intelli- gent behavior does not come from disembodied theorem-prover-based planning systems like STRIPS, nor does it require a global memory and control. Intelligence, Brooks claims, is the product of the interaction between an appropriately designed system and its environment. Furthermore, Brooks espouses the view that intelligent behavior emerges from the interactions of architectures of organized simpler behaviors. Figure 9.1 presents a three-layered subsumption architecture, where each layer is composed of finite-state machines, simple sets of condition → action production rules, run asynchronously. There is no central locus of control. Rather, each machine is data-driven by the information it perceives. The arrival of a message or the expira- tion of a time period causes the various machines to change state. Fig. 9.1 A three-layered subsumption architecture adapted from Brooks (1991). The three levels are defined by their EXPLORE, WANDER, and AVOID behaviors
9.2 Model Building Through Exploration 215 Brooks’ robot had a ring of 12 sonar sensors around it. Every second, these sen- sors gave radial depth measurements. The lowest level layer of the subsumption architecture, AVOID, implements a behavior that keeps the robot from hitting objects, whether these are static or moving. The machine labeled sonar system emits an instantaneous signal that is passed on to collide and feel force, which in turn can generate halt messages for the finite-state machine in charge of running the robot forward. When feelforce is activated, it is able to generate either runaway or avoid instructions. Brooks approach to robots learning about their environments was an important first step. AI researchers want robots to be able to enter new and possibly dangerous situations, to be able to explore them, and to draw some conclusions, such as “there is an injured human here.” However, as Brooks’ robot does not build a model of its world as it discovers new obstacles or passageways, it cannot take that next step. How would it know an “injured human?” How would it even recognize a place that it had previously explored? How can it learn anything? Finally, how could such a robot ever operate in a truly complex environment, e.g., having the knowledge that a taxi, Uber, or other driver needs to navigate a city such as London? The following generations of robots began to overcome these issues by adding more memory and present state information to their exploring strategies. For exam- ple, Lewis and Luger (2000) created a robot that, building on an architecture adopted from Hofstadter’s (1995) work, was able to map and remember wall and navigation pathways as it explored its environment as shown in Fig. 9.2. Lewis’s robot maps possible wall structures using signals from adjacent sonar sensors. In Fig. 9.3, Lewis’s robot is able to recognize and travel a pathway through object structures in attempting to achieve a goal. The Brooks and Lewis examples are early attempts for robots to discover and cope with their environment by explor- ing and building ever-improving models of that environment. As we saw in Sect. 5.3.2, the Google Brain community (Faust et al. 2018) created a much more modern and powerful solution to this get-to-know-your-world- through-exploration research. The robot system called PRM-RL used deep neural net learning coupled with reinforcement learning to discover goal-focused path components. The PRM-RL robot can then apply this “knowledge” to discover solu- tion paths in entirely new environments. The goal of this section was to demonstrate several AI problem solvers that used active search to build computational models that approximate the world that they explore. Brooks’ subsumption approach used a hierarchy of finite-state machines to explore its environment but had no memory to record its achievements. Lewis and his colleagues added limited memory structures to learn invariants of the explored world, including solid obstacles and passageways, and to reuse these discoveries later in their search. Finally, Faust and her colleagues used a probabilistic planning algorithm, along with reinforcement learning, to train a robot to discover new paths in previously unexplored territory.
216 9 Toward an Active, Pragmatic, Model-Revising Realism Fig. 9.2 Lewis (2001) created representational structures for barrier discovery and location. Feedback from the robot’s sonar sensors indicates the presence or absence of barriers Fig. 9.3 The robot’s path, left, and map or model, right, for barriers and passageways learned from the use of its sonar sensors. Figure adapted from Lewis (2001) 9.3 M odel Revision and Adaptation In Sect. 3.5, based on the modeling traditions of the cognitive science community, we presented several programs that demonstrated the assimilation of new informa- tion into appropriately conditioned cognitive systems. These programs presented mechanisms sufficient to describe many of the conservation tasks in children’s learning behavior described by Jean Piaget (1983) and other developmental psychologists. At the Artificial Intelligence Department of the University of Edinburgh, Young (1976), using production rules, demonstrated children’s seriation skills. In seriation
9.3 Model Revision and Adaptation 217 tasks, children are asked to organize blocks by their sizes, which requires under- standing the relationship of partial and total orderings (Young 1976). Also, at Edinburgh, Luger (1981) and Luger et al. (1983) created a production system accounting for object permanence in children, based on behaviors originally noted by the child psychologist T.G.R. Bower (1977). An object’s permanence is its continued existence across time despite being out of immediate sight. As mentioned in Sect. 3.4, Drescher’s (1991) program at MIT also demonstrated infants’ responses during the stages of object permanence. Finally, Wallace et al. (1987) developed the BAIRN program at Carnegie Mellon that used production rules to demonstrate number conservation. The programs just mentioned described children within their different stages of development. There has been little accounting, however, for how children moved between these stages as they matured. Section 9.3 discusses the issue of model revi- sion. What can be done when perception-based data cannot be interpreted by the present system’s a priori worldview? This is a difficult problem: how to make “adjustments” when new data cannot be interpreted, given the viewer’s current expectations for that data. Figure 9.4 presents an overview of this. On the top row, a cognitive model either offers an interpretation of new data or it does not. Piaget has described these situa- tions as instances of assimilation and accommodation. First, through assimilation, data fit expectations, possibly requiring adjusting its probabilistic measures. Otherwise, through accommodation, the model must reconfigure itself, possibly adding new components. The lower part of Fig. 9.4 presents the COSMOS architec- ture (Sakhanenko et al. 2008) created to address both these tasks. Fig. 9.4 Cognitive model use and failure, above; a model-calibration algorithm, below, for assimi- lation and accommodation of new data. Adapted from (Sakhanenko et al. 2008)
218 9 Toward an Active, Pragmatic, Model-Revising Realism The COSMOS (Sakhanenko et al. 2008) model selection and model calibration algorithm was tested in complex environments including the flow of liquids through pumps, pipes, and filters. The model interprets real-time pressure measures, pipe flows, filter clogging, vibrations, and alignments. When new data arrive, the pro- gram must decide if it fits within its current model of the world or whether it needs to select another model, such as adapting to a clogged filter, from its library of models. Let’s consider a simple example of the model selection and calibration problem. Suppose we are building a program to monitor home burglar alarms. The probabi- listic home burglar monitoring program is deployed in a specific location to be trained and tested in realistic situations. In particular, it monitors alarms for false- positive predictions, where the alarm indicates a problem when no problem exists. As this system is trained successfully during the winter months, the probabilistic values of the alarm reports are learned. The day-to-day deployment has produced data that conditions the system. After the training period, the data are assimilated into the model and the resulting trained program successfully monitors both false alarms and actual home break-ins. Next, suppose that in the Spring months of the year, there are multiple fierce desiccating winds that shake the alarm sensors mounted on doors and windows and dry out their connections. As a result, when the monitoring program presents many more false alarms, it is necessary to readjust the probabilities of the model and to add new model parameters to reflect the Spring weather conditions. The result will be a new extended system that supports alarm monitoring in the Spring. Further, when the alarm systems are then sold in a new city, it will be necessary to determine which models will best fit that situation. There may be other important disturbances, such as small earthquake tremors; as a result, more variables will need to be represented. Although the problem of model induction in general is intracta- ble, in most situations, useful new models can be created. A search to discover new causal relationships among a models’ constraints can often be sufficient for this task. The problem of model induction is an important component of current research in the development of probabilistic models. A description of this task says that, given new data, what is the most likely model that can explain that data? Judea Pearl and others (Pearl 2000; Tian and Pearl 2001), began this research. There are many exciting challenges for the use of model induction. These include, given fMRI data related to certain mental disorders, find the most likely set of cortical connections that can explain this data (Burge et al. 2007). Rammohan (2010) and Oyen (2013) created algorithms for investigating variables and its possible relationships in this structure search environment. Deep learning coupled with reinforcement learning also offers technology for model building. We saw the DeepMind and Alpha Zero programs described in Sect. 5.2. In these programs, the legal moves of the problem were used to search through the problem space to discover and reinforce partial solutions. The Faust et al. (2018) robotics project was also, using reinforcement learning, able to create successful paths for the robot to travel by discovering and linking smaller successful compo- nents of paths. These examples of reinforcement learning showed how searching
9.3 Model Revision and Adaptation 219 and assembling partial components of solutions can lead to successful models of a situation. Our next example of model-revising search is taken from children’s cognitive development. In Piaget’s (1965) conservation experiments, children aged 4–7 mis- takenly confuse the amount of liquid that a glass contains with the height of the glass that contains that liquid. As the child matures and watches her ideas fail in the practical world, i.e., there actually isn’t more juice in the taller, thinner glass, she revises her model of volume. New variables, such as the circumference or diameter of the container, expand her understanding of volume. Children come to understand that the quantity of a liquid is constant regardless of the shape of its container. Figure 9.5 presents the experimental situation where a child sees two containers of liquid, each holding a similar amount. The liquid of one container is then poured into a taller container and the child is asked which container holds the most liquid. The non-conserving child indicates that the taller, thinner glass holds more. A simple Bayesian belief network is sufficient to model the stages of conserva- tion behavior across time. In Fig. 9.6a, a number of perceptual values are associated with the child observing the vessel containing liquid. These perceptions capture height, thickness, color, and so on. Fig. 9.5 A schematic of Piaget’s conservation of liquid experiment. The lower left glass is poured into the taller thinner glass while the lower right glass remains the same. The 4- to 7-year-old child is then asked which glass contains more liquid Fig. 9.6 (a) presents a Bayesian network representing a child seeing perceptual cues. 1 indicates the height of the container, 2 indicates width or diameter, 3 and 4 could be the color of the liquid, etc. In (b), the child associates the container’s height and width to create a measure for volume
220 9 Toward an Active, Pragmatic, Model-Revising Realism Figure 9.6b represents the BBN where the perceptual cues for height and width or diameter of the container are combined. The child is able to unite these two per- ceptual cues when a teacher or parent points out that both the height and the width of the container are necessary for measuring the amount or volume of the liquid. Alternatively, the bright child might learn this association by herself, through exper- iment, or perhaps by asking how much liquid an even taller and thinner con- tainer held. Later, as the maturing child approaches the formal operational developmental stage, ages 12–16, she realizes that a formula can capture the volume measure pre- cisely. For cylindrical containers, it is the height multiplied by the diameter of the container multiplied by π. The movement through these stages of volume conserva- tion is usually driven by pragmatic concerns, such as “I want to get the most possi- ble juice” or “My teacher told me I didn’t understand volume.” The many empirical studies that have identified stages in early human cognitive development, including (Piaget 1954, 1983; Bower 1977; Young 1976; Gopnik et al. 2004; Gopnik 2011a), shed light on the mediating processes through which humans understand and use their environment. These developmental stages suggest a progressive approximation toward a complex equilibration. In many of these situ- ations, dynamic Bayesian models, such as our Piagetian conservation example, can offer a sufficient characterization of the phenomena involved. Even non-human pri- mates seem to go through similar developmental stages as they learn to organize objects by size and shape (McGonigle and Chalmers 2002). Major advances in the natural sciences can also be viewed as discovering new invariances through questioning the assumptions of older, previously learned, rela- tionships. The insights of Darwin, Eisenstein, Heisenberg, and Hawking can each be analyzed from this perspective. These insights are captured in new models, which will, in time, again be revised. These models are not the “discovery” of divine truths, as Descartes or Leibniz have suggested, but rather discovering new and useful accommodations within the environment. These discoveries can be expressed as new models and, as Heisenberg (2000) suggests, usually produce new language constructs. These represent concepts developed through experimentation and extend both current understanding and practical uses of the environment. Model calibration, extension, and revision remain important research areas for artificial intelligence. When new puzzling situations develop, how can these be best understood and interpreted in the contexts of what the AI model or human subject already knows? What are the best explanations for how we humans learn new rela- tionships? How do practical needs influence the understanding of new phenomena? Our cognitive and computational research communities are challenged to continue addressing these questions.
9.4 What Is the Project of the AI Practitioner? 221 What we have to learn to do, we learn by doing… —ARISTOTLE, Ethics. Where is the knowledge we have lost in information? —T.S. ELIOT, Choruses from the Rock. 9.4 W hat Is the Project of the AI Practitioner? There is no question whether the AI enterprise will continue to be successful. Across the full spectrum of computer-based problem-solving, there are few areas that arti- ficial intelligence techniques have not touched. What some call “first-generation AI” and we prefer “symbol-based AI” is a critical component of that success story. As noted in Chap. 4, applications including controllers for deep-space travel, the Mars rover, guidance systems for complex surgeries, advice systems for medical care, and language communication programs that assist in product sales, are all parts of this early success story. In fact, many components of the symbol-based AI technology are so integrated into commonly used software applications that their AI origins are no longer remembered. As noted in Sects. 4.3 and 6.5, the successes of symbol-based AI have also led to better understanding of its limitations. The iterative refinement process, so impor- tant in creating successful software, also sheds light on the inadequacy of particular approaches. The (rationalist) task of abstraction necessary to create symbols, sym- bol structures, and logic-like control algorithms constrains the designs of programs intended to capture the patterns in ever-changing environments. Symbols too inflex- ible and algorithms too rigid for many tasks have led the AI community to new approaches. There should not be an admission of failure here but rather an acknowl- edgment that responsible use of the methods of science has led the AI community to new insights, technologies, and successes. An important response to several of the limitations of symbol-based AI is the connectionist or neural network approach to problem-solving. The creation of the Boltzmann machine and backpropagation algorithms in the 1985 time period over- came the limitations of the older perceptron. With added access to server farms and vector-based processors, deep learning technology added to the neural network approach by adding multiple hidden layers. This new approach greatly improved many areas of problem-solving including image classification, facial recognition, language translation, text classifiers, and learning expert skills in new games, given only the rules of the game. Successes with deep learning and association-based (empiricist) problem solvers have again suggested their limitations. As noted in Sects. 5.4 and 6.5, research con- tinues in making deep-learning systems more transparent and better able to explain their decisions. This transparency is important in areas where a program makes personal, privacy, medical, or financial recommendations. Research also continues to focus on the meta-parameters of deep learning networks to better understand which learning rates, network sizes, and architectures are most appropriate for par- ticular problem situations.
222 9 Toward an Active, Pragmatic, Model-Revising Realism In Chap. 7, we presented the genetic and emergent approaches to AI. Although the successes of this approach are not as obvious as those of the symbolic or con- nectionist, they do offer a radically different perspective. Genetic algorithms and programming are able to evolve new perspective solutions using only reproduction operators and fitness functions. The artificial life algorithms produce new generations of both individuals and societies. There remains hope that areas including artificial chemistry, physics, and biology can produce useful new life forms. These technologies may be critical for better understanding complex systems, including the human genome and the devel- opment of antibody therapies. There also remains the challenge for the artificial life community to shed some insight on the origins of life itself and the processes for producing new species within the a-life world. Section 6.4 discusses the strengths and limitations of emergent approaches to AI. In Part III, we presented stochastic approaches to building AI solutions. We began Part III proposing a compromise, a synthesis, among philosophical positions. This compromise recognized the role that the current, a priori, knowledge of an agent has for integrating newly perceived, a posteriori, data. This Bayesian-based methodology echoed the expectations, or schemas, proposed by Kant, Bartlett, Piaget, and others in Chaps. 2–4. Perhaps the most important contribution to the Bayesian approach was the insights of Judea Pearl. The Bayesian belief networks (Pearl 1988, 2000) allowed Bayesian representations to be viewed as causal relationships that, when factored, become more tractable, i.e., their solution algorithms were realistically computable. We presented in Chap. 8 a number of examples of Bayesian solutions, including the use of dynamic Bayes. One singularly important accomplishment of the AI and the cognitive science communities has been to offer an answer to the dualism or the mind−body problem described in Sect. 2.4. Since the days of Descartes, philosophers have asked for an explanation of the interactions and integration of intelligent responses through the human mind, consciousness, and a physical body. Philosophers have offered every possible response, from total materialism to subjective and objective idealism and the denial of material existence. Several thinkers even proposed the supporting intervention of a benign god. Artificial intelligence and cognitive science research reject Cartesian dualism in favor of a material explanation of intelligence. The Oxford philosopher Gilbert Ryle (1949) described Descartes’ dualism as the presence of a “ghost in the machine.” Ryle, following the then-current tradition in psychology, suggests eliminating this ghost through the assumptions of behavior- ism. Artificial intelligence has taken an alternative approach to eliminate Ryle’s ghost. AI and cognitive science practitioners hypothesize that intelligence, includ- ing human intelligence, is based on the physical implementation, or instantiation, of representations in a processing system. Algorithms manipulate these representa- tions in the process of solving problems. The continuing successes of the AI research project are an indication of the validity of this hypothesis. With AI, Ryle’s ghost is removed or “cashed-out” from the problem solver and replaced by representations and algorithms that support decision-making. The best- first search algorithm (Sect. 4.1.2), for example, gives an estimated “best judgment”
9.4 What Is the Project of the AI Practitioner? 223 for selecting a next move in a search situation. In deep learning, conditioned micro- decisions developed in a reinforcement structure can control a robot in a new envi- ronment. We have seen multiple examples of reckoning-based judgments and appropriate actions over recent chapters. One reason for the continuing successes of the AI enterprise is the influx of bright, young, and excited researchers to the field. These new collaborators are also a very diverse group that includes linguists, psychologists, computer scientists, physicists, sociologists, medical doctors, and contributors from other fields. As pointed out in Sect. 3.3, it is important not to attempt to limit what it means to be an AI practitioner. When challenges seem unlimited, energy and commitment are important, with the only requirement to remain within the constraints and promise of the scientific method. There are, of course, AI winters, a term used several times over the last 60 plus years by the AI community to indicate deep changes in financial support for particu- lar projects. Most of the research funding in artificial intelligence has been afforded by government agencies, and predominantly in the US by the National Science Foundation and the Department of Defense. As different AI projects show their promise or, over time, their lack thereof, funding goals change. The AI community’s goals can also change: is funding important for human language understanding, for foreign language translation, or for autonomous vehicles? Are projects that are part of the physical symbol system hypothesis important? Why is there not more support for deep learning neural networks? AI’s practitioners have also, at different times, been guilty of overpromising results. When these results are disappointing, lack of interest and funding often follows. A number of consequential challenges remain for building intelligent systems. We propose three questions that need to be addressed as we continue to build more intelligence into mechanical devices: 1 . What is the role of embodiment and culture in intelligence? One of the main assumptions of the computational hypothesis is that the particular instantiation of a symbol or network system is irrelevant; all that matters is material represen- tations and algorithms. This viewpoint has been challenged by a number of thinkers (Searle 1980; Johnson 1987; Agre and Chapman 1987; Varela et al. 1993) who essentially argue that intelligent action in the world requires a physi- cal and social embodiment that allows the agent to be integrated into that world. The architecture of modern computers does not support this degree of situat- edness, requiring that an artificial intelligence agent interacts with its world through the extremely limited window of contemporary input/output devices. If this challenge is correct, then, although some forms of machine intelligence may be possible, full intelligence, as we humans experience it, will require a very dif- ferent machine, as Searle suggests (1980), than that afforded by contemporary computers. Further, as we argued in Chap. 7, knowledge must be regarded as a social as well as an individual construct. In a meme-based theory of intelligence (Edelman 1992), society itself carries essential components of knowledge. It is possible that an understanding of the social context of knowledge and human behavior is
224 9 Toward an Active, Pragmatic, Model-Revising Realism as important to a theory of intelligence as is an understanding of the dynamics of the individual mind/brain. 2 . What is the nature of interpretation, or how does AI address the grounding problem? Most computational models in traditional AI operate within an already interpreted domain. With this approach, there is an implicit and a priori commit- ment by the system’s designers to a set of “meanings” for the program. Once this commitment is made, there is very little flexibility for shifting contexts, goals, or representations as the problem-solving situation evolves. One AI approach to semantic meaning is the possible worlds of Alfred Tarski (1944, 1956). The Tarskian approach of mapping between sets of symbols and objects in a domain is sufficient to explain truth values for reasoning rules. It is insufficient, however, for explaining how one response may have different inter- pretations in the light of specific practical goals. Linguists have tried to remedy semantic limitations by adding theories of pragmatics (Austin 1962). Discourse analysis, with its fundamental dependence on symbol use in context, has also dealt with these issues in the recent years. The problem, however, is broader in that it deals with the failure of referential tools in general (Lave 1988). The pragmatist tradition started by Peirce (1958) and James (1981), and con- tinued by Eco (1976), Grice (1981), Seboek (1985), and others, takes a more radical approach to language and intelligence. It places symbolic expressions within the wider context of signs and interpretation. As Peirce (1958, p. 45) indi- cates “… we come down to what is tangible and practical, as the root to every real distinction of thought, no matter how subtle it may be; and there is no differ- ence of meaning so fine as to consist of anything but a possible difference of practice.” This meaning as expression of practical purpose suggests that a symbol can only be understood in the context of its role as interpretant, that is, in the context of purposeful interaction with its environment. There is in the current AI research community an insufficient understanding of the process by which humans and societies create meaning and change interpretations. We visit these issues again with discussion of neopragmatism in Sect. 9.5. 3 . Can the AI and cognitive science communities design computational models that are falsifiable? Popper (1959) and others have argued that scientific theories must be falsifiable. This means that there must exist circumstances under which the model is not a successful approximation of the phenomenon. The obvious reason for this is that any number of confirming experimental instances is not sufficient for confirmation of a model. Even more importantly, new research is created in a direct response to the failure of existing theories or models. The general nature of most computational models may make them difficult to falsify, and, as a result, of limited use as science. Some AI data structures, for example, the semantic or connectionist networks, are so general that they can model almost anything. Like the universal Turing machine, they can describe any computable function. When an artificial intelligence or cognitive science
9.5 Meaning, Truth, and a Foundation for a Modern Epistemology 225 researcher is asked under what conditions his or her characterization of intelli- gent behavior will not work, the answer can be difficult. Finally, it must be noted that most AI research projects are not focused on build- ing artificial general intelligence or AGI. The possibility of an AGI seems to be the boogeyman of many popular culture warriors. Even projects trying to win the annual Turing Competition (url 9.2) are not pretending to create an AGI. What would this AGI look like? Would it be equivalent to human intelligence, see Sect. 9.5? AI funders, and most all AI researchers, are not interested in this AGI. Research is more committed to expanding our current limited knowledge for solving important prob- lems for both individuals and society. The most exciting aspect of work in artificial intelligence is that to be coherent and contribute to the endeavor we must address these concerns. To understand problem-solving, learning, and language, we must comprehend the philosophical levels of representations and knowledge. We are asked to resolve Aristotle’s tension between theoria and praxis, to fashion a union of understanding and practice, of the theoretical and practical, to live between science and art. Researchers in AI, as practitioners and toolmakers who make representations, algorithms, and languages, enable the design and building of mechanisms that exhibit intelligent behavior. Through experimenting, we test both their computa- tional adequacy for solving problems and our own understanding of intelligent phenomena. There is a tradition for this: Descartes, Leibniz, Bacon, Pascal, Hobbes, Boole, Babbage, Turing, and the others whose contributions were presented in Chap. 2. Engineering, science, and philosophy; the nature of ideas, knowledge, and skill; the power and limitations of formalism and mechanism; these are the expectations and tensions through which the AI vision continues to thrive, and from which we con- tinue our explorations. We are just an advanced breed of monkeys on a minor planet of a very average star. But we can understand the universe. That makes us something very special. —STEPHEN HAWKING. 9.5 M eaning, Truth, and a Foundation for a Modern Epistemology There are four topics in this final section. First, we introduce neo-pragmatism, the continuation of the philosophical traditions introduced in Sect. 2.8. Second, we discuss the computer scientist’s notion of categories: the division of entities into independent irreducible groupings. Third, we consider findings of the neuroscience community that support our current understanding of human perception and performance. Finally, we conclude with a proposal for a modern epistemology and thoughts on being human, addressing relativism, and the use of the scien- tific method.
226 9 Toward an Active, Pragmatic, Model-Revising Realism 9.5.1 Neopragmatism, Kuhn, Rorty, and the Scientific Method A number of commentators on the AI tradition, including Winograd and Flores (1986), Searle (1980, 1990), and Weizenbaum (1976), claim that the most important aspects of intelligence are not, and in principle cannot be, modeled with any com- putational representation. These areas include learning, understanding human lan- guage, and the production of meaningful speech acts. These skeptical concerns have deep roots in our Western philosophical tradition. Winograd and Flores’s criticisms, for example, are based on issues raised in phe- nomenology and postmodern skepticism by Husserl (1970), Derrida (1976), and others. This poststructuralist viewpoint questions the very foundations and growth of our modern intellectual traditions and asks whether any truth can be established. Poststructural skepticism questions the possibility of accumulating knowledge, his- torical processes, and the cultural progress of humanism and the enlightenment. Heidegger (1962) represents an alternative approach to understanding knowl- edge and progress. For Heidegger, reflective awareness is found in a world of embodied experience, a life-world. This position, shared by Winograd and Flores, Searle, Dreyfus, and others, argues that a person’s understanding of things is rooted in the practical activity of using them for coping with the everyday world. This world is essentially a context of socially organized roles and purposes. In the early twentieth century, the pragmatist position was an important compo- nent of the philosophical world view. The pragmatist maxim that the meanings of hypotheses are verified by tracing their practical consequences and implications in specific situations. As noted in Sect. 2.8, William James, Charles Sanders Peirce, and John Dewey were among the primary proponents of this pragmatist position. As the twentieth century progressed, the logical positivist or “scientific philoso- phy” tradition emerged, and with it, modern artificial intelligence. Most of the tech- nical assumptions and tools of modern AI can trace their roots through the logical positivist positions of Carnap, Frege, Russell, Tarski, and Turing through Kant, Leibniz, Hobbes, Locke, and Hume, back to Plato and Aristotle. This tradition argues that intelligent processes conform to quantifiable laws and are, in principle, understandable. But the pragmatist worldview has certainly not ended with the logical positivist. Its revival, often called neopragmatism, included the positions of Hilary Putnam, W.V.O. Quine, Ludwig Wittgenstein, Thomas Kuhn, and Richard Rorty. Neopragmatism, with a primary focus on language and meaning, turned from talk- ing about “mind” and “ideas” to considering language use. The neopragmatists felt that analyzing the role of language could bring new understanding to the notions of meaning, objectivity, and truth. Putnam, in Words and Life (1994, p. 152), advocates fallibilism, a theory that claims that doubts can be raised about any belief. Putnam claims that philosophical skepticism requires as much justification as any other philosophical position. He also claims that there are no philosophical guarantees against the need to revise a
9.5 Meaning, Truth, and a Foundation for a Modern Epistemology 227 belief and that active involvement in the world is primary in philosophy, echoing the Husserl/Heidegger world views. Wittgenstein presents his language game in Philosophical Investigations (2009), revising several of his earlier positions described in the Tractatus Logico- Philosophicus (1922). Wittgenstein (2009, p. 23) sees language use as, primarily, for accomplishing tasks within a societal context, and he contends: It is not only agreement in definitions, but also (odd as it may sound) agreement in judge- ments that is required for communication by means of language… The word “language-game” is used here to emphasize the fact that the speaking of lan- guage is part of an activity, or a form of life. Consider the variety of language games in the following examples, and in others: Giving orders and acting on them— Describing an object by its appearance or by its measurements— Constructing an object from a description (a drawing)— Reporting an event— Speculating about an event— … Translating from one language into another— Requesting, thanking, cursing, greeting, praying. Because of this language game, and as reflected in the “conjectures” of Chap. 7, different social groups can have specific rules governing their communications and circumscribing objects to which their language can refer. As a result, there are often strict limits on communication with other communities. Werner Heisenberg (2000) describes this language incompatibility in his analysis of the evolution of physics. Quine in Word and Object (2013) argues for ontological relativism, claiming that language will never support a non-subjective description of reality. Further, onto- logical relativism claims that things people believe exist are totally dependent on, and delimited by, the subjective mental language used to describe them. Similar to Wittgenstein (2009), and as a strict behaviorist, Quine (2013) contends that a spe- cific language produces words that map concepts to objects in the world. Also, like Wittgenstein, Quine argues that there is no objective method for mapping commu- nications between the languages of different communities. In The Structure of Scientific Revolutions, Kuhn (1962) also appropriates the language game by arguing that our descriptions representing reality are only accept- able if they are sufficient to produce observations and related experiments that expand our knowledge. Kuhn describes these languages as paradigms, where “nor- mal” science operates to better understand the constraints within that paradigm. The important step for Kuhn is when a generally accepted paradigm is thrown over by a new world view, with a revised language that supports new sets of relationships and experiments. Kuhn’s (1962) writings suggest that the model must not be confused with the phenomenon being modeled. Models allow humans to capture useful attributes of phenomena: there will, of necessity, always be a “residue” that is not empirically explained. A model is used to explore, predict, and confirm; and when a model can mediate these goals, it is successful. Further, different models may explain different aspects of one phenomenon, such as the wave and particle theories of light.
228 9 Toward an Active, Pragmatic, Model-Revising Realism We contend, contrary to what Husserl and modern phenomenologists might pro- pose, that when anyone suggests that aspects of intelligent phenomena are outside the scope and methods of the scientific tradition, this statement can only be verified by using that method and tradition. The scientific method is the only tool we have for explaining in what sense issues may still be outside our current understanding. Every viewpoint, even from the phenomenological tradition, if it is to have any meaning, must relate to our current notions of explanation, even to be coherent about the extent to which that phenomenon cannot be explained. Kuhn, as an example, would see a neutrino as a language construct that can be used by physicists to better explain the tensions between matter and antimatter and why there exists a physical reality at all. In a much simpler sense, an electron or π are also useful mental constructs and are therefore meaningful only as a component of a particular explanation or paradigm. Kuhn does not see successive paradigms as moving toward some absolute TRUTH, as Leibniz or Descartes did, or as the modern French philosopher De Chardin (1955) conjectures. Rather, Kuhn understands revised paradigms as simply creating new viewpoints, incommensurable with their predecessors, that describe the world in new ways. All scientific paradigms, according to Kuhn, should be assumed to be both useful as they currently support science, but possibly false, as ever newer paradigms emerge to supersede them. Richard Rorty is seen by many as the primary spokesperson for the neopragma- tist worldview. In Philosophy as the Mirror of Nature, Rorty (1979) argues that the primary problem with modern epistemology is that the human mind is seen as attempting to accurately represent, or mirror, external reality. Since this reality- world is viewed as independent of the mind, this approach must be seen as mis- guided. As an anti-foundationalist, Rorty argues that there is no given in sensory perception or self-evident premises that can act as a fixed foundation for a modern epistemology. In Contingency, Irony, and Solidarity, Rorty (1989) contends that meaning is a product of socio-linguistic agreement and truth only relates to descriptions of things. Rorty states: Truth cannot be out there—cannot exist independently of the human mind—because sen- tences cannot so exist or be out there. The world is out there but descriptions of the world are not. Only descriptions of the world can be true or false. The world on its own, unaided by the describing activities of humans, cannot. This notion of truth led Rorty to be considered a postmodern and deconstruction- ist philosopher. When the utterances of a language are limited to the describing activities of humans, many of traditional philosophy’s assumptions are undermined. An example of this relativistic viewpoint is Rorty’s statement in Contingency, Irony, and Solidarity (1989) that “anything can be made to look good or bad by being re-described.” Near the end of his life, Rorty added a more human dimension to many of his earlier positions, writing on the importance of a quality of life that is supported by
9.5 Meaning, Truth, and a Foundation for a Modern Epistemology 229 democratic traditions and a liberal worldview. In an essay entitled “The Fire of Life” (2007), Rorty speaks of cultures with richer vocabularies and of being human: “I now wish I had spent somewhat more of my life with verse. … men and women are more fully human when their memories are amply stocked with verses.” Rorty gives an interesting perspective on his neopragmatist worldview in url: 9.3. If there is a philosophical tradition supporting the AI enterprise, just as logical positivism seems to support AI’s tool-making requirements, I would contend that it is neopragmatism. There is truth in the “small,” a representation that captures the important parameters of a particular situation, e.g., the model that supports decision making for the sodium-cooled nuclear reactor seen in Sect. 8.4. There are no abso- lute truths on the agenda. An AI program is “successful” if it performs according to its specifications. There is no requirement that a program must generalize its results, transfer to related situations, or unless required by its specifications, be transparent to its human users. Further, the AI community of program designers and builders relies on the scien- tific method as articulated by Thomas Kuhn (1962). This tradition examines data, constructs models, runs experiments, and evaluates results. Experiments lead to refining models for further experiments. This scientific method has brought an important level of understanding, explanation, and the ability to predict to artificial intelligence as well as to many other human endeavors. 9.5.2 A Category Error An important trope of modern computer science is that things, including important abstractions such as π and the truth values of true and false, belong to different cat- egories. Further examples of different categories include the sets of integers and strings of characters or multidimensional arrays and control instructions. This cat- egory difference is truly a pragmatic distinction because these different “things” cannot be combined: added, subtracted, or integrated without a category change. For example, casting can make a truth value into a 1 or 0 so that it can then be added to an integer. These changes are fundamentally pragmatic: making category changes for some utilitarian purpose. We contend, and acknowledging a critical category distinction, that humans and machines are fundamentally different. Humans and machines are members of inde- pendent categories that are not reducible one to the other. Certainly, humans and machines share properties, as rocks and automobiles can share hardness, or birds and airplanes can both fly. But like the different elements on the periodic table, humans and machines belong in separate irreducible categories, changed only by some pragmatic purpose, for example, to determine their combined weights in kilo- grams. One property that humans and properly programmed computers do share is the possession of skills and responses that the informed observer can call intelligent.
230 9 Toward an Active, Pragmatic, Model-Revising Realism To point out the differences that categories entail, consider again the assumptions and suppositions of Sect. 7.4. These assumptions and suppositions support the fact that meaning for humans is achieved through a commitment. We create the real through an existential affirmation that a perceived symbol or model is good enough for addressing some of our practical needs and purposes. Searle (1969) contends as much with his notion of speech phenomena as human acts having intention and purpose. To support the point of different categories, consider the grounding prob- lem for computation. Symbol grounding is an AI challenge we have repeatedly dis- cussed: how, specifically, do symbols and systems of symbols have meaning within a computational environment? Grounding or creating meaning for humans is both individual and societal. As a result, meanings may differ across individuals and across groups in society. These differences are often resolved pragmatically, by discovering which meaning com- mitment leads to more satisfactory results. But the fact of having divergent mean- ings is not the critical issue here: it is the phenomenon of meaning itself. The grounding issue is but one reason why computers have fundamental problems with expressions of intelligence, including their ability to understand human language and to learn. What disposition might a computer be given that supports appropriate and flexible purposes and goals? Although some (Dennett 1991) may impute grounding to a computer solving problems requiring and using “intelligence,” the lack of sufficient grounding is easily seen in the com- puter’s simplifications, brittleness, and often limited appreciation of evolving contexts. The use and grounding of symbols by animate agents implies even more. The particular nature of our human embodiment and social contexts mediate our interac- tions with the world. Our auditory and visual systems are sensitive to a particular bandwidth. We view the world as erect bipeds. We have arms, legs, and hands. We are part of a world with weather, seasons, sun, and darkness. We are individuals that are born, reproduce, and die. We operate within a society that itself has evolving goals and purposes. All these attributes are critical components supporting meta- phors of understanding, learning, and language and they mediate our comprehen- sion of art, life, and love, as we note again in our final section. To conclude, humans and computers simply “live” and “make decisions” in alternative search spaces: the many components that make up complex decisions are, simply put, different. Aristotle himself noted in his Essay on Rational Action, “Why is it that I don’t feel compelled to perform that which is entailed?” For humans, sound reasoning is only one part of mature judgments. We must conclude that there are many human activities that play an essential role in responsible human interactions, behaviors, and judgments; these responsibilities cannot be reproduced by or abrogated to machines.
9.5 Meaning, Truth, and a Foundation for a Modern Epistemology 231 9.5.3 The Cognitive Neurosciences: Insights on Human Processing Section 3.5 introduced many early studies in the cognitive science research domain. Even a weak interpretation of the physical symbol system hypothesis, that represen- tations and search offer a sufficient model for intelligent behavior, has produced many powerful and useful results in cognitive science and psychology. Although much early research was inspired by the physical symbol system hypothesis, vari- ous associative and connectionist representations have proven valuable for compu- tational modeling of human language, perception, and performance. Although current research in psychology and neuroscience offers many possible explanations, or models, for aspects of human processing, there remain many more open and interesting questions. Consider the cortical response system, shaped and conditioned by its social and survival needs. In cortex, for example, the amygdala and limbic systems, connected to every aspect of human perception and understand- ing, are responsible for emotional reactions, survival instincts, and memories. Research in cognitive neuroscience (Gazzaniga 2014) has added considerably to our understanding of the components of cortex involved in intellectual activity. A brief summary of open research issues in the cognitive neurosciences include: 1. In the area of perception, attention, and memory formation, there is the binding problem. Perceptual representations depend on distributed neural codes for relat- ing parts and properties of objects to each other. What mechanisms are needed to “bind” the various components of information related to each perceived object and to distinguish that object from others? 2 . In the area of visual search, what neural mechanisms support the perception of objects embedded in large complex scenes? Experiments show that suppression of information from irrelevant objects plays a role in the selection of a visual focus (Luck 1998). 3 . In considering the plasticity of perception, Gilbert (1998), Maturana and Varela (1987), and others contend that what we see is not strictly a reflection of the physical characteristics of a scene. Rather, perception is highly dependent on the processes by which our brain interprets that scene. 4 . How does the cortical system represent and index time-related sequences of information, including the interpretation of perceptions and the production of motor activity? 5. Finally, in memory studies, stress hormones, released during emotionally charged situations, modulate memory processes (Cahill and McGaugh 1998). This relates to the grounding problem: by what physical processes are thoughts, words, and perceptions, along with their emotional entailments, meaningful to a person? We see human processing also through the writings of the philosopher Immanuel Kant (1781) and the psychologist Fredrick Bartlett (1932). Kant proposed the notion of a priori knowledge, represented as a schema, that mediated new perceptions and
232 9 Toward an Active, Pragmatic, Model-Revising Realism understandings of the world. Bartlett, in his work on human memory, proposed similar ideas. Piaget’s genetic epistemology (1965, 1983), with constructs of assim- ilation and accommodation leading to system equilibration, demonstrated this approach through numerous studies of children moving through the different stages of their development. Modern philosophers and psychologists have augmented the ideas of Bartlett and Piaget with the notion that humans develop through their continuing and purposive exploration of their environment (Glymour 2001; Gopnik et al. 2004; Gopnik 2011a, b). Complementing this viewpoint must be a serious dose of pragmatism: Human actions are about something, and every task has an often-implicit meaning and emotional valence. We have proposed the integration of these philosophical and psychological traditions into a computational modeling medium sufficient to cap- ture important aspects of human problem-solving behavior. 9.5.4 On Being Human: A Modern Epistemic Stance Section 7.4 offered a set of five assumptions and eight follow-on conjectures that offer a foundation for a modern epistemology. This epistemic stance positions a survival-driven human agent in an ever-evolving context. The scope of this context is societal, in that all individuals not only need a society to survive but that we fash- ion our reality together. Acts of mutual interaction create symbols and patterns and networks of symbols that we use to both decode our environments and to thrive within them. Together we are the medium for knowledge, meaning, and truth. We introduced Bayes’ theorem, hidden Markov models, Bayesian networks, and dynamic Bayesian networks to offer sufficient modeling tools for understanding how information can be encoded both in the individual and in society. The different probabilistic techniques and examples presented demonstrate how Bayesian tech- nology can support conditioned responses, perceptual learning, as well as the assim- ilation, integration, and use of knowledge. The information patterns that are learned and reinforced over time in probabilis- tic networks are very much in the empiricist tradition, following the insights of David Hume (1748/1975). The general principles and associations encoded in the network reflect the rationalist tradition (Leibniz 1887). Finally, pragmatic require- ments of survival are reflected in the network’s search for satisfactory conditions of equilibrium or what Piaget calls equilibration. We suggest that sets of integrated dynamic Bayesian networks can be interpreted as sufficient models for aspects of human perception, knowledge, and performance. These networks integrate different perceptual modalities, link perception with amygdala-based emotional responses and the cognitive components of the human system. They also control the focusing mechanisms of the prefrontal cortex. Bayesian-like responses are ubiquitous in humans, for example, in tactile aspects of human perception. How is the human able to withdraw a hand from a hot stove faster than the signal “too hot” can travel from the finger to cortex, generate a “move
9.5 Meaning, Truth, and a Foundation for a Modern Epistemology 233 it” decision, and return that decision back to the hand? The human hand itself is a conditioned response system. The dynamic aspects of these Bayesian networks inte- grate perception, emotion, and intelligence across time. The interplay of components of the dynamic Bayesian network actively seeks “missing” information that can lead to system equilibration. We see this in develop- mental psychology and Piaget’s simple conservation experiments. Finding equilib- ria also drove the more complex scenario of fault detection, remediation, and contrafactual reasoning seen in the sodium-cooled reactor example of Sect. 8.4. In diagnostic situations, the drive for equilibration within a dynamic Bayesian network supports both the search processes that discover and integrate missing data and provide explanations that justify these searches and results. AI research into model building, refinement, and revision is still active. Although probabilistic models do, by design, seek equilibria, discovering techniques for the identification and integration of new parameters to a model remains a challenge. Hebbian-type conditioning can strengthen components of a model. Connectionist networks that include reinforcement learning attempt to identify and integrate micro-pieces of a solution that can lead to the discovery of larger solutions. Finding methods for refining and extending models, as humans do when facing model fail- ure, remains an open research issue. Answers may emerge from a better understand- ing of human agents’ skills and the purposive exploration of their environments. The assumptions and conjectures presented in Sect. 7.4 offer a foundation for a modern epistemology. There are multiple conjectures for how the five assumptions might be physiologically enabled within the human subject. The British neuroscien- tist Karl Friston’s (2009) free energy minimization theory, for example, can be seen as equivalent to our “survival” assumption. Free energy minimization also offers an explanation for how humans integrate a priori expectations with related sensory input, Piaget’s accommodation. Karl Friston (2009) and Geoffrey Hinton (2007) also provide insights on how symbols and patterns of network activations might be integrated into the human processing system. Knill and Pouget (2004) describe the Bayesian brain and neural coding support for addressing uncertainty. Although these researchers describe pos- sible implementation details for components of a human-centric epistemic stance, they do not extend their ideas to building a foundation for a modern epistemology. There were three goals for creating this book, each inspired by insights gained from research progress made by the artificial intelligence and cognitive science communities. The first goal was to consider and critique the foundational assump- tions of AI technology. The second goal was to suggest several data structures, net- works, and search algorithms created by the AI community as sufficient models for capturing important components of human perception, understanding, and problem- focused behavior. The third and most important goal in writing this book was to propose a founda- tion for a modern epistemology. This goal was described in Chap. 7, with the pre- sentation of five assumptions and a small set of follow-on conjectures. The five assumptions affirm the survival of the individual, and equally that of society, as the motivation for all behavior. Survival mediates the creation of symbols and models.
234 9 Toward an Active, Pragmatic, Model-Revising Realism The subsequent conjectures are intended to capture the essence of how we humans survive, understand, and flourish in our world. These symbol systems have meaning because the humans using them hold a common agreement and commit- ment as to what the symbols mean and how they are to be used. As suggested in Conjectures 3 and 5, individuals and society collaborate in creating and giving meaning to symbols and adopting systems of symbols to represent the knowledge and science that supports humans’ common purposes. Only the extreme solipsist, or the mentally challenged, can deny the reality of an extra-subject world. But what is this so-called “real world?” Besides being a com- plex combination of hard things and soft things, as Putnam (1987) notes there are: “… tables and chairs and ice cubes. There are also electrons and space time regions and prime numbers and people who are a menace to world peace.” We would also add that there are systems of atoms, molecules, quarks, gravity, relativity, indeter- minacy, cells, DNA, and perhaps even superstrings. All these explanatory constructs are just exploratory models driven by the pragmatic requirements of equilibration- driven humans. These exploratory models are not just about an “external” world. Rather, they capture the dynamic equilibrating tensions of the intelligent and social agent and of a material intelligence evolving and continually calibrating itself within the continuums of space and time. The assumptions and conjectures of Sect. 7.4 are but a rationalist approximation to, and a beginning understanding of ourselves operating in an evolving and survival- driven world. The full dynamic integration of the human experience is found through actively interacting within our environment, of creating ourselves through our social interactions, of being there. We can also discover important expressions of human maturity through our artistic and literary traditions. Perhaps Richard Rorty is right in suggesting that lit- erature is the new epistemology, that meaning is coming to terms with ourselves and society, and that our artists have the important project of helping us appreciate this relationship. In introducing their songs, the epic poets, including Homer, Virgil, Dante, and Milton, all invoked the Muses of history, wisdom, and poetry. Albert Camus (1946), in The Stranger suggests that “Fiction is the lie through which we tell the truth.” Joan Didion (1979), in The White Album proposes that “We tell our- selves stories in order to live…” Three examples from my own literary background express this human intellec- tual and emotional support/requirement. First, after the destruction by the Greeks of his much-loved Troy, Virgil’s Aeneas arrives at Carthage. On entering Dido’s pal- ace, he sees a mural depicting battles of the Trojan war and the deaths of his fellow countrymen. Aeneas is saddened by the scene and says, “… sunt lacrimae rerum et mentem mortalia tangent.” Or “… there are tears at the heart of things and mortality moves the soul…” What integration of human vision, memory, understanding, and emotion can enable a simple mural to evoke such a response? As a second example, consider lines from Shakespeare’s Sonnet XVIII: Shall I compare thee to a summer’s day? Thou art more lovely and more temperate: Rough winds do shake the darling buds of May,
9.5 Meaning, Truth, and a Foundation for a Modern Epistemology 235 And summer’s lease hath all too short a date... These lines capture several complex human emotions. Shakespeare questions comparing his lover to a summer day, and immediately qualifies his comparison by saying she is “more lovely and more temperate.” What does this comparison and contrast of a lover to a summer day imply in describing a human relationship? Further, what are the limits of love, emotion, and mortality that permeate the last lines: “Rough winds do shake the darling buds of May, and summer’s lease hath all too short a date?” As a final example of the human condition mediating interpretation, consider Dylan Thomas’s plea to his dying father, a verse that speaks to all: Do not go gentle into that good night. Rage, rage against the dying of the light. Meaning is a human-created artifact. It is a derivative of the human agent’s need to survive both individually and as a community. And truth is a human-created soci- etal norm. Truth is the alignment of a person’s or a group of people’s meanings with the goal-related meanings of another individual or of a society. Different societies, indeed, different components of the same society, will have different meanings and truths, as is often seen among religious, political, and cultural groups. Examples of societies’ different sets of truths can be seen in a science journal articles, beliefs concerning the roles of men and women, or in the declared positions of a political party. The question of whose truth can cause multiple conflicts that are often only reconciled through the pragmatic outcomes of their use. Although truth may be relative to particular sets of meanings established by indi- viduals and societies, all relativism is confined within the limits of self, society, and science for the measure of what is real. Individuals assume a responsible assimila- tion of knowledge and a measured commitment to truths. Societies provide their methods for conditioning individual members, including schools, jails, mental insti- tutions, and all too often, wars. There is always the dangerous possibility that fears, beliefs, and unrealistic hopes can create an unsustainable “reality,” such as that found in some commonly accepted cultures, myths, political stances, and religions. We conjecture that, as an agent’s needs and maturity require and even demand, this “relativist” stance will continue to evolve (Piaget 1983; Heisenberg 2000; Hawking and Mlodinow 2010). As suggested in Conjecture 8, the individual, soci- ety, and science are continually recreating and recalibrating models as well as the language for expressing what is knowable. This scientific methodology is the best guarantor of not just surviving in our world but also for coming to understand and enjoy it. Exploration-driven relativism may appear as a threat to many. But the response to the criticism of total relativism is centered in the responsibility humans take, both individually and collectively, when creating their symbols, sets of symbols, beliefs, truths, and judgments. Our description of human maturity is of a person and of a society that is open and humble before a world that is never fully understood.
236 9 Toward an Active, Pragmatic, Model-Revising Realism This person and society are ready to learn, always open to new appreciation of what an evolving reality portends, and above all, ready to acknowledge both igno- rance and error. The person is other-oriented, finding full maturity as a component of its social context. This mature person sees all individuals as seekers similar to themselves and sees in society a medium for finding, expressing, and enjoying a common responsibility. I contend that using the heuristic and pragmatic constraints of humility, self- awareness, and self-preservation to come to know ourselves, science, and society, we can both appreciate and embody the epistemic stance of an active, pragmatic, model-revising realism. Further Reading The writings of the philosophers Russell Goodman and Clark Glymour, as well as the insights of the developmental psychologist Allison Gopnik, inspired many aspects of this final chapter. Our developmental psychologists have shown that the understanding of how humans learn as they mature support insights into how more mature humans experience, explore, and revise their understandings as they come to appreciate and enjoy their world. Glymour, C. (2001). The Mind’s Arrows: Bayes Nets and Graphical Causal Models in Psychology. Gopnik et al. (2004). A Theory of Causal Learning in Children: Causal Maps and Bayes Nets. Gopnik, A. (2011a). A Unified Account of Abstract Structure and Conceptual Change: Probabilistic Models and Early Learning Mechanisms. Gopnik, A. (2011b). Probabilistic Models as Theories of Children’s Minds. I thank my talented graduate students Drs. Joseph Lewis and Nikita Sakhanenko for designing projects in this chapter and my friend Professor Lydia Tapia for intro- ducing me to the PRM-RL robot project at google. Thanks also to Professor Russell Goodman for his comments on this chapter. I recommend Goodman’s books: Goodman (1995). Pragmatism: A Contemporary Reader. Goodman (2002). Wittgenstein and William James. Goodman (2015). American Philosophy Before Pragmatism. We thank Karger Publications, Basel, for permission to use Fig. 9.1. This figure appeared in Luger et al. (2002). Figures 9.2 and 9.3 came from the PhD dissertation in Computer Science at UNM of Dr. Joseph Lewis. Figure 9.4 came from the PhD dissertation in Computer Science at UNM of Dr. Nikita Sakhenenko.
Bibliography Ackley, D.H. and Ackley E.S. 2016. The ulam programming language for artificial life. Artificial Life 22:431-450. Cambridge, MA: The MIT Press. Adami, C., & Brown, C.T. 1994. Evolutionary learning in the 2D artificial life system “Avida” Adaptation, noise, and self-organizing systems. Report No. MAP-173, Cornell, Cornell University. Adleman, L. M. 1994. Molecular computation of solutions to combinatorial problems. Science 266 (5187): 1021–1024. Agre, P. and Chapman, D. 1987. Pengi: An implementation of a theory of activity. Proceedings of the sixth national conference on artificial intelligence, pp. 268–272. CA: Morgan Kaufmann. Anderson, J.R. and Bower, G.H. 1973. Human associative memory. Hillsdale, NJ: Erlbaum. Arbib, M. 1966. Simple self-reproducing universal automata. Information and Control 9: 177–189. Arulkumaran, K., Antoine, C., & Togelius, J. 2020 AlphaStar: An evolutionary computation per- spective. Proceedings of the genetic and evolutionary computation conference companion. Austin, J.L. 1962. How to do things with words. Cambridge, MA: Harvard University Press. Awodey, S. 2010. Category theory, Oxford Logic Guides 49. London: Oxford University Press. Bacon, F. 1620. Novum organum. Londini: Apud (Bonham Norton and) Joannem Billium. Baker, Stephen 2011. Final jeopardy: Man vs. machine and the quest to know everything. Boston, New York: Houghton Mifflin Harcourt. Balestriero, R. and Baraniuk, R.G. 2018. A spline theory of deep networks. Proceedings of the 35th International Conference on Machine Learning, vol. 80 pp. 383-392. Barkow, J.H., Cosmides, L., and Tooby, J. 1992. The adapted mind. New York: Oxford University Press. Bartlett, F., 1932. Remembering. London: Cambridge University Press. Bayes, T. 1763. Essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London. London: The Royal Society, pp. 370-418. Ben-Amram, A.M. 2005.The Church-Turing thesis and its look-alikes. SIGART News, 36(3): 113-116. Bender, E.M and Koller, A. 2020. Climbing towards NLU: On meaning, form, and understand- ing in the age of data. Proceedings of the 58th Meeting of the Association for Computational Linguistics. ACL: 5185-5198. Bengio, Y., Ducharme, R., and Vincent, P. 2003. A neural probabilistic language model. Journal of Machine Learning Research 3 pp. 1137-1155. Berlin, B. and Kay, P. 1999. Basic color terms: Their universality and evolution, 2nd Ed. Stanford: CSLI Publications. Bishop, C.M. 2006. Pattern recognition and machine learning. Springer, New York. Black, M. 1946. Critical thinking, New York: Prentice-Hall. © Springer Nature Switzerland AG 2021 237 G. F. Luger, Knowing our World, https://doi.org/10.1007/978-3-030-71873-2
238 Bibliography Blackburn, S. 2008. The Oxford dictionary of philosophy, 15th edn. London: Oxford University Press. Bledsoe, W.W. and Browning, I. 1959. Pattern recognition and reading by Machine Proceedings of the eastern joint computer conference. New York: IEEE Computer Society. Boole, G. 1847. The mathematical analysis of logic. Cambridge: MacMillan, Barclay & MacMillan. Boole, G. 1854. An investigation of the laws of thought. London: Walton & Maberly. Boden, M. 2006. Mind as machine: A history of cognitive science. Oxford University Press. Bower, T.G.R. 1977. A primer of infant development. San Francisco: W.H. Freeman. Brachman, R.J. and Levesque, H. J. 1985. Readings in knowledge representation. Los Altos, CA: Morgan Kaufmann. Bradshaw, G. L., Langley, P., & Simon, H. A. 1983. Studying scientific discovery by computer simulation. Science, 222, 971-975. Brooks, R.A. 1986. A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation. 4:14–23. Brooks, R.A. 1991. Intelligence without representation. Proceedings of IJCAI–91, pp. 569–595. San Mateo, CA: Morgan Kaufmann. Brooks, R.A. 1997. The cog project, Journal of the Robotics Society of Japan, Special Issue (Mini) on Humanoid, Vol. 15(7) T. Matsui, (Ed). Brown, P. 2011. Color me bitter: Crossmodal compounding in Tzeltal perception words. The Senses and Society, 6(1). 106-116. Brown, T.B. et al. (31 co-authors). 2020. Language models are few-shot learners. https://arxiv.org/ abs/2005.14165. Bruner, J.S., Goodnow, J., and Austin, G.A 1956. A study of thinking, New York: Wiley. Buchanan, B.G. and Shortliffe, E.H. eds. 1984. Rule-based expert systems: The MYCIN experi- ments of the Stanford heuristic programming project. Reading, MA: Addison-Wesley. Bundy, A. 1983. Computer modelling of mathematical reasoning. New York: Academic Press. Bundy, A., Byrd, L., Luger, G., Mellish, C., Milne, R., and Palmer, M. 1979. Solving mechanics problems using meta-level inference. Proceedings of IJCAI-1979, pp. 1017–1027. Burge, J., Lane, T., Link, H., Qiu, S., and Clark, V. P. 2007. Discrete dynamic Bayesian network analysis of fMRI data. Human Brain Mapping 30(1), pp 122–137. Burks, A.W. 1971. Essays on cellular automata. University of Illinois Press. Illinois Cahill, L. and McGaugh, J.L. Modulation of memory storage. In Squire and Kosslyn 1998. Camus, A. 1946. The stranger. New York: Vantage Books. Carlson, N.R. 2010. Physiology of behavior, 10th edn. Needham Heights, MA: Allyn Bacon. Carnap, R. 1928. Der Logische Aufbau der Welt (The Logical Structure of the World). Leipzig: Felix Meiner Verlag. Castro, F.M., Marin-Jimenez, M.J., Guil, N., Schmid, C., & Alahari, K., 2018. End-to-end incre- mental learning. https://doi.org/arXiv:1807.09536v2. Ceccato, S. 1961. Linguistic analysis and programming for mechanical translation. New York: Gordon & Breach. Chakrabarti, C. and Luger, G.F. 2015, Artificial conversations for customer service chatter bots: Architecture, algorithms, and evaluation metrics. Expert Systems with Applications 42(20), 6878–6897. Chakrabarti, C., Pless, D. J., Rammohan, R., and Luger, G. F. 2005. A first-order stochastic prog- nostic system for the diagnosis of helicopter rotor systems for the US navy, In Proceedings of the FLAIRS-05, Menlo Park, CA: AAAI Press. Chakrabarti, C., Pless, D. J., Rammohan, R., & Luger, G. F. 2007, Diagnosis using a first-order stochastic language that learns, Expert systems with applications. Amsterdam: Elsevier Press. 32: 3. Changizi, M.A., Hseih, A., Nijhawan, R., Kanai, R. and Shimojo, S. 2008. Preceiving the present and a systematization of illusions. Cognitive Science, 32(3): 459-503. Chen, L. and Lu, X., 2018. Making deep learning models transparent. Journal of Medical AI, 1:5. Chomsky N. 1959. A review of B.F. Skinner’s verbal behavior. Language 35 (1): 26-58.
Bibliography 239 Church, A. 1935. Abstract No. 204. Bull. Amer. Math. Soc. 41: 332-333. Church, A. 1941. The calculi of lambda-conversion. Annals of mathematical studies. Vol. 6. Princeton, NJ: Princeton University Press. Clark, A., 2013. Whatever next? Predictive brains, situated agents, and the future of cognitive sci- ence. The Behavioral and Brain Sciences, 36 (3), 181-204. Clark, A., 2015. Radical predictive processing. The Southern Journal of Philosophy, 53 S1. Codd, E.F. 1968. Cellular automata. New York: Academic Press. Codd, E.F. 1992. Private communication to J. R. Koza. In Koza. Collins A. and Quillian, M.R. 1969. Retrieval time from semantic memory. Journal of Verbal Learning and Verbal Behavior, 8: 240–247. Cosmides, L., & Tooby, J. . 1992 Cognitive adaptations for social exchange. In Barkow et al. Cosmides, L., & Tooby, J. 1994 Origins of domain specificity: The evolution of functional organi- zation. In Hirschfeld and Gelman. Crutchfield, J.P., & Mitchell, M. 1995. The evolution of emergent computation. Working Paper 94-03-012. Santa Fe Institute. D’Amour, A. et al. (40 co-authors) 2020. Underspecification presents challenges for credibility in modern machine learning. https://arxiv.org/abs/2011.03395. Darling, M.C., Luger, G.F., Jones, T.B., Denman, M.R., & Groth, K.M. (2018). Intelligent moni- toring for nuclear power plant accident management. Int. J. of AI Tools. World Scientific Pub. Darwin, C. 1859. On the origin of species. New York: P.F. Collier & Son. Davis, M. (ed.) 1965. The undecidable, basic papers on undecidable propositions, unsolvable prob- lems and computable functions. New York: Raven Press. Davis, L. 1985. Applying adaptive algorithms to epistatic domains. Proceedings of the International Joint Conference on Artificial Intelligence, 1985: 162-164. Davis, K.H., Biddulph, R., & Balashek, S. 1952. Automatic recognition of spoken digits. Journal of the Acoustical Society of America, 24(6), 637-642. Dawkins, R. 1976. The selfish gene. Oxford: The University Press. De Chardin, P.T. 1955. The phenomenon of man. New York: Harper and Brothers. De Palma, P. 2010. Syllables and Concepts in Large Vocabulary Speech Recognition. PhD Thesis University of New Mexico, Department of Linguistics. De Palma, P., Luger, G.F., Smith, C., & Wooters, C. 2012. Bypassing words in automatic speech recognition. MAICS-2012. Dechter, R. 1986. Learning while searching in constraint-satisfaction problems. Proc. of the 5th National conference on artificial intelligence. AAAI Press, New York. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., and Harshman, R. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41 (6): 391-407. Dempster, A.P., Laird, N.M., and Rubin, D.B. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, B, 39, p 1-38. Dennett, D.C. 1991. Consciousness explained. Boston: Little, Brown. Dennett, D.C. 1995. Darwin’s dangerous idea: Evolution and the meanings of life. New York: Simon & Schuster. Dennett, D.C. 2006. Sweet dreams: Philosophical obstacles to a science of consciousness. Cambridge: MIT Press. Derrida, J. 1976. Of grammatology, Baltimore, MD: Johns Hopkins University Press. Descartes, R. 1637/1969. Discourse on method: Meditations on the first philosophy. New York: Duton. Descartes, R. 1680. Six metaphysical meditations, wherein it is proved that there is a God and that man’s mind is really distinct from his body. W. Moltneux, translator. London: Printed for B. Tooke. Devlin, J., Chen, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. https://arxiv.org/abs/1810.04805. Dewey, J. 1916. Democracy and education. New York: Macmillan.
Search
Read the Text Version
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
- 213
- 214
- 215
- 216
- 217
- 218
- 219
- 220
- 221
- 222
- 223
- 224
- 225
- 226
- 227
- 228
- 229
- 230
- 231
- 232
- 233
- 234
- 235
- 236
- 237
- 238
- 239
- 240
- 241
- 242
- 243
- 244
- 245
- 246
- 247
- 248
- 249
- 250
- 251
- 252
- 253
- 254
- 255
- 256
- 257
- 258
- 259
- 260
- 261
- 262
- 263
- 264
- 265
- 266
- 267