336 Chapter 10 n Topics to Pursue from Here Figure 10.2 A somewhat more varied transportation graph between river towns. Let us change our example to see what happens if the heuristic is not always admissible and we still want the shortest path. We will have to take some liberties with the graph to make this happen, yielding the revised graph shown in Figure 10.2. In order to shorten travel times, the cities at node A and node X have commissioned a catapult service to fling travelers across the river extremely quickly. This makes our heuristic inadmissible. The move to catapult service was sparked by heavy rains that forced the cancellation of the ferry service and caused major damage to the road between node B and node C. The roads to and from node X also suffered some minor damage. We wish that the code should not need to touch any node that it has already touched, but the combination of an inadmissible heuristic and the desire for the shortest path means we need to handle paths that split and then merge differently than before. If a second path makes it to a merge point—a node we have already touched—with a lower g value than what the merge point already has, then the first path through the merge point is not the optimal path, and the merge point goes back onto the open list with new values for g and f. This is the case in Figure 10.2. Again, we start at C. Our final goal is node W instead of node X. Node X is the merge point. The numbers for the nodes are shown in the figure. Node C goes on the open list first. Being the only node there, its neighbors, B and Z, which are not on any list, go on the open list, and C goes on the closed list. We examine the open list for the most promising node.
A* Path Finding 337 Z at 16 beats B at 16.9, so Z is examined next. Z cannot improve C’s numbers, so C stays on the closed list. Z puts Y on the open list with B because Y is a new node. We are done with Z, so it joins C on the closed list. Again, we examine the open list for the most promising node. Y at 16 also beats B at 16.9, so we examine Y. Y cannot improve on Z’s numbers, so Z stays on the closed list. X is new, so it goes on the open list. Y is done, so Y joins Z and C on the closed list. From the open list, B at 16.9 beats X at 17, so we examine B next. B cannot improve C’s numbers, so C stays on the closed list. A is new, so B adds A to the open list with X. B gets put on the closed list with C, Z, and Y. On the open list, X at 17 beats A at 17.7, so we examine X next. X cannot improve Y’s numbers, so Y stays on the closed list. X cannot improve A’s numbers, so X does not update A on the open list. W is new, so X puts W on the open list. X joins Y, Z, C, and B on the closed list. On the open list, A at 17.7 beats W at 18, so A is examined next. Here is where we will encounter the effects of an inadmissible heuristic. The catapult method of crossing the river is far faster than the heuristic expects anyone to ever go. This is where the heuristic gives an inadmissible estimate of the cost. Node A cannot improve B’s numbers, so B stays on the closed list. Node A can improve X’s numbers, so X is updated and moved from the closed list back to the open list to join W. An A* implementation that has the normal optimization of ignoring nodes on the closed list would not have touched X again. Nor would it have taken the effort to see if it could improve the numbers of any node on the closed list as we did in the preceding paragraphs. Node A goes onto the closed list with B, C, Y, and Z. The open list is again checked. X at 16.5 beats the existing W at 18, so X is given another chance. X cannot improve A or Y, so they remain on the closed list. X can improve W on the open list, so W goes from 18 to 17.5. X goes to the closed list with all the other nodes except W. Since the goal node has the best score on the open list, it is used as it is. This example shows that the world does not end if the heuristic is sometimes inadmissible. It can have easily happened that with an inadmissible heuristic, A* will not find the shortest path, regardless of whether it reexamines nodes on the closed list or not (see Exercise 2 at the end of this chapter). This is the penalty for
338 Chapter 10 n Topics to Pursue from Here using an inadmissible heuristic. But if catapults are rare and roads are the norm, we might stay with our inadmissible heuristic. We do this because it is more aggressive about discarding paths early, and that is a performance win. The effect in our example would be that sometimes, the pathfinder neglects to exploit catapults when they are off the beaten path. In our example, the catapult is 8 times faster than the heuristic. If we tune for catapults instead of roads, we have to expand our searching by a factor of 8 to get to the point where we say, ‘‘Even a catapult won’t help you when you are this far away,’’ and that could be a lot of useless searching. Since reexamining nodes on the closed list does not guarantee getting the shortest path when the heuristic is inadmissible, and reexamining nodes on the closed list is pointless when the heuristic is admissible, the case for reexamining nodes on the closed list is too weak to justify. Besides holding the f, g, and h numbers, each node also tracks what node came before it in the path. When our code reached X via A, X would store A. When a node is placed on the open list for the first time or updated with better numbers, the node is also updated with the new predecessor in the path. When our example reached X via Y, it updated the costs and also overwrote the stored A with Y. This way, when A* completes, the nodes have stored the best path. Caveats A* is popular because it usually performs well and gives optimal results. Per- forming well does not make it free, however; A* can be costly to run. When no path exists, A* exhibits maximum cost as it is forced to explore every node in the graph. If the usual case is that paths usually exist and obstacles are localized, then dealing with the difference in performance between the usual case and the worst case presents a difficult engineering challenge. A* also depends heavily on the concept of ‘‘here’’ in the graph. The world may be represented at a low level by a mesh of tiny triangles that can be fed to the graphics engine, but that mesh makes for expensive path finding. A room may have a large number of triangles that make up the floor, but for navigation purposes it can be reduced to a single ‘‘here’’ or a small number of ‘‘heres.’’ To help with path finding, many games have a coarse navigation mesh with far fewer nodes than the graphics mesh. The navigation mesh makes the global pathfinder run quickly by greatly reducing the number of nodes. The nodes in the navi- gation mesh can be forced to have nice properties that make them easier to deal
Machine Learning 339 with, such as being convex. Path finding within a node in the navigation mesh is done locally, and simple methods often suffice; if the node is convex, then a straight line is all that is required. An issue with any pathfinder is that long paths are fragile paths. Bridges get blown up, freeways are subject to being closed by accidents, and fly-over rights can be denied after a flight takes off. Before blindly using A* and expecting all paths to work, you should investigate the methods that are being used to deal with path problems. Before the novice AI programmer rejoices that path finding is solved, he or she needs to consider the needs of the virtual world. A* will rightly determine that using roads and bridges is better than taking cross-country treks and fording streams. What A* does not do is traffic control. Unexpected traffic jams on popular routes and near choke points are realistic, but so are traffic cops. The game’s AI might be smart enough to prevent gridlock, but the planning code still has a problem. The planning code has to factor in the effects of the very traffic the planning code is dispatching. If the planning code places a farm tractor, a truck carrying troops, and a motorcycle dispatch rider on the same narrow path in that order, the whole convoy is slowed to the speed of the tractor. A* told the planner that the dispatch rider will get to its destination in far less time than will actually happen. A* told the planner that the troops will get there on time as well. Even worse is the case when opposing traffic is dispatched to both sides of a one-lane bridge. A* by itself may not be enough. Machine Learning Machine learning has been ‘‘the next big thing’’ in game AI for many years. The methods have been described as elegant algorithms searching for a problem to solve. The algorithms have been put to good use in other areas of computer science, but they have gained very little traction in game AI. People have been looking for ways of getting around the limits of machine learning in games for some time now [Kirby04]. Realizing the potential of machine learning is difficult. There are many kinds of machine learning. Two of them are popular discussion topics: neural networks and genetic algorithms. Neural networks appear to be very enticing, particularly to novice game AI programmers. They have been a popular topic for over a decade in the AI roundtables at the Game Developers Conference. The topic is
340 Chapter 10 n Topics to Pursue from Here usually raised by those with the least industry experience and quickly put to rest by those with many years of experience. The conventional wisdom is that neural networks are not used in commercial games. By contrast, genetic algorithms have gained a very modest toe-hold into game AI, but not in the manner that you might first expect, as we shall see. A major goal of this section is to give beginning AI programmers sufficient knowledge and willpower to enable them to resist using either method lightly. Before considering the different machine-learning algorithms, it is worthwhile to consider how they will be used. The first big question to ask of any proposed game feature is, ‘‘Does it make the game more fun?’’ If cool techniques do not translate into improved player experience, they do not belong in the game. The next big question to answer is, ‘‘Will it learn after it leaves the studio?’’ For most machine learning used in games, the answer is a resounding ‘‘No!’’ Quality- assurance organizations do not want support calls demanding to know why the AI suddenly started cheating or why it suddenly went stupid. In the first case, the AI learned how to defeat the player easily; in the second case, the AI learned something terribly wrong. Most ordinary car drivers are happy with the factory tuning of their real-world automobiles, and most players are fine with carefully tuned AI. That being said, the answer is not always ‘‘No’’; Black & White is an early example of a game that turned machine learning in the field into a gameplay virtue. Learning algorithms are susceptible to learning the wrong thing from outlier examples, and people are prone to the same mistake. Black & White learned in the field, and learning supported a core gameplay mechanic. One of the main fea- tures of the game was teaching your creature. If teaching is part of the gameplay, then learning is obviously an appropriate algorithm to support the gameplay. How many games can be classified as teaching games? Most games do not employ teaching in their gameplay, and most of those same games have no need of learning algorithms. There may be other gameplay mechanics besides teaching that are a good fit for machine learning in the field, but it will take an innovative game designer backed by a good development team to prove it. The final question always to ask is, ‘‘Is this the right tool for this job?’’ For neural networks in game AI, the answer is usually an emphatic ‘‘No.’’ In some cases involving genetic algorithms, the answer will be a resounding ‘‘Yes.’’ One of the clarifying questions to ask is, ‘‘Is it better to teach this than it is to program it?’’
Machine Learning 341 Of course, ‘‘better’’ can have many different interpretations: easier to program, produces a better result, can be done faster, etc. Training The trick with teaching—or training, as it often is called—is to make sure that the right things are taught. This is not always a given. A system trained to distinguish men from women at first had trouble with long-haired male rock stars. Another image-analysis system learned to distinguish the photographic qualities of the training data instead of differences in the subjects of the pho- tographs. The system is only as good as the training data; it will faithfully reflect any biases in the training data. Lionhead Studios improved the user interface to the learning system in Black & White II to make it explicitly clear to players exactly what feedback they were about to give their creature, solving a major source of frustration with the original game. The most straightforward way to ensure that learning has taken place is to administer a test with problems that the student has never seen before. This is called an independent test set. A system that is tested using the same data that trained it is subject to memorizing the problems instead of working out the solutions. Just as other software is tested, learning systems need test data that covers a broad range of the typical cases, as well as the borderlines and outliers. The system is proven to work only in the areas that the test set covers. If the system will never encounter a situation outside of the training set, the demand for an independent test set is relaxed. Hidden in this coverage of training is a fact that needs to be made explicit: We are talking about two different kinds of training. The two methods of training are supervised training and reinforcement training. In supervised training, the learning algorithm is presented with an input situation and the desired outputs. The image-analysis system designed to identify men versus women mentioned earlier was taught this way. The system trained against a set of pictures of people, where each picture was already marked as male or female. Supervised training happens before the system is used and not when the system is in use. If there is a knowledgeable supervisor available at run-time, there is no need for the learning system to begin with. The success of supervised training in learning algorithms outside of the game industry has not been echoed within the game industry. Unless otherwise noted, supervised learning is the default when discussing learning algorithms.
342 Chapter 10 n Topics to Pursue from Here Reinforcement learning, by contrast, emphasizes learning by interacting with the environment. The system starts with some basic knowledge and the need to maximize a reward of some kind. In operation, the system must balance between exploitation and exploration. At any given point in time, the system might pick the action it thinks is best, exploiting the knowledge it already has. Or it can pick a different action to better explore the possibilities. Reinforcement learning uses exploration to improve future decision making by making changes to the basic knowledge the system starts with. A good introduction to reinforcement learning is given in the first chapter of [Sutton98], which is available in its entirety online. Advanced readers will want to study the method of temporal differences (TD); TD(l) is of particular interest. Reinforcement learning has produced a few notable success stories; in com- mercial video games, Black & White used reinforcement learning. TD-Gammon began as a research project at IBM’s Watson Research Center using TD(l). It plays backgammon at slightly below the skill level of the top human players, and some of its plays have proven superior to the prior conventional wisdom [Tesauro95]. Reinforcement learning creeps into games in small ways as well [Dill10]. Why Don’t These Methods Get Used in Games? Many experienced game AI programmers attribute characteristics of undead vampires to neural networks and genetic algorithms; they are enticing, parti- cularly to the na¨ıve. They keep returning, at least as a topic of conversation, despite heroic efforts to lay them to rest. They cannot be trusted to do what you want them to do, and it takes arcane knowledge and skills to get them to do anything at all. At the end of the project, there is great fear that they will turn to dust should the game ever see the light of day. All kidding aside, what are the real reasons? There are three basic concerns: control over the AI, lack of past history of achieving a successful AI with these methods in games, and control over the project. Many game designers demand arbitrary control over what the game AI does. The smarter the AI gets, the more it wants to think for itself. In and of itself, that is not a problem; the problem comes when a designer wants to override the AI, usually for dramatic impact or playability reasons. As we shall see, neural networks effectively are a black box. They might as well carry the standard warning label,
Machine Learning 343 ‘‘No user-serviceable parts inside.’’ You can train them (repeatedly if necessary), you can use them, and you can throw them away. For all of the other methods that we have covered in prior chapters, there are places where the AI programmer can intervene. Places where numbers can be altered or scripts can be injected abound in those methods, but not inside these black boxes. They have to be trained to do everything expected of them, and that is often no small task. Supervised training is notorious for its need of computational resources. In addition, every tweak to the desired outputs is essentially a ‘‘do-over’’ in terms of training. The combination of these two issues is a rather non-agile development cycle. My non-game industry experience with supervised training could be summarized as, ‘‘Push the Start Learning button on Monday; come back on Friday to test the results.’’ When a game designer says, ‘‘Make it do this!’’ the AI programmer starts the entire learning process over from scratch. Worse, there is no guarantee that the system will learn the desired behavior. Nor is there any guarantee that old behaviors will not degrade when the system is taught new ones. Effectively training a learning system is a black art, and expertise comes mainly through experience. Getting the desired outputs is not always intuitive or obvious. This lack of surety makes the methods a technical risk. Just as building a learning system is an acquired skill, so is testing one. The learning systems I built were never released for general availability until after they had successfully passed a field trial in the real world. Learning systems in games need their own particular test regimes [Barnes02]. The task is not insurmoun- table, but it does create project schedule risks—the test team has to learn how to do something new. The issue of completeness also crops up. A neural network operates as an inte- grated whole. All of the inputs and desired outputs need to be present to conduct the final training. So the AI cannot be incrementally developed in parallel with everything else. Prior versions may not work in the final game. From a project- management perspective, this is usually unacceptable; it pushes a risky tech- nology out to the end of a project, when there is no time left to develop an alternative if learning does not work out. Good project management calls for mitigating risks as early as possible. Neural Networks Neural networks loosely mimic the structure of the human brain. As shown in Figure 10.3, a typical network has three layers: the input layer, a hidden layer, and
344 Chapter 10 n Topics to Pursue from Here Figure 10.3 A neural network for determining monster state. an output layer. The hidden layer helps provide a way to create associations among multiple inputs and multiple outputs. Key to exploiting neural networks is knowing that they are classifiers and that they need the right inputs. Classifiers Neural networks are classifiers. They take inputs and render an opinion on what they are. As a learning algorithm, they infer relationships instead of requiring the relationships to be explicitly programmed in. This is critical to understanding the utility of the method. When you know the relationships among the inputs, you can get exactly what you want by directly programming them. When you do not, or when they tend to defy an exact definition, it may be easier and more accurate for the network to infer, or learn the relationship. At this point, we know enough to ask the final question we should always ask, ‘‘Is this the right tool for the job?’’ If the job of our AI is to classify the situation before it, we might still be okay. However, the job of most game AI is to answer the question, ‘‘What should I do?’’ It is not always obvious that the answer to ‘‘What is the situation?’’ that we get from a classifier will directly yield the answer to ‘‘What should I do?’’ that the AI needs to provide. The neural network show in Figure 10.3 could be used to determine the best next state in the FSM from Chapter 3, ‘‘Finite State Machines (FSMs).’’ The network classifies the world by indicating the next state that is the best fit to the conditions at hand. Using a neural network for this simple example is total overkill, but
Machine Learning 345 recall our discussion of complexity in an FSM. The complexity of the transition rules can be the downfall of an FSM that has a manageable number of states. Using a neural network frees programmers from the task of programming the transition rules, provided they can teach the network instead. An advantage of using a neural net to classify is that the outputs are not fixed to binary values. In our existing monster FSM AI, when the monster is attacking or fleeing, it has no middle ground in which it is considering both. A neural network can be trained to consider two or more different alternatives. If the training data did not have a sharp cutoff between attacking and fleeing, the neural network will reflect that for those inputs that had more than one different valid output. The training data would indicate that sometimes the monster flees players when hit points are at a given level, and sometimes it continues an attack. The FSM code that disambiguates multiple possible next states can weigh the options presented by the outputs. If the highest output is always used, the behavior will be the same as before. If a weighted random selection is used, the behavior will more closely match the proportions in the training data. Using a different example, the output of a neural network trained to recognize letters might show strong outputs for Q, G, O, and C when presented with a Q to recognize. If the network did not have a strong preference, the next character might be used to assist in the decision. In English, Q is almost always followed by U or u. Having weighted outputs instead of binary outputs coming from the neural network allows the overall system greater leeway in making decisions. This is true of other methods as well. Classifiers such as recognition systems are oracles. When asked a question—any question—they give an answer. On the surface this capability seems impressive, but it is also true of the Magic 8-Ball toy. If you feed text that uses the Greek or Cyrillic alphabet to a recognizer trained on the Roman characters present in English, the recognizer will output what it thinks are English characters. It would make the same attempt when fed Hebrew or Hiragana characters or even Kanji symbols and random applications of the spray-paint tool on a bitmap. One of the more important design decisions to make with such a system is what to do when presented with inputs that have no particular correct response. Such inputs are commonly called garbage, but they include ambiguous input as well. There are two ways to approach this problem. The simplest is to demand, ‘‘Give me an answer, even if it’s wrong.’’ In this case, the system is trained with the broadest array of data available and sent into battle. The alternative is to design a
346 Chapter 10 n Topics to Pursue from Here system that has an ‘‘I don’t think so’’ output. In these systems, the training data includes properly marked garbage. Good garbage for training can be hard to find, and tuning systems that have a garbage output can be a black art. The question for the designer is to ponder the difference between an AI that is resolute, if occasionally stupid, compared with one that sometimes does not know what to do. If the designer says, ‘‘All the choices are always good, just sometimes some are better than others,’’ then the resolute system is fine. On the other hand, if there is a good default behavior available to the AI, especially when some of the outputs could be badly inappropriate, then it may be worth the effort to implement a garbage output. Whether it is programmed to recognize garbage or not, a clas- sifier should always be tested with data that has no right answer. Input Features Getting the inputs right can make or break a neural network. The inputs are collectively known as the features presented to the network. We could have pre- processed two of the inputs, hit points and max hit points, and used the ratio as a single input instead. In our particular case, a properly engineered network will infer the relationship, but this is not always a given. Consider a monster that is in combat with a powerful foe. The monster is losing hit points in large chunks. The monster will go from the attacking state to dead without ever having a chance to flee. Our current set of features is not sufficient. If we add rate of change of hit points as an input feature, our monster can act more intelligently. It can know that it can stay in combat longer against a weak foe, even if the monster is hurting. And it will know to flee early if it is being hammered by a potent foe. Getting the features right is critical if the network needs to deal with time-varying data. The rate of change in a value (the first derivative, in mathematical terms) can be very useful, as we saw in the monster-hit-points example. The rate of change of a first derivative (the second derivative of the value) can be useful as well. When the steering wheel of a car is turned, the car begins to deviate from a straight course and to rotate. The change rate of the vehicle’s heading (the first derivative) gives a sense of how fast the vehicle is getting around the corner. The second derivative, combined with other data (notably the position of the steering wheel and its first derivative), tells the driver if he or she has exceeded the available traction and is beginning to spin out. The increasing rate of heading change, easily noticed when the second derivative is positive, means that the car is
Machine Learning 347 not only cornering, but spinning as well. Stable cornering has a constant rate of change in heading and therefore a zero second derivative (the rate of change of a constant is zero). The amount of counter-steering required to catch the spin depends on how bad the spin is; the second derivative is a very useful number. Going the other way and summing data over time may also have merit. If the monster is tracking the cumulative number of points of damage it is dealing out to a foe, the monster may have a good idea of how soon it will vanquish a foe without resorting to cheating and looking at the foe’s actual hit points. The monster is guessing at the number of hit points the foe started with, but it will know the cumulative number of hit points that the foe has lost. The cumulative sum is the first integral of damage delivered. The monster may want to decide to stay in a difficult combat because it thinks it can finish off a potent foe before the foe can return the favor. A single value for damage delivered does not tell enough of the story to be useful. Time-varying data typically needs special handling to create effective features to input to a neural network. As mentioned earlier, using a neural network in place of our familiar monster FSM is total overkill. It is used as a familiar example only. There are better ways than neural networks to do everything we have illustrated so far. Genetic Algorithms Genetic algorithms are rarely if ever used for game AI. Instead they are sometimes used for tuning variables in games. Motorsports provide us with a good example [Biasillo02]. A mechanic setting up a car manages many tunable parameters to obtain optimum performance on any particular track. There are the camber, caster, and toe angles to be adjusted when doing a basic wheel alignment. There is the braking proportion between front and rear tires—a setting more easily understood if you consider the rear braking needs of an empty pickup truck compared to a fully loaded one. An empty pickup truck needs minimal rear braking force because the lightly loaded rear tires provide minimal traction. A heavily loaded truck benefits from having a great deal of rear braking force to exploit the greater traction available. This is only a partial list of tuning para- meters; as you might expect, there can be interactions between variables. Genetic algorithms can help us in our search for the optimal settings. Genetic algorithms roughly mimic biological evolution. We start with a population that has different characteristics. Some are more fit than others. Cross-breeding and mutation yield new members with different characteristics. Some of these will
348 Chapter 10 n Topics to Pursue from Here hopefully be more fit than any of their parents. The best are emphasized for further improvement, possibly along with a small number of less-fit individuals in order to keep the diversity of the population high. The best need not be restricted to children of the current generation; it could include members from older generations as well. We continue until we have a good reason to stop. Let us expand on the concepts of this paragraph in detail. In programming terms, our variables to tune are the characteristics of our individuals. Each of the angles used in wheel alignment is a characteristic. The braking proportion is another. For production vehicles, you might think that these values have already been optimized by the factory, but the factory settings may degrade lap times to decrease tire wear or increase stability. Even relatively uncompromising road-legal sports cars such as the Lotus Elise are specified for road use and need to be tuned for optimal track handling; specialized vehicles also offer room for improvement on the track. Selection requires a way to tell which individual is more fit than another. At first blush, we can pick lap time as the fitness function. Why would lap time not be the best fitness function? Most races run more than one lap. One of the compromises in street specifications for wheel alignment is a trade-off between handling and tire wear. Do racers care about tire wear? Racers care about the lowest total time to complete the race (or the farthest distance traveled in a given time period in endurance racing). There is more to it than stringing together a sequence of fast laps. Time spent changing tires might or might not have an impact. Autocross racers have trouble keeping their tires warm between runs and might not wear out a fresh set of tires in an entire day’s competition. Things are very different in NASCAR, especially at tire-hungry tracks such as Talladega. The point is that great care must be taken in selecting a fitness function for a genetic algorithm, or your system will tune for the wrong situation. In motorsports games, as in real life, vehicles are often tuned to individual tracks. Hidden in the selection criteria is the cost of the fitness function. In our motorsports case, we have to simulate a lap or possibly an entire race with each member of the new population that we want to evaluate. In our case, not only do we have to run a simulated race, but we need a good driver AI. We are going to optimize the car not only for a particular track but also for a particular driver. The cost to mix and match characteristics is typically quite modest when com- pared to the cost to evaluate them. Not only do we have to select for the right thing, but the very act of measuring for selection must have a reasonable cost.
Behavior Trees 349 Crossbreeding in a genetic algorithm involves how we select which parent pro- vides each characteristic of the child. There are a number ways to make the selection, but it is worth emphasizing that the process is selection, not averaging. The child car will not have the toe angle set to the average of the two parents; it will get the exact toe angle value from one of them. Mutation calls for changing a value in a child randomly to something other than the two values present in the parents. At first glance, it may seem that if the starting population is diverse enough, there is no need for mutation. However, the selection process will tend to suppress diversity as it emphasizes fit indivi- duals who are likely to resemble each other. Diversity is required to keep the algorithm from falsely converging early on a sub-optimal solution. If we are to keep our population from growing without bound, we need to figure out which individuals are not going to make it to the next run. An obvious method is to replace the worst individuals. Another method would be to replace the members who score closest to the new individuals. Doing selection this way also helps keep a diverse population. There are many reasons for stopping the algorithm. If the new generations have stopped improving, then perhaps the optimal population has been found, and there is no reason to continue. If the latest generation has individuals that are ‘‘good enough,’’ then it may be time to go on to other things. If each generation takes a large amount of computation, time will be a limiting factor. And finally, if the original population contains individuals thought to be optimal, a few gen- erations may be all that is needed to confirm their status. Genetic algorithms are good at getting close to an optimal solution, but they can be very slow to converge on the most optimal solution. If the problem can be expressed in terms of genes and fitness, the algorithm gives us a way to search arbitrarily complex spaces. It does not need heuristics to guide it or any expertise in the area searched. Our search speed will depend heavily on how expensive our fitness function is to run. Behavior Trees Behavior trees made it on to the industry’s radar in a big way with Damian Isla’s paper at the 2005 Game Developers Conference [Isla 2005]. The paper was about managing complexity in the AI. Behavior trees bring welcome relief to that problem area without introducing any new major problems of their own.
350 Chapter 10 n Topics to Pursue from Here (Image courtesy of Damian Isla.) Figure 10.4 A subset of the behavior tree used in Halo 2. One of the drawbacks of an FSM is that the number of possible transitions between states grows with the square of the number of states. An FSM with a large-but-manageable number of states may find itself with an unmanageable number of transitions. In Chapter 3, we brushed on hierarchical state machines as a way to control state explosion, with the idea that two-level machines often suffice. If we emphasize the hierarchy more than the state machines, we wind up with something like behavior trees. Behavior trees attempt to capture the clarity of FSMs while reducing their complexity. Behavior trees have been used effec- tively in some very popular games. Figure 10.4 shows a subset of the behavior tree for Halo 2, and Figure 10.5 shows a subset of the behavior tree used in Spore. There are a few important observations to make about the trees. The first observation is that the number of levels in the tree is arbitrary. The second
Behavior Trees 351(Image courtesy of Chris Hecker.) Figure 10.5 A subset of the behavior tree used in Spore. observation is that the tree is not a proper tree in that children can have more than one parent. The careful eye will note that Grenade appears in many places in the Halo 2 tree. REST and FIGHT appear more than once in the Spore tree. These are not different behaviors with the same name; they are the same behaviors with multiple parents in the tree. The third and less obvious observation is that there are no loops in the trees. A final observation is that these diagrams are far easier to understand than an equivalent single-level FSM diagram would be. Behavior trees attempt to keep the best parts of an FSM while managing their drawbacks. We will look at how behavior trees work by looking at the two ways we can evaluate them: top-down and bottom-up. Top-Down Evaluation So how do behavior trees work? The Spore behavior tree in Figure 10.5 is marked with asterisks to denote the current state of the AI, which is EAT_FOOD. Each time the AI is asked to think, it starts at the top of the tree and checks that the current decision is still a good one. A common arrangement for the items at any level is a prioritized list. Note that IDLE is listed last on both diagrams. All the ways of disambiguating states can come into play here. All the states have a decider that indicates if it wants to activate and possibly how strongly it wants to activate. So in the Spore case, FLEE, GUARD, and FIGHT can interrupt EAT. If any of them do want to interrupt, the machine has EAT_FOOD terminate, then EAT is asked to terminate, and then the new sequence is started. With no interruptions at the highest level, EAT continues to run. FIND_FOOD has
352 Chapter 10 n Topics to Pursue from Here priority over EAT_FOOD, so it can interrupt EAT_FOOD if it wants in the next level down. Actual implementations of behavior trees include some subtleties. A number of good ideas from other AI methods find a home in behavior trees as well. The current behavior gets a boost in priority when it is rechecked against alternatives. This helps stop the AI from equivocating between two actions. In actual use, the trees need some help to better deal with events in the world. The sound of a gunshot is an event; it is highly unlikely that the AI will ever ‘‘hear’’ a gunshot by examining the current world state. Instead, the event has to be handled imme- diately or remembered until the next time the AI thinks, and the behavior tree needs hooks to make that memory happen. An important subtlety more particular to top-down evaluation in behavior trees is that a high-level behavior cannot decide that it wants to activate unless it knows that at least one of its children will also want to activate. In real life, it is wasted effort to drive into the parking lot of a restaurant if no one in the car wants to eat there. This requirement needs to hold true all the way down the tree to the level of at least one usable behavior. The lowest-level items in the tree that actually do something are the behaviors, and all the things above them are deciders. Deciders need to be fast because they are checked often. The Spore diagram only shows deciders. Most of the nodes in the tree we have covered so far do selection; they pick a single child to run. Nodes can also do sequencing, where they run each child in turn. Bottom-Up Evaluation If there is sufficient CPU available, we can switch from the top-down evaluation described to a bottom-up evaluation. The top-down method requires the highest-level deciders to accurately predict whether any of their descendants wants to activate in the current situation. This capability is critical. Given that assurance, the tree evaluates very quickly, and top-down evaluation gives better performance. Bottom-up evaluation starts by having the lowest-level leaves of the tree, the behaviors, comment on the current situation and feed their opinions up the tree to the deciders. The deciders no longer have to predict whether one of their descendant behaviors wants to activate; instead, they are armed with sure knowledge that includes how much each descendant wants to activate. The deciders are then left with the task of prioritizing the available options. The results are fed up the tree to the highest level for final selection.
Behavior Trees 353 What benefit does bottom-up evaluation provide that is worth the performance hit? Bottom-up offers the possibility of better decision making. In particular, nuanced and subtle differences in behaviors can be more easily exhibited in a bottom-up system. The highest-level deciders in a top-down system need to be simple in order to run quickly and to keep complexities from the lowest levels from creeping up throughout the tree. The high-level deciders in top-down make broad-brush, sweeping, categorical decisions. The tree in Figure 10.5 decides between flee, guard, fight, eat, or idle at the highest level before it decides how it will implement the selected category of action. It decides what to do before it works out how to do it. With a bottom-up system, we can let the question of how we do something drive the decision whether to do it. Let us consider an example of how this might work. In a fantasy setting, an AI character is being driven by the decision tree shown in Figure 10.6. Decider nodes have italic text, and behavior nodes have bold text. Extraneous nodes have been left out for clarity. A top-down evaluation decides to evade at the highest level. The next level down, it has to decide between running away and hiding. It decides to hide in a nearby building, simplified as the Hide node in Figure 10.6. The next level down, it decides between picking the lock on the door, using a spell to unlock the door, or breaking the door down. If the character has no picks and no spells, the only option is to break the door down. Unfortunately, breaking the door down will be obvious to any pursuers. It would be better to keep running away in this case, but the decision between running and hiding has already been made at a higher level, and hiding is usually better than running. Hiding would have been the best choice if the character had picks or a spell. A bottom-up evaluation takes care of this problem. Both the Pick Lock behavior and the Cast Unlock Spell behavior return a very strong desire to activate if the Figure 10.6 A partial decision tree used by a fantasy character.
354 Chapter 10 n Topics to Pursue from Here character is properly equipped and no desire at all to activate otherwise. The Break Down Door behavior always returns a weak desire to activate. The Run Away behavior returns a medium desire to activate. The Hide node will pick the best of the three methods for opening the door and offer them to the Evade node in the next level up in the tree. The selected hiding result will be compared to running away. If the character is properly equipped, hiding will be preferred to running away. If the character lacks picks or spells, the character runs away. Looking at all the leaves this way may seem vaguely familiar. Bottom-up eva- luation in a behavior tree is analogous to a rule-based system with an explicit hierarchical disambiguation system. The argument for or against bottom-up evaluation centers on whether or not nuances are lost by making high-level decisions first. If there are no nuances to lose, top-down evaluation is simple and fast. If low-level complexities keep creeping into the high-level deciders, the system is losing the simplicity and possibly the speed that top-down systems are supposed to deliver. In such a case, the system can be simplified by pushing the detail back down to the leaves where it belongs and evaluating from the bottom up. Advantages Behavior trees provide two major advantages: They are easy to understand, and they allow for sophisticated behavior selection. The diagrams shown are easy for AI programmers to understand, and they are easy for experienced game designers to understand. This point is critical; if the vision of the designer is to make it into the AI of the game, the designer needs to understand what the AI will do. Besides being easy for a designer to understand, the diagrams are straightforward for an AI programmer to implement. The method is also easy to debug. All these advantages make for a more productive team, allowing more time to get the AI right because of the clarity provided. Where does that clarity come from? The diagram partitions the AI problem, allowing the team to look at one node (along with its parents and children) and safely ignore details that are important only to other nodes. FSMs also divide the AI problem, but behavior trees add their hierarchy as an additional way of partitioning the problem that FSMs lack. Hierarchical FSMs are a middle ground between pure FSMs and behavior trees in terms of partitioning. With FSMs, the mindset at any level is finishing the statement, ‘‘I am. . .’’ yielding a flat diagram. In contrast, a typical behavior tree works on the idea of ‘‘The big decisions count
Behavior Trees 355 the most,’’ yielding an appropriate number of levels in the diagram. In an FSM, the partitioning is done once; in a behavior tree, it is done as many times as needed. Superior partitioning localizes concerns better. Graphically, this can be seen if the nodes in a behavior tree have few links and the states in an FSM have many transitions. In addition, behavior trees do not loop. Like all divide-and-conquer methods, great pains must be taken in the divide. If the divide is not clean, then the conquer is unlikely to succeed. The divide step is also a good time to evaluate which method to use. If your tree has minimal levels and a relatively broad base, an FSM or a hierarchical FSM might be the optimal solution. If your tree has much depth or is uneven, with broad parts here and deep parts there, a behavior tree may be best. Throughout the discussion, we have tied behavior trees to diagrams. Non- graphical methods of specifying a tree or an FSM may detract from the clarity of the design, but decisions about what AI algorithm you use should be made holistically. Tool support may have a strong influence on those decisions; the Spore developers described their particular way of defining their behavior tree as ‘‘fairly self-documenting’’ while at the same time having ‘‘the side benefit of compiling extremely rapidly’’ [Hecker07]. Implicit in that statement is the idea that their design definition was used to generate code, avoiding any disconnect between the design and the code at the potential price of a design that was harder for people to deal with. The sophisticated behavior selection available does a very good job at handling AI that thinks in terms of ‘‘What should I be doing right now?’’ Whether we evaluate in the typical top-down or the more expensive bottom-up sequence, we arrive at a single thing that the AI should be doing. This is predicated on a fundamental question to ask before considering a behavior tree for your AI: ‘‘Can I make high-level decisions or categorizations at all?’’ Along with that is the question of whether doing just one thing is appropriate. The benefits to behavior trees come from the fact that they control complexity. If your tree starts with a single root and then explodes immediately into a large number of leaves, you have lost the simplifying power of the method. Put another way, you need to ask, ‘‘Can I effectively organize the AI behaviors into a hier- archy?’’ If there is no benefit to the hierarchy, perhaps another method would be better. If you cannot because you need to string a bunch of behaviors together, a planning architecture may be what you need.
356 Chapter 10 n Topics to Pursue from Here Planning ‘‘Can’t the AI just work out all of the steps it needs to pull this off by itself?’’ is the essential question that brings us to planning. We got a taste of planning in Chapter 6, ‘‘Look-Ahead: The First Step of Planning.’’ In general terms, our AI searched for things to do that would take it to some goal. The preceding sentence seems pretty innocuous, but if we analyze it in careful detail, we can examine any planning system. Note that we used the term search. Recall the astronomical numbers we com- puted in Chapters 6, ‘‘Look-Ahead: The First Step of Planning,’’ and 7, ‘‘Book of Moves.’’ Planners do searching as well, so it should come as no surprise that their computational complexity is a serious issue. Writing a planner is no small task, either, but their novel capability of sequencing steps together at run-time make them well worth examining. You may recall from Chapter 6 that systems that think about the future need a way to represent knowledge that is independent from the game world. This is especially true for planning systems. Good knowledge representation needs to be designed in from the start. The AI is not allowed to change the world merely by thinking about the world; it can only change the world by acting on the world. Planning AI emphasizes capability instead of action. The elements in a planning system are couched in terms of their results on the world instead of the actions they will perform. Planning elements declare, ‘‘This is what I make happen,’’ instead of emphasizing procedure by saying, ‘‘This is what I do.’’ They are declarative instead of procedural. The overall system is tasked with meeting goals instead of performing actions. With planning, the system is given a goal and given the command, ‘‘Make it so’’ instead of telling all the elements what they are supposed to do. Consider a visitor to a large city who is talking with friends, some old and some recently met. ‘‘I need a place to sleep,’’ the visitor says. This is declarative. ‘‘You can sleep on my sofa,’’ one friend says, which is the only thing the visitor was expecting. ‘‘You can sleep on the other half of my king-sized bed,’’ another friend offers. ‘‘I’m the night manager of a hotel; I can get you a room at cost,’’ says a third. None of the other friends speak up. All three options meet the goal, and two of them were unexpected. The visitor decides the best plan. The visitor could have said, ‘‘Can I sleep on your sofa?’’ to the one friend and left it at that. This is procedural. While planners deal with dynamic situations gracefully, other,
Planning 357 cheaper methods do as well. Planners’ strong suit is more than just picking the single best action. The biggest benefit of planning comes when the AI assembles multiple actions into a plan. This capability is a reasonable working definition for a planner and the biggest reason to justify their costs. Planning systems search the space of available actions for a path through them that changes the state of the world to match the goal state. Let us look at an example, loosely based on the old chil- dren’s folk song, ‘‘There’s a Hole in the Bucket.’’ The setting is a farm back in the days when ‘‘running water’’ meant someone running out to the well to get water. Our agent, named Henry, is tasked with fetching water. So the goal is to deliver water to the house. There are many objects available for Henry to use, including a bucket with a hole in it. Henry searches for a way to achieve the goal with the objects at hand. He will patch the bucket with straw. The straw needs to be cut to length. He will cut the straw with a knife. The knife needs to be sharpened. He will sharpen the knife on a stone. At this point, we will depart from the song, because in the song you need water in order to sharpen the knife on the stone, and as programmers we despise infinite loops. In our example, the stone will be perfectly fine for sharpening the knife. Our Henry’s plan goes like this: Sharpen the knife on the stone. Cut the straw to fit. Patch the bucket with the cut straw. Use the bucket to fetch water. Just as behavior trees scale better than pure FSMs, planning systems scale better than behavior trees when the need is for sequences of actions. In a behavior tree, the sequences are assembled by the programmer as part of the tree. With a planner, the AI dynamically performs the assembly, although not without cost. Once again, we see an almost Zen-like condition where adding sophistication makes the systems simpler. The execution systems are becoming more abstract and more data driven. This pulls the actions code into smaller, more independent chunks. What used to be code is slowly turning into something more like data. This phenomenon is not restricted to AI; it is a common progression seen in nearly all programming areas. The cost we mentioned was further alluded to when we said, ‘‘adding sophistication.’’ By analogy, consider simple programs that use numbers and strings as input data. By now, every reader of this book has written some of them. Consider the task of writing a compiler, such as the VB compiler we have been using. While at the lowest level the input data to the VB compiler is still simple strings, in reality that data is written at a much higher level. The VB compiler as a program is far
358 Chapter 10 n Topics to Pursue from Here more sophisticated than nearly all of the programs it compiles. The output of the VB compiler is far more flexible and wider ranging than the simpler programs it usually compiles. Few of the readers of this book are sufficiently qualified to write a VB compiler. ‘‘Adding sophistication’’ raises the bar for the programmers. We need not debate whether writing a planner is more or less difficult than writing a compiler, but we should again also emphasize that the more-sophisticated programs consume more-sophisticated data. VB code is more than simple strings, and the actions made available to a planner are likewise more than simple strings. We will examine three planning systems: STRIPS, GOAP, and HTN. The latter two (along with behavior trees) are very current topics among industry professionals. STRIPS STRIPS stands for the Stanford Research Institute Problem Solver and dates to 1971 [Fikes71]. The basic building blocks in STRIPS are actions and states. From these, we can develop plans to reach goals. Let us examine these states, goals, actions, and plans. Along the way, we will touch on conditions as well. States are not the states of an FSM, but the state of the virtual world. States can be partial subsets of the world state. States in this sense are a collection of attributes that have specific values. A state for nice weather might call for clear skies and warm temperatures. Goals are specified as states in this manner. A goal state is defined by a set of conditions that are met when the goal is reached. Goal states differ from other states in that the AI places importance on achieving the goal states, and it does not particularly care about other states along the way to the goal. Actions have preconditions and postconditions. An action cannot be performed unless all its preconditions are met. If an action succeeds, all the postconditions of the action will become true, whether they were previously true or not. As far as the planner is concerned, only things expressed in the preconditions or post- conditions matter. Planners are results oriented. Let us examine the sharpen knife action from Henry’s plan to mend the bucket earlier in this section. It would have two preconditions: Henry has to possess the knife, and he has to possess the stone. As a postcondition of the action, the knife will be sharp.
Planning 359 Table 10.1 Hole-in-the-Bucket States State Conditions World Kitchen has no water. Goal Bucket has a hole. Have straw. Straw is too long. Have a knife. Knife is dull. Have a stone. Kitchen has water. A plan is a sequence of actions leading to a goal. Every precondition of every action that is not forced to be true by a prior action in the plan must be true in the initial state that is used to develop the plan. If those conditions are true and the plan is successfully executed, then the world state will match the goal state at the end. Let us revisit our example of Henry and the bucket. Our states are given in Table 10.1. We have two states of interest: the state of the world and our goal state. The goal state only has one condition: showing that most states are far simpler than the world state. To transform the world state so that it also matches the goal state, we have the actions listed in Table 10.2. The column order shows how the action transforms the state of the world; start with the preconditions and perform the action to get the postconditions. Table 10.2 Hole-in-the-Bucket Actions Precondition Action Postcondition Water is delivered. Bucket is mended Use bucket Bucket is mended. Cut straw is available Mend bucket with straw Have a knife Cut straw is available. Have straw Cut straw to length Knife is sharp. Knife is sharp Sharpen knife Have a stone Have a knife
360 Chapter 10 n Topics to Pursue from Here We start with the goal. The only action that transforms the world state to the goal state is the ‘‘use bucket’’ action. The precondition for using the bucket is not met, so we seek actions that will transform the world state to one where the pre- conditions are met. The mend the bucket action has the postcondition of the bucket being mended, which meets the preconditions needed for using the bucket. This might not be the first time that the bucket needed to be mended. If the world state had included leftover straw from the last time the bucket was mended, all of the outstanding preconditions would be met, and we would get a shorter plan that used the state of the world instead of more actions. In our case, we do not have cut straw. The cut straw to length action has that as a postcondition. To make use of this action, we need to meet three preconditions. The state of the world provides us with a knife and with straw meeting two of the preconditions, but the knife is dull, giving our plan another outstanding precondition. If we cannot find a way to meet this outstanding precondition, this particular plan will fail. The sharpen knife action can meet the outstanding precondition, but it adds two preconditions of its own. The world state provides both the knife and the stone, meeting all of our outstanding preconditions. This leaves us with a workable plan; the world state, when transformed by a string of actions, meets the goal state. Our simple example illustrates chaining actions together into a plan but leaves out dealing with alternatives and failures. We might also have scissors capable of cutting the straw but lack a sharpening stone. As you might expect, an initial plan to use the knife would fail to meet all of its outstanding preconditions, and the planner would explore using the scissors instead. We will see these facets of planning in the sections on GOAP and HTN planners. The ideas behind STRIPS are reasonably easy to grasp. The complexities of programming the system are not trivial. A great deal of effort needs to be applied to the knowledge-representation problem presented by states, preconditions, and postconditions. The knowledge-representation issues have implications about the interface between the planning system and the game. STRIPS is not widely used in games. GOAP If Damian Isla’s presentation on behavior trees at the 2005 Game Developers Conference put that method on the radar of game AI programmers, then Jeff
Planning 361 Orkin’s articles and presentations did the same thing for goal-oriented action planning (GOAP) [Orkin03], [Orkin06]. For our purposes, GOAP (rhymes with soap) can roughly be considered as STRIPS adapted to games and uses the same vocabulary. We will look at how GOAP is typically applied to games. When used in games, actors controlled by GOAP systems typically have multiple goals, but only one goal active at a time. The active goal is the highest-priority goal that has a valid plan. For independent agents, this single mindedness is not a major drawback. Some games require that multiple goals be active at the same time. In this case, the planner needs to be more sophisticated. It may want to accept a higher-cost plan for one goal so that other goals can be pursued in parallel. It may need to ensure that the many plans have no resource-contention problems. It is up to the game designer to decide whether the game is better served by having a wise AI that plans around these issues or a potentially more realistic AI that correctly models the problems. Adding such a capability adds complexity to an already complex algorithm, and it adds CPU demands to an algorithm that is already demanding of uncomfortable amounts of processing power. Note These are real-world problems; a full-scale review by the U.S. military many years ago found that in one scenario, the combined demands of the worldwide commands collectively called for five times the total airlift capacity of the Military Airlift Command and deployment of the Rapid Deployment Force simultaneously to 10 different locations. In games, GOAP actions are typically simplified so that each action can report a cost value associated with it that can be used to compare that action to other actions. These costs can be used to tune the AI’s preference of one action over another. The lowest-cost action might have a cost of 1. If costs tend to be roughly comparable, the planner will prefer plans with fewer steps. It will accept Rube- Goldberg–style plans only when there are no simpler means of accomplishing the goal. The actions themselves are implemented with scripts or small FSMs that are restricted to doing just the one action. FSMs work quite well because the implementation of the action is often heavily tied to animation sequences, and those sequences are often easily controlled by FSMs. Given those simplifications to the actions, GOAP planners in games can use the now-familiar A* algorithm to search the action space for a path between the goal and the current state. With A*, the direction in which we search has an impact. To use A* at all, we need a heuristic. We will examine both of these facets in detail.
362 Chapter 10 n Topics to Pursue from Here Figure 10.7 The direction of the search matters. Note the direction of search, from the goal to the current state. This is for performance reasons. As long as A* has a good heuristic, it will find the lowest- cost path, but the performance will vary, depending on which end of the final path the search starts. The typical GOAP or STRIPS case gives the actor many possible actions that might be checked if we start by acting on the current state. Most of those actions will not take us closer to our goal, which gives us a large search space to examine and discard. If we start at the goal, we are far more likely to see only a few actions that need to be searched to take us toward the current state. As Figure 10.7 suggests, if you start at the base of a tree looking for an ant that is on the tip of one of the leaves, it would be easier for the ant to find you at the base of the tree than it will be for you to find the ant. If all the actions are simple enough that each one changes only a single post- condition, then we can create a good heuristic for A* to use when estimating. That heuristic is that the minimum cost to finish A* from any point is the number of conditions that need to change. If the minimum cost of a step is 1 and the actions can change only one condition, this heuristic holds true. Plans could cost more if the designer gave some steps higher weight than the minimum to help steer the search. Plans could also cost more if there are no direct paths. Indirect paths happen when the steps that lead to the goal add extra unmet conditions that will need to be satisfied by later actions. A real-world example might be in order.
Planning 363 A farmer who stands at the door of his house wants to go out into the night to find out what is making noise in the barn. A single action, ‘‘GOTO,’’ would take him to the barn. That action has a precondition, ‘‘has light to see by,’’ that is unmet. The lantern has the capability to change that condition, but it also has preconditions for being in hand, for fuel, and for being lit. Since the lantern has fuel, the precondition for fuel does not count. The precondition of being lit does count because the lantern in not currently lit. The matches have the capability to light things on fire, and they carry the precondition of being in hand. So the final plan will have grown to get matches, get the lantern, light a match, light the lantern, and finally go to the barn. The designer might have included a flashlight object that has a much lower cost than the lantern, making it preferable because a dropped flashlight is less likely to cause a barn fire than a dropped lantern. In this case, the flashlight has no batteries, and there are no batteries in the house, so the A* stopped progress on the path using the flashlight when it hit the high cost of driving into town to get more batteries. A* connected the dots to form a plan to meet the goal. The AI programmer did not have to do it himself or herself the way he or she would have for an FSM. The programming paradigm shifts to something like, ‘‘Toss in capabilities in the form of actions and stir them with goals to see what plans you get.’’ The simplicity of the actions is supposed to make programming the set of them simpler in total than trying to program the equivalent capability in an FSM or a behavior tree. Knowledge representation is critical to the success of the method. The actions have to be able to get all the inputs they need from the state representation and make concrete changes to it. Notice that many of our preconditions and postconditions are simple Booleans that can be represented by one bit of a bit field. Performance of the knowledge representation is critical, and bit fields are a common way to help achieve it. Goal-oriented agents coded this way resemble goal-driven people; they do not stop to smell the flowers along the way, but they do get the details right. HTN ‘‘Don’t we have a plan for that? Or something close?’’ One of the caveats that we gave about A* is that although we appreciate its performance, we know it is not free. A* connects the dots for us in GOAP. Maybe there is a way to save and reuse chunks of plans the way we might pre-compute popular paths. Each of the travelers who passed through the Cumberland Gap or crossed the Rocky
364 Chapter 10 n Topics to Pursue from Here Figure 10.8 A subset of a hierarchical task network. Mountains on the various trails did not come up with a brand-new plan before setting out (with the possible exception of the disastrous Donner party). We might like our AI to have a deep bag of tricks to use in terms of pre-computed plans while still being able to re-plan when things go decidedly wrong. At the lowest level, a hierarchical task network (HTN) resembles the other planners we have mentioned. We have actions that can be directly executed if their preconditions are met, providing us with a change to the state of the world. These are often referred to as primitive tasks. HTNs also include a task library. Any task that is not a directly executable primitive task is called an abstract task. The abstract tasks in the task library decompose into lower-level abstract tasks and finally into primitive tasks. An example might be in order. As shown in Figure 10.8, our farmer from the GOAP example has a ‘‘light source’’ abstract task in his task library. That task decomposes into the ‘‘use lantern’’ and ‘‘use flashlight’’ abstract tasks. The ‘‘use lantern’’ abstract task decomposes into a ‘‘light lantern’’ abstract task and the highly useful ‘‘get’’ abstract task. As they must, all the remaining abstract tasks similarly reduce to primitive tasks, although for simplicity, the diagram does not show the full
Planning 365 decomposition of everything. Details of the flashlight and other methods for making fire are not shown. The decompositions are hierarchical. In a rich environment, there will be many such decompositions to choose from for each task; our farmer has a flashlight and a lantern at his disposal for use as a light source, and probably more than one means of making fire. The task library is comparable to a rich script library [Kelly07] and can be computed offline [Kelly08]. There are some compelling advantages to HTN: The plans can be rapidly forward searched, and they deal gracefully with changing conditions. While they can be computed offline, designers can create plans to their liking as well. The highest-level abstract tasks tend to be a very close match to typical goals that might be presented to a STRIPS or GOAP planner. Instead of running A* from the end goal back to the current state, we can more rapidly find matching high- level abstract tasks that take us to our goal, a welcome performance improvement. Instead of hoping that the ant on the leaf finds us at the base of the tree, we pull out the closest thing we have to an ant-locator plan and follow it directly to the ant. The forward search with HTN is faster than the backward search with GOAP. Because of their forward and hierarchical nature, HTNs are also part of plan execution. Again using our farmer example, suppose the farmer starts to use the flashlight, and the batteries fail before he gets out the door. The light source task needs to re-plan, but nothing higher in the hierarchy has to change. The basic plan for getting to the barn and exploring the noises is sound. Only the part about getting some light has to change. A GOAP or STRIPS planner would have to restart the search because its plans do not natively retain the dependencies explicitly or keep alternatives that were discarded along the way. Once a GOAP or STRIPS planner discovers the flashlight in our example, it will discard using the lantern. Even if a clever implementation were to recognize that only a portion of the plan had failed, GOAP or STRIPS would need to search through all actions for alternative means of making light. The HTN planner in Figure 10.8 has the alternative light sources encoded in the abstract tasks. To make sure everything works properly, HTNs include a new feature known as critics. Critics detect conflicts early in an effort to minimize backtracking. Instead of going out to investigate noises in the barn, suppose our farmer was going out after dark to light a bonfire for roasting marshmallows. Critics would detect the fact that he might not have enough matches to both light the lantern and light the bonfire.
366 Chapter 10 n Topics to Pursue from Here HTN planners are gaining traction in real-time–strategy games [Brickman08] [Cerpa08] and other genres. The potential for better performance is always welcome in computer games. The increase in intelligence is equally welcome; not only do agents make workable plans, but they re-plan when things go wrong. Resources The hot topics among professional game AI programmers are always the subject of presentations at the Game Developers Conference (GDC), particularly at the AI summits that have been held in recent years. Along with academics and various other people outside of the industry, this same group of professional game AI programmers has authored the articles in the AI Game Programming Wisdom series of books. They also present at other conferences such as AIIDE and AAAI. Some of them even run Web sites devoted to game AI. The first volume of the AI Game Programming Wisdom series, edited by Steve Rabin, has an entire section devoted to A*. Machine learning is the subject of the articles in section 10, and neural networks are the subject of the articles in section 11 in AI Game Programming Wisdom 2, although by volume 4, the editor openly asks, ‘‘What happened to Learning?’’ [Rabin08]. The entire four-volume series was written by and for industry professionals, so some of the articles may be hard going for novice AI programmers. A full list of the articles in the series as well as from many other sources can be found at http://www.aiwisdom.com/. The noteworthy Web sites for game developers are too numerous to keep track of, so we will mention only a few: n http://www.gamedev.net. The AI section of gamedev.net is reasonably extensive. n http://aigamedev.com. The aigamedev.com site is focused entirely on game AI at a professional level, although some of the articles require paid membership to view. n http://www.gamasutra.com. The gamasutra.com site tends to collect GDC- related material; Damian Isla’s behavior tree paper is available there. n http://www.wikipedia.org. While in no way guaranteed to be academically rigorous or even factually correct, Wikipedia is probably the best place on the Internet to read about a topic for the first time, including AI-related topics. Although Wikipedia is a good place to get the first clue about a topic, all
Chapter Review 367 articles should be read whilst wondering what mistakes are in it. The references section of the articles that have them is a rich hunting ground for good information, often pointing to papers of higher quality than the Wikipedia article citing them. n Your favorite search engines. Chapter Summary This chapter covers a wide range of topics. Every game AI programmer needs to be fluent in A*. From there, we find that the more sophisticated topics have to be judged on the basis of whether they will be useful in a project. The important point is not that machine learning or planning systems are cool, which they are, but whether they are the right tool for the job. Unlike the topics in prior chapters, these were not picked for near-universal applicability or ease of understanding. By the same token, there is no VB code project for a neural network or some flavor of planner. They are topics that aspiring AI programmers who have made it this far should strive toward. The code for them would not be in keeping with the non-threatening nature of the code in this book, particularly for those novice AI programmers who come from a background other than hard-core pro- gramming, such as animators, producers, or even managers who have worked thus far to gain solid background in game AI. Chapter Review Answers are in the appendix. 1. What are the two concepts in A* that let us perform the best-first search? 2. What conditions could cause a node to move from the closed list back onto the open list in A*? 3. When is a machine-learning system easier to implement than a directly programmed system? 4. What is the major advantage of behavior trees over FSMs? 5. Why does the search run backward in a GOAP system and forward in an HTN system?
368 Chapter 10 n Topics to Pursue from Here Exercises 1. Search for A* on the Internet. Write an A* implementation that directs the fox in Fox and Hounds when there is an opening. 2. Change the graph in Figure 10.2 so that the catapult service is between node A and node W at cost 0.5. Reduce the cost of road travel to or from node X back to 4. A path from node C to node W through A now has an actual cost of 12.5, while a path through node Z has cost 16. Prove to yourself that regardless of whether the algorithm reexamines nodes on the closed list, the inadmissible heuristic will cause the algorithm to return the longer path. 3. Search for neural network implementations on the Internet. Write one in VB for our monster. When training the network, leave out training data with hit-point values between the always fight and always flee levels. Watch the network outputs in the transition area. References [Barnes02] Barnes, Jonty; Hutchens, Jason. ‘‘Testing Undefined Behavior as a Result of Learning,’’ AI Game Programming Wisdom. Charles River Media, 2002. [Biasillo02] Biasillo, Gari, ‘‘Training an AI to Race,’’ AI Game Programming Wisdom, Charles River Media, 2002. [Brickman] Brickman, Noah; Nishant, Joshi. ‘‘HTN Planning and Game State Management in Warcraft II.’’ Available online at http://users.soe.ucsc.edu/ *nishant/CS244.pdf. [Fikes71] Fikes, Richard; Nilsson, Nils. ‘‘STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving.’’ Artificial Intelligence, v2, pp. 189–208. 1971. [Hecker07] Hecker, Chris. Liner Notes for Spore/Spore Behavior Tree Docs, Web page 2007. Available online at http://chrishecker.com/My_Liner_Notes_ for_Spore/Spore_Behavior_Tree_Docs. [Isla05] Isla, Damian. ‘‘Handling Complexity in the Halo 2 AI.’’ Proceedings of the 2005 Game Developers Conference. CMP Media, 2005. Available online at http:// www.gamasutra.com/view/feature/2250/gdc_2005_proceeding_handling_.php.
References 369 [Cerpa08] Cerpa, David; Obelleiro, Julio. ‘‘An Advanced Motivation-Driven Planning Architecture.’’ AI Game Programming Wisdom 4, pp. 373–382. Charles River Media, 2008. [Dill10] Dill, Kevin. Comments regarding Axis & Allies and Kohan 2 in private correspondence, 2010. [Kelly07] Kelly, John-Paul; Botea, Adi; Koenig, Sven. ‘‘Planning with Hierarchical Task Networks in Games.’’ Proceedings of the Seventeenth International Con- ference on Automated Planning and Scheduling, American Association for Artificial Intelligence, 2007. Available online at http://www.plg.inf.uc3m.es/ icaps-pg2007/papers/Planning%20with%20Hierarchical%20Task%20Networks% 20in%20Video%20Games.pdf. [Kelly08] Kelly, John-Paul; Botea, Adi; Koenig, Sven. ‘‘Offline Planning with Hierarchical Task Networks in Video Games.’’ Proceedings of the Fourth Arti- ficial Intelligence and Interactive Digital Entertainment Conference, American Association for Artificial Intelligence, 2008. Available online at http://www.aaai .org/Papers/AIIDE/2008/AIIDE08-010.pdf. [Kirby04] Kirby, Neil. ‘‘Getting Around the Limits of Machine Learning.’’ AI Game Programming Wisdom 2, pp. 603–611. Charles River Media, 2004. [Orkin03] Orkin, Jeff. ‘‘Applying Goal-Oriented Action Planning to Games.’’ AI Game Programming Wisdom 2, pp. 217–228. Charles River Media, 2003. [Orkin06] Orkin, Jeff. ‘‘Three States and a Plan: The A.I. of F.E.A.R.’’ Proceedings of the 2006 Game Developers Conference. Available online at http:// web.media.mit.edu/*jorkin/gdc2006_orkin_jeff_fear.pdf. [Rabin08] Rabin, Steve. Preface, AI Game Programming Wisdom 4. pp. x. Course Technology, 2008. [Sutton98] Sutton, Richard; Barto, Andrew. Reinforcement Learning: An Intro- duction. The MIT Press, Cambridge, Massachusetts, 1998. Available online at http://webdocs.cs.ualberta.ca/*sutton/book/ebook/the-book.html. [Tesauro95] Tesauro, Gerald, ‘‘Temporal Difference Learning and TD- Gammon,’’ Communications of the ACM, volume 38, Association for Com- puting Machinery, 1995.
This page intentionally left blank
Appendix Answers to Chapter Review Questions Most of the review questions have straightforward answers. Others call for opinions, and opinions may vary. The answers given here are one set of opinions. Chapter 1: What Is Game AI? 1. What are the three parts to our definition of game AI? The AI has to be able to act, it has to be intelligent, and it has to deal with changing conditions. 2. Why is game physics not game AI? All choices are forced in physics, so there is no room for intelligence. Chapter 2: Simple Hard-Coded AI 1. What are the common drawbacks to hard-coded AI? Hard-coded AI lacks a formal methodology for determining which behavior to employ. Without formal organization, hard-coded AI tends to grow quickly in size and complexity. It can be very difficult to change or maintain. 2. What are the advantages to hard-coded AI? Hard-coded AI is intuitive and can be fast to write and fast to execute. 3. Complexity can be as low as the sum of the parts and as high as the product of the parts. What is the relationship between the parts when complexity is 371
372 Appendix n Answers to Chapter Review Questions the sum? What is it when complexity is the product? When complexity is the sum of the number of parts, the parts are independent. When it is the product, all the parts are interrelated. 4. What is the design of the data called when the data is information the AI uses to help it think about (or even imagine about) the world? It is called knowledge representation. 5. Critique the expediencies in the code that interrogates the world in the four-set-point thermostat. Comment on the dangers versus the additional complexity needed to mitigate the risks. In general, the wrapper code knows the internal implementation of the world. It does not ask the world for values; it goes in and finds those values. The world code and the wrapper code are two separate files that must be kept synchronized. If the wrapper asked the world via a function in the world code, the world code could change however it liked as long as it kept the function valid. The function would be in the world-code file. It is easier to keep one file consistent than it is to keep two files synchronized. Internally in the wrapper, the parallel arrays are very handy, but they must be kept synchronized. In addition, the settings are in time-sorted order, and the search for the right setting silently depends on this fact. If the settings were allowed to be changed, the code that stores the new settings would have to sort them. Expediencies should be kept localized; within a routine is fine, within a file is tolerable, and across files is questionable. 6. Why is the side effect in the code that gets the set-point temperature in the four-set-point thermostat important? The side effect of showing the tem- perature used helps let us see what the AI is thinking. Chapter 3: Finite State Machines (FSMs) 1. Define a finite state machine and tell what each part does. A finite state machine is composed of states and transitions. The transitions are used to change from one state to another. The states are used to define the different things the machine will do. 2. What are the advantages of a finite state machine compared to hard-coded AI? The formal organization combats complexity by making the states
Chapter 4: Rule-Based Systems 373 independent. The transitions remain, but they are organized into a regular system. The code is less fragile. 3. What are some indicators that a finite state machine is inappropriate to use? An FSM has fundamental problems when it needs to be in more than one state at a time. It has problems if the states are not inherently discrete. If the complexity rises past a certain level, other methods may be easier to implement. 4. What do we mean by ambiguous transitions? When more than one transition is valid, they are ambiguous. 5. What do we call it when ambiguous transitions exist? What are three ways of dealing with them? When there are ambiguous transitions, we have a race condition. We can ignore them (which is free), we can fully specify the transitions (this is usually a very bad idea), or most likely we will prioritize the transitions. Chapter 4: Rule-Based Systems 1. What are the two parts of a rule in a rule-based system? The rules have a matching part and an execution part. 2. What does the framework do in a rule-based system? The framework presents the current conditions to the rule base, selects a rule (or rules) to execute, and executes it. If more than one rule is to execute, the framework ensures that there are no conflicts between them. 3. Why is it that a rule-based system can play like both a human and a machine at the same time? The rule-based system plays like a human when it exploits human-sourced behaviors. It plays like a machine in that it never gets tired of picking the same best way to respond to a given situation. 4. What makes a rule-based AI appear intelligent? What makes it appear stupid? A rule-based system appears intelligent when it comes up with a good or possibly even great response for a situation. It appears stupid when the rules are too sparse and the best response it has is not appropriate for the situation. It also has issues when it lacks good defaults or the ability to recognize when to use them.
374 Appendix n Answers to Chapter Review Questions Chapter 5: Random and Probabilistic Systems 1. What are three ways to get odds for a game? You can get odds for a game by precomputing them, by Monte Carlo methods, or by faking them and tuning to suit. 2. What are the drawbacks to these methods? The first potential drawback to these methods is that they demand good numbers and possibly a large number of them. Furthermore, tuning those numbers is an acquired skill. Chapter 6: Look-Ahead: The First Step of Planning 1. What does an evaluation function do? How is it similar to or different from a goal? An evaluation function attempts to comment on how good an indeterminate situation is. The function should correspond in some way to how close the situation is to a goal, but it does not require that the goal be in sight. 2. What is a heuristic? How do heuristics help? Heuristics are general guidelines. They can help guide the search toward fruitful paths and away from poor ones. They can help evaluate indeterminate situations. 3. What is pruning and how does it help? Pruning is the act of discarding paths deemed to have low potential for success. It narrows the search space, allowing a deeper look down higher-potential paths. 4. What is the most common drawback to look-ahead? There is rarely enough processor time to search every promising path. Chapter 7: Book of Moves 1. Describe how moves in a book of moves and heuristics are similar and how they are different. Both can guide the AI, but moves tend to be concrete actions posed in game terms, and heuristics can be more abstract and generalized. Moves are about how the game is played, and heuristics tend to be about how the game is evaluated. 2. How is a book of moves similar to a rule-based AI? How would you decide which label to use on a particular system? The book of moves is a spe- cialized form of a rule-based AI. A system that uses rules for basic AI is
Chapter 10: Topics to Pursue from Here 375 probably a rule-based system, particularly if it includes general rules and defaults. An AI that is assisted by a specialized rule base might be said to be using a book of moves. Chapter 8: Emergent Behavior 1. List the elements and characteristics of a system that allows and en- courages emergent behaviors. Systems that feature independent agents who employ simple behaviors and interact tend to exhibit emergent behaviors. 2. Describe the effects of feedback and the effects of feedback rates. Feedback is what enables continued interaction. If the rate is too slow, the interaction will fade. If the interaction is too fast, the system may not allow any significant variations to develop. Chapter 9: Evoking Emotions on the Cheap 1. Many aspects of a game have an emotional payload. What additional attribute is required to make these aspects part of the overall AI? They have to be changeable in order for the AI to have anything to do. 2. Describe the critical difference between games and simulations with regard to what they are trying to do with emotions. Simulations attempt to model emotions accurately as a first priority. Games attempt to evoke emotional responses in the players as a first priority. 3. Some of the techniques in this chapter are subtle. How can the game make sure the player catches on? The most straightforward way is to tell them in some more direct manner. 4. List some general categories of places where some AI control adds to the ability of the game to deliver emotional content. The AI can help control music, ambient settings (mood), plot, and even the camera itself. Chapter 10: Topics to Pursue from Here 1. What are the two concepts in AÃ that let us perform the best-first search? AÃ tracks the cost so far and estimates the cost remaining to the goal. It is useful to have an admissible heuristic, but tracking and estimating cost allows the algorithm to be best-first.
376 Appendix n Answers to Chapter Review Questions 2. What conditions could cause a node to move from the closed list back onto the open list in A*? This can happen when the heuristic is inadmissible and there are branching paths that can rejoin. This never happens if the algorithm is written with the usual optimization to ignore all nodes on the closed list. 3. When is a machine-learning system easier to implement than a directly programmed system? Machine learning is superior when it is easier to teach the machine than it is to directly program the machine. 4. What is the major advantage of behavior trees over FSMs? They control complexity better, allowing faster iteration cycles in development. 5. Why does the search run backward in a GOAP system and forward in an HTN system? In GOAP the number of actions that are close to the goal is typically smaller than the number of actions that are close to the agent. AÃ is faster when it has fewer branches to explore. HTN task libraries are designed to present goal states to the agents, making a forward search faster.
INDEX A advantages machine learning, 339 behavior trees, 354–355 training, 341 A* emergent behavior, 255 AmbientUpDown control, 26 implementation, 329–339 emotions, 301 ambiguities, FSMs (finite state lists, 334–338 look-aheads, 162–163 machines), 47 moves, 232 analysis absolute pixels, 257 probabilistic systems, 130 FSMs (finite state machines), abstract tasks, 364 random systems, 130 access, read-only, 262 rule-based systems, 78 44–45 adding probabilistic systems, 130 agents random systems, 130 AI, 197–214 feedback, 247 rule-based systems, 77–78 character class radio interactions, 244–245, 253 Thermostat project, 30–32 angles, cameras, 295. See also buttons, 10 AI cameras code, 26 adding, 197–214 animation comments, 87 Day in the Life project, Cars and Trucks, 264–280 complexity, 57 141–143 loop updates, 250 controls, 7 hybrid, 221 timers, 257 feedback, 254 internal helper routines, AnimationTimer, 265 flags, 98–99 200–203 answers to review questions, frameworks, 107–110 public interfaces, 199–200 371–376 random and probabilistic applicability, look-aheads, 163 IncrementMineCount systems, 125–126 applications, Windows rule-based system operating system, 4 routines, 86 implementations, 99–121 applying moves in interactions, 254 user-interface connections, Minesweeper, 231–232 modes, 37 198–199 archetypes, 299 modules, 26 Arimaa, 222 moves, 176 aigamedev.com, 366 arrays public interfaces, 58 AI Game Programming initialization, 32, 176 Public keywords, 175 ModeValues, 37 routines, 107 Wisdom, 366 variables, denoting, 32 safety features to AI Related region, 237 assigning values to variables, 88 AI.V5.vb file, 170 assumptions, Monte Carlo Minesweeper, 98–99 algorithms methods, 127 squares to playing fields, 82–84 user-interface elements, 270 A*, 329–339 vertical scrollbars, 100 complexity of code, 21 wrapper functions, 28 genetic, 342–343, 347–349 Add New Item dialog box, 26, 59 ad-hoc code organization, 21 377
378 Index Attack state, 61 roads and vehicles, 258–264 Day in the Life project, automatic garbage collection, chaos, emergent behavior, 135–136 116 246–247 with depth-limit heuristics, characters 160 B class radio buttons, adding, 10 FSMs (finite state machines), BackColor property, 11, 26 termination, 28 49–50 back-face culling, 293 checks for errors, 65 backslash (\\), 261 Chess moves, 221–222 with line heuristics, 159–160 Balance of Power, 297 child Twixt, 224–228 balancing realism, 253 classes, 60 without heuristics, 157–159 behavior values, 349 computation classes faking it, 128 appropriateness of, 20 Board, 190–191 odds, 126–128 emergent. See emergent Button, 80, 194–197 precomputing, 127–128 child, 60 concurrent states, 50 behavior Cruise, 319 conditions simple, 245–246 GameState, 181–188 race, 47 steering, 253–255 Job, 138 reacting to changing, 30 trees, 349–351 parent, 60 configuration Person, 316–325 emotions, 283–328 advantages, 354–355 PlayingField, 88 FaHButton.vb tab, 194 planning, 356–358 RuleOne, 104 game-board code, 188–189 Behavioral Mathematics for RuleTwoNear, 118 graphical squares, 179–180 Game AI, 49 Square, 80 interfaces, 174–175 Black & White, 340 Vehicle, 259, 269 vehicles, 262 Black & White II, 341 classifiers, 344–346 connecting user interfaces, blank tiles, 84 Click event, 87, 96, 98 198–199 blocks, adding comments, 87 cloning, implementation of, 193 constants Board class, 190–191 clothing, 288–289 game-state implementation, boards CLR (Common Language code, 188–189 Runtime), 66 180–181 Twixt, 222–224 code, 19–22. See also hard- UNREACHABLE, 168 user interfaces, 174–175 coded AI constraints, real-time, 253 Boolean variables, 95 adding, 26 constructs, nested indentation, BorderStyle property, 26 Day in the Life project, 88 bottom-up evaluation, 352–354 contents variable, 85 branching factors, 158 143–146 Continue For directive, 115 brute-force look-aheads, 224 game boards, 188–189 controls, 4 Button class, 80 NewGame, 85 adding, 7 drag and drop, 194–197 organization, 21 AmbientUpDown, 26 Button control, 80, 265 Collection object, 60 Button, 80 buttons, 4. See also radio collision detection, 251 emergent behavior, 247 buttons CollisonDetect() function, 274 Level, 13 ByRef keyword, 104 ColorMe routine, 173 NumericUpDown, 12, 26 comments, adding, 87 placement of, 24 C Common Language Runtime placing, 8 (CLR), 66 ThoughtsTextBox, 59 cameras, 23, 292–296 completion, Day in the Life Crime occupation, Day in the Cars and Trucks, 255–258 project, 143–146 Life project, 133 complexity, 21–22, 30 cross-breeding genetic debugging, 279 adding, 57 algorithms, 349 feedback in, 250–253 Cruise class, 319 movement and animation, 264–280
Index 379 culling, back-face, 293 disadvantages needs and relationship Cyrillic alphabet, 345 emergent behavior, 255 models, 307–325 emotions, 301–302 D faster feedback, 248 plot, 291–292 look-aheads, 163 projects, 302–328 dampening, 247 moves, 232 skill sets, 297 databases, 311–316 probabilistic systems, 130–131 states, 297–301 Day in the Life project, 131–148 random systems, 130–131 texturing, 290–291 rule-based systems, 78–79 tools, 285 completion of code, 143–146 enabling complexity, 135–136 disambiguation, 47 scrollbars, 266 debugging, 145 transitions, 72 user interfaces, 189–197 implementation, 136–140 End Class statement, 67, 72 occupations, 132–134 discrete moves, 161–162 End Function statement, 67 results, 147–148 disgust. See also emotions ending Minesweeper, 95 simulated people, 134–135 Done property, 76 End Region line, 58 simulations, 132 Doom, 73 entry functions, 49 Day Job occupation, Day in the drag and drop, Button class, entry methods, 54 Life project, 133 Error List window, 67 debugging, 16 194–197 errors Cars and Trucks, 279 Draw() function, 264 checks, 65 collision detection, 251 Dump button, 321 lists, 33 Day in the Life project, 145 evaluation FSMs (finite state machines), E bottom-up, 352–354 functions, 152–153 69 easy fun, 287 statements, 195 edges of graphs, 331 Fox and Hounds, 167–174 decisions elements, Board class, 190–191 heuristics, 154–157 loops, 248 emergent behavior, 241–281 top-down, 351–352 Monte Carlo methods, 127 events, 4 Deep Blue Chess computer, advantages, 255 Click, 87, 96, 98 221 Cars and Trucks, 255–280 handlers, 34 Delegate keyword, 61 disadvantages, 255 Load, 12 depth-limit heuristics, feedback Monte Carlo methods, 127 complexity with, 160 MouseDown, 98 design. See also configuration; and control, 247 MouseUp, 98 formatting in Cars and Trucks, 250–253 evoking emotions, 286–287 FSMs (finite state machines), fast, 248–249 exit functions, 49 slow, 249–250 exit methods, 54 44–45 interaction, 244–245 explosions knowledge representation order and chaos, 246–247 state, 50 overview of, 243 transitions, 51–52 (KR), 25 reinforcement, 247–248 probabilistic systems, 130 simple behaviors, 245–246 F random systems, 130 timing, 248 rule-based systems, 77–78 emotions, 283–328 faces, 285. See also emotions rules, 101–106 advantages, 301 factors, branching, 158 simple behaviors, 245 cameras, 292–296 FaHButton.vb tab, views, 270 clothing, 288–289 detection, collisions, 251 disadvantages, 301–302 formatting, 194 dialog boxes evoking, 286–287 failure modes, FSMs (finite state Add New Item, 26, 59 finite state machines (FSMs) New Project, 5 for, 303–307 machines), 50–52 directives, Continue For, 115 lighting, 289–290 faking it, 128 disabling scrollbars, 266 mini-databases, 311–316 fast feedback, 248–249 music, 287–288
380 Index fear, 286. See also emotions frameworks Minesweeper, 79–121, 228–232 feedback adding, 107–110 state rule design, 101–106 adding, 254 implementation, 180–188 in Cars and Trucks, 250–253 FSMs (finite state machines), 43, support for user interfaces, emergent behavior, 247 298, 344–346 fast, 248–249 analysis, 44–45 191–194 slow, 249–250 behavior trees, 350 Tic-Tac-Toe, 152–155 fields, playing, 79–80 complexity, 49–50 Twixt, 222–228 fiero, 286. See also emotions debugging, 69 GameState class, 181–188 FIGHT, 351 design, 44–45 garbage collection, automatic, files for emotions, 303–307 116 AI.V5.vb, 170 failure modes, 50–52 genetic algorithms, 342–343, PlayingField.vb, 237 MonsterAI project, 55–73 347–349 Financier occupation, Day in multiple-transition review, GetType method, 64 the Life project, 134 47–49 GetUpperBound function, 37 finite state machines. See FSMs objects, 53–55 Git, 6 first-order approximation, overview of, 43–44 goal-oriented action planning 244 projects, 52–73 (GOAP), 361–363 fitness functions, 348 single-transition review, 45–47 graphics flags, adding, 98–99 squares, 179–180 Flee state, 49, 61 fun, 287 loops, 267 flight testing, 248 functions graphs, 331 floating-point math, 264 transportation, 332, 336 floating-point numbers, 90 CollisonDetect( ), 274 Greek alphabet, 345 formatting Draw(), 264 guesses, productive, 119 emotions, 283–328 entry, 49 FaHButton.vb tab, 194 evaluation, 152–153 H game-board code, 188–189 graphical squares, 179–180 Fox and Hounds, 167–174 Halo2, 350 interfaces, 174–175 heuristics, 154–157 handlers, events, 34 rules, 101–106 exit, 49 Handles keyword, 13 squares, transforming into GetUpperBound, 37 hard-coded AI, overview of, helper, 275 mines, 84–90 NearNeighbor, 112 19–22 vehicles, 262 NearNeighbors, 90 hard fun, 287 forms, 4 NeedsSome(), 318 Hebrew characters, 345 creating, 7 New(), 71, 259, 264 helpers forward slash (/), 261 update, 50 Fox and Hounds, 155 emotions, 310–311 adding AI, 197–214 G functions, 275 complexity without heuristics, routines, 200–203 gamasutra.com, 366 squares, 109 157–159 Game Developers Conference, heuristics, 154–157 discrete moves, 161–162 A*, 331 evaluation functions, 167–174 339, 349 complexity without, 157–159 game state, 166–167 gamedev.net, 366 depth-limit, complexity with, look-aheads, 203–214 games. See also specific games moves, 164–165, 203–214, 160 AI, definition of, 1–3 drawbacks to, 160–161 233–234 boards line, 157 neighbors, 164–165 lines, 159–160 projects, 163–214 code, 188–189 sequences, 159 frame rates, 264, 267 Twixt, 222–224 Hiding state, 49, 61, 69 user interfaces, 174–175 hierarchical task network Fox and Hounds, 155 (HTN), 364–366 frameworks, adding, 107–110 interfaces, enabling, 189–197
Index 381 Hiragana characters, 345 MonsterAI project, 57 loops hit-point calculator project, numbers on, 94 animation updates, 250 public decision, 248 5–17 feedback, 249–250 HScrollBar control, 265 adding, 58 graphics, 267 hybrid AI, 221 AI, 199–200 set-back thermostat, 34 loss, 152 I support, 191–194 Lotto occupation, Day in the internal helper routines, 200–203 IBM, 342 Isla, Damian, 349, 360 Life project, 133 If statements, 88 implementation J M A*, 329–339 Job class, 138 machine learning, 339–341 AI rule-based systems, 99–121 machine objects, 54 cloning, 193 K Mage radio button, 15 Day in the Life project, MainForm, 140 Kanji symbols, 345 136–140 keywords completion of code, 143 FSMs (finite state machines), management, refactoring, ByRef, 104 44, 49–50 Delegate, 61 22 game state, 180–188 Handles, 13 Manhattan distance, 333 Minesweeper, 79–99 Me, 69 Matches routine, 104 moves and neighbors, 175–179 MustInherit, 60 Me keyword, 69 Thermostat project, 32–39 MustOverride, 60 methods IncrementMineCount routines, New, 60 adding, 86 Private, 53 entry, 54 indentation, nested constructs, Public, 53, 58, 175 exit, 54 88 knowledge representation (KR), GetType, 64 individual goal-directed boids, 25, 162 Monte Carlo, 126–127 246 New, 194 initialization L arrays, 32, 176 PaintTheTrim(somecolor), Minesweeper, 92 labels, 257 playing field, 86 Level control, 13 53 transitions, 62 lighting, 289–290 of training, 341 inputs line heuristics, 157 transition-check, 54 agents, 247 lines update, 54 neural networks, 346–347 mines instability, 248 heuristics, complexity with, placement of, 109 Intellisense, 13 159–160 squares, transforming into, interaction, emergent behavior, 244–245 reforming, 161 84–90 interactions LineShape control, 270 Mines project, 80 adding, 254 lists Minesweeper agents, 253 interactivity, quality, 2 A*, 334–338 AI implementation, 99–121 interfaces, 11 errors, 33 ending, 95 AI connections, 198–199 Load events, 12, 188, 269 flags, adding, 98–99 Board class, 190–191 loading routines, rules for, 107 implementation, 79–99 Day in the Life project, 139 look-aheads initialization, 92 emotions, 309–310 advantages of, 162–163 moves, 228–232, 234–239 enabling, 189–197 applicability, 163 game boards, 174–175 brute-force, 224 applying, 231–232 disadvantages of, 163 basic numbers, 228 Fox and Hounds, 203–214 corner first move, 229 edge first move, 229 middle first move, 228–229 one square away from an edge move, 229–230
382 Index Minesweeper (continued) optimization, 219–221 floating-point, 90 one square diagonally away overview of, 217–219 on interfaces, 94 from a corner move, projects, 232 random generators, 89 230–231 safe, 110, 120–121 NumericUpDown control, 12, 26 Twixt, 222–228 playing, 92–98 multiple-transition review, O safety features, adding, 98–99 FSMs (finite state machines), Minesweeper project, 79–121 47–49 object-oriented programming mini-databases, emotions, music, emotions, 287–288 (OOP), 52–53 311–316 MustInherit keyword, 60 minimum feedback speed, 252 MustOverride keyword, 60 objects models mutation, 349 Collection, 60 emotional states, 297–301 FSMs (finite state machines), personalities, 299 N 53–55 single-transition review, 45–47 machine, 54 modes Name property, 7, 265 monster, 54 adding, 37 naming objects, 7 naming, 7 failures, FSMs (finite state nashes/kvell, 286. See also state, 54 transitions, 54 machines), 50–52 emotions Type, 64 ModeValues array, 37 navigation modification of states, 85 occupations, Day in the Life modules, adding, 26 A*, 331 project, 132–134 MonsterAI project, FSMs (finite Visual Basic, 3–17 NearNeighbor function, 112 odds state machines), 55–73 NearNeighbors function, 90 computation, 126–128 monster objects, 54 need for rules, 119–121 faking it, 128 Monte Carlo methods, 126–127 needs models, 307–325 precomputing, 127–128 mood, 288. See also emotions NeedsSome() function, 318 using, factors to consider, MouseDown events, 98 neighbors, 90–92 129–130 MouseUp event, 98 Fox and Hounds, 164–165 movement, Cars and Trucks, implementation, 175–179 optimization two-square evaluation, moves, 219–221 264–280 nodes, 337 moves 110–117 nested construct indentation, 88 oracles, 345 advantages, 232 networks order and chaos, emergent Chess, 221–222 disadvantages, 232 hierarchical task network behavior, 246–247 discrete, 161–162 (HTN), 364–366 Orkin, Jeff, 360–361 Fox and Hounds, 164–165, neural, 342–343, 343–347 P 203–214, 233–234 New() function, 71, 259, 264 hybrid AI, 221 NewGame code, 85 PaintTheTrim(somecolor) implementation, 175–179 NewGame routine, 93 method, 53 Minesweeper, 228–232, New keyword, 60 New method, 194 parent classes, 60 234–239 New Project dialog box, 5 paths, A*, 329–339 applying, 231–232 NextState string variable, 62 people basic numbers, 228 nodes, 341 corner first move, 229 Day in the Life project, middle first move, 228–229 optimization, 337 134–135 one square away from an No Lookahead region, 197 non-adjacent squares, two- fun, 287 edge move, 229–230 People button, 320 one square diagonally away square evaluation, 117–119 performance, 267 numbers from a corner move, A*, 331 230–231 faking it, 128 personalities, models, 299 Person class, 316–325 perspective, cameras, 294 physics, 251
Index 383 pixels, absolute, 257 implementation, 32–39 Public Methods, 191 placement state of the art, 39–40 PublicStuff, 269 properties reinforcement of emergent of controls, 8, 24 BackColor, 11, 26 behavior, 247–248 of mines, 109 BorderStyle, 26 relationships planning, 356–358 Done, 76 classifiers, 344 goal-oriented action planning Name, 7 models, 307–325 ScrollBars, 100 representation, knowledge (KR), (GOAP), 361–363 Properties window, 6 25, 162 hierarchical task networks pruning, 153–154 resources, 366–367 heuristics, 154–157 REST, 351 (HTNs), 365 public arrays, 32 results, Day in the Life project, players public interfaces 147–148 adding, 58 Revealed variable, 95 interfaces, enabling, 189–197 AI, 199–200 rich script libraries, 365 views, 294. See also views Public keyword, 53, 58, 175 roads, Cars and Trucks, PlayingField class, 88 Public Methods region, 191 258–264 playing fields, 79–80 PublicStuff region, 269 Rock Band occupation, Day in initialization, 86 Pythagoras theorem, 333 the Life project, 133–134 squares, adding, 82–84 routines PlayingField.vb file, 237 Q AI, 200–203 playing Minesweeper, ColorMe, 173 92–98 Quake, 2 plot, 291–292 quality, interactivity, 2 IncrementMineCount, precomputing, 127–128 questions, answers to review, paths, 330. See also A* adding, 86 predictions, Monte Carlo 371–376 Matches, 104 methods, 126–127 NewGame, 93 primitive tasks, 364 R rules, loading, 107 Private keyword, 53 rule-based systems, 75 probabilistic systems Rabin, Steve, 366 advantages, 78 analysis, 130 race conditions, 47 AI implementation, 99–121 design, 130 radio buttons, 37 analysis, 77–78 probabilistic systems, 125 design, 77–78 advantages, 130 character class, adding, 10 disadvantages, 78–79 disadvantages, 130–131 Mage, 15 implementation, 79–99 productive guesses, 119 random events, Monte Carlo Minesweeper project, 79–121 programming, 52–53. See also methods, 127 overview of, 75–77 code random number generators, 89 RuleOne class, 104 projects, 22–40 random systems, 125 rules Day in the Life, 131–148 advantages, 130 design, 101–106 emotions, 302–328 analysis, 130 need for, 119–121 Fox and Hounds, 163–214 design, 130 routines, loading, 107 FSMs (finite state machines), disadvantages, 130–131 for single-square evaluation, rates, frames, 264, 267 52–73 read-only access, 262 103–106 hit point calculator, 5–17 realism, balancing, 253 testing, 106 mines, 80 real-time constraints, 253 for two-square evaluation, Minesweeper, 79–121 refactoring, 22 MonsterAI, FSMs (finite state reforming lines, 161 108–110 regions, 58 RuleTwoNear class, 118 machines), 55–73 AI Related, 237 running moves, 232 No Lookahead, 197 running, 16 Minesweeper, 92–98 Thermostat, 23–40 projects, 16 analysis, 30–32
384 Index S Stanford Research Institute flight, 248 Problem Solver (STRIPS), rules, 106 safe moves, 110, 120–121 358–360 text boxes, 4 safety features, adding to texturing, 290–291 starting debugging, 16 Thermostat project, 23–40 Minesweeper, 98–99 statements analysis, 30–32 Schadenfreude, 286. See also implementation, 32–39 debugging, 195 state of the art, 39–40 emotions End Class, 67, 72 thinking speed, 251 scrollbars End Function, 67 ThoughtsTextBox control, 59 If, 88 Tic-Tac-Toe, 152–155 disabling, 266 states discrete moves, 161–162 enabling, 266 of the art Thermostat project, tie, 152 vertical, adding, 100 tiles, blank, 84 ScrollBars property, 100 39–40 Timer control, 265 searching Attack, 61 timers, 257, 267 A*, 329–339 child classes, 61 timing, emergent behavior, 248 direction of, 362 classifiers, 344 Toolbox window, 258 selection criteria, 348, 349 concurrent, 50 tools, emotions, 285 sequences, heuristics, 159 emotions, 297–301 top-down evaluation, 351–352 serious fun, 287 explosion, 50 training, 341–342 set-back thermostat interface, 34 finite state machines (FSMs), transition-check method, 54 shortcuts, 88 transitions simple behaviors, 245–246 298 child classes, 61 simplicity, 20 Flee, 61 disambiguation, 72 Sims, The, 298 FSMs (finite state machines). explosions, 51–52 simulated people, Day in the FSMs (finite state machines), Life project, 134–135 See FMS simulations games, 166–167 43 Day in the Life project, 132 multiple-transition review, Monte Carlo methods, implementation, 180–188 support for user interfaces, 47–49 126–127 single-transition review, single-square evaluation, rules 191–194 Hiding, 61, 69 45–47 for, 103–106 modification, 85 initialization, 62 single-transition review, FSMs objects, 54 objects, 54 statistics, 103 transportation graphs, 332, 336 (finite state machines), steering behaviors, 253–255 trees, behavior, 349–351 45–47 stopping debugging, 16 advantages, 354–355 skill sets, emotions, 297 storage, Day in the Life project, planning, 356–358 sliders, emotions, 299 137 Twixt, 222–228 slow feedback, 249–250 Street occupation, Day in the two-square evaluation Solution Explorer, 6 Life project, 133 neighbors, 110–117 sophistication, 20 Stunt Show occupation, Day in non-adjacent squares, 117–119 Source Safe, 6 the Life project, 133 rules for, 108–110 speed, 276 subversion, 6 Type objects, 64 speed, thinking, 251 support Spore, 350 for user interfaces, 191–194 U Square class, 80 Visual Basic, 3–4 squares, 80–82 surprise, 286. See also emotions underscore (_) continuation graphical, 179–180 character, 89 mines, transforming into, T unknown value, 152 84–90 temporal differences (TD), 342 UNREACHABLE constant, 168 non-adjacent, two-square termination characters, 28 testing evaluation, 117–119 playing field, adding, 82–84 Cars and Trucks, 264
Index 385 updates NextState string, 62 navigating, 3–17 animation loops, 250 Revealed, 95 Toolbox window, 258 functions, 50 values, assigning, 88 methods, 54 Vehicle class, 259, 269 W vehicles, Cars and Trucks, user interfaces 258–264 walls, camera angles, 295 AI connections, 198–199 versions, 6 Watson Research Center, Board class, 190–191 vertical scrollbars, adding, 100 emotions, 309–310 victory, 152 342 enabling, 189–197 viewing Web sites, 366 game boards, 174–175 projects, 6 Wikimedia Commons, 153 support, 191–194 safe moves, 110 wikipedia.com, 366 views windows V cameras, 294. See also Error List, 67 values cameras Properties, 6 child, 349 design, 270 Toolbox, 258 variables, assigning, 88 Visual Basic Windows operating system automatic garbage applications, 4 variables Wing Commander, 287 arrays, denoting, 32 collection, 116 wonder, 286. See also emotions Boolean, 95 hit point calculator project, wrapper functions, adding, contents, 85 28 genetic algorithms, 348 5–17 writing hard-coded AI, 19–22
Search
Read the Text Version
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
- 213
- 214
- 215
- 216
- 217
- 218
- 219
- 220
- 221
- 222
- 223
- 224
- 225
- 226
- 227
- 228
- 229
- 230
- 231
- 232
- 233
- 234
- 235
- 236
- 237
- 238
- 239
- 240
- 241
- 242
- 243
- 244
- 245
- 246
- 247
- 248
- 249
- 250
- 251
- 252
- 253
- 254
- 255
- 256
- 257
- 258
- 259
- 260
- 261
- 262
- 263
- 264
- 265
- 266
- 267
- 268
- 269
- 270
- 271
- 272
- 273
- 274
- 275
- 276
- 277
- 278
- 279
- 280
- 281
- 282
- 283
- 284
- 285
- 286
- 287
- 288
- 289
- 290
- 291
- 292
- 293
- 294
- 295
- 296
- 297
- 298
- 299
- 300
- 301
- 302
- 303
- 304
- 305
- 306
- 307
- 308
- 309
- 310
- 311
- 312
- 313
- 314
- 315
- 316
- 317
- 318
- 319
- 320
- 321
- 322
- 323
- 324
- 325
- 326
- 327
- 328
- 329
- 330
- 331
- 332
- 333
- 334
- 335
- 336
- 337
- 338
- 339
- 340
- 341
- 342
- 343
- 344
- 345
- 346
- 347
- 348
- 349
- 350
- 351
- 352
- 353
- 354
- 355
- 356
- 357
- 358
- 359
- 360
- 361
- 362
- 363
- 364
- 365
- 366
- 367
- 368
- 369
- 370
- 371
- 372
- 373
- 374
- 375
- 376
- 377
- 378
- 379
- 380
- 381
- 382
- 383
- 384
- 385
- 386
- 387
- 388
- 389
- 390
- 391
- 392
- 393
- 394
- 395
- 396
- 397
- 398
- 399
- 400
- 401