Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore Game Audio Programming: Principles and Practices

Game Audio Programming: Principles and Practices

Published by Willington Island, 2021-08-15 04:09:51

Description: Welcome to Game Audio Programming: Principles and Practices! This book is the first of its kind: an entire book dedicated to the art of game audio programming. With over fifteen chapters written by some of the top game audio programmers and sound designers in the industry, this book contains more knowledge and wisdom about game audio programming than any other volume in history.

One of the goals of this book is to raise the general level of game audio programming expertise, so it is written in a manner that is accessible to beginners, while still providing valuable content for more advanced game audio programmers. Each chapter contains techniques that the authors have used in shipping games, with plenty of code examples and diagrams.

GAME LOOP

Search

Read the Text Version

222   ◾    Game Audio Programming 2 However, by subdividing the rectangle, the algorithm is no longer an analytical solution and the process introduces a numerical component. The complexity to accuracy tradeoff is also quite high which leads us to the final example in this chapter. 12.15 AVERAGE DIRECTION AND SPREAD USING A UNIFORM GRID OR SET OF POINTS The last example, although yielding interesting results, is fairly constric- tive in its uses. However, it was what brought me to the next algorithm which is possibly the most useful of all the ideas presented so far. I have worked on more than one game where regions of the game world are marked up by a 2D grid broadly labeling spaces. For audio, entering this region would kick off an ambient loop. It would be much nicer if this was a 3D sound which transitioned to 2D when the listener entered the space and eliminate any time-based component or fade-ins (Figure 12.12). FIGURE 12.12  Average weighted direction and spread of grid cells where each cell is treated as a single point.

Approximate Position of Ambient Sounds   ◾    223 Notice from the analytical solution of a rectangle that, as a rectangle is subdivided more, the area, the delta x, and the delta y become less s­ ignificant in the integration. If the rectangles become sufficiently small compared to the circle represented by the attenuation range, they can be approximated by discrete points. This does something nice to the equa- tion: integration becomes summation over a set of points. Here is one possible implementation: template<class PointContainer> Vector TotalAttenuatedDirectionPoints( const Sphere& sphere, const PointContainer& points, const double grid_cell_size, // 0 is valid double& out_spread) {  static_assert( std::is_same<Vector, PointContainer::value_type>::value, \"PointContainer must iterate over Vectors.\"); // Assumes point is the center of a grid cell. const double minimum_distance = grid_cell_size / 2.0; double total_weight = 0.0; Vector total_dir = { 0.0, 0.0, 0.0 }; for (const Vector& point : points) { const Vector direction = point - sphere.center; if (grid_cell_size > 0.0 && fabs(direction.x) < minimum_distance && fabs(direction.y) < minimum_distance && fabs(direction.z) < minimum_distance) { // Inside the a grid cell and should treat the sound as 2D. out_spread = 1.0; return{ 1.0, 0.0, 0.0 }; } const double distance = Length(direction); if (distance < sphere.radius) { const double weight = sphere.radius - distance; total_dir += (weight / distance) * direction; total_weight += weight; } } out_spread = 1 - Length(total_dir) / total_weight; return total_dir; }  In this implementation, the details of the order in which points are iter- ated is purposely left out. How this code is integrated in game heavily

224   ◾    Game Audio Programming 2 depends on the context. For example, if the geometry is a set of distinct objects in the world, such as a large number of torches bordering a room, the container might be a vector or array. On the other hand, if the points being iterated is a search space around a listener, as would happen with 2D grid in the world, the container might iterate more efficiently starting at the listener and fill out until a maximum number of points have been found. Another example of a custom container is one that only processes as many points for some given limited amount of time or spreads the pro- cessing over multiple frames variable on how much other audio activity is happening. 12.16 CONCLUSION The last example is a starting point for a huge amount of flexibility and creativity. Even the geometry mentioned early in this chapter (line seg- ments, rectangular volumes, spheres) can be discretized or voxelized and solved as the average of a set of 3D points. Obstruction and occlusion that would have been difficult to rationalize about can be handled by testing each point or a subset of them. The focus of this chapter was on ambient sounds, but as hinted at ear- lier, this approach can also be applied to effects, weapons, or spells which cover an area. Examples for line segments might be a beam weapon or a spell where the caster leaves a trail of fire with clear start and end points. Examples for a grid could be an explosion or skill that creates a large num- ber of randomly placed events. Another extension I have used is that time can be an additional parameter in the weighting function. Using time, points can fade out of the equation, for example, damage over time effects such as a p­ oison pool or fire which burns out.

13C H A P T E R Techniques for Improving Data Drivability of Gameplay Audio Code Jon Mitchell Blackbird Interactive, Vancouver, British Columbia, Canada CONTENTS 13.1 Problems with Embedding Audio Behavior in Gameplay Code226 13.2 M odel-View-ViewModel226 13.3 C reating AudioViewModels from Data227 13.4 Remapping and Manipulating Game Values227 13.5 Blending Game Values228 13.6 RTPC Blend Tree228 13.7 Heavy230 13.8 Control Sources230 13.9 C ontrol Emitters231 13.10 Control Receivers231 13.11 Uses of Control Sources232 13.11.1 V olumetric Control Emitters for Reverb Zones232 13.11.2 M ixing232 13.11.3 C ontrolling a 2D Layered Background Ambience232 13.12 Speeding Up Control Sources233 13.12.1 Spatial Hashing233 13.12.2 R educing the Update Rate234 225

226   ◾    Game Audio Programming 2 13.13 Conclusions234 Reference 234 13.1 PROBLEMS WITH EMBEDDING AUDIO BEHAVIOR IN GAMEPLAY CODE On many games I’ve worked on, even larger scale ones making use of well- developed middleware, the audio-related behaviors have become gradually and unpleasantly intertwined with the gameplay code. This is a fairly natural consequence of the game audio behaviors starting off very s­ imple—usually no more than “play sound x when y happens”—and evolving from there. There is nothing wrong with this, of course. It probably covers a large percentage of all the game sounds since game audio began, and even in more complex games is still a sizable percentage, especially in simpler and smaller scale games. However, it doesn’t scale well as the complexity of the audio behav- iors we require grows, and has at least three major problems. First, it’s fragile: When coders restructure the gameplay code there’s a good chance the audio hooks and associated logic will be made obsolete, refactored out of existence, or, in the best-case scenario, made more complicated to refactor. Second, it’s often repetitive: Whether the audio behavior is for a crowd system, a model of vehicle audio, or weapons firing, there are core aspects that are often the same for each. Third, embedding audio behavior in gameplay code requires a programmer. It must be done either by the gameplay coder or by an audio programmer. The less programmer time required to get the desired audio behavior, the happier the designers will be, and (usually) the happier the programmers are. In an ideal world, the audio designers would have tools that empower them to create their own audio control logic in a way that scales well as the game grows, is validatable and solid, and is powerful enough to support some fairly complex behaviors. 13.2 MODEL-VIEW-VIEWMODEL Dealing with the first problem, fragility, is really just a matter of applying good software engineering practices. In general, gameplay code is hard to keep tidy. It often has to pull together and drive AI, rendering, physics, and UI code, in addition to audio, and the more these various aspects of the game code can be cleanly separated the more robust the code will be. Gameplay audio code is actually quite similar in many ways to the code to drive UI elements. Both are representations of game entity state, need

Techniques for Improving Data Drivability   ◾    227 FIGURE 13.1  Model-View-ViewModel. to monitor game entity state (while usually leaving it unaffected), respond rapidly to changes, and often require transitional behaviors and lifetimes that don’t correspond exactly to those of the underlying game objects. The Model-View-ViewModel pattern used by Microsoft for their WPF tools is their approach to decoupling application logic from UI behavior logic, and those concepts map very neatly to the challenges we have as audio behavior logic becomes more complex. In a game audio context, the Model is solely responsible for provid- ing methods to query the current state of properties of our game objects, whereas the View solely manages submission of state to the presentation layer of the audio. The ViewModel binds the two together, providing a way to translate game events and changing gameplay values into the control parameters understood by the game audio engine. Figure 13.1 illustrates the Model-View-ViewModel pattern in the c­ ontext of an audio engine. 13.3 CREATING AUDIOVIEWMODELS FROM DATA At a minimum, the AudioViewModel for a specific type of game object can be broken out in a separate class or structure, to be controlled by the gameplay code. While this still requires dedicated programmer time, it reduces the coupling to the gameplay code. When common patterns in the AudioViewModel code are identified, they can be extracted and reused. Better still, they can be exposed to the designers, enabling them to create their own view model behaviors. 13.4 REMAPPING AND MANIPULATING GAME VALUES Often, we want to use a game parameter value (velocity, time in a certain state, health, etc.) to drive one of our audio parameters. In the simplest case, all our ViewModel needs to do is read from the Model (usually a property on the game entity) and pass that value to the AudioView. More often than

228   ◾    Game Audio Programming 2 not, though, it’s more convenient for the designer to work with a modified version. Some manipulation, such as scaling and clamping, can be done in the RTPC response curves in most audio middleware authoring tools, but it’s often cleaner to modify them before they’re sent to the audio engine. 13.5 BLENDING GAME VALUES Much like remapping, you can perform basic combinations of game parameters by adding multiple RTPCs to your objects and using your audio engine to add them together—but there are lots of other useful blend operations that aren’t often supported, and trying to do it in the playback tool will often just complicate the designers’ lives. 13.6 RTPC BLEND TREE One very flexible method I’ve used for allowing designers to manipulate RTPC values is a blend tree. Each leaf node in the tree provides an RTPC value, and the parent nodes apply a blend operation and pass the results to their parent nodes, until the final value is reached at the root, and can be sent to the desired parameter on the AudioView. Figure 13.2 shows an evaluator that takes the larger of two game param- eters, then scales the result by 2.5. Conceptually, this is very similar to a simple programming language’s expression tree. While it’s probably not a good idea to perform any heavy- weight calculations this way for the sake of both your game’s performance and your designers’ sanity, it does work well for performing a lot of com- monly needed operations. Here are the core of the node types we use in the project I’m working on at the moment: • Constant: Provides a constant floating-point value. • Blend: Combines the values of one or more child nodes, using the blend types add, min, max, multiply, clamp, and average. • Select: Selects between two child nodes, depending on if a given condition is true or false. Our game engine has the concept of a Requirement, which is a predicate base class that returns true or false when evaluated on a given game entity. We have conditions which take other conditions such as OrRequirement and AndRequirement, so in combination with the Blend node, the fairly simple tree struc- ture does give us a lot of flexibility.

Techniques for Improving Data Drivability   ◾    229 FIGURE 13.2  An evaluator that takes the larger of two game parameters, then scales the result. • GameParameter: Retrieves an int or float parameter by name from the game entity the audio view is attached to. • GameParameterMap: Using a dictionary-like key/value collec- tion, this is used to retrieve non-numeric game entity data such as strings or enum values before mapping them to a floating-point parameter. • GameParameterGraph: Much like GameParameter, this retrieves a floating-point value by name from the game entity but, before returning it to the blend tree, applies a piecewise linear function— much like the RTPC curves in Wwise.

230   ◾    Game Audio Programming 2 • ActiveTagCount: Each of the audio objects and active audio events in the game can be tagged with a set of 32-bit hashed IDs, used for grouping sounds into collections of sounds that share certain prop- erties. This node returns the number of active objects or events con- taining the passed-in tag. This might seem a little abstract but has turned out to be useful for all sorts of things. For example, in my current project, this node is used to drive a nearby_combat_ intensity parameter by counting all the events within a given radius that contain the combat tag. Any sound event triggered by one of our WeaponView objects will have this applied. • ControlSourceReceiver: Similar to ActiveTagCount, this node is a way for audio objects’ parameters to be driven not by their own internal state, but by the state of objects around them. These read values from ControlSourceEmitters, which can be used to route parameters from one object to another, or to provider values for object-to-object parameters such as distance between objects. I’ll go into more detail about Control Sources in Sections 13.8–13.10. 13.7 HEAVY While I’ve found it’s best to roll your own parameter manipulation code, Enzien Audio’s Heavy1 tool can be very useful as a quick-and-dirty method of allowing designers to do their own manipulation of game parameters. It takes Pure Data (PD) patches and compiles them into plugins for most popular audio engines. It was quick to use to create plugins to blend multiple parameters by taking the min, max, average, and so on, as I later imple- mented in our game engine as part of our parameter blend tree. Figure 13.3 shows an example PD patch which can be compiled with Heavy, which takes two input parameters, and outputs the maximum to a third parameter. 13.8 CONTROL SOURCES The core idea behind control sources is allowing audio entities to broadcast and/or receive control parameter values from other audio entities, giving the designers the power to build interesting interactions between them. A little like patch cords on modular synths—if the programmers concentrate on building boxes that do interesting and useful things, then hopefully the designers can connect them together in ways the programmers hadn’t origi- nally even dreamed of. Although, admittedly, there’s an equally good chance that they’ll use them in ways that may give the programmers nightmares.

Techniques for Improving Data Drivability   ◾    231 FIGURE 13.3  A PD patch which takes two input parameters and outputs the maximum to a third parameter. 13.9 CONTROL EMITTERS Much like an audio emitter, a control source has a position and an area of effect. But instead of being a source of sound, they’re a source of some- thing that controls the sound, like an RTPC. An audio emitter is only audible when within range of a listener object, and a control source only does anything when queried by a Control Receiver. 13.10 CONTROL RECEIVERS Like an audio listener, a control receiver’s job is to determine which con- trol emitter objects are within range and apply those parameters. Not all emitters are of interest to all receivers, so each emitter and receiver has a channel ID they broadcast/receive on. Unlike a listener, there are a variety of ways to do this beyond just sum- ming. We have the following types in our system: • Sum: Adds the values of any emitters with range and broadcasting on the receiver’s channel. • Average: The same as Sum, but the total value is divided by the number of active emitters. • Nearest: Only the value of the nearest in-range emitter is received.

232   ◾    Game Audio Programming 2 • Select: As with the Select node in the parameter blend tree, this uses our requirement system to filter out what would otherwise be candi- date control emitters. 13.11 USES OF CONTROL SOURCES 13.11.1 Volumetric Control Emitters for Reverb Zones One of the first uses of control emitters in our game was as a basic imple- mentation of reverb zones. Since each emitter is essentially just a sphere controlling a game parameter, the distance from the center of the sphere can be used to control the send level of a reverb effect. Overlapping con- trol emitters with carefully tuned radii and falloff parameters let us approximate some of the curvier and more organic areas where we needed reverb, such as a riverbed. It’s worth noting that this was more of a case of the designers doing something because it was the only way they had at the time, rather than a way I would particularly recommend, but one of the nice things about giving your designers flexible tools is that even what you might consider misuse of a system can point to areas to improve or serve as requirements for entirely new subsystems. 13.11.2 Mixing The first version of control sources in the game was initially written to per- form a sort of dynamic mixing. Our game is RTS-like, and like any RTS, one of the big challenges for audio is dealing with closely packed units. If we have 50 vehicle units in an area, and they are all playing a loud engine sound, the game will start to sound loud and unpleasant pretty quickly. My first impulse was to simply limit the number of vehicle voices and put a lim- iter on the buss, but this meant that large groups of units would steal from small groups or single units, when often they were just as important to hear. Similarly, just using the limiter meant that single units were drowned out, as large groups activated the limiter. What we needed was some way of reduc- ing the volume of the engines, depending on how many other loud sounds were nearby. With this approach, dense groups of units would all reduce each other’s volume substantially, but sparse groups and single units would be less affected. Figure 13.4 shows an example of this in action. 13.11.3 Controlling a 2D Layered Background Ambience Initially, background ambience in our game was handled by simply plac- ing a large number of positional 3D emitters and allowing the voices to be

Techniques for Improving Data Drivability   ◾    233 FIGURE 13.4  Control Surface mixing in action. blended and culled by our audio middleware. While this sounded great, the large number of audio objects and simultaneously active streams was a performance issue, as well as cluttering up our audio profiler logs. To keep things to a constant number of streams and active audio objects, the audio emitters were replaced with control source emitters, with one con- trol source channel for each ambience layer. The N ambience layers were instead triggered as a 2D sound on a single audio object, and the final volume of each layer controlled by a control receiver placed at the listener’s position, calculating the sum of the active control emitters within range of the receiver. I’ve implemented similar systems on previous games to control things such as the volume of crowd audio layers, in response to the number of nearby pedestrians, but those have usually been bespoke code—in this case, the designers were able to implement a similar system without coder intervention (occasional advice and discussion aside). 13.12 SPEEDING UP CONTROL SOURCES My first naive implementation of control sources worked well but per- formed a distance check for every receiver against every emitter, every frame. Our game can have thousands of active objects, so running O(N2) tests like this meant endangering our frame budget pretty badly. 13.12.1 Spatial Hashing Using spatial hashing improved the speed of the implementation dra- matically. Rather than performing a distance check against every existing

234   ◾    Game Audio Programming 2 emitter, I was able to loop through the grid cells known to be within the radius of each receiver and build a collection of candidate emitters from their contents. This worked especially well when the majority of the emit- ters were offscreen, although didn’t help too much when multiple dense clusters of objects were within range. One problem I’m still looking for the solution for is finding a good way to trivially accept when an emitter should be active, as well as trivially reject. 13.12.2 Reducing the Update Rate Another big improvement was reducing the rate at which the control_receiver values were calculated. A quick A/B with our designers confirmed that we couldn’t tell the difference between calcu- lating the control values every frame (~30 fps) or once every 8 frames (~4 fps), especially with parameter rate limiting and interpolation enabled in Wwise. Having a calculation spike every 8 frames isn’t substantially better than having it continually, so I added code to split the receivers into N buckets and updated only one bucket per frame. Amortizing update costs across multiple frames is a simple but effective technique I’ve used in many audio subsystems, so much so that it’s often worth thinking about your likely required update rate when you start implementing one and finding a consistent way to decouple from the graphical frame rate. 13.13 CONCLUSIONS As well as giving the designers more power, making an effort to keep our gameplay audio code modular and extensible, with as few underly- ing assumptions about the type of game we’re making as possible is pay- ing dividends in making it reusable across multiple projects—both in terms of code reuse, and providing a somewhat consistent cross-p­ roject interface for audio designers, who at many companies will work on more than one  project, or have periods of helping out when one project is p­ articularly busy. While much of this chapter describes work we’ve done for our custom game engine, a similar system should be easily applicable in most Entity Component System style engines, including commercially available ones such as Unity and Unreal. REFERENCE 1. Downloadable from https://enzienaudio.com/.

14C H A P T E R Data-Driven Sound Limitation System Akihiro Minami and Yuichi Nishimatsu Square Enix, Tokyo, Japan CONTENTS 14.1 Introduction 235 14.2 S ystem Design 237 14.2.1 D ata Driven237 14.2.2 F lexibility237 14.2.3 S implicity237 14.2.4 E xtensibility238 14.2.5 M acro238 14.3 I mplementation 238 14.4 Case Study 240 14.4.1 Case 1240 14.4.2 C ase 2240 14.4.3 Case 3241 14.4.4 Case 4241 14.4.5 C ase 5242 14.4.6 M ore Complex Usages242 14.5 GUI Design243 14.6 C onclusion 243 14.1 INTRODUCTION Looking back into the history of video game audio, the number of sounds that could be played was very limited due to hardware constraints. When Dragon Quest and Final Fantasy first appeared, their hardware platform 235

236   ◾    Game Audio Programming 2 was strictly restricted to five channels of audio output. But today, after 30 years of advancement in technology, we do not need to worry about those hardware audio channel restrictions. The only factor that lim- its the number of sounds being played simultaneously is CPU, and even considering the fact that we must share that with other sections, we have abundant power to play sounds beyond human cognition. But even if we went f­urther and could play an arbitrarily large number of sounds—that is, playing sounds in every possible occasion—should we do it in video games? The idea of sound limitation begins from this question. There are all sorts of hooks for sound effects in a game context: char- acter motion, collision, VFX, HUD, landscape, etc. Although firing off the sound effects at all those occasions may be correct from a physical ­perspective, it may not always correctly reflect the sound designers’ aes- thetics. Even without taking psychoacoustics into account, sound effects for the player-controlled protagonist are much more important compared to those for a random NPC just passing by. This differentiation can be seen in graphics as well: non-player characters are intentionally authored and rendered to be much more inconspicuous compared to the protagonists. And of course, that is not the only situation where there is a mismatch between the game world and the real world. Consider a situation where an enemy is about to attack the player-controlled character. A sound effect can be an effective warning of the oncoming attack, especially if the enemy is out of sight. When the attack would be fatal for the player char- acter, that sound effect could be the most important aspect of the game at the moment. These decisions of choosing what to play, what not to play, and what to emphasize are an important part of video game audio design. We can see that our answer to the aforementioned question “should we play sounds in every possible occasion?” is “clearly not.” With that said, now we need to figure out how we are going to limit the sounds that are playing. One basic approach is to have a priority for each and every sound. That is, when the number of sounds exceeds a threshold, the one with the lowest priority will be stopped. Although this will work to a certain extent, it is not always something that will satisfy the sound designers’ aesthetics. Some of the things they will say are the following: - I don’t want two of the same sound effect to be played simultaneously - When there are too many sounds playing, I want the one with mul- tiple instances to be stopped before others

Data-Driven Sound Limitation System   ◾    237 - I don’t want sound effect A to play when sound effect B is playing - I want to duck other sounds down while this sound effect is playing A basic priority-based limitation system is not able to handle these requests. But building and debugging a new system every time a sound designer comes up with a new limitation rule can become burdensome. In order to solve this problem in a generic fashion, we built a system called the Sound Limitation Macro System or simply “Macro.” 14.2 SYSTEM DESIGN In designing the Macro system, there were four key concepts: data driven, flexibility, simplicity, and extensibility. 14.2.1 Data Driven The task of sound limitation design must be in the sound designers’ hands. Our system must be data driven in order to accomplish this goal: the data that the sound designers create is the main variable controlling the sound limitation. Also, this will decrease the cost of repetition because the sound designers only need to replace the data instead of waiting for the program- mers to fix the code. Note that even if the code modification was just a couple of lines, the sound designers may have to wait for hours to compile if this were done in code. This way of working is extremely powerful when adjusting audio scenes, where it can take copious amounts of trial and error to get it sounding right. 14.2.2 Flexibility There are a variety of situations where sound limitation will come into play, and thus it must be extremely flexible to accommodate any condi- tions that may happen in the game scene. One sound designer may wish to limit sound based on characters, and another may wish to do it using sound category, such as music, UI, or footsteps. Even going further, a very special sound limitation could become necessary in certain occasions such as “do not play this sound if this and that are playing” or even “stop all sounds belonging to a certain category when this sound is played.” 14.2.3 Simplicity Given the above two requirements, we should never create a system that is incomprehensible for the sound designers, who are not programmers

238   ◾    Game Audio Programming 2 and therefore do not have strong backgrounds of algorithm design. If the system is too complicated, sound designers may have to go ask the programmers every time to review if they are doing the right thing. That interaction would be a failure of a system designed for sound designers to use. We also believe it is important to keep the system itself simple so that users new to Macro can quickly pick up on it. But being simple does not constrain the sound designers to simple use- cases, it also enables the users to easily construct a complex Macro once they achieve mastery. 14.2.4 Extensibility Lastly, we need to always be mindful of the future. In video game develop- ment, it is not unusual to extend the system out of unpredictable necessity. Therefore, we must avoid as much as possible the case of those unseen implementations being impossible due to system limitations. 14.2.5 Macro To meet the above requirements, we decided to implement a command- based system. The series of commands is packaged with the sound effect data to be triggered upon start, stop, or even every frame of its life. It is flexible due to the variety of commands that are available. It is a simple structure where a series of simple commands are executed one by one from top to bottom. New features can be added easily by implementing a new command. In designing, or scripting, the sound limitation, the macro instruction of commands is what the system revolves around, which led us to the name “Macro.” 14.3 IMPLEMENTATION Now that the design of the system is set, let’s go to the actual code ­implementation. There are two types of commands in our Macro system: Filter commands and Execute commands. Filter commands are in charge of extracting the desired sounds from the mass of sounds that are currently in play at the moment. Once the search is done by the Filter commands, Execute commands will execute the sound limi- tation given certain conditions. Below are some of the commonly used Filter and Execute commands.

Data-Driven Sound Limitation System   ◾    239 Filter Commands: Filter: Clear Filter: Same Sound Filter: Same Category Filter: Same Macro Filter: Sound ID [parameter: Sound ID] Filter: Category [parameter: Category Number] Filter: Priority [parameter: Priority] Filter: Priority Difference [parameter: Difference] Filter: Panning [parameter: Panning] Filter: Panning Difference [parameter: Difference] Execute Commands: Exec: Cancel Play Exec: Stop Oldest Exec: Stop Furthest Exec: Set Own Sound Volume [parameter: Volume, Fade Time] Exec: Set Sound Volume [parameter: Sound ID, Volume, Fade Time] Exec: Set Category Volume [parameter: Category, Volume, Fade Time] Note that some of the commands take parameter arguments. For an example, Exec: Set Own Sound Volume command takes Volume and Fade Time as its arguments. Using these parameters, sound designers are able to make adjustments to the volume of the sound that triggered the Macro. Also, Execute commands all have a common, extremely important parameter: count. This is the count of the sounds left in the list after the Filter com- mands. The variable enables the Execute commands to be executed only when the number of sounds matches a certain condition. Taking Exec: Set Own Sound Volume command once again, it will be used as below. Exec: Set Own Sound Volume if (count >= 2) [Volume = 0.5, Fade time = 1.0 sec] Another useful type of command is the Difference Filter com- mands. Other Filter commands take an immediate value and find sounds with matching values, but Difference Filter commands are more complex. Let’s look at Filter: Priority Difference as an example. This command takes Difference as its parameter and per- forms a subtraction between the priority of sound that triggered the Macro and the priority of every other sound in play. With this command, sound designers are able to not just eliminate sounds with lower priority, but keep the sounds with close priority and kill all others with lower priority.

240   ◾    Game Audio Programming 2 Taking the command design pattern, we believe the class structure is nothing difficult. Each Filter command and Execute command are command classes, and a static manager class that receives the Macro (a series of commands) from the sound at an arbitrary timing will execute the commands in order. There is absolutely no need to go back to what has already been done, so unlike the advanced implementations of command pattern, only the execute function is implemented. That is all there is to it: simplicity is the ultimate sophistication. 14.4 CASE STUDY Now, let’s look at few examples of the Macros that have been actually w­ ritten by our sound designers. 14.4.1 Case 1 Macro on play: Filter: Same Sound Exec: Stop Oldest if (count >= 1) Let us start with a simple but frequently used Macro. This one is set as “on play Macro,” meaning that it is triggered when the sound is about to be played. To walk through the commands, Filter: Same Sound will search for all the instances of the sound that triggered the Macro. Then Exec: Stop Oldest comes into action. Because the count parameter is set to 1, the Execute command will not do anything if the sound effect instance that triggered the Macro is the only one alive. However, when another instance of the same sound is about to be created, this Macro will stop the older sound instance. Although the count may vary depending on the sound effect, this is used to prevent the same sound from accumulating to the point where the mix gets too crowded. This can be applied to all different combinations of Filter commands and count parameters to be the most useful Macro when it comes to limiting the number of sounds to be played at once. 14.4.2 Case 2 Macro on play: Filter: Same Sound Exec: Cancel Play if (count >= 1) This one seems to be very similar to the first one, but its outcome and usage are very different. The change we made from the first example is

Data-Driven Sound Limitation System   ◾    241 the second command: Exec: Stop Oldest has turned into Exec: Cancel Play. So, instead of stopping the older sound, we now will cancel the new sound instance to be played. This is commonly used in character voices, because it is extremely unnatural for a character to sud- denly stop in middle of a sentence and start a new one. 14.4.3 Case 3 Macro on play: Filter: Same Sound Filter: Elapsed Time [Elapsed < 0.5 sec] Exec: Cancel Play if (count >= 1) Again, working from the previous case, this one adds another Filter command before the Execute command. Filter: Elapsed Time takes a time parameter as its argument and filters out the sounds that had been played for more than the given time (one half second in this exam- ple). So, translated into English, this Macro states “do not play this sound if the same sound has been played within the last half second.” This is the type of Macro used to avoid repetition in a short amount of time. This macro is also sometimes used to avoid bugs due to the game behavior. For example, when footsteps are triggered from collision, a com- plex form of collision may result in awkward audio bugs such as triggering multiple footstep sounds in a single frame or several frames in row. Near the end of development phase, programmers and level designers are often reluctant to fix these bugs if their impact is confined to audio, because the changes may cause yet another bug while time is running out. With the power of Macros, sound designers can solve these problems on their own. 14.4.4 Case 4 Macro on play: Filter: Same Sound Filter: Panning Difference [abs(Difference) < 15 degrees] Exec: Set Volume if (count >= 1) [Volume = 0.5, Fade time = 0.1 sec] Let’s try something different now. In this example, the Filter: Same Sound command comes with a Filter: Panning Difference command. As we saw earlier, Difference Filter commands will compute the difference between the sound that triggered the Macro and the sounds in play. This one works for panning, seeing if the direction of

242   ◾    Game Audio Programming 2 sound is within the range of ±15°. So, with this combination of Filter commands, the instances of the same sound effect that exists in the same general direction are collected. Then, Exec: Set Volume command will change the volume of the sounds that were captured by the Filter commands; in this case, the volume is halved. This type of Macro is sometimes used in a scene where the player is surrounded by a crowd of enemy characters of the same type. The sound designers do not want to play every single sound of the enemies at full level, but at the same time, they want to be tricky and want to keep the s­ urroundedness. Looking only at the numbers does not help maintain spatialization, so they had to find the difference between the panning vectors. This way, older sounds coming from the same d­ irection will be ducked down while the fresh ones are clear on their attack. 14.4.5 Case 5 Macro on play: Exec: Set Category Volume if (count >= 0) [Category = BGM, Volume = 0.3, Fade time = 1.0 sec] Macro on stop: Exec: Set Category Volume if (count >= 0) [Category = BGM, Volume = 1.0, Fade time = 1.0 sec] Macros can be worked in combination as well. All sounds that are started must stop at some point, so making a pair of Macros on play and on stop can be useful in making changes back and forth. In this case, the Macro works as a ducking process to turn down the background music volume. This is commonly used for music to avoid a dissonant mixture of two musical chords. 14.4.6 More Complex Usages The examples above only contain one Execute command per Macro, but of course, multiple Execute commands are allowed in a single Macro. Not only that, the user may use another Filter command after the first Execute command to narrow down the target sounds of the sec- ond Execute command. If the sound designer wishes to put two totally different processes in a single Macro, that is possible as well. Filter:

Data-Driven Sound Limitation System   ◾    243 Clear command will clear the current result of the Filter commands and gets the user ready to start a whole new process. 14.5 GUI DESIGN We have implemented the Macro system, but we still have some work left to do: creating a user-friendly tool to write the Macro. Again, the users who write the Macro are not programmers, and we definitely should not tell the sound designers to write the script like above on text editors. There is absolutely no need for typing every single command word for word to find themselves stuck in syntax errors. What we implemented is a s­ preadsheet-like GUI where the first column contains the command name, and proceeding columns contain the numerical parameters. Command names are inputted from the list that appears upon clicking the add com- mand button, and the rest are simple numerical text input. By using a list of commands, we can avoid a variety of human errors such as misspelling, misremembering, and making up commands that do not even exist. This user interface has a huge advantage in the rest of the system, as we can save the data in a structured format, and do not have to write a text analysis algorithm. We can replace the command to an enumerated command ID at tool time and simply save and load the numbers. 14.6 CONCLUSION Using a data-based script-like sound limitation system, sound designers are able to control the psychoacoustics of the game and emphasize on what is important in the audio scene. With the power of the system, sound designers are much more capable of supporting the gameplay, enliven- ing the narrative, and even alleviating (or heightening) the players’ stress. There is great merit for programmers as well. The computation resource freed by limiting the number of sounds can be used for other audio algo- rithms such as audio propagation, signal processing, and procedural audio synthesis to enhance the audio quality. We believe the Sound Limitation Macro System is one of the most versatile solutions to “level up” our games.



15C H A P T E R Realtime Audio Mixing Tomas Neumann Blizzard Entertainment, Irvine, California CONTENTS 15.1 Introduction 245 15.2 Clarifying Terminology 246 15.2.1 Mixing246 15.2.2 Offline Mixing247 15.2.3 Realtime Mixing247 15.3 T he Purpose of Realtime Mixing 247 15.4 Realtime Mixing Techniques 248 15.4.1 Playback Limit248 15.4.2 HDR Audio248 15.4.3 Importance249 15.4.4 P layer Perspective 250 15.4.5 B usses and DSP Chain251 15.4.6 M ix States252 15.5 M onitoring252 15.6 Other Mix-Related Audio Features 253 15.7 I t Is Mixing Time 254 15.8 Future Techniques 256 15.9 C onclusion 257 References 257 15.1 INTRODUCTION In-game mixing is a very broad topic and nothing new conceptually. The first primitive forms of dynamic volume change or sound swapping 245

246   ◾    Game Audio Programming 2 in video games history happened, I am sure, several decades ago. Since then, the industry understood that creating great sounding audio assets is only one part of the game’s audio. The other part is to make them sound great together. Video games continue to sound better and better over the past several years, and that is to a large extent because the quality of the in-game mixing is continually improving. There also have been some e­ xcellent articles written about this topic, as well as presentations at con- ferences [1–3]. New technologies were invented and integrated into game engines and audio middleware. You can find links to some of them at the end of this chapter. You might think this should be a settled case by now. So, given the amount of content already out there about this subject, why should we address it in a dedicated chapter in an audio programming book? Even after many years and fantastic advances in great games, real-time mixing is still an open field with new technical approaches yet to be dis- covered by all of us. That’s why I think it is valuable to write this chapter specifically for a programming audience, and hopefully bring you up to speed in an informative and entertaining way on the following pages. There are many tools at our disposal already, and so we need to establish first what we mean, when we talk about real-time mixing. I will clarify the terminology of all the different ways, how a mix can be manipulated effi- ciently. You will understand how and why it evolved, and where it might be heading to, so your contribution will advance this field even further. What you will not find in this chapter are very detailed and explicit implementa- tion examples for each method we discuss. Each of those deserves their own chapter in a future book. I hope you will gain some insight into what to keep in mind when you implement your feature set for real-time mixing. 15.2 CLARIFYING TERMINOLOGY We need to take a quick moment to agree on basic terms and interpreta- tions. This will make it easier to follow this chapter, but also when you are talking to other people in the game audio field like sound designers, other programmers, or middleware representatives. 15.2.1 Mixing In short, mixing for a video game is the act of bringing all audio assets of different disciplines, such as music, voice, ambiance, SFX, or UI together in a way that listening to the result is enjoyable, not fatiguing, and ­supports the gameplay.

Realtime Audio Mixing   ◾    247 15.2.2 Offline Mixing Mixing offline describes the process of defining the volume and frequencies of audio assets in a separate program (usually a DAW), so that the result is statically saved out and then used in the game as they were authored. This definition of mixing is pretty much identical to the mixing for TV or movies. 15.2.3 Realtime Mixing In this chapter, we will focus on mixing in real time so that properties such as volume or frequencies of the audio assets are being changed while the game is running. This requires additional computational effort. There are two different types of real-time mixing: passive and active. Passive Realtime Mixing Passive mixing occurs when the meta-data and behavior of the audio was authored in a static way (for example, routing and ducking of busses), so that the audio signal itself changes dynamically by being routed through a DSP effect. One example would be a compressor on the master bus, or a sidechain ducking of the music bus by the voice bus. Active Realtime Mixing We speak of Active mixing when events in the gameplay manipulate the mix dynamically, and the authored audio assets are being changed on the fly. A classic example would be the simulation of a tinnitus effect when the game detects that an explosion occurred very close to the player, by ducking the SFX bus for a short time and playing a ringing sound. 15.3 THE PURPOSE OF REALTIME MIXING Many reasons come to mind when you ask yourself why offline mixing would not be enough for a video game. Affecting the audio dynamically to enhance the player’s experience is desirable because it allows us to decide what this experience should invoke. Similar to choosing different music for certain gameplay moments, realtime mixing is used to influence the state of the player in a way that deepens the engagement with the game [4]. It is very important to understand that different games need different mixing. There is no single best way. A sports game requires different tech- niques and results than a horror game or a procedural open-world game. I encourage you to experiment with what techniques your game needs and discuss the features with the sound designers so that everyone works toward the same goal.

248   ◾    Game Audio Programming 2 15.4 REALTIME MIXING TECHNIQUES Now that we understand what realtime mixing means and what it is for, we can look at how different techniques affect a certain part toward the broader concept. 15.4.1 Playback Limit Since the beginning, every gaming platform had a limitation on how many sounds could be played simultaneously. Depending on the hardware and terminology this can also be called channel or voice limit. The first PlayStation, for example, had an impressive 24 channels. But often the game could request more sounds at one time than the hardware supports. So, you want to choose which are the sounds to play physically and which should be rejected or virtually simulated, so they can be easily swapped back in when a slot opens up again. One of the first techniques to achieve this was to assign a priority num- ber to each sound, sort by it, and then play the n first instances which the hardware supported. This way a gunshot sound could win over a reload sound. The problem with only using this approach in a 3D game is that the gunshot could be very far away and the other sound much closer and louder. It is rather common now to define a basic priority value, and addi- tionally change the value up or down depending on the listener’s distance to the sound. You can achieve a very similar effect by using the audio signal strength or volume to cull the sounds which are simply quieter than the n louder sounds. The problem with this approach is that now a bee sound which got authored pretty loud, could win over a gunshot sound. As a result, we lost the meaning of the sounds. Combining both techniques is a great start to work toward the hardware limitation of your platform. Powerful modern hardware allows hundreds of sounds, and the chal- lenge is to make the result sound clear and not muddy. That’s why instead of a global playback limit we use scoped limits on busses, categories, or objects. If your game does not need unlimited gunshot sounds, then this bus could be limited to, for example, 30 instances, so other sounds still have a chance to appear. This creates a more diverse spectral audio signal, and keeps the gunshots which do play a bit clearer. 15.4.2 HDR Audio Another technique that can push other sounds from being physically played is called high dynamic range (HDR) audio, which simulates the

Realtime Audio Mixing   ◾    249 ear’s impressive ability to hear quiet and loud sounds, but not necessarily at the same time. There are a few different methods, all with their own pros and cons. At its core, HDR audio is implemented by creating a dis- tinction between the volume of a sound file and its loudness. Without HDR, a sound designer would author the loudness into the file using a DAW. If we represent volume as a float from 0.0 to 1.0, then a quiet sound would peak somewhere around 0.2, while a loud sound would be around 0.9. The same result can be achieved with normalized sounds so both files peak at 1.0, but some volume metadata is attached to the sounds so that at runtime they get scaled to 0.2 and 0.9, respectively. The limita- tion of this system is that we cannot effectively represent a volume differ- ence of 10 or 20 times because the quiet sound would not be audible, and we cannot peak higher than 1.0 without introducing clipping. HDR solves this problem by using normalized audio files, and instead of defining volume we add a loudness value to the files. As an example, the gunshot could be 130 dB and the bee sound could be 20 dB. Then we define the height of a moving decibel window, which can vary for a given output system, such as a TV or high-quality speaker. In our example, we’ll say that it is 100 dB tall. When the bee sound plays with a loudness of 20 dB, it will fit through the window and is audible. Once the gunshot starts, however, the upper edge of the window is pushed toward 130 dB, and everything under the lower edge of 30 dB will be culled. This creates the impression that the gunshot is so loud that it completely covers quieter sounds. Once the gunshot rings out, the window starts to shift down again toward the now loudest sound (Figure 15.1). HDR is very effective for keeping the mix clear in different loudness scenarios, but it also has limitations. If the loudness values are authored statically, then HDR cannot differentiate between important and less important loud sounds. For example, the same gunshot aiming straight at you or slightly besides you are treated identically. 15.4.3 Importance I recently contributed to a mixing technique we simply called Importance [5]. Our game analyses how important an opponent is in regard to you by looking at the damage dealt, your size on their screen, their distance and view direction, or if they have you in the center of their sniper scope. All the values are weighted and combined, and then we sort the opponents into buckets of different sizes. The most important bucket holds the oppo- nent that is most relevant to the player. The other buckets hold opponents

250   ◾    Game Audio Programming 2 FIGURE 15.1  The gunshot sound pushes the lower edge of the HDR window up and effectively cuts out the bee sound for a while. who are less relevant or can be safely ignored. Each bucket drives most of the mixing techniques mentioned above on all the sounds related to the opponents in it. We change the priority of the most important opponent dynamically to make sure the most valuable sounds are reliably heard on the platform. We change the volume and pitch to simulate an HDR effect, where non-important sounds are pushed out of the audible window. But what was most important to us was that we don’t statically author loud- ness by category, such as footsteps or gunshots, because depending on the gameplay situation both could be critically important to help you hear your opponent’s approach and learn to listen to the game to become a better player [6]. This allows us to solve the dilemma that the footsteps of someone sneaking up behind you with a loaded shotgun should be much louder than the gunshots of some firing at you from a distance. 15.4.4 Player Perspective I want to write about an example in which sound asset authoring itself can be interpreted as realtime mixing. It is easy to imagine how one sound can be used in different contexts. For example, a boot footstep on a stone sur- face can be used on the player, an enemy, or an NPC. I like to look at this

Realtime Audio Mixing   ◾    251 scenario and ask myself: should that sound be different when perceived through the player’s perspective? If I play in first person view, then my footsteps could sound less spatial- ized and give me a good understanding what surface I am walking on. If an NPC is walking by, then maybe I do not really care about hearing her footsteps at all, unless no other sounds are around. But when an enemy is walking around, I want to really quickly understand there is a potential threat over a larger distance [7]. We could use this sound in different perspectives. 1P stands for first- person of the player, 3P is third-person view of the player, and 3PR stands for third-person of a remote entity (Table 15.1). We can achieve the different 3PR entries either by authoring differ- ent sounds with different radii, or we can combine the player perspective with the Importance technique. Then the fact that a friendly unit is less important would result in a steeper falloff curve of 30 m and we can make the sound quieter and even use different sound layers, compared to the important enemy version. Another example we used a lot was to have different bullet impacts for player-generated shots than bullet impacts for impacts triggered by other players. Authoring different sound assets gives the sound designers the freedom to define large falloff radii for the player, so she can hear her glass shatter from far away, but then the same window is shot by someone else, the player might not be interested in this at all. 15.4.5 Busses and DSP Chain The sound designers will creatively decide the layout of the busses. Modern games can easily have up to a hundred busses, with unique needs for DSP effects and reverb properties, or audio signal routing. As mentioned above, the bus on which critical dialog is played on could be side-chained to drive a compressor effect on the music bus. The more that sound design- ers inflate these structures and exercise control, the more computational TABLE 15.1  Example of How Differently Authored Audio Assets Affect the Mix Perspective Spatialization Radius (m) 1P 2D — 3P 3D 10 3PR (enemy) 3D 70 3PR (friendly) 3D 30

252   ◾    Game Audio Programming 2 cost is needed. It is very valuable to develop CPU budgets in addition to the decoding of audio files, so that the additional signal mixing stays under control. 15.4.6 Mix States Imagine a modern mixer console with different channels and all their knobs for volume and pitch. Years ago, someone in the studio would need to write down every single value, so a band could play the same song identically. Mix states are essentially snapshots of audio properties of a bus hierarchy, which we can restore, blend in and out, and discard dynamically at runtime. A decade ago, games such as Heavenly Sword and Hellgate: London used techniques like that. I also programmed a mix state system for CryEngine called “SoundMoods,” which was able to blend multiple mix states together across different properties such as volume, low pass, and pitch, and then apply it to different busses. That way we could create a specific mix for complex gameplay situations, for example, being low on health, and d­ riving a tank after crashing into a lake. Since then, most audio middleware engines have integrated and improved this technique and now it is much easier for a programmer to add a mix state feature for sound designers for all different use cases. Mix states can be activated by different areas in a map, by changing player stats such as health, or by any player actions such as firing a gun. It is a power- ful tool of realtime mixing, and it is not hard to imagine a great use case for your game. When you design a feature like this, think about which states are mutu- ally exclusive, or if you need to have multiple states active at the same time. Just like looping sounds, you never want to leak a sound state. It can be useful to organize states in independent groups, for instance, UI and gameplay. The UI group could have states such as “main menu,” “options,” team roster,” and “score flash.” The gameplay group could have states such as “isShooting,” “isDiving,” “usesUltimate,” and “isDead.” Both groups can be active independently and be reset at specific times. 15.5 MONITORING Sound designers are most interested in two analytic audio values: volume and frequency. The volume of a sound or a bus is usually represented with a VU meter, which drives colored bars and indicates the most recent peak.

Realtime Audio Mixing   ◾    253 FIGURE 15.2  Performance graphs in Wwise. The unit is usually displayed as dB and most middleware presents this information when connected to the game. But, depending on the needs of your game, you can think about other areas to display this information. If your game has an in-game overlay, for example, it can be invaluable to enable a debug mode that displays a VU meter on the screen. The second piece of information that sound designers are interested in is a spectral view of the frequency distribution of a sound, a bus, or a game object. This allows sound designers to see where a sound sits spectrally and how they combine with other sounds. A mix that reduces the overlap of sounds in the frequency domain is easier to listen to and can hold more information for the player. In addition to volume and frequency spectrum, the CPU and mem- ory consumption can be very important values to monitor. For example, the audio middleware Wwise by Audiokinetic allows connecting their authoring tool to the running game in order to monitor various statistics, and other audio middleware solutions provide similar tools. This capabil- ity provides visibility for sound designers to stay within budget with their DSP and mixing costs (Figure 15.2). 15.6 OTHER MIX-RELATED AUDIO FEATURES If you define mixing very broadly, then every falloff curve, compression set- ting, or reverb setting would be counted as a mixing technique. They do affect the outcome of the audio signal, but their main purpose is to create a specific effect. Falloff curves simulate distance, compression settings control data sizes and decoding load, and reverb simulates environmental dimen- sion and changes. But they can have a noticeable effect on the o­ verall mix. Another example is how you compute obstruction and occlusion in a game to simulate sound propagation. Newer approaches with voxel-based

254   ◾    Game Audio Programming 2 data sets and path data can be used to create impulse responses, which strongly manipulate how a player hears a sound, and in which situation or location. The basic functionality of the human hearing relies only on three core abilities: • Analyzing changes in the frequency domain due to distance, obstruc- tion, or the incoming angle affected by your ear flap • Detecting time delay and volume differences of incoming sound waves between the two ears due to the extra distance to reach around your head • Filtering of repetitions in the audio signal caused by early and late ref lections When we use DSP effects, we try to replicate these signal changes to trick the ear into “believing” our virtual sound world. If we apply a low-pass filter to convey that the player has low health, we run the risk that the ear might also mistakenly process this as sounds being further away. In that sense, certain mixing effects can be combined in a coun- terproductive way that influences the psycho-acoustic experience nega- tively. This can be a risky pitfall, and it is good practice to be aware of h­ arming the player experience by applying all kinds of DSP effects onto the sound signal. Another application which might get negatively affected by filtering is Spatial Audio, the process of enhancing the virtual positioning of sounds in AR/VR by using head-related transfer functions. Depending on the incoming angle of a spatialized sound, the signal gets filters with high and low passes. If your obstruction and mixing techniques already apply those, then the quality of the perceived positioning can be harmed. 15.7 IT IS MIXING TIME It is very common in the industry for the sound designers to dedicate some time shortly before shipping the game to do a final mix, often in a different room or facility. There is great information available how to start up such a session from a creative point of view and to avoid fatigue. I want to add some thoughts specifically for programmers, who should try to prepare, participate, and support these efforts.

Realtime Audio Mixing   ◾    255 Here are a few features to allow sound designers to be most efficient during a mix session: - Mobility: The game can be moved and transported to different rooms or external facilities potentially without connection to game servers or databases. - Security: To avoid data breaches and leaks all game data should be encrypted and securely stored overnight and during breaks. - Connectivity: Changes to the mix are locally stored and can be ­easily synced with the game data at the studio, or transported in any other form, while avoiding merge conflicts with changes made at the studio. One brute-force solution is to exclusively keep all audio data checked out while the mix session is ongoing. - Independence: Changes to the game data can be applied on the spot, without the need to wait for a sync or an overnight build from the studio. - Iteration: Allow sound designers to listen, change, and verify their work as fast as possible. Often games allow runtime connection to the game to audition changes on the fly. - Information: Sound designers can detect quickly which sound is currently playing in game and how to find it in your tool set. Also, they can easily visualize sound properties in the game, such as 3D max radius. - Monitoring: Loudness and frequency values can be easily observed and tested. - Filter: Sounds can be filtered/muted/soloed by category, mix, bus, or discipline so these sounds can be easily mixed in a consistent way. - Efficiency: Any part of the game can be easily reached by cheats and repeated instead of the need to reach this by actually playing. There is one potential pitfall to a dedicated mix session, especially if no game designer is present. The core game play could be mixed more toward sounding cinematic than supporting the game, but this is clearly in the responsibility of the design experts. However, if most of the features above are available to the sound designers during the development of the game,

256   ◾    Game Audio Programming 2 they can react and apply feedback from testers and designers efficiently so that the game’s mix is already in great shape. Essentially, the game is mixed as we go. This is also valuable when showcasing the game internally or externally to publishers. Then the final mix session might be very short, less costly, or even unnecessary altogether. 15.8 FUTURE TECHNIQUES We can talk about where realtime mixing techniques will go next by a­ sking two questions: • What mixing will the games need which we will be making in the future? • What technology will become available by increased resources to mix games which are similar to the current ones? I think we might see more AR/VR games on mobile devices, at home, or as spectators of larger (eSports) events. These games would require extra attention of mixing through low-quality speakers or headphones, or due to loud background noise. Maybe we will see noise cancellation technol- ogy appear in AR audio mixes. AR and VR typically require more explicit 3D sounds, because 2D sound beds do not work well with head movement. These games could use their understanding of the real or virtual world around the player to actively affect not only the reverb, but the mix itself. Cameras and microphones could be driving forces of adapting the mix for each player. More powerful hardware will allow us to compute more of what we already know. We can author more complex sounds, and turn on and off more layers depending on the mixing requirements. It is very likely that machine learning will be integrated into the audio process to assist sound designers finding the best mix. If thousands of players do not react to a critical sound, these algorithms can detect that and either change the mix accordingly, or bubble this information up to someone who can. Maybe it is AI itself, which can listen to a mix and tell us if it was able to play the game better with it. Regardless of what technology is at your disposal or you are able to develop, I encourage you to always question whether the game actually needs it. Does the player get an advantage out of a mixing technique? Avoid using something just because everyone else is doing it. Defining

Realtime Audio Mixing   ◾    257 and solving your unique needs could lead you to an innovative discovery we currently cannot imagine. 15.9 CONCLUSION By now, I hope you have a good insight into the complexity of realtime mixing and which great techniques are already available for you to be used. It is clear that we have not reached the perfect solution yet, because all games are different. Not only do our mixing techniques get better, but they can also interfere with each other and there are plenty of technical challenges for you to discover and solve. REFERENCES 1. Garry Taylor, Blessed are the Noisemakers, February 20, 2012. http:// gameaudionoise.blogspot.com/p/all-in-mix-importance-of-real-time.html. 2. Rob Bridgett, The Game Audio Mixing Revolution, Gamasutra, June 18, 2009. www.­gamasutra.com/view/feature/132446/the_game_audio_ mixing_revolution.php. 3. Rob Bridgett, The Future of Game Audio—Is Interactive Mixing the Key, Gamasutra, May 14, 2009. www.gamasutra.com/view/feature/132416/the_ future_of_game_audio__is_.php. 4. Etelle Shur, Remixing Overwatch: A Case Study in Fan Interactions with Video Game Sound, Claremont Colleges, 2017. http://scholarship. claremont.edu/cgi/viewcontent.cgi?article=2019&context=scripps_theses. 5. Steven Messner, How Overwatch Uses Sound to Pinpoint Threats You Can’t, PC Gamer, March 21, 2016. www.pcgamer.com/how-overwatch-uses- sound-to-pinpoint-threats-you-cant-see/. 6. Marshall McGee, Is Overwatch the Best Sounding Shooter Ever? www.­ youtube.com/watch?v=MbV_wKScrHA. 7. Scott Lawlor, Tomas Neumann, GDC 2016 Overwatch—The Elusive Goal: Play by Sound. www.youtube.com/watch?v=zF_jcrTCMsA.



16C H A P T E R Using Orientation to Add Emphasis to a Mix Robert Bantin Massive Entertainment - An Ubisoft Studio, Malmö, Sweden CONTENTS 16.1 A Basis of Reality260 16.2 A Practical Example of Heightened Reality Mixing260 16.3 How Can We Model the Dub Mixer’s Behavior in a Non- scripted Environment?260 16.4 M icrophone Polar Pattern Modeling261 16.4.1 The Polar Pattern Function262 16.4.2 Morphing between Polar Patterns263 16.4.3 Regression Analysis to Find the Morphing Function264 16.4.4 Optimizing for Cosine267 16.4.5 C onverting between the Dot Product Direction Test and Theta269 16.5 Implementing SIMD Optimizations270 16.5.1 S IMD Optimizing the Function f(x)271 16.5.2 SIMD Optimizing the Function h(θ)272 16.6 T he Overall Implementation So Far273 16.7 Listening to the Result274 16.8 A n Arrangement That Needs Less Management274 16.9 A Final Word on Azimuth and Zenith Orientation Handling275 16.10 C onclusion277 259

260   ◾    Game Audio Programming 2 16.1 A BASIS OF REALITY When we think about designing sound placement in 3D games, we tend to start out with a basis of reality and build our bespoke systems on top of that: attenuation is based on distance to the listener; the loudspeaker matrix is based on orientation to the listener. With the exception of 3rd-person player cameras (a special topic handled in Chapter 11 of Game Audio Programming: Principles and Practices, Volume 1), it should be pretty straightforward, and for a low number of sound-emitting actors, it usually is. However, with the technological advances we’ve had over the last few console generations, the number of sound emitters we’ve been allowed to use has increased dramati- cally. Add to that the merit of being able to use ambient processing in mul- tiple zones to create an ever more detailed environment, and we should be getting ever nearer to our understanding of what “reality” should be. And yet with all that, the audio mix can end up sounding muddled and undesirably chaotic—much like a documentary film of a gun battle with location sound. If this is happening to you, you should not feel bad. After all, it is exactly what you designed in the first place! If it’s not what you wanted, then perhaps consider that what you were looking for was not “reality,” but rather “heightened reality.” 16.2 A PRACTICAL EXAMPLE OF HEIGHTENED REALITY MIXING In film postproduction (and let’s face it, for most of us that is the “gold stan- dard”), the dubbing stage mixer will be very selective in what they bring into focus in the mix and what they’ll allow to be masked, and this will change from moment to moment. Consider the government lobby scene from the first Matrix movie. It is constantly isolating some sounds while blurring out others: single bullets leaving the chamber of a single gun when the film cuts to the close up of that gun and wall tiles cracking to bullet impacts when the film cuts to the close up of a wall. Meanwhile, every other sound is notice- ably attenuated or low-pass filtered for that moment. I’ve picked this movie because it is an extreme example of the concept I’m trying to illustrate, but in lesser forms is rather commonplace in film mixing. It brings emphasis to the mix so that the viewer can be drawn to what’s important. 16.3 HOW CAN WE MODEL THE DUB MIXER’S BEHAVIOR IN A NON-SCRIPTED ENVIRONMENT? Normally we can’t do the creative mix of a human in games unless it’s a cutscene. In the more typical scenarios, the player has agency over the

Using Orientation to Add Emphasis to a Mix   ◾    261 point of view, NPCs will have complex behaviors, and then there’s all that rigid-body physics triggering sound effects to boot. It all makes for a highly emergent (and therefore unpredictable) system—essentially the exact oppo- site of a movie, in fact. What we can do, though, is define certain mixing behaviors that we can apply to some of our sound emitting actors. We just need to decide what those rules are and what to apply them to. 16.4 MICROPHONE POLAR PATTERN MODELING When you think about it, a listener whose sensitivity has no preference over its orientation to an emitter is rather like using an omnidirectional mic on a film set and then simply throwing the result at the audience. For reasons that should now be apparent, that isn’t always useful, which is why on-set sound recordists and Foley artists will usually exploit the discrimi- natory nature of directional mics to control what gets captured. We can’t change how our sound assets were recorded after the fact, but what we can do is apply a similar emphasis rule (attenuation and/or filtering) to some of the emitters based on their orientation to listener (i.e., prioritize the perceptual loudness of some sound-emitting actors to the player camera according to how central they are to the player’s view) (Figure 16.1). FIGURE 16.1  How emitter positions could be emphasized as if the listener was like a single cardioid-pattern mic.

262   ◾    Game Audio Programming 2 So first of all, we need an algorithm that can i. Control a level based on angle like a microphone: Let’s say ­omnidirectional to hyper-cardioid. Let’s not specify what kind of emphasis that is yet as we’ll want to allow the sound designer to try different things, not just attenuate. The control level should be used as a r­ eal-time parameter to drive something else. ii. Vary the polar pattern smoothly from one extreme to the other so we can adjust it whenever we need to. iii. Orient off the result of a dot product of two normalized direction vectors in the range of {−1 ≤ d ≤ +1}, so that the game-side program- mer can use a unit of measure they are familiar with. This can work for both azimuth and zenith planes, but for now we’ll assume we’re only interested in the azimuth plane. 16.4.1 The Polar Pattern Function We can address the first requirement with a little bit of mathematics if we first consider how multi-pattern microphones work. Normally, this is achieved with two capsules and a preamplifier circuit that allows their signals to interact (Figure 16.2). If we define our virtual microphone in the same way (and we ignore the fact that the real-world polar pattern would be almost always o­ mnidirectional at very high frequencies), we just need to describe the effect of the pressure zone and the pressure gradient of the two opposing capsules and add them together. g (θ ) = PZF +   PGF cosθ − PZB + PGB cosθ 2 θ Azimuth or zenith angle (the response is the same on either plane) PZF Pressure zone (front capsule) PGF Pressure gradient (front capsule) PZB Pressure zone (back capsule) PGB Pressure gradient back capsule Using θ in the typical range of {0 ≤ θ ≤ 2π} or {−π ≤ θ ≤ π} radians, we then get a linear gain value in the range of {−1 ≤ g ≤ 1}. The only ­question is what to set for PZF, PGF, PZB, and PGB, and an interesting aspect of this e­ quation is that there are multiple answers to the same solution. For example, Table 16.1 and Figure 16.3 show one solution for the typical polar patterns:

Using Orientation to Add Emphasis to a Mix   ◾    263 FIGURE 16.2  Multi-pattern microphones normally contain two capsules back to back. Image by Dan Tyler (http://www.mephworks.co.uk/). TABLE 16.1  Coefficients That Can Produce the Standard Polar Patterns—This Is Just One of Multiple Solutions Polar Pattern PZF PGF PZB PGB Omnidirectional 2 0 00 Sub-cardioid 1.31 0.69 0 0 Cardioid 1 100 Super-cardioid 0.67 0.67 0 0.67 Hyper-cardioid 1/2 1/2 0 1 16.4.2 Morphing between Polar Patterns To address the second requirement, we need to discover a combination of PZF, PGF, PZB, and PGB that have a distinct progression from one extreme (omnidirectional) to the other (hyper-cardioid). Once we have a pro- gression that looks like it’s close enough to what we want, we can assign

264   ◾    Game Audio Programming 2 1.25 1 0.75 0.5 0.25 0 Sub-cardioid Cardiode Super-cardioid Hyper-cardioid Omni FIGURE 16.3  Comparative polar patterns (magnitude gain, not dB). an arbitrary ordinal value from one end of the progression to the other, and then derive a mathematical formula via a (most likely) nonlinear regression. As it turns out, you just need one set of capsule parameters (I picked PZF and PGF) to go from an omnidirectional pattern to a hyper-cardioid pattern, as the other capsule is only strictly necessary to generate a perfect bidirectional pattern. Table 16.2 and Figure 16.4 show a progression that I found empirically that works quite well. 16.4.3 Regressing Analysis to Find the Morphing Function At this point, we have a polar pattern generator that’s simpler than where we started, and we have a set of parameter values for PZF and PGF that will give us the five configurations that we’re looking for. What we’re missing

Using Orientation to Add Emphasis to a Mix   ◾    265 TABLE 16.2  PGF and PZF Progression from an Ordinal Value of 1–5, Expressed as Exact Ratios Directivity Q PZF PGF 1 2−2=0 2 2 3 2 − 2 ≈ 0.59 2 2−1=1 4 1 1 2− 1 ≈ 1.29 5 2 2 1 2 − 1 = 1.5 2 2 2.5 2 1.5 1 0.5 0 12345 Fast PZF Fast PGF Original PZF FIGURE 16.4  The same PZF and PGF progression shown graphically. is a morphing function that transforms those two parameters according to those preferred values as we vary the directivity Q. A good solution is a polynomial fit as it’s the sort of mathematical template that can be calculated with the simple operators {add | sub- tract | multiply | divide}, and especially because these operators can be grouped together in SIMD instructions—making them very CPU effi- cient. I’m going to skip the part where I explain how I do the polynomial fit (as this would be a chapter in itself) suffice to say I normally use Microsoft Excel with the “Analysis Toolpak” enabled. In other words, I don’t need any exotic tools.

266   ◾    Game Audio Programming 2 Based on the observations we have to regress, we’ll use a third-order polynomial that takes this form: f (x ) = x0k0 + x1k1 + x 2k2 + x3k3  Naturally, x0 = 1, so we often ignore that operator. In this case, I’ve kept it there to remind us later to initialize the first unit of the first SIMD block with 1.0. I also recommend that you avoid any math power function calls by using in-line multiplies since we know the powers are whole numbers and never change. In other words, where you see x2 or x3, your non-SIMD- optimized code should replace the power function with a hard-coded multiplies like this: float calcPZF(float x) {  float x2 = x*x; float x3 = x2*x; return k0 + x*k1 + x2*k2 + x3*k3; }  We’re going to replace this function with a SIMD-optimized version later on, so I’ve left it here as a non-encapsulated method with the k constants declared in a wider scope for simplicity. In the event that SIMD isn’t available on your platform, ultimately you should implement this version of the function and constants within the scope of a class (e.g., as a private method and members). The coefficients from the third-order polynomial fit of PZF are shown in Table 16.3. Since PGF = 2 − PZF, we only need to calculate the polynomial for PZF and then infer PGF using the same 2 − PZF conversion. Table 16.4 shows the resulting progression for PZF and PGF by substitut- ing Q for x. As you can see, we’ve now got almost the same morphing behav- ior w­ ithout using any expensive mathematical operations. The overall f­unction then looks like this: g (θ ,Q) = f (Q)+ (2 − f (Q))cosθ 2 TABLE 16.3  All Four “k” Coefficients for the Polynomial Fit of PZF k0 k1 k2 k3 2.74873734152917 −0.871494680314067 0.113961030678928 −0.00592231765545629

Using Orientation to Add Emphasis to a Mix   ◾    267 TABLE 16.4  The Polynomial-Fitted Version of PZF and the Inferred PGF Q Q2 Q3 Original PZF 3rd Order Poly-Fit PZF Inferred PGF 11 1 24 8 2 1.985281374 2 − 1.985281374 = 0.014718626 3 9 27 1.414213562 2 − 1.414213562 = 0.585786438 2 = 1.414213562 1 2−1=1 4 16 64 11 2 = 0.707106781 0.707106781 2 − 0.707106781 = 1.292893219 5 25 125 1 0.5 2 − 0.5 = 1.5 2 Since the attenuation of super and hyper-cardioid mics can go negative, I’ve added a modulus operator to the whole equation to make sure we only get values of g in the range {0 ≤ g ≤ 1}. This will make things simpler if we want to use this equation for an emphasis control value. 16.4.4 Optimizing for Cosine There’s just one niggle left, and that’s the use of cosine. Trigonometric functions are very expensive operations, so executing them at frame or sample rate is a no-no. Thankfully, we don’t need that much accuracy as provided by a native CPU instruction, and our input range will never exceed the limits of {0 ≤ θ ≤ π} because we’re going to be using as input the converted radian value of a dot product in the range {−1 ≤ d ≤ +1}. We can, therefore, safely approximate cos θ with a low-order Taylor series polynomial. cosθ ≈ 1− θ2 + θ4 − θ6 + θ8 2! 4! 6! 8! Now, we can’t use it as-is yet, because we need to make a few minor adjust- ments so we can turn those1/x! parts into constant coefficients we can just multiply with. Table 16.5 shows these coefficients. Figure 16.5 shows the difference between the cosine function and the Taylor expansion; as you can see, they are very similar. So, now we compute a new function h(θ) that approximates cos θ using coefficients c1, c2, c3, and c4: h(θ ) = 1+ c1θ 2 + c2θ 4 + c3θ 6 + c4θ 8 

268   ◾    Game Audio Programming 2 TABLE 16.5  All Five “c” Coefficients Needed to Approximate Cosine Using Taylor Series Coefficient Factorial Ratio Substitute Label Coefficient c1 − 1 2! −0.5 c2  41! 0.041666667 c3 − 1 −0.001388889 c4 6! 2.48016E−05  81! 1.2 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 cos(theta) Taylor series (5 terms) FIGURE 16.5  cos θ and its five-term Taylor series equivalent. Later on, I’m going to show you how to SIMD-optimize this function, but for now your non-SIMD-optimized code for this function should look something like this: float taylor5cosine(float t) { float t2 = t*t; float t4 = t2*t2; float t6 = t4*t2; float t8 = t4*t4; return 1.0f + c1*t2 + c2*t4 + c3*t6 + c4*t8; }

Using Orientation to Add Emphasis to a Mix   ◾    269 Again, we’re going to replace this code with a SIMD-optimized version later on, so I’ve shown the function here as a nonencapsulated method with the c constants declared in a wider scope for simplicity. If SIMD is not available on your platform, you should implement this function and con- stants within the scope of a class (e.g., as a private method and members). Finally, we can substitute cos θ with our new function h(θ) into our final equation like so: g (θ ,Q) = f (Q)+ (2 − f (Q))h(θ ) 2 16.4.5 Converting between the Dot Product Direction Test and Theta Remember that although we’re testing source orientation with respect to the listener, we’re actually going to apply the emphasis to the source, so we need to view the orientation from source’s perspective (Figure 16.6). For this to work as seen from the source, you’ll first need to generate the normalized direction vector from the source to the listener like this: n =source-to-listener plistener − psource plistener − psource FIGURE 16.6  The listener when viewed from the source.

270   ◾    Game Audio Programming 2 nsource-to-listener Normalized direction from the source to the listener psource Position of the source in cartesian space plistener Position of the listener in cartesian space (npoxzrm=a√llyptx2h+e paz2zim: tuhteh magnitude of the vector along the x and z axes tion your engine uses). plane, adjust according to whatever axis conven- Second, you’ll need to perform a dot product with that normalized direction vector and the listener’s orientation (which is also a normalized direction vector you should already have). d = nsource-to-listener ⋅nlistener d Scalar rotation factor nlistener Listener orientation vector (normalized direction vector) Note: You should consider the dot product of a·b to be equivalent to axbx + azbz if we ­consider those axes to be the azimuth plane. When the listener is pointing directly at the source, the result will be −1, while when the listener is pointing directly away from the source, the result will be +1. When the two normalized direction vectors are orthogonal, the result will be 0. Converting this number range to radi- ans is a s­ traightforward linear conversion from the range {−1 ≤ d ≤ +1} to {0 ≤ θ  ≤ π} and looks like this: θ = (d + 1)π 2 Code-wise you could implement it as in inline method like this: inline float DirectionalEmphasis::directionTestToRadians(float d) {  return 0.5f * ((d + 1.0f) * c_PI); }  Assume c_PI is a constant brought in via a header or as a protected mem- ber of the class (or its base). 16.5 IMPLEMENTING SIMD OPTIMIZATIONS SIMD is a feature of modern CPUs to facilitate multiple numerical opera- tions in single calls. Typically, these function calls are hardware dependent,

Using Orientation to Add Emphasis to a Mix   ◾    271 so using them normally involves the use of intrinsic code woven into your C/C++ source. For the purposes of this chapter, we’ll assume that we’re using an x86_64 CPU or similar. 16.5.1 SIMD Optimizing the Function f(x) As you may recall, we need to implement this function: f (x ) = x0k0 + x1k1 + x 2k2 + x3k3  Rather than calculating PZF twice (the second time for PGF = 2 − PZF), we opt here to take a reference to placeholders of both PZF and PGF so that they can be calculated together: #include <pmmintrin.h> static const __m128 DirectionalEmphasis::k_constants = _mm_set_ps( 2.74873734152917f, -0.871494680314067f, 0.113961030678928f, -0.00592231765545629f); void DirectionalEmphasis::fastPZFandPGF( float Q, float& PZF, float& PGF) {  float Q2 = Q*Q; __m128 x_values = _mm_set_ps(1.0f, Q, Q2, Q*Q2); __my_m128 f_values; f_values.m128_v4 = _mm_mul_ps(k_constants, x_values); f_values.m128_v4 = _mm_hadd_ps(f_values.m128_v4, f_values.m128_v4); f_values.m128_v4 = _mm_hadd_ps(f_values.m128_v4, f_values.m128_v4); PZF = f_values.m128_f32[0]; PGF = 2.0f – PZF; }  Since the k values are constant, we calculate them once and store them as a 4x float vector within the scope of the class. For clarity in context, I’ve initialized the k_constants member just ahead of the encapsulated method that uses it, but in your complete class implementation should probably initialize all the static constants in the same place (e.g., at the top of the .cpp file), most likely as protected members. Directivity Q is squared once and put temporarily on the stack so it can be used twice locally with the function. A 4x float vector for the x values