PRACTICAL IMPLEMENTATION 131    minimum-phase system as long as the RCs have magnitude less than one, which  can be verified while solving the normal equation during LP analysis.       For the pitch synthesis filter with system function (4.89), the system poles are  found by solving                                              1 þ bzÀT ¼ 0    or        zÀT ¼ Àb:  ð4:90Þ       There are a total of T different solutions for z, and hence the system has T    different poles. These poles lie at the vertices of a regular polygon of T sides  that is inscribed in a circle of radius jbj1=T . Thus, in order for the filter to be stable,    the following condition must be satisfied:        jbj < 1:   ð4:91Þ       An unstable pitch synthesis filter arises when the absolute value of the numerator  is greater than the denominator as in (4.84), resulting in jbj > 1. This usually arises  when a transition from an unvoiced to a voiced segment takes place and is marked  by a rapid surge in signal energy. When processing a voiced frame that occurs just  after an unvoiced frame, the denominator quantity Æe2s ½n À T involves the sum of  the squares of amplitudes in the unvoiced segment, which is normally weak. On the  other hand, the numerator quantity Æes½nes½n À T involves the sum of the products  of the higher amplitudes from the voiced frame and the lower amplitudes from the  unvoiced frame. Under these circumstances, the numerator can be larger in magni-  tude than the denominator, leading to jbj > 1. Therefore, an unstable pitch synth-  esis filter can arrive when the signal energy shows a sudden increase. To ensure  stability, the long-term gain is often truncated so that its magnitude is always  less than one.       Maintaining the long-term gain to have a magnitude strictly less than one is often  not a good strategy, since subjective quality could be adversely affected. This is true  for various kinds of speech sounds generated by a sudden release of pressure, such  as the stop consonants b and d. By easing the constraint on the long-term gain,  sounds of a transient, noncontinuant nature can be captured more accurately by  the underlying model, leading to an increase in subjective quality. Thus, it is com-  mon for various coding algorithms to tolerate short-term instability in the pitch  synthesis filter. A popular choice for the upper bound of the long-term gain is  between 1.2 and 2.    4.8 PRACTICAL IMPLEMENTATION    In general, LP analysis is a well-behaved procedure in the sense that the resultant  synthesis filter is guaranteed to be stable as long as the magnitudes of the RCs are
132 LINEAR PREDICTION    less than one (Section 4.4). In practice, however, there are situations under which  stability can be threatened. For instance, under marginally stable conditions, the  limited precision of the computational environment can lead to errors high enough  to produce an unstable filter; this could happen for signals with sustained oscilla-  tion, where the spectrum is associated with poles close to the unit circle. In this  section we study several techniques employed in speech coding to fix the described  problem, all of them aimed at alleviating ill-conditioning during LP analysis and, at  the same time, improving the stability of the resultant synthesis filter, as well as the  quality of the synthetic speech. These techniques can be used in an isolated fashion  or combined together.    Pre-emphasis of the Speech Waveform    The typical spectral envelope of the speech signal has a high frequency roll-off due  to radiation effects of the sound from the lips. Hence, high-frequency components  have relatively low amplitude, which increases the dynamic range of the speech  spectrum. As a result, LP analysis requires high computational precision to capture  the features at the high end of the spectrum. More importantly, when these features  are very small, the correlation matrix can become ill-conditioned and even singular,  leading to computational problems. One simple solution is to process the speech  signal using the filter with system function    HðzÞ ¼ 1 À azÀ1;          ð4:92Þ    which is highpass in nature. The purpose is to augment the energy of the high-    frequency spectrum. The effect of the filter can also be thought of as a flattening  process, where the spectrum is ‘‘whitened.’’ Denoting x½n as the input to the filter  and y½n as the output, the following difference equation applies:    y½n ¼ x½n À ax½n À 1:  ð4:93Þ    The filter described in (4.92) is known as the pre-emphasis filter. By pre-emphasiz-  ing, the dynamic range of the power spectrum is reduced. This process substantially  reduces numerical problems during LP analysis, especially for low-precision  devices. A value of a near 0.9 is usually selected.       It is common to find in a typical speech coding scheme that the input speech is  first pre-emphasized using (4.92). To keep a similar spectral shape for the synthetic  speech, it is filtered by the de-emphasis filter with system function    GðzÞ  ¼  1    1           ð4:94Þ              À azÀ1    at the decoder side, which is the inverse filter with respect to pre-emphasis. Figure  4.27 shows the magnitude plots of the filter’s transfer functions.
PRACTICAL IMPLEMENTATION 133               10    |H ( e jω)|             0.8                 1                                 α= 0.9                 0.1                       0.5                  1                       0               ω /π    Figure 4.27 Magnitude plots of the transfer functions of the pre-emphasis filter.    Bandwidth Expansion Through Modification of the LPC    In the application of linear prediction, the resultant synthesis filter might become  marginally stable due to the poles located too close to the unit circle. The problem  is aggravated in fixed-point implementation, where a marginally stable filter can  actually become unstable (with the poles located outside the unit circle) after quan-  tization and loss of precision during processing. This problem creates occasional  ‘‘chirps’’ or oscillations in the synthesized signal.       Stability can be improved by modifying the LPCs according to                            anewi ¼ giai; i ¼ 1; 2; . . . ; M;     ð4:95Þ    with g < 1 a positive constant. The operation moves all the poles of the synthesis  filter radially toward the origin, leading to improved stability. By doing so, the  original spectrum is bandwidth expanded, in the sense that the spectrum becomes  flatter, especially around the peaks, where the width is widened. Typical values for  g are between 0.988 and 0.996.       Another advantage of the bandwidth expansion technique is the shortening of the  duration of the impulse response, which improves robustness against channel  errors. This is because the excitation signal (in some speech coders the excitation  signal is coded and transmitted) distorted by channel errors is filtered by the synth-  esis filter, and a shorter impulse response limits the propagation of channel error  effects to a shorter duration.
134 LINEAR PREDICTION    100                                      2     γ =1                 γ =1                                           1   10                               h[n]    |H ( e jω )| 1    γ = 0.92               0               0.1    0.01                         0.5                    1        0                     ω /π  1 0 20 40 60                                                                               n    Figure 4.28 Magnitude of the transfer function (left) and impulse response (right) of the  original (solid line) and bandwidth-expanded (dotted line) synthesis filters.    Example 4.11 The LPCs from Example 4.10 are modified for bandwidth expan-  sion, using a constant g of 0.92. Figure 4.28 shows a comparison between the ori-  ginal and modified magnitude response and impulse response. Note how the  bandwidth-expanded version has a smoother, flatter frequency response; in addi-  tion, the impulse response decays faster toward zero. Poles of the system function  are plotted in Figure 4.29, where, after bandwidth expansion, they are pulled toward  the origin.                               1    Im( pi) 0                      −1                0       1                       −1           Re( pi)    Figure 4.29 Plot of poles for the original (Â) and bandwidth-expanded (^) synthesis  filters.
PRACTICAL IMPLEMENTATION 135                100                10                                  After    |H(e jω )|  1                0.1                0.01         0.5         1                      0  ω /π    Figure 4.30 Comparison between the magnitude plots of the synthesis filter’s transfer  functions before and after white noise correction.    White Noise Correction    White noise correction mitigates ill-conditioning in LP analysis by directly redu-  cing the spectral dynamic range and is accomplished by increasing the autocorrela-  tion coefficient at zero lag by a small amount. The procedure is described by                                              R½0 l:R½0    with l > 1 a constant. The constant l is usually selected to be slightly above one.  For the G.728 LD-CELP coder (Chapter 14), l ¼ 257=256 ¼ 1:00390625, an  increase of 0.39%. The process is equivalent to adding a white noise component  to the original signal with a power that is 24 dB below the original average power.  This directly reduces the spectral dynamic range and reduces the possibility of ill-  conditioning in LP analysis. The drawback is that such an operation elevates the  spectral valleys. By carefully choosing the constant l, the degradation in speech  quality can be made imperceptible.    Example 4.12 Figure 4.30 compares the magnitude plots of the synthesis filter  before and after white noise correction, where the LPCs are the same as in Example  4.10 and l ¼ 257=256. Note that the dynamic range of the original function is  reduced, where the lowest portion is elevated significantly.    Spectral Smoothing by Autocorrelation Windowing    In the bandwidth expansion method described earlier, the spectrum represented by  the LPCs is smoothed by manipulating the values of the coefficients. The technique  is applied after the LPCs are obtained.
136 LINEAR PREDICTION                 1                                                             1    w [ l ] 0.5                      β = 0.001                                 0.1          β = 0.001                    0.005                                              WW ω,0.005,101 0.01                       0.005               0                                                                            0.5                  0                           WW  ω  ,|0W.0(0e1j,ω1)0| 1  1  0.001                                                                             10−4      ω /π                                                                            1 10−5                                                                                  −6                   1                                                                            1 10                                                                                    0                             20      40                                l    Figure 4.31 Gaussian windows and their Fourier transforms (magnitude normalized).       On some occasions, it is desirable to introduce some smoothing before obtaining  the LPCs, since the solution algorithms (Levinson–Durbin or Leroux–Gueguen)  require many computational steps leading to error accumulation. This can be  done by windowing the autocorrelation function. Since the autocorrelation function  and the power spectral density form a Fourier transform pair (Chapter 3), multiply-  ing the autocorrelation values with a window (in lag domain) has the effect of con-  volving the power spectral density with the Fourier transform of the window (in  frequency domain) [Oppenheim and Schafer, 1989]. By selecting an appropriate  window, the desired effect of spectral smoothing is achieved. Given the autocorre-  lation function R½l, windowing is performed with                               Rnew½l ¼ R½l Á w½l; l ¼ 0; 1; . . . ; M;                             ð4:96Þ  a suitable choice for w½l is the Gaussian window defined by                                        ð4:97Þ                                               w½l ¼ eÀbl2 ;    where b is a constant. Figure 4.31 shows some plots of a Gaussian window for  various values of b.       The described technique can be used to alleviate ill-conditioning of the normal  equation before it is solved; after convolving in the frequency domain, all sharp  spectral peaks are smoothed out. The spectral dynamic range is reduced with the  poles of the associated synthesis filter farther away from the unit circle.    Example 4.13 The autocorrelation values corresponding to the LPCs of Example  4.10 are Gaussian windowed with b ¼ 0:01. Figure 4.32 compares the original  spectrum with the one obtained after smoothing: note how the sharp peaks are low-  ered and widened. The net effect is similar to a bandwidth expansion procedure  with direct manipulation of the LPCs.
MOVING AVERAGE PREDICTION 137              100                                                  After               10    |H(e jω )|  1                0.1                0.01                0.5                         1                     0           ω /π    Figure 4.32 Comparison between the magnitude plots of the synthesis filter’s transfer  functions before and after spectral smoothing.    4.9 MOVING AVERAGE PREDICTION    The discussion so far is based on the AR model. Figure 4.33 shows the block  diagrams of the AR process analyzer and synthesizer filters, where a predictor  with the difference equation given by (4.1) is utilized. It is straightforward to verify  that these block diagrams generate the exact same equations for the AR model. In  practical coding applications, parameters of the predictor are often found from the  signal itself since a computationally efficient procedure is available, enabling real-  time adaptation.       The MA model, as explained in Chapter 3, is in a sense the dual of the AR  model. Figure 4.34 shows the predictor-based block diagrams of the analyzer and  synthesizer filters. In this case, however, the difference equation of the predictor is  given by                                      XK                           ð4:98Þ                         ^s½n ¼ À bix½n À i;                                           i¼1    s[n] x[n] x[n]                                         s[n                Predictor  s[n]    s[n]                                                Predictor    Figure 4.33 Block diagram of the AR analyzer filter (left) and synthesizer filter (right).
138 LINEAR PREDICTION    s[n]                   x[n] x[n]                     s[n          s[n]  Predictor             Predictor  s[n]    Figure 4.34 Block diagram of the MA analyzer filter (left) and synthesizer filter (right).    with K the order of the model and bi the MA parameters. When compared with  (4.1) we can see that ‘‘prediction’’ is now based on a linear combination of excita-  tion or samples of prediction error x½n, which in theory are white noise.       Unlike the AR model, where the optimal parameters can be found by solving a  set of linear equations based on the statistics of the observed signal, the MA para-  meters can only be found using a set of nonlinear equations, and in practice highly  computationally demanding. Hence, other approaches are normally applied to find  the model parameters; these include spectral factorization [Therrien, 1992] and  adaptive filtering techniques such as the least-mean-square (LMS) algorithm  [Haykin, 1991], as well as other iterative methods.       Even though (4.98) is a sort of ‘‘linear prediction’’ scheme, where the prediction  is based on a linear combination of samples, the name of LP is traditionally asso-  ciated with AR modeling. When prediction is based on the MA model, it is expli-  citly referred to as ‘‘MA prediction’’ in the literature. Why do we bother with MA  prediction? The technique offers some unique advantages and will be explained in  Chapter 6, where differential pulse code modulation (DPCM) is introduced, and  also in Chapter 7 with the introduction of predictive vector quantization (PVQ).  Finally, in Chapter 15, MA prediction is applied to the design of a predictive quan-  tizer for linear prediction coefficients.    4.10 SUMMARY AND REFERENCES    In this chapter, a theoretical foundation and practical implementation of linear  prediction are thoroughly explained. Linear prediction is described as a system  identification problem, where the parameters of an underlying autoregressive model  are estimated from the signal. To find these parameters, autocorrelation values are  obtained from the signal and a set of linear equations is solved. The resultant esti-  mation is optimal in the sense that the variance of the prediction error is minimized.  For nonstationary signals such as speech, the LP analysis procedure is applied to  each short interval of time, known as a frame. The extracted LPCs from each frame  result in a time-varying filter representing the activity of the human speech produc-  tion organs. LP is often associated with the acoustic tubes model for speech produc-  tion. Details can be found in Rabiner and Schafer [1978]. Efficient algorithms to  solve the normal equation were introduced. Two such procedures—the Levinson–  Durbin algorithm and the Leroux–Gueguen algorithm—can be used, with the latter  more suitable for fixed-point implementation since all intermediate quantities of the  procedure are bounded.
EXERCISES 139       The method of LP analysis presented in this chapter is known in the literature as  the autocorrelation method. Other schemes exist for LP analysis. The covariance  method, for instance, formulates the problem in a different way, with the sum of  squared error minimized inside the frame. This method has not received wide  acceptance mainly because it cannot be solved as efficiently as the autocorrelation  method. Also, no simple procedure allows a stability check. For additional informa-  tion readers are referred to classical textbooks such as Markel and Gray [1976] and  Rabiner and Schafer [1978]. A discussion of the computational cost for various LP  analysis procedures is found in Deller et al. [1993].       Long-term linear prediction is an efficient scheme where correlation of the  speech signal is modeled by two predictors. The short-term predictor is in charge  of correlation between nearby samples, while the long-term predictor is in charge of  correlation located one or multiple pitch periods away. The method described in  this chapter is known as the one-tap predictor; that is, prediction is based on one  single sample from the distant past. For a multitap long-term predictor, see  Ramachandran and Kabal [1989]. However, the extra complexity and slight perfor-  mance improvement limit the application of the multitap long-term predictor in  practice [Kroon and Atal, 1991]. See Veeneman and Mazor [1993] for additional  insight.       Several techniques to alleviate ill-conditioning, improve stability, and increase  quality of the synthetic speech are presented. In a typical speech coding algorithm,  these methods are used separately or combined together, and they are often  included as standard computational steps. These procedures are cited in subsequent  chapters, where different standard coders are studied. Autocorrelation windowing  was introduced in Tohkura et al. [1978], developed originally to combat bandwidth  underestimation. See Chen [1995] for a discussion of the incorporation of white  noise correction, autocorrelation windowing, and bandwidth expansion to the  framework of the LD-CELP coder.       Prediction can also be defined within the context of other signal models, such as  MA. A good coverage of various statistical models can be found in Therrien [1992],  as well as other textbooks such as Haykin [1991] and Picinbono [1993].       One of the criticisms about the application of LP in speech modeling is the fact  that no zeros are incorporated in the system function of the synthesis filter, which  introduces inaccuracies when representing certain classes of signals, such as nasal  sounds. Difficulties related to a pole-zero type of system function, or ARMA mod-  el, are mainly due to the lack of efficient computational procedure to locate the  parameters of the model. See Lim and Lee [1993] for pole-zero modeling of speech  signals.    EXERCISES    4.1 Within the context of linear prediction, let e½n denote the prediction error        under optimal conditions. Show that                                              Efe½ns½n À kg ¼ 0
140 LINEAR PREDICTION          for k ¼ 1; 2; . . . ; M. That is, e½n is orthogonal to s½n À k. The relation is        known as the principle of orthogonality.    4.2 An alternative way to obtain                                                       XM                                       Jmin ¼ Rs½0 þ aiRs½i                                                                                 i¼1          is by substituting (4.6), the condition required to minimize the cost function        J(4.3), into J itself. Show the details of this alternative derivation.    4.3 In internal prediction where the analysis interval (for autocorrelation estima-        tion) is the same as the prediction interval (the derived LPCs are used to        predict the signal samples), find out the prediction gain when different        windows are involved in the LP analysis procedure. Using a prediction order        of ten and a frame length of 240 samples, calculate the segmental prediction        gain by averaging the prediction gain results for a large number of signal        frames for the two cases where the rectangular window or Hamming window        is involved. Which window provides higher performance?    4.4 Consider the situation of external prediction where the autocorrelation values        are estimated using a recursive method based on the Barnwell window. Using        a prediction order of 50 and a frame length of 20 samples, measure the        prediction gain for a high number of frames. Repeat the experiment using        various values of the parameter a of the window (Chapter 3). Plot the resultant        segmental prediction gain as a function of a. Based on the experiment, what is        the optimal value of the parameter a?    4.5 From the system function of the pitch synthesis filter, find the analytical        expression of the impulse response. Plot the impulse response of the pitch        synthesis filter for the following two cases:    (a) b ¼ 0:5; T ¼ 50.  (b) b ¼ 1:5; T ¼ 50.    What conclusions can be reached about the stability of the filter?    4.6 Within the context of the Levinson–Durbin algorithm, (a) prove that    (a)                                   Yl   À           Á                                              1           ;                             Jl  ¼  J0           À  ki2                                          i¼1         which is the minimum mean-squared prediction error achievable with an         lth-order predictor.    (b) Prove that the prediction gain of the lth-order linear predictor is                                               Yl     À        ki2Á  !                                                     1       PGl ¼ À10 log10                                   À         :                                               i¼1
es[n]                                                          EXERCISES 141                                                               e[n              z-T              z-1                 b0                                b1    Figure 4.35 Equivalent signal flow graph of a long-term prediction-error filter with  fractional delay.    4.7 In Example 4.8, where the simple linear interpolation procedure is applied to        create fractional delay, show that the long-term prediction-error filter can be        implemented as in Figure 4.35, with the long-term LPC summarized in Table        4.3, where b is the long-term gain given by (4.84). Thus, the considered long-        term predictor with fractional delay is indeed a two-tap long-term predictor.        What happens with the cases when two or more bits are used to encode the        fractions?    4.8 In the long-term LP analysis procedure, minimization of J is equivalent to    maximizing the quantity                             ÀP                  T  Á2  :                               nPes½nes½n À                                     n es2½n À T    Justify the above statement. Develop a more efficient pseudocode to perform  the task.    4.9 One suboptimal way to perform long-term LP analysis is by determining the          pitch period T based on maximizing the autocorrelation                                                 X                                          R½T ¼ es½nes½n À T:                                                                          n    Note that the sum of squared error J is not necessarily minimized. An  advantage of the method is the obvious computation saving. Write down  the pseudocode to perform long-term LP analysis based on the described  approach. Mention the property of the resultant long-term gain b. Hint: The  maximum autocorrelation value is in general greater than or equal to zero.    TABLE 4.3 Long-Term LPC for a Prediction Error  Filter with Two Fractional Values: 0 or 0.5    Fraction                 b0                             b1        0                    b                              0     1/2                   b/2                            b/2
142 LINEAR PREDICTION    4.10 Use some speech signal to obtain a set of autocorrelation values for a 10th          order predictor. Find the corresponding LPCs and plot the magnitude of the          response for the associated synthesis filter. Also, plot the poles of the system          function. Repeat using the LPCs obtained by first applying a white noise          correction (l ¼ 257=256), followed by a Gaussian windowing (b ¼ 0:001),          and finally apply bandwidth expansion with g ¼ 0:98 to the resultant LPCs.    4.11 Within the context of AR modeling, where the prediction error is e½n and the          prediction is ^s½n, derive the difference equation relating e½n to ^s½n and show          that the system function of the filter with e½n as input and ^s½n as output is                                        HðzÞ ¼ 1ÀþPPMi¼Mi¼11aaizizÀÀi i :    4.12 Develop the pseudocode to perform long-term LP analysis using the          fractional delay scheme described in Example 4.8. Consider two cases: an          exhaustive search approach, where all possible delay values are evaluated,          and a two-step suboptimal scheme, where the integer pitch period is located          first followed by a fractional refinement near the integer result found.    4.13 In the long-term and short-term linear prediction model for speech produc-          tion, the long-term predictor has a delay of T, while the short-term predictor          has an order of M, with T > M. Is it functionally equivalent to replace the          cascade connection of pitch synthesis filter and formant synthesis filter with          a single filter composed of a predictor of order T with system function                                                      XT                                                 À aizÀi;                                                                             i¼1            where ai ¼ 0 for i ¼ M þ 1; M þ 2; . . . ; T À 1? Why or why not?
CHAPTER 5    SCALAR QUANTIZATION    Representation of a large set of elements with a much smaller set is called quanti-  zation. The number of elements in the original set in many practical situations is  infinite, like the set of real numbers. In speech coding, prior to storage or transmis-  sion of a given parameter, it must be quantized. Quantization is needed to reduce  storage space or transmission bandwidth so that a cost-effective solution is  deployed. In the process, some quality loss is introduced, which is undesirable.  How to minimize loss for a given amount of available resources is the central  problem of quantization.       In this chapter, the basic definitions involved with scalar quantization are given,  followed by an explanation of uniform quantizers—a common type of quantization  method widely used in practice. Conditions to achieve optimal quantization are  included with the results applied toward the development of algorithms used for  quantizer design. Algorithmic implementation is discussed in the last section,  where computational cost is addressed. The presentation of the material is intended  to be rigorous mathematically. However, the main goal is to understand the practi-  cal aspects of scalar quantization, so as to incorporate the techniques in the coding  of speech.    5.1 INTRODUCTION    In this section the focus is on the basic issues of scalar quantization.    Definition 5.1: Scalar Quantizer. A scalar quantizer Q of size N is a mappin from  the real number x 2 R into a finite set Y containing N output values (also known                                                                                                               143    Speech Coding Algorithms: Foundation and Evolution of Standardized Coders. Wai C. Chu                                                            Copyright  2003 John Wiley & Sons, Inc.                                                                                        ISBN: 0-471-37312-5
144 SCALAR QUANTIZATION    as reproduction points or codewords) yi. Thus,                                              Q: R ! Y    where                             ðy1; y2; . . . ; yNÞ 2 Y:    Y is known as the codebook of the quantizer. The mapping action is written as                     QðxÞ ¼ yi; x 2 R; i ¼ 1; . . . ; N:  ð5:1Þ    In all cases of practical interest, N is finite so that a finite number of binary digits is  sufficient to specify the output value. We further assume that the indexing of output  values is chosen so that                                          y1 < y2 < Á Á Á < yN:    Definition 5.2: Resolution. We define the resolution r of a scalar quantizer as                             r ¼ log2 N  lg N;           ð5:2Þ    which measures the number of bits needed to uniquely specify the quantized value.    Definition 5.3: Cell. Associated with every N point quantizer is a partition of the  real line R into N cells Ri, i ¼ 1, . . . , N. The ith cell is defined by                     Ri ¼ fx 2 R: QðxÞ ¼ yig ¼ QÀ1ðyiÞ:   ð5:3Þ    It follows that          [                            ð5:4Þ  and if i ¼6 j                Ri ¼ R;                              i                             Ri \\ Rj ¼ :                 ð5:5Þ    Definition 5.4: Granular Cell and Overload Cell. A cell that is unbounded is  called an overload cell. The collection of all overload cells is called the overload  region. A cell that is bounded is called a granular cell. The collection of all granular  cells is called the granular region.       The set of numbers                                       x0 < x1 < x2 < Á Á Á < xN;
174 PULSE CODE MODULATION AND ITS VARIANTS           1 0.5             00  ˆx[n] xˆ[n]                                                                  ˆ−1−0.5       0                                                                    ˆ50 1000  50 100                                                                              n                n    Figure 6.15 PCM quantized signal (left) with input from Figure 6.14, and quantization  error (right).    removing redundancy. After all, there is no need to transmit if one can predict from  the past.       It is important to point out that the above example only serves the purpose of  illustration. One can rely on a fixed predictor only if the signal source is stationary.  Otherwise, the predictor must change with time to adapt to the input signal    1 0.5    00         −1          50 100           −0.5                                      50 100             0     n                        0                                 n    (a)                        0.5  (b)                                                    0                                                −0.5                                                      0 50 100                                      (c)                                                                         n    Figure 6.16 DPCM quantized signal with (a) input from Figure 6.14, (b) quantization error,  and (c) prediction error.
ADAPTIVE SCHEMES 175                             e[n] Encoder                                  i[n]  x[n] (Quantizer)                       −                                                           Decoder            xp[n] (Quantizer)             Predictor                  eˆ[n]    Decoder           eˆ[n]  i[n] (Quantizer)                                                                         xˆ[n]                             Predictor                              xp[n]    Figure 6.17 Encoder (top) and decoder (bottom) of DPCM with MA prediction.    properties. Principles of DPCM are applied not only to speech coding, but many  other signal compression applications as well.    DPCM with MA Prediction    The predictor in Figure 6.13 utilizes the past quantized input samples, therefore  obeying the AR model. An alternative is to base the prediction on the MA model  (Chapter 3), where input to the predictor is the quantized prediction-error signal, as  shown in Figure 6.17. Performance of the MA predictor is in general inferior; how-  ever, it provides the advantage of being more robust against channel errors.       Consider what happens in the DPCM decoder of Figure 6.13, when channel error  is present; the error not only affects the current sample but will propagate indefi-  nitely toward the future due to the involved loop. For the DPCM decoder of  Figure 6.17, however, a single error will affect the current decoded sample, plus  a finite number of future samples, with the number determined by the order of  the predictor. Thus, DPCM with MA prediction behaves better under noisy channel  conditions.       Often, in practice, the predictor combines the quantized input and quantized pre-  diction error to form the prediction. Hence, high prediction gain of the AR model is  mixed with high robustness of the MA model, resulting in an ARMA model-based  predictor (Chapter 3).    6.4 ADAPTIVE SCHEMES    In scalar quantization, adaptation is necessary for optimal performance when deal-  ing with nonstationary signals like speech, where properties of the signal change
176 PULSE CODE MODULATION AND ITS VARIANTS    rapidly with time. These schemes are often referred to as adaptive PCM (APCM)  and are the topics of this section.    Forward Gain-Adaptive Quantizer    Forward adaptation can accurately control the gain level of the input sequence to be  quantized, but side information must be transmitted to the decoder. The general  structure of a forward gain-adaptive quantizer is shown in Figure 6.18.       A finite number N of input samples (frame) are used for gain computation, where  N ! 1 is a finite number, known as the frame length. The estimated gain is quan-  tized and used to scale the input signal frame; that is, x½n=g^½m is calculated for all  samples pertaining to a particular frame. Note that a different index m is used for  the gain sequence, with m being the index of the frame. The scaled input is quan-  tized with the index ia[n] and ig[m] transmitted to the decoder. These two indices  represent the encoded bit-stream. Thus, for each frame, N indices ia[n] and one  index ig[m] are transmitted. If transmission errors occur at a given moment, distor-  tions take place in one frame or a group of frames; however, subsequent frames will  be unaltered. With sufficiently low error rates, the problem is not serious.       Many choices are applicable for gain computation. Some popular schemes are                        g½m    ¼  k1  maxfjx½njg  þ  k2;             ð6:12Þ                                     Xn                              ð6:13Þ                      g½m ¼ k1 x2½n þ k2;                                       n    x[n]                                                    Amplitude  ia[n]                                                           encoder                 gˆ[m]          (•)−1                     Gain g[m]      Gain                               ig[m]               computation       decoder                                    Gain                                 encoder          ia[n]         Amplitude                           xˆ[n]        ig[m]          decoder                                          gˆ[m]                         Gain                       decoder    Figure 6.18 Encoder (top) and decoder (bottom) of the forward gain-adaptive quantizer.
ADAPTIVE SCHEMES 177    with the range of n pertaining to the frame associated with index m, and (k1, k2)  positive constants. As we can see, the purpose of the gain is to normalize the  amplitude of the samples inside the frame, so that high-amplitude frames and  low-amplitude frames are quantized optimally with a fixed quantizer. To avoid  numerical problems with low-amplitude frames, k2 is incorporated so that divisions  by zero are avoided.       For nonstationary signals like speech having a wide dynamic range, use of  APCM is far more efficient than a fixed quantizer. At a given bit-rate, the SNR  and especially the SSNR are greatly improved with respect to PCM (see  Chapter 19 for the definition of SSNR).    Backward Gain-Adaptive Quantizer    In a backward gain-adaptive quantizer, gain is estimated on the basis of the quan-  tizer’s output. The general structure is shown in Figure 6.19. Such schemes have the  distinct advantage that the gain need not be explicitly retained or transmitted since  it can be derived from the output sequence of the quantizer. A major disadvantage  of backward gain adaptation is that a transmission error not only causes the current  sample to be incorrectly decoded but also affects the memory of the gain estimator,  leading to forward error propagation.       Similar to the case of the forward gain-adaptive quantizer, gain is estimated so as  to normalize the input samples. In this way, the use of a fixed amplitude quantizer is  adequate to process signals with wide dynamic range. One simple implementation  consists of setting the gain g[n] proportional to the recursive estimate of variance    x[n]                        Amplitude                    i[n]                               encoder          (•)−1 Amplitude                                                  decoder          g[n] Gain                        y[n]                            computation                     Amplitude                               xˆ[n]  i[n] decoder                                                                    Gain                                                              computation    Figure 6.19 Encoder (top) and decoder (bottom) of the backward gain-adaptive quantizer.
178 PULSE CODE MODULATION AND ITS VARIANTS    for the normalized-quantized samples, where the variance is estimated recursively  with    s2½n ¼ as2½n À 1 þ ð1 À aÞy2½n;          ð6:14Þ    where a < 1 is a positive constant. This constant determines the update rate of the  variance estimate. For faster adaptation, set a close to zero. The gain is computed  with    g½n ¼ k1s2½n þ k2;                        ð6:15Þ    where k1 and k2 are positive constants. The constant k1 fixes the amount of gain per  unit variance. The constant k2 is incorporated to avoid division by zero. Hence, the  minimum gain is equal to k2.       In general, it is very difficult to analytically determine the impact of various    parameters (a, k1, k2) on the performance of the quantizer. In practice, these para-  meters are determined experimentally depending on the signal source.    Adaptive Differential Pulse Code Modulation    The DPCM system described in Section 6.3 has a fixed predictor and a fixed quan-  tizer; much can be gained by adapting the system to track the time-varying behavior  of the input. Adaptation can be performed on the quantizer, on the predictor, or on  both. The resulting system is called adaptive differential PCM (ADPCM).       Figure 6.20 shows the encoder and decoder of an ADPCM system with forward  adaptation. As for the forward APCM scheme, side information is transmitted,  including gain and predictor information. In the encoder, a certain number of  samples (frame) are collected and used to calculate the predictor’s parameters.  For the case of the linear predictor, a set of LPCs is determined through LP analysis  (Chapter 4). The predictor is quantized with the index ip[m] transmitted.       As in DPCM, prediction error is calculated by subtracting x[n] from xp[n]. A  frame of the resultant prediction-error samples is used in gain computation,  with the resultant value quantized and transmitted. The gain is used to normalize  the prediction-error samples, which are then quantized and transmitted. Note that  the quantized quantities (samples of normalized prediction error, gain, and the  predictor’s parameters) are used in the encoder to compute the quantized input  ^x½n, and the prediction xp[n] is derived from the quantized input. This is done  because, on the decoder side, it is only possible to access the quantized quantities;  in this way, synchronization is maintained between encoder and decoder since both  are handling the same variables.       As we will see, many speech coding algorithms use a scheme similar to the  forward-adaptive ADPCM. In many such algorithms, LP analysis is performed  with the resultant coefficients quantized and transmitted. Thus, a good understand-  ing of ADPCM allows a better digestion of the material presented in subsequent  chapters.
ADAPTIVE SCHEMES 179    x[n]                                                             Amplitude  ia[n]                                                                    encoder                        −             Predictor                Gain     (•)−1                 Amplitude         computation            computation                         decoder           Predictor               Gain         Gain         encoder                encoder      decoder           Predictor       xp[n]               Predictor                               ig[m]         decoder                                                                     ip[m]                                                          xˆ[n]    ia[n]             Amplitude                                      xˆ[n]  ig[m]              decoder                                                        Predictor                       Gain                     decoder    ip[m]             Predictor                    decoder    Figure 6.20 Encoder (top) and decoder (bottom) of a forward-adaptive ADPCM quantizer.       One shortcoming of the forward-adaptation scheme is the delay introduced due  to the necessity of collecting a given number of samples before processing can start.  The amount of delay is proportional to the length of the frame. This delay can be  critical in certain applications since echo and annoying artifacts can be generated.       Backward adaptation is often preferred in those applications where delay is cri-  tical. Figure 6.21 shows an alternative ADPCM scheme with backward adaptation.  Note that the gain and predictor are derived from the quantized-normalized  prediction-error samples; hence, there is no need to transmit any additional  parameters except the index of the quantized-normalized samples.       Similar to DPCM, the input is subtracted from the prediction to obtain the  prediction error, which is normalized, quantized, and transmitted. The quantized-  normalized prediction error is used for gain computation. The derived gain is  used in denormalization of the quantized samples; these prediction-error samples  are added with the predictions to produce the quantized input samples. The
180 PULSE CODE MODULATION AND ITS VARIANTS          x[n]                                                    Amplitude  i[n]                            −                                    encoder                                         (•)-1                    Amplitude                                                                 decoder                                          Gain                                      computation                                                       Predictor                                                   computation                                 xp[n]               Predictor    xˆ[n]    i[n]  Amplitude                                                          xˆ[n]         decoder                                            Gain                  Predictor                                      computation                                                                                      Predictor                                                                                  computation    Figure 6.21 Encoder (top) and decoder (bottom) of a backward-adaptive ADPCM  quantizer.    predictor is determined from the quantized input ^x½n. Techniques for linear  predictor calculation are given in Chapter 4. Using recursive relations for the  gain calculation ((6.14) and (6.15)) and linear prediction analysis, the amount of  delay is minimized since a sample can be encoded and decoded with little delay.  This advantage is due mainly to the fact that the system does not need to collect  samples of a whole frame before processing. However, the reader must be aware  that backward schemes are far more sensitive to transmission errors, since they  not only affect the present sample but all future samples due to the recursive nature  of the technique.    6.5 SUMMARY AND REFERENCES    The major facts about PCM are presented in this chapter, where performance of  uniform quantization as a function of resolution is found. For resolution higher
                                
                                
                                Search
                            
                            Read the Text Version
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
- 213
- 214
- 215
- 216
- 217
- 218
- 219
- 220
- 221
- 222
- 223
- 224
- 225
- 226
- 227
- 228
- 229
- 230
- 231
- 232
- 233
- 234
- 235
- 236
- 237
- 238
- 239
- 240
- 241
- 242
- 243
- 244
- 245
- 246
- 247
- 248
- 249
- 250
- 251
- 252
- 253
- 254
- 255
- 256
- 257
- 258
- 259
- 260
- 261
- 262
- 263
- 264
- 265
- 266
- 267
- 268
- 269
- 270
- 271
- 272
- 273
- 274
- 275
- 276
- 277
- 278
- 279
- 280
- 281
- 282
- 283
- 284
- 285
- 286
- 287
- 288
- 289
- 290
- 291
- 292
- 293
- 294
- 295
- 296
- 297
- 298
- 299
- 300
- 301
- 302
- 303
- 304
- 305
- 306
- 307
- 308
- 309
- 310
- 311
- 312
- 313
- 314
- 315
- 316
- 317
- 318
- 319
- 320
- 321
- 322
- 323
- 324
- 325
- 326
- 327
- 328
- 329
- 330
- 331
- 332
- 333
- 334
- 335
- 336
- 337
- 338
- 339
- 340
- 341
- 342
- 343
- 344
- 345
- 346
- 347
- 348
- 349
- 350
- 351
- 352
- 353
- 354
- 355
- 356
- 357
- 358
- 359
- 360
- 361
- 362
- 363
- 364
- 365
- 366
- 367
- 368
- 369
- 370
- 371
- 372
- 373
- 374
- 375
- 376
- 377
- 378
- 379
- 380
- 381
- 382
- 383
- 384
- 385
- 386
- 387
- 388
- 389
- 390
- 391
- 392
- 393
- 394
- 395
- 396
- 397
- 398
- 399
- 400
- 401
- 402
- 403
- 404
- 405
- 406
- 407
- 408
- 409
- 410
- 411
- 412
- 413
- 414
- 415
- 416
- 417
- 418
- 419
- 420
- 421
- 422
- 423
- 424
- 425
- 426
- 427
- 428
- 429
- 430
- 431
- 432
- 433
- 434
- 435
- 436
- 437
- 438
- 439
- 440
- 441
- 442
- 443
- 444
- 445
- 446
- 447
- 448
- 449
- 450
- 451
- 452
- 453
- 454
- 455
- 456
- 457
- 458
- 459
- 460
- 461
- 462
- 463
- 464
- 465
- 466
- 467
- 468
- 469
- 470
- 471
- 472
- 473
- 474
- 475
- 476
- 477
- 478
- 479
- 480
- 481
- 482
- 483
- 484
- 485
- 486
- 487
- 488
- 489
- 490
- 491
- 492
- 493
- 494
- 495
- 496
- 497
- 498
- 499
- 500
- 501
- 502
- 503
- 504
- 505
- 506
- 507
- 508
- 509
- 510
- 511
- 512
- 513
- 514
- 515
- 516
- 517
- 518
- 519
- 520
- 521
- 522
- 523
- 524
- 525
- 526
- 527
- 528
- 529
- 530
- 531
- 532
- 533
- 534
- 535
- 536
- 537
- 538
- 539
- 540
- 541
- 542
- 543
- 544
- 545
- 546
- 547
- 548
- 549
- 550
- 551
- 552
- 553
- 554
- 555
- 556
- 557
- 558
- 559
- 560
- 561
- 562
- 563
- 564
- 565
- 566
- 567
- 568
- 569
- 570
- 571
- 572
- 573
- 574
- 575
- 576
- 577
- 578
- 1 - 50
- 51 - 100
- 101 - 150
- 151 - 200
- 201 - 250
- 251 - 300
- 301 - 350
- 351 - 400
- 401 - 450
- 451 - 500
- 501 - 550
- 551 - 578
Pages:
                                             
                    