Home Explore Visual Media Coding and Transmission

Visual Media Coding and Transmission

Published by Willington Island, 2021-07-26 02:21:34

Description: Visual Media Coding and Transmission is an output of VISNET II NoE, which is an EC IST-FP6 collaborative research project by twelve esteemed institutions from across Europe in the fields of networked audiovisual systems and home platforms. The authors provide information that will be essential for the future study and development of visual media communications technologies. The book contains details of video coding principles, which lead to advanced video coding developments in the form of Scalable Coding, Distributed Video Coding, Non-Normative Video Coding Tools and Transform Based Multi-View Coding. Having detailed the latest work in Visual Media Coding, networking aspects of Video Communication is detailed. Various Wireless Channel Models are presented to form the basis for both link level quality of service (QoS) and cross network transmission of compressed visual data. Finally, Context-Based Visual Media Content Adaptation is discussed with some examples.

MEDIA DOODLE

Read the Text Version

Pages:

Wireless Channel Models 331 [6] 3rd Generation Partnership Project, “Technical speciﬁcation: group services and system aspects: general packet radio service (GPRS): service description; stage 2 (release 4),” 3GPP TS 23.060 V4.0.0, March 2001. [7] 3rd Generation Partnership Project, “GSM/EDGE: overall description of the GPRS radio interface: stage 2,” TS 03.64 2002, v. 8.10.0, 1999. [8] http://www.chronologic.com/products/dsp/cossapcs.html. [9] J.G. Proakis, Digital Communications, McGraw Hill, London, 1995. [10] GSM, “ETSI EN 300 909 V8.5.0 (2000 07) European standard (telecommunications series), digital cellular telecommunications system (phase 2þ ): channel coding,” GSM 05.03, Version 8.5.0, Jul. 2000. [11] 3rd Generation Partnership Project, “Technical speciﬁcation group GERAN: digital cellular telecommunications system (phase 2þ ): modulation,” 3GPP TS 05.04, v. 8.2.0, Jan. 2001. [12] ETSI/SMG, “Overall description of the GPRS radio interface stage 2,” GSM 03.64, v. 5.2.0., 1998. [13] 3rd Generation Partnership Project, “Technical speciﬁcation group GERAN: digital cellular telecommunications system (phase 2þ ): radio transmission and reception,” TS 05.05, v. 8.8.0, Jan. 2001. [14] GSM, “European standard (telecommunications series), digital cellular telecommunications system (phase 2 þ ): background for RF requirements,” GSM 05.50, v. 8.2.0, Mar. 2000. [15] Lucent Technologies, “Proposal for EDGE EGPRS receiver performance values in GSM 05.05,” Tdoc SMG2 1566/99, Nov. 1999. [16] Ericsson, “EGPRS receiver performance for BTS,” Tdoc SMG2 EDGE 561/99, Dec. 1999. [17] EDGE Drafting Group, “Working assumption for receiver performance requirements,” Tdoc SMG2 EDGE 401/ 99, Aug. 1999. [18] Ericsson, Motorola, Nokia, “Change request GSM 05.05: EGPRS receiver performance for MS DCS 1800 and PCS 1900,” Tdoc SMG2 060/00, Jan. 2000. [19] EDGE Drafting Group, “Outcome of drafting group on MS EGPRS Rx performance,” Tdoc SMG2 086/00, Jan. 2000. [20] Ericsson, “Proposed values for 05.05 receiver performance for BTS,” Tdoc SMG2 EDGE 564/99, Dec. 1999. [21] R. Talluri, “Error resilient video coding in the ISO MPEG 4 standard,” IEEE Communications Magazine, pp. 112 119, Jun. 1998. [22] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobsen, “RTP: a transport protocol for real time applications,” Network Working Group, RFC 1889, 1996. [23] C. Burmeister, M. Degermar, H. Hannu, L. Jonsson, R. Hakenberg, R. Hakenberg, et al., “RObust Header Compression (ROHC),” IETF Internet draft, Feb. 2001, expires Aug. 2001. [24] S. Casner, V. Jacobson, T. Koren, B. Thompson, D. Wing, P. Ruddy, et al., “Compressing IP/UDP/RTP headers for low speed serial links,” IETF Internet draft, Nov. 2000, expires Jun. 2001. [25] 3rd Generation Partnership Project, “Technical speciﬁcation group core network: digital cellular telecommu nications system (phase 2þ ): general packet radio service (GPRS): mobile station (MS) serving GPRS support node (SGSN): subnetwork dependent convergence protocol (SNDCP) (release 1999),” TS 04.65, v. 8.1.0., Sep. 2000. [26] 3rd Generation Partnership Project, “Technical speciﬁcation group core network: digital cellular telecommu nications system (phase 2 þ ): general packet radio service (GPRS): mobile station (MS) serving GPRS support node (SGSN): logical link control (LLC) layer speciﬁcation (release 1999),” TS 04.64, v. 8.6.0., Dec. 2000. [27] http://www.cadence.com/datasheets/fpga design.html. [28] TSGR4#7(99)578, TSG RAN working group 4 (radio) meeting #7 AH01, Noordwijkerhout, 30 Sep. 1 Oct. 1999. [29] TSGR4#7(99)581, TSG RAN working group 4 (radio) meeting #7 AH01, Noordwijkerhout, 30 Sep. 1 Oct. 1999. [30] 3rd Generation Partnership Project, “Technical speciﬁcation group radio access network: user equipment (UE) radio transmission and reception (FDD) (release 4),” TS 25.101, v. 4.10.0., Mar. 2002. [31] H. Holma and A. Toskala, WCDMA for UMTS: Radio Access for Third Generation Mobile Communications, John Wiley & Sons, Ltd., revised edition, 2001. [32] 3rd Generation Partnership Project, “Radio interface protocol architecture,” TS 25.301, v. 4.4.0., Sep. 2002. [33] L. Qiu, Y. Huang, and J. Zhu, “Fast acquisition scheme and implementation of PRACH in WCDMA system,” Vehicular Technology Conference, Vol. 3, pp. 1701 1705, Oct. 2001. [34] 3rd Generation Partnership Project, “Technical speciﬁcation group terminals: radio link control (RLC) protocol speciﬁcation (release 4),” TS 25.322, v. 4.7.0., Jan. 2003. [35] 3rd Generation Partnership Project, “Technical speciﬁcation group terminals: medium access control (MAC) protocol speciﬁcation (release 4),” TS 25.321, v. 4.7.0., Jan. 2003.

332 Visual Media Coding and Transmission [36] “Universal mobile telecommunications system (UMTS); selection procedures for the choice of radio transmission technologies of the UMTS (UMTS 30.03 version 3.2.0),” TR 101 112, v. 3.2.0., Apr. 1998. [37] 3rd Generation Partnership Project, “Technical speciﬁcation group terminals: common test environments for user equipment (UE) conformance testing (release 4),” TS 34.108, v. 4.7.0, Jun. 2003. [38] 3rd Generation Partnership Project, “Technical speciﬁcation group radio access network: multiplexing and channel coding (FDD) (release 4),” TS 25.212, v. 4.6.0., Sep. 2002. [39] 3rd Generation Partnership Project, “Technical speciﬁcation group radio access network: spreading and modulation (FDD) (release 4),” TS 25.213, v. 4.3.0., Jun. 2002. [40] S. Saunders, Antennas and Propagation for Wireless Communication Systems, John Wiley & Sons, Ltd., 1999. [41] 3rd Generation Partnership Project, “Technical speciﬁcation group radio access network: physical channels and mapping of transport channel on to physical channel (FDD) (release 4),” TS 25.211, v. 4.6.0., Sep. 2002. [42] B. Vucetic and J. Yuan, “Turbo codes: principles and applications (The Springer International Series in Engineering and Computer Science) Springer; 1st edition.,” Jan. 2000. [43] K. Higuchi, H. Andoh, K. Okawa, M. Sawahashi, and F. Adachi, “Experimental evaluation of combined effect of coherent RAKE combining and SIR based fast transmit power control for reverse link of DS CDMA mobile radio,” IEEE Journal on Selected Areas in Communications, Vol. 18, No. 8, pp. 1526 1535, Aug. 2000. [44] J.J. Olmos and S. Ruiz, “Chip level simulation of the downlink in UTRA FDD, 11th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, Vol. 2, pp. 1469 1473, Sep. 2000. [45] 3rd Generation Partnership Project, “Technical speciﬁcation group services and system aspects: quality of service (QoS) concept and architecture (release 4),” TS 23.107, v. 4.6.0., Jan. 2003. [46] 3rd Generation Partnership Project, “Physical layer procedures (FDD),” TS 25.214, v. 4.6.0., Apr. 2003. [47] M. Hunukumbure, M. Beach, and B. Allen, “Downlink orthogonality factor in UTRA FDD systems,” Electronics Letters, Vol. 38, No. 4, pp. 196 197, Feb. 2002. [48] 3rd Generation Partnership Project, “Technical speciﬁcation group terminals: base station (BS) radio transmis sion and reception (FDD) (release 4),” TS 25.104, v. 4.4.0., Mar. 2002. [49] 3rd Generation Partnership Project, “Technical speciﬁcation group terminals: packet data convergence protocol (PDCP) speciﬁcation (release 4),” TS 25.323, v. 4.6.0., Sep. 2002. [50] A. Cellatoglu,“Adaptive header compression techniques for mobile multimedia networks,” PhD Thesis, University of Surrey, UK, Feb. 2003. [51] J.J. Olmos and S. Ruiz, “Transport block error rates for UTRA FDD downlink with transmission diversity and turbo coding,” 13th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, Portugal, Sep. 2002. [52] “IEEE standard for local and metropolitan area networks, part 16: air interface for ﬁxed and mobile broadband wireless access systems, amendment 2: physical and medium access control layers for combined ﬁxed and mobile operation in licensed bands and corrigendum 1, 802.16e 2005. [53] “IT þþ signal processing library, version 3.10.2,” http://itpp.sourceforge.net, May 2006. [54] M. Reza Soleymani Y. Gao, and U. Vilaipornsawai, Turbo Coding and Satellite and Wireless Communications, Kluwer Academic Publishers, 2002. [55] “Fixed and mobile channel models identiﬁcations,” WP2.1 SUIT Project Deliverable, Jul. 2006. [56] B. Baumgartner, M. Reinhardt, G. Richter, and M. Bossert, “Performance of forward error correction for IEEE 802.16e,” 10th International OFDM Workshop, Hamburg, Germany, Aug. 2005. [57] C. Eklund, R.B. Marks, S. Ponnuswamy, K.L. Stanwood, and N.J.M.V. Waes, “WirelessMAN: inside the IEEE 802.16 standard for wireless metropolitan networks,” IEEE Standards Wireless Networks Series, May 2006. [58] E. Westman, “Calibration and evaluation of the exponential effective SINR mapping (EESM) in 802.16,” Master’s degree project report, Royal Institute of Technology (Kungliga Tekniska Hogskolan), Stockholm, Sweden, Sep. 2006.

9 Enhancement Schemes for Multimedia Transmission over Wireless Networks 9.1 Introduction Third-generation (3G) access networks were designed from the outset to provide a wide range of bearer services with different levels of quality of service (QoS) suitable for multimedia applications with bit rates of up to 2 Mbps. The bearer services are characterized by a set of transport channel parameters, which include: transport block size, CRC code length, channel coding schemes, RLC mode, MAC type, transport time interval, rate matching, and spreading factor. The perceived quality of the application seen by the end user is greatly affected by the settings of these transport channel parameters. The optimal parameter settings depend highly on the characteristics of the application, the propagation conditions, and the end-user QoS requirements. This section will examine the effect of these transport channel (network) parameter settings upon the performance of MPEG-4-coded video telephony and AMR-WB speech applications, and will investigate the optimal radio bearer design for real-time speech and video transmission over UTRAN, GPRS, and EGPRS. The inﬂuence of the network parameter settings and different channel and interference conditions upon the received video/ speech quality and network performance will be assessed experimentally using the real-time UMTS, GPRS, and EGPRS emulators described in Chapter 8. Furthermore, differences between packet-switched and circuit-switched radio bearer conﬁgurations for conversational video applications over UMTS will be investigated. 9.1.1 3G Real-time Audiovisual Requirements The most challenging form of communication class in terms of application requirement is the conversational service class. The real-time conversational scheme is characterized by two main requirements: very low end-to-end delay and the preservation of time relations between information entities in the stream. The maximum end-to-end delay is decided by human Visual Media Coding and Transmission Ahmet Kondoz © 2009 John Wiley & Sons, Ltd. ISBN: 978-0-470-74057-6

334 Visual Media Coding and Transmission Table 9.1 Characteristics of conversational (real time) applications [1] Application example Videophone Degree of symmetry Two way Data rates One way end to end delay 32 384 kbps <150 ms preferred Frame jitter <400 ms limit Information loss <100 ms for lip synch <1% FER perception. Therefore, the limit for acceptable delay is very strict, as failure to provide sufﬁciently low delay will result in unacceptable quality. Conversational trafﬁc is symmetric or nearly symmetric in nature. The characteristics of the conversational applications as speciﬁed in [1] are shown in Table 9.1. In order to allocate the scarce radio resources fairly and ﬂexibly between different types of services with their respective quality demands, an end-to-end quality of service (QoS) architecture is used in UMTS. Here, QoS is viewed as a series of chained services operating at different levels of the mobile environment, and the required QoS is realized through several different bearer services. A radio access bearer (RAB), which is based on the characteristics of the radio interface, provides the service quality over the radio interface. Clearly-deﬁned QoS attributes (parameters) are used to characterize the services and the functionality of the application. The speciﬁed QoS attributes for the conversational class are shown in Table 9.2. For physical realization of the intended QoS for the transfer, all these QoS parameters must be mapped onto the transport/physical channel parameters, such as spreading factor, transport format, ARQ parameters, channel-protection and error-detection schemes, RLC/MAC type, and rate matching. The effectiveness of radio resource allocation algorithms and the system performances are greatly affected by the settings of these radio access network parameters. The optimal settings depend highly on the characteristics of the application, the propagation conditions, and the end-user quality requirements. For illustration purposes, an example of the settings of a radio access bearer for 64 kbps conversational video applications is shown in Table 9.3. Table 9.2 Value ranges of radio access bearer service attributes for the conversational class [2] QoS attributes Value range Trafﬁc class Conversational class Maximum bit rate (kbps) Correct delivery order <2048 Maximum SDU size (octets) Delivery of erroneous SDUs Yes/No Residual BER 1500 or 1502 SDU error ratio Yes/No/ Transfer delay (ms) 5 Â 10À2, 1 Â 10À2, 5 Â 10À3, 1 Â 10À3, Guaranteed bit rate (kbps) 1 Â 10À4, 1 Â 10À6 Allocation/retention priority 1 Â 10À2, 7 Â 10À3, 1 Â 10À3, 1 Â 10À4, 1 Â 10À5 80 maximum value <2048 1, 2, 3

Enhancement Schemes for Multimedia Transmission over Wireless Networks 335 Table 9.3 Transport channel parameters for 64 kbps conversational radio bearer (DL/CS RAB) [3] RLC Logical channel type DTCH MAC RLC mode TM Layer 1 Payload sizes, bit 640 Max data rate, bps 64 000 DPCH Downlink TrD PDU header, bit 0 MAC header, bit 0 DPCCH MAC multiplexing N/A DPDCH TrCH type DCH TB sizes, bit 640 TTI, ms 20 (or 40) Coding type TC CRC, bit 16 Max number of bits/TTI after channel coding 3948 (alt. 7884) Spreading factor 32 Number of TFCI bits/slot 8 Number of TPC bits/slot 4 Number of Pilot bits/slot 8 Number of data bits/slot 140 Number of data bits/frame 2100 9.1.2 Video Transmission over Mobile Communication Systems The video encoding process can be considered under two main categories, namely open-loop encoding and closed-loop encoding. In the open-loop encoding process, the input video sequence is encoded with ﬁxed (preset) quantization settings for each frame. To prevent the propagation of error in an encoded video sequence, intra-coded frames are inserted at regular intervals, which do not make use of any information from previously encoded frames. This process leads to highly variable encoded video frame sizes and is often referred to as variable bit rate (VBR) encoding. Closed-loop encoding sends the encoded video frames into an output buffer, which is located at the encoder (see Figure 9.1(b)). The buffer threshold setting controls the output video bit rate. If the input bit rate to the buffer exceeds the output bit rate set by the controller, the encoder is Current Output Current Buffer output video video video threshold Output frame frame frame video frame Encode Encoder Quantisation Rate Transmitter step size controller buffer Quantisation (b) Closed loop encoding step size (pre-set) (a) Open loop (or Variable Bit Rate) encoding. Figure 9.1 Video encoding process: (a) open loop; (b) closed loop encoding

336 Visual Media Coding and Transmission told to adjust the quantization step size of the current frame or macro blocks in such a way as to realize the required video bit rate. Therefore, closed-loop (rate-controlled) encoding generates a more or less constant bit rate output. The intra-coded frames are much larger in size compared to the inter (predictive)-coded frames. An increase of quantization step size in order to achieve the required bit rate will result in poor-quality intra-coded frames. Moreover, as they are referenced by following frames, a dramatic drop in the quality of the entire video sequence is visible. In order to mitigate the error propagation and to minimize the quantization distortion while achieving the target output bit rate, adaptive intra refresh (AIR) and cyclic intra refresh (CIR) algorithms are often used in conjunction with the rate control algorithms. In both algorithms, only a selected number of macro blocks (MBs) within each video frame are intra-coded. The AIR technique refreshes the macro blocks belonging to high-activity regions of the video frame. The CIR refreshes macro blocks in a cyclic manner, starting with the ﬁrst macro block of the video frame. This avoids the propagation of errors over low-motion ﬁeld. Figure 9.2 shows the trafﬁc characteristics of the Suzie sequence, which is encoded with and without rate control enabled. The quantization step size is selected for the open-loop encoding so as to achieve same average bit rate as the closed-loop encoding output. In addition to the TM5 rate control algorithm, AIR/CIR algorithms as described in Annex E1.5 of the MPEG-4 standard are used in the closed-loop encoding. The numbers of MBs selected for AIR and CIR algorithms are eight and two respectively. This provides a frame refresh rate that is equivalent (on average) to the insertion of one intra-coded frame every 10 frames. Figure 9.2 clearly illustrates the variable frame sizes caused by open-loop encoding. In fact, the intra-coded frames are roughly six times the size of inter-coded frames. Figure 9.3 shows the corresponding frame peak signal-to-noise ratio (PSNR) values for the open- and the closed-loop encoding. The probability distribution of frame sizes and frame PSNR values is also shown in Figure 9.4. As can be seen in Figure 9.4(b), the variance of the video frame size distribution is greatly reduced by the closed-loop encoding process. However, this improvement is obtained at the expense of video quality. As demonstrated in Figures 9.3 and 9.4(d), the resulting frame PSNR from the closed-loop encoding is highly variable compared to that of the open-loop encoding. Referring to Figure 9.3, open-loop encoding Figure 9.2 Frame size variation of MPEG 4 coded video

Enhancement Schemes for Multimedia Transmission over Wireless Networks 337 psnr suzie 39 38 mean frame PSNR (dB) 37 36 35 34 33 32 open loop 31 closed loop 0 50 100 150 frame no Figure 9.3 Frame quality variation of MPEG 4 coded video. The circled areas represent high activity regions frame size open loop − total frame size close loop − total 90 80 80 70 70 60 50 60 40 no of occurance 30 no of occurance 50 20 10 40 0 30 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 20 frame size 10 (a) pdf of frame size for open loop encoding 0 0 500 1000 1500 2000 2500 3000 frame psnr open loop − total frame size 90 80 (b) pdf of frame size for closed loop encoding 70 60 frame psnr close loop − total 50 40 40 30 20 35 10 no of occurance 0 no of occurance 30 33 34 35 36 37 38 39 40 41 42 43 25 frame psnr 20 (c) pdf of frame PSNR for open loop encoding 15 10 5 0 31 32 33 34 35 36 37 38 frame psnr (d) pdf of frame PSNR for closed loop encoding Figure 9.4 Frame size and PSNR statistics of MPEG 4 coded video: (a) pdf of frame size for open loop encoding; (b) pdf of frame size for closed loop encoding; (c) pdf of frame PSNR for open loop encoding; (d) pdf of frame PSNR for closed loop encoding

338 Visual Media Coding and Transmission gives higher frame PSNR, while quality drop is visible in the closed-loop encoding in high- motion sections (circled in the ﬁgure). This is due to the need for a course quantizer to compensate for the relatively large number of information bits generated in high-motion regions. Basic comparisons of tradeoffs and potentials between the open-loop and the closed-loop encoded video transmission can be found in [4]. A smoother output bit rate is favored in ﬁxed channel allocation schemes, where the radio channel is allocated according to the target rate setting at the rate controller. Due to the relatively high video quality and smoother frame quality variation seen at the open-loop encoding, perceived quality may be improved by an adaptive channel allocation scheme. The closed-loop video encoding will be used in all the experiments described in this chapter. 9.1.2.1 Video Performance in Error-free Environments The optimal video quality that can be delivered to the user at different source rates was experimentally derived for the selected test sequences. The sequences are coded with source rates varying from 26 to 300 kbps, while the video frame rate is set to 10 fps. The frame rate of 10 fps is considered to be sufﬁcient to provide acceptable quality for relatively low-motion video conferencing applications [5]. Video performance over the error-free environment (see Figure 9.5(a)) shows a linear relationship between mean frame PSNR and video source rate at higher source rate operations. At lower source rates, the mean frame PSNR decreases dramatically. It can be seen that the quality of the Suzie sequence is constantly 3 4 dB better than that of the Carphone sequence. Similarly, the Carphone sequence shows better perfor- mance than the Foreman sequence throughout. This is mainly due to the differences in motion activities involved in the sequences. Figure 9.5(b) illustrates the effect of frame rates on the video performance over an error-free environment. Transmission of the Suzie sequence is considered in the experiment. Low frame MPEG4 performance over error free channel MPEG4 performance over error free channel 45 45 carphone foreman suz e 40 40 mean PSNR mean PSNR 35 35 suzie 5fps suzie 10fps suzie 15fps 30 30 0 50 100 150 200 250 300 0 50 100 150 200 250 300 Source rate (kbps) Source rate (kbps) (a) (b) Figure 9.5 MPEG 4 performance over error free channel: (a) effect of activity of the video sequence; (b) effect of frame rate at various source bit rates

Enhancement Schemes for Multimedia Transmission over Wireless Networks 339 rates result in low quantization distortion; hence, high frame PSNR. However, frame PSNR calculation only demonstrates the relative quality of individual decoded frames compared to the original video frame. For video communication, the quality seen in the time domain is also important in assessing the performance. When the frame rate is too low, motion becomes jerky when viewing the decoded video stream. This behavior is more disturbing and tends to deteriorate the perceptual video quality. 9.1.2.2 Real-time Video Communications over UTRAN UMTS is designed to support both circuit-switched and packet-switched multimedia applica- tions. Packet-switched multimedia communication is realized using standard IETF-deﬁned protocols, such as IP, UDP, and RTP. The session initiation protocol (SIP) and session description protocol (SDP) are used to negotiate and open an IP connection between the terminals [6]. Two radio bearers are considered for conversational multimedia applications. Media data is transmitted over the speciﬁed radio access bearer (RAB), while the necessary control information is conveyed over the accompanying signaling radio bearer (SRB) [3]. The corresponding transport channels for radio bearers are separately channel-protected and formatted. The transport channels are ﬁnally multiplexed onto the same physical channel at the physical layer in order to transmit the data over the air interface. RLC transparent mode operation is considered for both circuit-switched and packet- switched transmission. In other words, no retransmission mechanism is considered. Unlike other media applications, where the received corrupted data packets are more likely to be dropped at the RLC layer, for video applications it is necessary to pass all received data (including the corrupted data) to the application layer. This is because the error-resilience/ concealment mechanisms implemented in the MPEG-4 decoder can be used to recover/conceal the corrupted data bits to some extent. In this scenario, the use of CRC bits at the RLC packet level is not only redundant but also reduces the available bandwidth for the source data. Hence, no CRC attachment is considered. Full error-resilience-enabled MPEG-4-coded video transmission is considered in the experiments discussed in this section. QCIF (176 Â 144)-formatted sequences are transmitted over the speciﬁed channels for 30 s duration. For video sequences captured at 30 fps, this allows transmission of 900 frames. To capture the effect of the bursty nature of the channel on the received video quality, each experiment was repeated 10 times. The results for all three selected sequences were averaged to obtain a meaningful ﬁgure. This means each point represents an average frame PSNR value of 27 000 frames of the original video sequences (captured at 30 fps). 9.1.3 Circuit-switched Bearers Typical parameter sets for reference RABs, SRBs, and important combinations of them for conversational multimedia applications are presented in [3]. In the simulation a 3.4 kbps SRB, which is speciﬁed in [3], is used for a dedicated control channel (DCCH). The throughput available to the application depends mainly on the spreading factor, the rate-matching ratio, and the channel-coding scheme used for the bearer conﬁguration. The CRC attachment, the

340 Visual Media Coding and Transmission Table 9.4 UMTS trafﬁc capacity (kbps) for circuit switched radio bearers: CC, convolutional code; TC, turbo code; RM, rate matching ratio Spreading factor CC 1/2 CC 1/3 TC 1/3 No coding RM 1.0 RM 0.9 RM 1.0 RM 0.9 RM 1.0 RM 0.9 RM 1.0 RM 0.9 128 16.2 18 10.6 11.8 10.85 12.05 33.1 36.7 64 39.5 44 26.10 29.1 26.7 29.65 80.6 89.5 32 97 107.75 64.5 71.85 65.5 72.8 197.1 219 16 206.1 229 137.4 152.6 139.55 155.05 419.1 465.6 8 442.8 492 295.1 327.6 299.3 332.6 899.1 999 4 915.75 1016.8 610 679.05 619.2 688.05 1859.1 2065.6 protocol layer operation modes, and the transport block size also inﬂuence the available application throughput. However, as mentioned earlier, transparent mode is selected and zero CRC attachment is considered for video telephony applications. With no CRC bits, the transport block size can be assumed to have an insigniﬁcant effect on the application throughput. No network protocol layer overhead is added for circuit-switched connections. The calculated information data rates, according to the simulation parameter settings for circuit-switched connections, are shown in Table 9.4. The above ﬁndings can be used in conjunction with the UMTS trafﬁc capacity states shown in Table 9.4 to determine the optimum settings of source and network parameters for conversational video communications over UTRAN. The throughput capacity for a spread- ing factor 128 realization, which is approximately 12 kbps, is too low to support video applications. In fact, the minimum source rate that a high-motion sequence (such as Foreman and Carphone) can be encoded at is limited to 28 29 kbps by the maximum quantization step size for settings of 10 fps. A spreading factor of 64 can only support source rates around 26 kbps (with 1/3 rate channel coding). This means that even a spreading factor of 64 may not be adequate to realize video communications over UTRAN. Therefore, only spreading factor settings of 32, 16, and 8 are considered for the investigation of video performances over error- prone environments. Table 9.5 summarizes the maximum average frame PSNR value that can be achieved with various radio bearer conﬁgurations for the video telephony application over UTRAN. Note that video encoded at 5 fps is used for transmission over the spreading factor 64 channels. Table 9.5 Maximum frame PSNR value variation with radio bearer conﬁgurations Spreading Video Source rate CC 1/2 Source rate CC 1/3 Source rate TC 1/3 factor fame at CC 1/2 at CC 1/3 at TC 1/3 rate (fps) (kbps) (kbps) (kbps) 28.37 32.51 64 5 39.5 29.94 26.10 28.31 26.7 36.13 39.78 32 10 97 34.43 64.5 32.48 65.5 16 10 206.1 38.02 137.4 36.06 139.55 8 10 442.8 41.65 295.1 39.71 299.3

Enhancement Schemes for Multimedia Transmission over Wireless Networks 341 9.1.3.1 Channel Utilization and Delay–Jitter Variation Low frame delay variation (frame jitter) is an important requirement for video telephony applications. For accurate lip synchronization, the maximum allowed frame delay variation is limited to 100 ms [1]. As can be seen in Figure 9.4(b), the output video frame sizes vary around the mean value by a considerable amount, even for the closed-loop (rate-controlled) encoding process. This could lead to frame delay variation when transmitted over ﬁxed-rate channels. This section investigates frame delay (jitter) variation resulting from the source trafﬁc characteristics, and its effect on channel utilization. The video encoding frequency is set to 10 fps. This means that the RLC/MAC layer receives an encoded video frame every 100 ms. The received video frame is segmented into RLC blocks according to the speciﬁed RLC block size, which is equivalent to the transport block size for transparent mode operation. If the information bits in a video frame do not ﬁt into an integer number of RLC blocks then zero padding bits are added at the last RLC block. The segmented blocks are channel coded and stored in a transmitter buffer to be transmitted over the physical channel in every transmission time interval (TTI), which is an integer multiple of 10 ms. As in a practical system, if the number of data blocks in the transmitter buffer is less than the required number of data blocks in a TTI frame then dummy bits are used to complete the current TTI frame. Another media stream can be used instead of dummy bits, and two media streams can be multiplexed into the same TTI frame in order to maximize the system utilization, as recommended by 3GPP [7]. However, the intention of this work is to examine the effect of video trafﬁc characteristics alone on the system performance. Therefore, no media multi- plexing is performed. Channel utilization is calculated in terms of the ratio between the total number of useful bits transmitted and the total number of available bits for a given RLC block size. The resulting channel utilization is shown in Figure 9.6 for spreading factor 32 realization and 1/3 rate channel coding. As can be seen from the ﬁgure, the channel utilization decreases total channel utilization 100 long−duration(30 s) 99 short−duration(5 s) 98 utilization (%) 97 96 95 200 400 600 800 1000 1200 1400 94 93 92 91 90 0 RLC block size Figure 9.6 Protocol efﬁciency in circuit switched video telephony

342 Visual Media Coding and Transmission 2000 total jitter frame jitter variation 1800 1600 long−duration(30 s) 450 Suzie 1400 short−duration(5 s) 400 Foreman 1200 350 Carphone 1000 300 800jitter (ms) 250 600 jitter (ms) 200 400 150 200 200 400 600 800 1000 1200 1400 100 20 40 60 80 100 120 140 160 180 50 0 RLC block size RLC block size 0 00 (a) (b) Figure 9.7 Frame delay jitter characteristics of circuit switched video telephony: (a) effect of sequence duration; (b) effect of sequence with an increase in the RLC block size for both long-duration (30 s) and short-duration (5 s) connections. Experiments show similar performances for other spreading factor realizations. Figure 9.7 illustrates the effect of RLC block size upon the frame delay variations. The deﬁnition of the frame jitter variation follows the statistical delay jitter bound with 95% probability. probðJi JmaxÞ ! Umin for all I ð9:1Þ where Ji and Jmax represent the frame jitter of the ith frame and the maximum frame jitter value, respectively, and Umin is the lower bound of the probability that Ji be within its limit. Frame jitter is deﬁned as: Ji ¼ jDi À Dj for all I ð9:2Þ where D denotes the ideal or target frame delay, and Di is the actual frame delay resulting from the transmission. The frame jitter variation caused by the variable nature of the application throughput can be controlled by reducing the source bit rate or managing the RLC buffer at the transmitter. Both of these methods have weaknesses associated with them. Reducing the source throughput results in an inefﬁcient system. The buffer management techniques control the size of the transmitter buffer in order to maintain time correlation between the transmitting frames. Information data that cannot be transmitted to the receiver within the required time window is dropped at the transmitter. As even the most important data, such as VOP header information, can be dropped at the transmitter, buffer overﬂow could lead to tremendous degradation of received video quality, unless an interaction between the application layer and the RLC layer is conducted. The maximum affordable transport block sizes that satisfy the delay requirements (<100 ms) are calculated for different spreading factor realizations and are listed in Table 9.6. The table also shows the resulting channel utilization for the above channel conﬁgurations.

Enhancement Schemes for Multimedia Transmission over Wireless Networks 343 Table 9.6 Transport block size and channel utilization; frame delay jitter 95 m percentile 100 ms Spreading factor Maximum RLC block size (bits) Channel utilization (%) 32 216 98.4 16 560 98.25 8 1000 98.2 However, for optimal performance of interleaving and convolutional coding algorithms, the input block size to the channel coder should be less than 504 bits. Otherwise, block segmentation is performed at layer 1 (see Section 9.3). Therefore, for optimal performance and system utilization, the RLC block size should be set to less than 504 bits for spreading factor 16 and 8 (with convolutional code) realizations. 9.1.3.2 Video Performance in Error-prone Environments This section describes the performance of video telephony over simulated error-prone channel conditions using the developed UTRAN emulator. The channel environments investigated are vehicular A (with 50 kmph mobile speed) and pedestrian B (with 3 kmph mobile speed) multipath propagation conditions, as deﬁned in [8]. Full error-resilience- enabled, rate-controlled (using TM5 rate control algorithm) MPEG-4-coded video is considered in the experiments. Inﬂuences of MPEG-4 codec parameter settings on the received video quality over W-CDMA-based radio networks are discussed in [9]. The results show that a video packet size of around 500 700 bits provides the optimal settings. Furthermore, ﬁxed settings of 2 and 10 intra-coded MBs in each frame for CIR and AIR respectively are shown to provide the optimal video quality over the range of channel error rates considered likely for W-CDMA mobile networks. Therefore, in the experiments discussed in this section, the number of intra-coded MBs per frame is set to 2 and 10 in CIR and AIR algorithms, and a video packet size of 600 bits is used. Experiments were conducted to investigate the inﬂuences of network parameter settings on video performance. Mainly the effects of spreading factor allocation, channel coding schemes, and inﬂuence of fast power control were examined. Effect of Source Frame Rate Despite the temporal domain quality degradation discussed earlier, low-frame-rate-encoded video shows higher spatial quality (frame PSNR) at error-free (good) channel conditions. However, in the presence of channel errors, high-frame-rate video shows better performance than low-rate. This is mainly because of the prevention of temporal error propagation. For example, settings of 12 intra-MBs per frame result in refreshment of one complete frame every 550 ms at 15 fps frame rate, while refreshment of one complete frame occurs every 1650 ms at 5 fps frame rate. However, as can be seen in Figure 9.8, the performance of 15 fps shows only a slight improvement compared to that of 10 fps. Therefore, video sequences are encoded at 10 fps frame rate for the experiments described in this section.

344 Visual Media Coding and Transmission MPEG4 performance over Veh A CC 1/3 Suzie, SF 32mean PSNR (dB) 40 35 30 25 20 15 10 5fps 10fps 15fps 5 0 2 4 6 8 10 12 Eb/No (dB) Figure 9.8 Effect of frame rate on video performance in error prone environments Effect of Spreading Factor The received video quality for different spreading factor realizations is measured in terms of average frame PSNR and the results are depicted in Figure 9.9 and Figure 9.10. Video sequences are coded at the appropriate rates listed in Table 9.4. 1/3 rate convolutional code is used to protect the video data, and the protected data is transmitted over the simulated vehicular A channel environment. As expected, allocation of SF 32 provides slightly better performance than others at poor channel conditions, due to the better channel-protection capability of higher spreading factors. At better channel conditions, allocation of SF 16 provides superior video quality compared to others. Both SF 16 and SF 32 reach the maximum achievable video quality MPEG4 performance over Veh A tc1/3 no pc, 10 fps MPEG4 performance over Veh A cc1/3 no pc, 10 fps 40 40 35 35 30 30 25 25 20 20 15 15 sf32 10 sf32 sf16 sf8 sf16 sf8 10 3 4 5 6 7 8 9 10 11 12 5 Eb/No (dB) 3 4 5 6 7 8 9 10 11 12 (b) Eb/No (dB) (a) mean PSNR (dB) mean PSNR (dB) Figure 9.9 MPEG 4 performance over vehicular A environment: (a) 1/3 CC; (b) 1/3 TC and no power control

Enhancement Schemes for Multimedia Transmission over Wireless Networks 345 MPEG4 performance over PedB cc 1/3 no pc Suzie, 10 fps 32 30 28 mean PSNR (dB) 26 24 22 20 18 16 14 sf32t sf16t sf8t 12 3 4 5 6 7 8 9 10 11 12 Eb/No (dB) Figure 9.10 MPEG 4 performance over pedestrian B environment with 1/3 CC and no power control at channel Eb/No value equals to 10 dB. SF 8 considerably underperforms compared to all other schemes, even with good conditions. This is due to the inter-symbol interference experienced in multipath channels. The channel coding algorithm tends to mitigate the inter-symbol interfer- ence effect. However, signiﬁcant performance degradation is visible with low spreading factors (such as 8). Similar performances are visible for spreading factors 32 and 16 with turbo coding when operating over the vehicular A propagation environment. However, the performance of spreading factor 8 shows much better performance for turbo coding compared to that of convolutional coding. As stated Section 9.4.1.1, the main reason is the better performance of the turbo encoder/decoder in the presence of large input block sizes. Even though both SF 16 and SF 32 approach the maximum achievable frame PSNR values (36.13 dB and 32.51 dB, respectively) at an Eb/No value of 10 dB, the SF 8 results do not reach the maximum PSNR value (39.78 dB) within the Eb/No range considered in the experiment. With no fast power control, the channel’s bit error rate characteristic for the pedestrian B environment is signiﬁcantly low. This is due to the poor performance of the block-based interleaver and channel decoder in the presence of long, weak bursts of channel condition at low mobile speeds. Therefore, video telephony over the pedestrian B environment shows poor performance relative to that resulting from the vehicular A environment. For 1/3 rate convolutional code, a spreading factor of 32 gives the best performance, while a spreading factor of 8 shows the worst performance. The performance for the 1/3 rate turbo code shows similar behavior. Effect of Channel Coding Figure 9.11 shows the effectiveness of different channel coding schemes for video applications over the vehicular A environment. The allocated spreading factor is 32 and video source rates are set according to the values shown in Table 9.4. Figure 9.11 clearly illustrates the performance improvement achieved by turbo coding for operation over low-quality channels.

mean PSNR (dB)346 Visual Media Coding and Transmission MPEG4 performance over Veh A SF 32, 10 fps 35 30 25 20 15 10 cc13 cc12 tc13 5 3 4 5 6 7 8 9 10 11 12 Eb/No (dB) Figure 9.11 Effect of channel coding scheme on MPEG 4 performance over the vehicular A channel (SF 32) The frame PSNR result for turbo coding is about 4 5 dB better than for 1/3 rate convolutional coding, and 9 10 dB better than for 1/2 rate convolutional coding. For good channel conditions, 1/2 rate convolutional coding outperforms others by a PSNR value of 2 dB, mainly because of the low quantization distortion seen at higher source rates. All three coding schemes obtain the maximum possible frame PSNR values at a channel Eb/No of around 10 dB. The expected performance improvement with turbo coding is also visible for video transmission over the pedestrian B propagation environment. Effect of Fast Power Control As outlined Section 9.5.1.2, the employment of a fast power control algorithm can improve the bit error characteristics of the propagation environment. Experiments were carried out to investigate the inﬂuences of fast power control on the performance of video applications over various propagation environments. The average frame PSNR values for sequences protected with various channel-coding schemes and a spreading factor of 32 over vehicular A and pedestrian B channel conditions are shown in Figure 9.12. Dashed lines in Figure 9.12 indicate the performances achieved without the application of power control. Slight performance improvements can be seen in the vehicular A propagation environment. For example, turbo code protected transmission achieves its maximum video quality at an Eb/No setting of around 9 dB without fast power control. However, with fast power control the maximum value is reached at an Eb/No setting of 8 dB. More noticeable improvement is visible over the pedestrian B propagation environment. The relative quality improvement seen in terms of frame PSNR for turbo code realization, with and without power control, varies between 3 4 dB with good and 8 9 dB with moderate channel conditions. About 5 8 dB frame PSNR improvement is visible for the 1/3 rate convolutional code. Even better improvement can be seen for the 1/2 rate convolutional code.

Enhancement Schemes for Multimedia Transmission over Wireless Networks 347 35 MPEG4 performance over PedB SF 32, 10 fps 35 MPEG4 performance over Veh A SF 32, 10 fps 30 30 mean PSNR (dB) mean PSNR (dB) 25 25 20 20 cc13 pc 15 cc13 cc12 pc cc12 15 tc13 pc 10cccc1123 tc13 tc13 cc13 pc cc12 pc tc13 pc 10 5 4 5 6 7 8 9 10 11 12 4 5 6 7 8 9 10 11 12 Eb/No (dB) Eb/No (dB) (a) (b) Figure 9.12 MPEG 4 performance over (a) vehicular A; (b) pedestrian B channel with fast power control (SF 32) So far, the performances of video telephony over UTRAN under various radio bearer conﬁgurations and different propagation environments have been presented in terms of average frame PSNR values vs. channel Eb/No conditions. Even though PSNR is the standard measure for judging video quality, it has some limitations. Video sequences can generally be charac- terized by their spatial and temporal information. Spatial information is described by frame size, resolution, chrominance, and luminance accuracy. The temporal domain is described by the frame rate, the motion involved, and the scene changed. The PSNR calculation is based only on the spatial information of the video frames, hence it provides no indication about the temporal quality of the video. Although it may be used as an assessment ﬁgure in performance comparisons, absolute (perceptual) quality of video should be judged based on subjective quality tests. Unfortunately, there is no standard subjective quality measurement method speciﬁed for video applications. However, experimental results show that if the average frame PSNR value is higher than 20 dB, the video output shows an acceptable visual quality. In order to put the above PSNR performance ﬁgures into perspective, the minimum Eb/No requirement needed to deliver perceptually-acceptable video to end users is calculated for all radio bearer conﬁgurations investigated. The average frame PSNR value of 20 dB is considered as the reference limit, and the obtained results are listed in Table 9.7. The minimum required Eb/No values for acceptable-quality video with turbo coding are lower than those required for the convolutional codes. 1/3 rate convolutional codes require lower Eb/No than those required for 1/2 rate convolutional codes. Spreading factors 16 and 32 show similar Eb/No requirements for acceptable video quality, while spreading factor 8 shows relatively higher minimum Eb/No requirement. Eb/No values as low as 4.5 5.0 dB are sufﬁcient to provide acceptable quality video with 1/3 rate turbo coding over UMTS multipath propagation environments. Perceptually-acceptable video quality can be obtained around an Eb/No of 7 dB with 1/2 rate convolutional code over fast power-controlled channels. 1/3 rate convolutional code requires a minimum Eb/No of around 5.5 6.5 dB for reasonable quality video.

348 Visual Media Coding and Transmission Table 9.7 Minimum Eb/No requirement for acceptable video quality over UTRAN Spreading factor Vehicular A Pedestrian B 1/2 CC 1/3 CC 1/3 TC 1/2 CC 1/3 CC 1/3 TC Without fast power control 32 7.2 6.3 5.0 9.0 8.0 6.5 16 6.5 5.0 8.5 6.5 8 8.5 6.0 10.0 7.0 With fast power control 32 6.6 5.5 4.5 7.0 6.5 5.0 16 5.8 4.5 6.6 5.2 8 7.7 5.0 7.0 6 It is necessary to emphasize that the values shown in Table 9.7 are the required minimum information bit energies per received noise spectral density (Eb/No), and not the transmitted power per received noise spectral density (Pt/No). The relationship between Pt/No and Eb/No can be shown as: Pt=No ¼ Eb=No Á R ð9:3Þ where R is the video source rate. Therefore, minimum Pt/No requirements can be obtained by multiplying the Eb/No values shown in Table 9.7 by the corresponding source rates shown in Table 9.4. As 1/2 rate code can support higher source rates compared to the 1/3 codes, the minimum Pt/No requirement for acceptable video quality will be much higher for the 1/2 rate convolutional code than for 1/3 rate codes. Also, the source rate increases with decreasing spreading factor. This means that the required Pt/No for acceptable quality increases with a decrease in spreading factor for each channel coding scheme. This means that a spreading factor of 32 with 1/3 rate turbo coding provides acceptable video quality with the least transmit power over UMTS networks. 9.1.4 Packet-switched Bearers Packet-switched conversational multimedia applications are considered within the 3GPP- speciﬁed IP multimedia subsystem (IM subsystem) framework. The service architecture, call control, and media capability control procedures for the packet-switched multimedia applica- tion have been deﬁned in [10]. These functions are deﬁned based on a modiﬁed version of the IETF session initiated protocol (SIP) and the session description protocol (SDP). Packet- switched multimedia terminals are supposed to have the above functions inbuilt to support packet-switched-based multimedia telephony. In packet-switched conversational multimedia applications, the individual media types are independently encoded and packetized separately into real-time transport protocol (RTP) packets. The encapsulated packets are then transmitted end-to-end over IP connections inside UDP datagrams. Inter-media synchronization among received media streams is performed based on the RTP time stamps at the receiver. To allow communication over heterogeneous networks, the same multimedia codecs as in circuit-switched telephony are selected to be implemented in the packet-switched domain. A

Enhancement Schemes for Multimedia Transmission over Wireless Networks 349 thorough discussion of codec selection and codec format for packet-switched conversationalpercentage (%) multimedia applications can be found in [6]. For packet-switched-based video telephony applications, MPEG-4-coded video frames are separately encapsulated within RTP packets following the RTP fragmentation rules speciﬁed in [11]. The RTP packet size is deﬁned by the maximum transmission unit (MTU) setting. One RTP packet can only contain an integer number of video packets and every video frame should start in a new RTP packet. In other words, a video packet cannot be split over multiple RTP packets. Also, data which belongs to different video frames should not be packetized into the same RTP packet. This allows the use of RTP timestamps to indicate the VOP time framing. The resulting RTP packets may contain information bits numbering slightly less than the limit deﬁned by the path-MTU setting. Finally, each RTP packet is encapsulated within a UDP/IP packet for transmission. The RTP header and the UDP header contain 96 bits and 32 bits, respectively. Header size of the IP (version 4) is 192. Therefore, a total of 320 bits overhead are added to each RTP payload from the transport and network layer. No network layer header compression is assumed at the PDCP layer. 9.1.4.1 Trafﬁc Capacity for Packet-switched Bearer The presence of network layer overhead reduces the throughput available to the application. The available application throughput in this scenario is experimentally determined and the results are presented in Figure 9.13 in terms of overhead percentage vs. channel bit rate for two speciﬁc MTU sizes (576 and 288 bytes). As expected, the network layer overhead percentage decreases with an increase in MTU size. Also, the overhead percentage shows approximately exponential decay behavior with respect to the channel bit rate. At low channel bit rates, the resulting overhead percentage can go up to 11% and 17% for radio bearer realization, with 576 bytes and 288 bytes MTU sizes, respectively. The experimentally-determined source trafﬁc capacities for packet-switched video tele- phony over UMTS are listed in Table 9.8. An MTU size of 576 bytes is assumed in the calculation. A signiﬁcant reduction in source throughputs compared to equivalent circuit- switched connections is visible at high-spreading factor realizations. 18 overhead packet size 576 packet size 288 16 14 12 10 8 6 0 50 100 150 200 250 300 350 bit rates (kbps) Figure 9.13 Throughput loss due to network layer overhead in packet switched connections

350 Visual Media Coding and Transmission Table 9.8 UMTS trafﬁc capacities (kbps) for packet switched connections (MTU: 576 bytes): CC, convolutional code; TC, turbo code; RM, rate matching ratio Spreading factor CC 1/2 CC 1/3 TC 1/3 No coding RM 1.0 RM 0.9 RM 1.0 RM 0.9 RM 1.0 RM 0.9 RM 1.0 RM 0.9 64 35.85 39.82 23.24 26.12 23.78 26.62 74.06 82.24 32 89.16 99.56 58.75 65.71 59.66 66.57 182.95 203.28 16 191.30 212.56 127.19 141.22 129.16 143.48 391.86 435.34 8 412.17 457.97 274.63 306.31 278.60 309.98 840.66 934.06 4 856.23 950.71 570.35 634.91 578.95 643.33 1738.26 1931.34 9.1.4.2 Video Performance over Packet-switched Bearers A packet-switched connection provides operational ﬂexibility for media services, particularly with respect to data delivery over the core network. However, careful design strategies should be followed to avoid throughput loss, which occurs due to the extra overhead added at the network layer, and also due to extra information loss, resulting from corrupted network layer header information in the radio access networks. The effective probability of loss depends on many factors, such as error-resilience/conceal- ment techniques, data format, temporal error propagation, probability of packet header corruption, and the probability of VOP header loss. The channel distortion arising from corrupted packet headers also depends on the abovementioned factors. In this section, experi- ments are carried out to investigate real-time video performance over packet-switched con- nections in UMTS radio access networks. Section 9.2 will further explain these issues based on a mathematically-derived distortion model for video transmission over an error-prone channel. Figure 9.14 reveals the obtained results for video transmission over packet-switched con- nections with selected MTU sizes. A spreading factor of 32 and 1/3 rate convolutional code realization is used in the radio bearer conﬁguration, and the vehicular A propagation environment is assumed. The ﬁgure shows the total received video quality over a packet-switched connection is always less than the performance over circuit-switched connections. For good channel conditions, a slight performance loss (about 1 0.5 dB) is visible. This is mainly because of the source throughput reduction resulting from the network layer overhead. However, for low channel quality, the performance loss due to corrupted packet headers becomes the dominating factor and about 3 4 dB PSNR loss is noticeable compared to circuit-switched performance. Although the experimental results are shown only for a speciﬁc radio bearer setting, similar performance loss ﬁgures can also be expected with other radio bearer conﬁgurations. 9.1.5 Video Communications over GPRS The MPEG-4 video encoder was operated with the rate control algorithm enabled and set to the desired level. Each video frame was encapsulated into a separate RTP packet, unless a maximum RTP-PDU size of 512 octets (4096 bits) was reached, in which case the video frame segmentation is carried out. This value was chosen because although an increase in average IP packet size results in a more efﬁcient usage of the available throughput by reducing the protocol overheads, excessively large packets will be more vulnerable to information loss

Enhancement Schemes for Multimedia Transmission over Wireless Networks 351 MPEG4 performance over Veh A cc 1/3, SF 32 35 30 mean PSNR (dB) 25 20 15 10 cs ps−576 ps−288 5 3 4 5 6 7 8 9 10 11 12 Eb/No (dB) Figure 9.14 MPEG 4 performance over a packet switched connection due to header corruption. It is also assumed that 96 bits are required for the RTP header, 32 bits for the UDP header, 192 bits for the IP header, 8 bits for the SNDC-PDU header, 24 bits for the LLC header, and a further 24 bits for the LLC frame check sequence. 9.1.6 GPRS Trafﬁc Capacity The payload available in a GPRS RLC/MAC block depends upon the channel coding scheme used. As can be seen in Table 9.9, the data rate of the RLC/MAC data payload, i.e. the rate presented to the LLC layer, varies from 8 kbps for CS-1 to 20.35 kbps for CS-4. The available throughput to a single terminal will be a multiple of these rates, depending upon the multislotting capabilities of the terminal. As it is envisaged that the CS-1 and CS-2 schemes will be used for video applications, so emphasis will be made on these two schemes. As described, the values of the data rates given in Table 9.9 represent the throughput at which LLC-PDUs are transmitted across the Um and Gb interfaces to the serving GPRS support node (SGSN) over the RLC/MAC and BSSGP interfaces. In the work described in this section, it is assumed that trafﬁc bottlenecks are caused solely by restrictions in timeslot allocation across the Um interface, and never due to restrictions placed by the BSSGP layer. When considering the Table 9.9 GPRS channel coding schemes Scheme Code Pre coded Data BCS Tail Radio block Data rate rate USF payload size (Headers (kbps) þ Data) CS 1 1/2 3 160 40 4 181 8.0 CS 2 %2/3 6 12.35 CS 3 %3/4 6 247 16 4 268 14.55 CS 4 1 12 20.35 291 16 4 312 407 16 428

352 Visual Media Coding and Transmission Protocol Efficiency of GPRS Stack Efficiency 0.902 0.9 25 30 35 40 45 50 55 60 65 0.898 Source Throughput (kbps) 0.896 0.894 0.892 0.89 0.888 0.886 0.884 0.882 20 Foreman -5 fps Akiyo - 5 fps Figure 9.15 GPRS protocol efﬁciency protocol stack across the Um interface it can be seen that in addition to the video information, the RLC/MAC data payload contains header and other related signaling overheads from the LLC, SNDC, IP, UDP, and RTP layers. The presence of these overheads will reduce the true throughput available at the application layer, which in the case being studied is the MPEG-4 encoder. In order to determine the throughput per timeslot available at coding schemes CS-1 3, the Akiyo and Foreman video sequences were encoded at a number of different source rates and passed through the GPRS data ﬂow simulator so as to determine the average number of source bits transmitted in each RLC/MAC block, as shown in Figure 9.15. This was done so as to be able to determine the output rate that must be set at the source video decoder for the combination of timeslot allocation and channel coding used. Analysis of the protocol efﬁciency in the payload of the RLC/MAC blocks for the operating conditions described above shows that when encoding at both 5 fps and 10 fps the efﬁciency is in excess of 88%. This means that fewer than 15% of the bits in the payload of a radio block are used up by header information belonging to the overlying protocols. The efﬁciency is generally seen to increase together with source rate, an indication of the increasing size of LLC-PDUs. Similar experiments carried out for sequences encoded at 10 fps show a variation in efﬁciency from around 89% to 90%. This, together with the signiﬁcant differences between the Akiyo and Foreman sequences, clearly indicates that a reduction in the data rate per timeslot as seen by the video encoder of 15% is enough to compensate for all the protocol overheads. The newly- computed radio block data rates are given in Table 9.10. Table 9.10 Source throughput for different allocation schemes (units in bps) Timeslots 1 2 3 4 5 6 7 8 CS 1 6800 13 600 20 400 27 200 34 000 40 800 47 600 54 400 CS 2 10 500 21 000 31 500 42 000 52 500 63 000 73 500 84 000 CS 3 12 200 24 400 36 600 48 800 61 000 73 200 85 400 97 600 CS 4 17 200 34 400 51 600 68 800 86 000 103 200 120 400 137 600

Enhancement Schemes for Multimedia Transmission over Wireless Networks 353 Experiment results show that at 5 fps, under conditions of no channel errors, at least two slots are required to encode video sequences, and this is only possible using codes CS-2 and CS-3. If channel conditions dictate that CS-1 must be used then three-slot operation is necessary. It can also be seen that the differences in radio block payload size result in a signiﬁcant quality deﬁciency when operating at CS-1 as compared to CS-2 or CS-3, whereas there is little visible difference between the quality achieved with CS-2 and CS-3. The results obtained when transmitting 10 fps are similar to those observed at 5 fps. However, once again, it can be seen that the throughput advantage of codes CS-2 and CS-3 allows them to support this frame rate using only three timeslots, whereas ﬁve timeslots are required using CS-1. It can also be seen that there is little difference between codes CS-2 and CS-3, particularly at lower levels of multislotting. The spatial quality of a video sequence is not the only factor that must be taken into consideration when assessing the performance of video coding mechanisms. Video sequences are three-dimensional signals, in which the temporal resolution and quality play an important part in determining whether a received signal is acceptable or not. Under many circumstances, particularly at lower rates, the MPEG-4 rate control mechanism is forced to discard some frames and not encode them so as not to exceed the stipulated output rate, as shown in Figure 9.17. This causes temporary ‘‘freezing’’ of the sequence at the receiver as an expected frame is not received, and the receiver has to wait for the following video frame. This jitter or variation in the rate of display at the receiver terminal is subjectively very annoying and therefore efforts must be made to reduce its occurrence as much as possible. In Figure 9.16 it can be observed that an increase in the available throughput not only increases the spatial quality of the sequences, but also brings about an improvement in their temporal quality. The converse is also true, and is possibly more relevant to the relatively low bit rates made available by GPRS. We can see that although decent PSNR values for sequences at 5 fps are achieved using two slots at CS-2 and CS-3, these levels can only be achieved if 35% and 20% of source frames are not transmitted. This results in a considerable degradation in sequence quality. Use of three-slot operation reduces these values to between 5% and 1%, which can be considered as being acceptable. A similar situation can be seen to occur when transmitting at 10 fps. Although it is possible to support this rate using three timeslots at CS-2 and CS-3, these rates are only sustainable by discarding between 30 and 35% of the source frames. In fact, loss Figure 9.16 Quality performance at 10 fps (no channel errors)

354 Visual Media Coding and Transmission 0.4 Frame Dropping Probability 0.35 5 fps 0.3 10 fps 0.25 0.2 0.15 0.1 0.05 0 20 30 40 50 60 70 Bit Rate (kbps) Figure 9.17 Frames dropped by MPEG 4 rate control rates inferior to 5% are only possible at throughputs in excess of 50 kbps, representing at least ﬁve slots using CS-2 and CS-3, and are unobtainable using CS-1. 9.1.7 Error Performance One of the most critical factors in providing video communications over mobile channels, such as the GPRS PDTCH, is the high levels of errors that occur on the link. Traditionally, compressed video has been extremely susceptible to bit and packet errors, as any loss or corruption of data could easily lead to the loss of synchronization of the variable-rate packets. In order to counter this problem, several error resilience techniques have been proposed and implemented for use in different coding schemes. As a result of these techniques, superior image quality may be obtained by using corrupted video information, rather than discarding all the bits in such blocks. This is because although it may be ascertained that total data integrity has been compromised, in general the information available in corrupted data blocks allows for a greater improvement in received quality than the potential distortion caused by those bits in error. For this reason, in the simulations described in this section, it is assumed that GPRS reliability class 5 is used. This speciﬁes use of unacknowledged operation in GTP with unacknowledged operation in the LLC layer with no data protection enabled. The RLC block operation is also set to unacknowledged mode. This means that corrupted RLC blocks are forwarded to the LLC layer, which then also forwards corrupted LLC frames, and frames in which data has been lost. 9.1.7.1 Simulation Conditions The following tests were carried out using the bit error patterns generated using the CCSR GPRS physical link layer simulator described in Section 9.1.5 for the following conditions (no implementation loss was included): . TU 50 ideal frequency hopping 1800 MHz . TU1.5 ideal frequency hopping 1800 MHz . TU1.5 no frequency hopping 1800 MHz.

Enhancement Schemes for Multimedia Transmission over Wireless Networks 355 Table 9.11 Characteristics of source video sequences Akiyo Foreman Carphone Video Packet Size 600 bits 600 bits 600 bits INTRA spacing 10 frames 10 frames 10 frames Data Partitioning Enabled Enabled Enabled RVLCs Enabled Enabled Enabled Frame Rate 5 fps 5 fps 5 fps Bit Rate 48.791 kbps 32.359 kbps 32.28 kbps PSNR 28.35 dB 29.359 dB 30.65 dB Each point in the experiments corresponding to a particular channel condition was determined by carrying out 10 runs each of the Akiyo, Carphone, and Foreman sequences using randomly- selected starting positions for the bit error ﬁles. This is equivalent to averaging out the measured PSNR values of 1790 frames, and was found to be sufﬁcient to provide reliable results. 9.1.7.2 Effect of Bit Errors In Figure 9.18 the average PSNR values for sequences protected using schemes CS-1 3 when corrupted by bit errors characteristic of the TU 50 IFH propagation conditions are shown. It can be seen that for the CS-1 scheme the maximum PSNR achievable is approached at a C/I value of around 14 dB. For CS-2 and CS-3, such levels are only reached at C/I values in excess of 19 dB, although for most operating conditions below this point the CS-2 scheme gives results superior to those obtained with CS-3 by about 2.5 dB. A similar situation can be seen when examining the results obtained using the TU 1.5 IFH propagation model. Although once again the CS-1 code gives maximal results at around 14 dB, there is now a more noticeable difference between the CS-2 and CS-3 schemes. In fact, the difference between these two schemes varies between 3.5 and 4 dB, while the asymptotic limits are seen to be reached at around 18.5 dB and above 21 dB, respectively. 9.1.7.3 Channel Coding Scheme Selection The experiments described thus far clearly demonstrate the compromise that must be made between payload capacity and error-correcting capability when selecting the appropriate PSNR (dB)30 30 PSNR (dB)28 28 26 CS-1 26 CS-1 24 CS-2 24 12 17 CS-2 22 CS-3 22 C/I (dB) CS-3 20 20 18 12 17 18 22 16 C/I (dB) 16 14 14 12 22 7 7 (a) (b) Figure 9.18 Video performance over GPRS channels: (a) TU50 IFH 1800 MHz; (b) TU1.5 IFH 1800 MHz

356 Visual Media Coding and Transmission channel coding scheme for video transmission. It is necessary to assess the variation in received quality for a given number of timeslots under different C/I ratios so as to allow for a determination of the operation range of each coding scheme in terms of received C/I. It must be remembered that the criteria for the selection of coding schemes for real-time applica- tions are different than for data transfer applications, where data integrity must be maintained and consequently the delay limits allow for the use of backward error-correction mechanisms. The optimum choice of coding scheme is that which provides the highest throughput of error-free data (after error detection and retransmission) at the LLC layer at a given C/I ratio. On the other hand, in real-time multimedia communications the optimum coding scheme is that which provides the best subjective quality results at the particular C/I conditions being tested. Figure 9.19 shows the averaged PSNR results for the Akiyo and Foreman sequences using three slots. It must be noted that the PSNR is calculated between the received video frame and the equivalent original frame, with no concessions made for frames being discarded by the source rate control mechanism. This means that when frames are discarded, a misalignment occurs between the source and receiver, thereby affecting the computed PSNR values. It is felt that for the length of sequences used in these tests, the reduction in PSNR caused by such misalignment gives a useful indication of the reduction in quality due to the effect of lost frames. While this is most deﬁnitely not the optimum way of providing an objective metric of spatial and temporal quality, it can nonetheless serve to give a useful indication of the received video quality. As expected, when using the TU1.5 IFH and TU50 IFH models, the CS-1 code gave optimal performance under high interference levels, although when the channel conditions allowed for use of the CS-2 code, a very noticeable improvement in quality over that obtainable using CS-1 was observed. This difference is partly attributable to the severe throughput limitations of three-slot operation. In fact, using three slots at CS-1 only allows for a source throughput of 20 kbps. Allocating four slots to the user will somewhat reduce this difference. 30 30 28 28 26 26 PSNR (dB) PSNR (dB) 24 24 22 13 18 CS-1 22 13 18 CS-1 C/I (dB) CS-2 C/I (dB) CS-2 20 (a) CS-3 20 (b) CS-3 18 23 18 23 8 8 Figure 9.19 Rate adaptive performance with three slots: (a) TU50 IFH 1800 MHz; (b) TU1.5 IFH 1800 MHz

Enhancement Schemes for Multimedia Transmission over Wireless Networks 357 Table 9.12 Operational scenarios for video over GPRS CS 3 >21 dB Channel model CS 1 CS 2 >22 dB >23 dB TU50 IFH 1800 MHz <15 dB 15 21 dB TU1.5 IFH 1800 MHz <15 dB 15 22 dB TU1.5 NFH 1800 MHz <14 dB 14 23 dB It is immediately obvious from Table 9.12 that transmitting video in real time (i.e. without retransmissions) requires low interference levels, which are considerably more demanding than for ordinary data transfer applications. Typically a C/I ratio of at least 14 dB (with no implementation or antenna loss included) is required. 9.1.8 Video Communications over EGPRS EGPRS allows for a considerable increase in throughput availability to a single user, given enough trafﬁc availability and benign interference conditions. This means that video services can be provided with higher data rates than is possible with GPRS. In Figure 9.15, it was shown that on average, assuming no header compression, a protocol efﬁciency of between 88 and 90% is achieved. In order to allow for a fair comparison with the results obtained for GPRS, an 85% efﬁciency level is assumed. The resulting throughput allocation per timeslot at the application level, or as seen by the video codec, is shown in Table 9.13. The source throughput capacity for a single channel is seen to vary from 7.5 kbps for MCS-1 to 50 kbps for MCS-9. This means that there is a much greater spread in available throughput values for video services over EGPRS, assuming that the bit error conditions are met. 9.1.9 Trafﬁc Characteristics Tests were carried out to evaluate the video quality that can be achieved using the EGPRS data channels using different modulation coding schemes. In Figure 9.20, the obtained PSNR values for the averaged three sequences for error-free transmission using schemes MCS-1 9 are shown for up to ﬁve-channel multislotting. From these results it can be seen that there is a greater quality difference between one slot and two slots than between any other multislotting Table 9.13 EGPRS multislotting capacity for video (kbps) Scheme 1 TS 2 TS 3 TS 4 TS 5 TS 6 TS 7 TS 8 TS MCS 1 7.5 15 22.5 30 37.5 45 52.5 60 57.6 67.2 76.8 MCS 2 9.6 19.2 28.8 38.4 48 75.6 88.2 100.8 90 105 120 MCS 3 12.6 25.2 37.8 50.4 63 114 133 152 151.2 176.4 201.6 MCS 4 15 30 45 60 75 228 266 304 277.2 323.4 369.6 MCS 5 19 38 57 76 95 301.8 352.1 402.4 MCS 6 25.2 50.4 75.6 100.8 126 MCS 7 38 76 114 152 190 MCS 8 46.2 92.4 138.6 184.8 231 MCS 9 50.3 100.6 150.9 201.2 251.5

358 Visual Media Coding and Transmission 40 40 38 38 36 36 34 34 PSNR (dB)32 32 PSNR (dB) 30 30 28 28 26 26 1 TS 2 TS 1 TS 2 TS 24 24 3 TS 4 TS 3 TS 4 TS 22 5 TS 22 5 TS 20 20 MCS-1 MCS-2 MCS-3 MCS-4 MCS-5 MCS-6 MCS-7 MCS-8 MCS-9 MCS-1 MCS-2 MCS-3 MCS-4 MCS-5 MCS-6 MCS-7 MCS-8 MCS-9 Modulation coding scheme / throughput (kbps) Modulation coding scheme / throughput (kbps) (a) (b) Figure 9.20 Received video quality at: (a) 10 fps; (b) 5 fps combination. In fact, as the number of slots is increased, the relative difference between two adjacent multislotting settings decreases. When transmitting sequences at 5 fps, at least three slots are required to transmit video information at MCS-1, and two slots is the minimum for schemes MCS-2 4. The 8-PSK data channels (MCS-5 upwards) can all support video using a single slot only. In addition, it is also evident that a progression in modulation coding scheme between MCS-1 and MCS-2 and between MCS-2 and MCS-3 result in a greater increase than that observed between other adjacent schemes. Similar results were obtained for the MPEG-4 quality traces for video sequences encoded at 10 fps. As expected, the PSNR values for given timeslot coding scheme combinations were approximately 2 dB lower than corresponding values for sequences encoded at 5 fps. In addition, it was seen that sequences can be encoded at the higher rate of 10 fps using a single slot only if channel conditions allow for the use of scheme MCS-7 or higher. The availability of only two slots allows for operation with schemes MCS-5 and MCS-6, whereas three slots are necessary to provide 10 fps video using schemes MCS-2 4. If channel conditions are such that it is necessary to use MCS-1 then four-slot operation is required. 9.1.10 Error Performance In order to determine the effect of propagation conditions upon video sequences encoded with MPEG-4, the three sequences used previously were encoded for operation at three-slot usage for coding schemes MCS-1 6. Full error-resilience tools were enabled with intra-frame spacing set to 10, video packet size set to 600 bits, and reversible codewords and data partitioning enabled. Simulations were carried out at TU1.5 IFH at a carrier frequency of 1800 MHz. The results of these experiments are shown in Figure 9.21. The experiments were repeated 10 times for each sequence, so as to ensure that meaningful averages were obtained. When operating at TU1.5 IFH, it can be seen that MCS-1 gives better performance than MCS-2 at all C/I values up to around 20 dB. At this value, however, MCS-5 begins to provide a superior video quality to either of these schemes. MCS-6 does not match this performance until at least a C/I value of 30 dB. It can also be seen that the MCS-3 model considerably underperforms compared to all other codes, as does MCS-4, whose results are not displayed on the graph.

Enhancement Schemes for Multimedia Transmission over Wireless Networks 359 PSNR (dB) 33 MCS-1 31 MCS-2 29 MCS-3 27 MCS-5 25 MCS-6 23 21 10 15 20 25 30 35 40 19 C/I (dB) 17 15 5 Figure 9.21 5 fps video quality at TU1.5 IFH 1800 MHz 9.1.11 Voice Communication over Mobile Channels The AMR-WB codec includes a set of ﬁxed-rate speech and channel codec modes, a voice activity detector, discontinuous transmission functionality in GSM, UMTS and source- controlled rate functionality in 3G, in-band signaling for codec mode transmission, and link adaptation to control the mode selection. The AMR-WB codec adapts the bit-rate allocation between speech and channel coding, optimizing speech quality to prevailing radio channel conditions. AMR-WB is also very robust against transmission errors due to multi-rate operation and adaptation. The AMR-WB codec has been developed for use in several applications, including GSM full rate channel, GERAN, UTRAN, and voice-over-IP applica- tions. Awide range of applications is envisioned for the AMR-WB codec, which includes ISDN wideband telephony, audiovisual teleconferencing, voice-over-IP, IP video conferencing, voice mail, voice chat, broadcast, and voice streaming. AMR-WB codec was selected as a harmonized wideband codec for GSM, 3G WCDMA, and ITU-T in 2001Rapporteur’s meeting, and approved by the ITU in January 2002, when it became known as ITU-T G.722.2. The acceptance of a single harmonized speech codec allows ease of implementation of wideband voice applications and services across a wide range of communication systems and platforms without the use of transcoding between wireless and wired infrastructure. The AMR-WB codec is based on the algebraic code excited linear prediction (ACELP) technology. ACELP has been successfully used in a wide range of speech compression standards, such as 3GPP AMR, ETSI EFR, NA-TDMA IS-641, NA-CDMA-IS-127, ITU-T G.729, and ITU-T G.723.1 codecs. However, these were developed for narrow-band signals. AMR-WB includes much functionality to make the speech signal robust to wideband channel errors. A detailed overview of the AMR-WB codec can be found in [12]. The AMR-WB codec operates at a 16 kHz sampling rate. Coding is performed in blocks of 20 ms. The AMR-WB speech codec consists of nine speech coding modes with bit rates of 6.6, 8.85, 12.65, 14.25, 15.85, 18.25, 19.85, 23.05, and 23.85 kbps. In addition, AMR-WB includes a background noise mode, which is designed to be used in discontinuous transmission (DTX) operation. This acts as a low-bit rate source-dependent mode for coding background noise in

360 Visual Media Coding and Transmission Table 9.14 AMR WB operational modes Mode Bit rate (kbps) Total number of bits per frame 0 6.6 132 1 8.85 177 2 12.65 253 3 14.25 285 4 15.85 317 5 18.25 365 6 19.85 397 7 23.05 461 8 23.85 477 other systems. The bit rate of this mode is 1.75 kbps. The 12.65 kbps mode and the modes above it offer high-quality wideband speech. The two lowest modes, 6.65 and 8.85 kbps, are intended to be used only temporarily during severe radio channel conditions. The bit allocation of the codec at different modes of operation is shown in Table 9.14. 9.1.12 Support of Voice over UMTS Networks The AMR-WB was tested in several phases during standardization in 3GPP, and performances ﬁgures are given in [13]. The codec was tested for a number of different languages, error conditions in mobile communication channels, input levels, and background noise levels. The test results were compared to the performance of other speech codecs. The AMR-WB codec provides speech quality that substantially exceeds that provided by existing speech codecs. In typical operating conditions, AMR-WB gives superior quality to all other GSM and AMR-NB codecs. Even in poor radio channel conditions, AMR-WB still offers comparable quality to AMR-NB and far exceeds the quality of other GSM codecs. In this section, the voice communication over UTRAN is investigated. In particular, the performance of different AMR-WB modes over a range of channel conditions and bearer conﬁgurations is examined. The calculated information data rates over UMTS bearers are shown in Table 9.15. This shows that only AMR-WB modes 0 to 2 can be supported using channel-protected SF 128 channels. All operation modes can be supported with uncoded (unprotected) SF 128 channels. However, the channel bit rates on these channels are signiﬁcantly higher, hence they are not recommended for voice applications. For higher modes, a channel-protected SF 64 channel is required. Table 9.15 UMTS trafﬁc capacity (kbps): CC, convolutional code; TC, turbo code; RM, rate matching ratio Spreading factor CC 1/2 CC 1/3 TC 1/3 No coding 128 16.2 10.6 10.85 33.1 64 39.5 26.10 26.7 80.6 32 97 64.5 65.5 197.1 16 206.1 137.4 139.55 419.1 8 442.8 295.1 299.3 899.1 4 915.75 610 619.2 1859.1

Enhancement Schemes for Multimedia Transmission over Wireless Networks 361 Table 9.16 Performance of AMR WB in error free conditions Mode 012345678 Rate (kbps) 6.6 8.85 12.65 14.25 15.85 18.25 19.85 23.05 23.85 PESQ MOS value 2.8695 3.1885 3.4783 3.5751 3.6213 3.7112 3.7282 3.7852 3.7759 9.1.13 Error-free PerformancePESQ value The optimal performance of the AMR-WB speech codec under different modes of operation over the error-free communication channel is experimentally evaluated for speech encoder test sequences T00.INP to T21.INP, provided in 3GPP [14]. These sequences cover a wide range of speech samples encountered in a normal conversation. T04 and T06 11 are female speech samples collected in ambient noise and car noise environments, while T07 and T12 17 are male speech in ambient and babble noise environments. T04 and T05 sequences contain a lot of low-frequency components, and T18 and T19 contain some ‘‘all zeros’’ frames in between speech segments. The performance is evaluated based on the perceptual evaluation of speech quality (PESQ) [15] and the results are presented in Table 9.16 and Figure 9.22. PESQ provides an objective method for predicting the subjective quality of speech samples. Even though subjective quality measuring methods, such as MOS and DMOS tests, are better suited for evaluating the quality of speech, such tests require large numbers of users to evaluate large numbers of decoded speech sequences in order to achieve meaningful average performance ﬁgure. The work carried out in this section only considers the comparative or relative performance improvement of speech generated from different encoding modes, thus an objective quality measure is suitable for performance analysis. PESQ MOS is calculated by comparing the decoded speech sequences to the original speech sequences. The test sequences T00 21 were used in this experiment. The PESQ MOS values shown in Table 9.16 and Figure 9.22 were derived by averaging the calculated PESQ MOS values for all sequences. 4 3.5 3 2.5 2 1.5 1 0.5 0 123456789 Mode Figure 9.22 Performance of AMR WB in error free environments

362 Visual Media Coding and Transmission Figure 9.22 illustrates the effects of different encoding modes on the voice performance over an error-free environment. Mode 0 shows the lowest quality, while mode 8 gives the highest quality. This is due to the low quantization distortion seen with higher source rates at higher modes. 9.1.14 Error-prone Performance This section describes the performance of voice over simulated error-prone channel conditions using the developed UTRAN emulator. The channel environment investigated is vehicular A (with 50 kmph mobile speed) multipath propagation conditions, as deﬁned in [8]. Experiments were conducted to investigate the inﬂuences of network parameter settings and encoder mode on voice performance. In particular, the effect of spreading factor allocation was examined. The maximum PESQ MOS value that can be achieved for a given encoder mode rate is shown in Table 9.16. These values can be considered as reference performance ﬁgures for the experiment results described in this section. The received video quality for different encoder modes is measured in terms of average PESQ MOS, and the results are depicted in Figure 9.23. 1/3 rate convolutional code is used to protect the video data, and the protected data is transmitted over the simulated vehicular A channel environment. As expected, higher modes provide better performance than lower modes at good channel conditions, due to the lower quantization distortion seen at higher rates. At poor channel conditions, allocation of lower modes provides slightly better voice quality. Figure 9.23(a) shows the voice performances over SF 128 channels, while Figure 9.23(b) shows those over SF 32 channels. 9.1.15 Support of Voice over GPRS Networks In order to meet the UMTS service requirements outlined above, the GSM/EDGE radio access network must be able to support real-time voice services in packet-switched environments. 44 3.5 3.5 33 PESQ MOS PESQ MOS 2.5 2.5 22 mode 0 mode 1 1.5 1.5 mode 2 mode 3 1 1 mode 4 mode 0 mode 5 0.5 mode 1 0.5 mode 6 mode 2 mode 7 0 mode 3 0 mode 8 4 5 6 7 8 9 10 11 12 4 5 6 7 8 9 10 11 12 Eb/No (dB) Eb/No (dB) (a) (b) Figure 9.23 Performance of AMR WB over error prone channels

Enhancement Schemes for Multimedia Transmission over Wireless Networks 363 MOS5 1.6 Statistical Multiplexing Gain . 1.5 4 1.4 1.3 3 1.2 1.1 2 1 GSM 0.9 0.8 1 GPRS-B 3 45678 9 Number of Traffic Channels 0 no errors 16 13 10 7 4 C/I (dB) (b) (a) Figure 9.24 (a) Statistical multiplexing gain obtained using voice over GPRS; (b) subjective speech quality results of voice over GPRS Operating in packet-switched mode at the radio interface should allow for the exploitation of the bursty characteristics of human speech, thereby increasing radio capacity by implementing statistical multiplexing techniques. None of the GPRS service classes provide for QoS levels sufﬁcient for the provision of low-latency packet speech services. For this reason, much research has been carried out into the evolution of GPRS and EGPRS in order to be able to provide speech bearers which provide at least the same quality and efﬁciency as can be achieved with GSM. The schemes proposed have been shown to provide similar subjective quality in interfer- ence-limited channel conditions to that obtained using circuit-switched operation, while providing for a signiﬁcant improvement in the trafﬁc capacity over a single carrier by employing statistical multiplexing. This was demonstrated by carrying out mean opinion scores (MOS) testing on 16 subjects, as shown in Figure 9.24. In fact, when 7 trafﬁc channels were allocated to voice services, it was observed that 10 users can be accommodated while sustaining a packet loss rate, due to the contention mechanism of less than 1%. This represents a statistical multiplexing gain of 1.428. Figure 9.24(b) shows the statistical multiplexing gain i.e. the number of voice users divided by the number of required channels as a function of the number of trafﬁc channels allocated to voice trafﬁc. The proposed access mechanism was also shown to have a negligible effect upon the delay performance of the data trafﬁc. Further research has since been carried out on providing voice services over packet-switched networks, with emphasis being placed on EGPRS. The results obtained indicated that when using 3/9 and 4/12 reuse patterns, packet-switched voice provides greater capacity than circuit- switched services under most propagation conditions. The quality obtained may be further improved by using adaptive codecs such as ETSI’s adaptive multirate (AMR). 9.1.16 Conclusion This section has given an overview of the technologies and concepts involved in providing voice and video communications over mobile access links. In particular, the role of speech and video compression technologies was discussed, with emphasis on the effect of high error rates on such schemes. UMTS and EGPRS were chosen as the access networks for use in the experiments. These offer end-to-end packet transfer capabilities, which, if tailored for the demands of real-time services, have been shown to provide signiﬁcant performance and

364 Visual Media Coding and Transmission ﬂexibility advantages over equivalent circuit-switched access networks in the provision of voice services. The results presented in this section highlight a number of key factors that should be considered in the provision of real-time video applications over UMTS networks. Compared to delay-insensitive media applications, the strict delay requirements seen in real-time video applications restrict the use of some error-resilience and congestion-control mechanisms. Powerful automatic repeat request (ARQ) techniques, which are widely used in streaming video applications in enhancing received video quality, may not be suitable for real-time conversational video applications. Also, the encoding/decoding and error resilience should not be too complex, in order to avoid extra processing delay. The results shown in Section 9.1.1 illustrate that the delay requirements stated in [1] for real-time video communication can be satisﬁed with careful selection of network parameter settings for the rate control of MPEG-4-encoded video transmission. The networks should satisfy not only the delay requirement but also the error-rate requirement, in order to achieve acceptable-quality video transmission over error-prone channels. Bit error rates of around 10 3 are the maximum tolerable by a DCT-based source codec such as MPEG-4. As demonstrated in Section 9.1.3.2, an Eb/No value of 5 6 dB is sufﬁcient to provide acceptable video quality over the UMTS multipath channels. There are two main reasons that video quality is degraded when transmitted over error-prone environments. The ﬁrst is the unrecoverable quantization distortion resulting from the operation of compression algorithms at the encoder, and the second is the channel distortion due to the information loss from transmission over error-prone channels. A compromise between source throughput capacity and error-correcting capability is necessary to obtain the optimal video quality for a given channel condition. In addition, the transmission power requirement plays an important role in W-CDMA-based UMTS networks. This is because UMTS is an interference-limited system. In other words, increases in transmit power for one user increase the interference power for other users, which tends to reduce the overall system capacity. Therefore, careful selection of channel coding schemes, spreading factor, source throughput, and transmit power is essential in achieving optimal video quality and maximum system capacity for video applications over UMTS. Even though high operational ﬂexibility can be achieved with packet-switched connections, especially in the core network, careful design criteria should be followed to avoid quality degradation seen over the radio interface. Experimental results illustrate that there is a 1 3 dB performance loss (in average frame PSNR value) for packet-switched operation, compared to circuit-switched operation. The performance loss seen with good channels is mainly due to the source throughput loss resulting from the network layer overhead. The main reason for the additional performance loss seen in the presence of channel errors is the information loss resulting from the corrupted packet headers. As will be explained in Section 9.2, a compromise between throughput loss and information loss should be considered in selecting the optimal packet size for video transmission over error-prone channels. Radio link quality can be improved with the use of performance enhancement techniques such as fast power control for downlink transmission. Experimental results demonstrate a tremendous improvement in video quality with fast power control for slow mobile speed channels. Other transmission diversity techniques such as space time transmit diversity and closed-loop transmit diversity may further enhance video performance over UMTS networks.

Enhancement Schemes for Multimedia Transmission over Wireless Networks 365 9.2 Link-level Quality Adaptation Techniques (Portion reprinted, with permission, from C. Kodikara, S.T. Worrall, A.M. Kondoz, ‘‘Energy efﬁcient video telephony over UMTS’’, IEEE VTC-Spring 2004, 17 19 May 2004, Milan, Italy. Ó2004 IEEE.) Efﬁcient transmission power utilization is an important design criterion in interference- limited cellular systems, such as UMTS networks. System capacity is limited by the total interference experienced within the cell coverage. Thus, the optimization of power consump- tion for an individual user can provide an increase in system capacity, as well as in the QoS experienced by the user. Recently, several energy minimization techniques for wireless video applications have been proposed [17,18]. All of these techniques are optimized to achieve a target video quality while minimizing the transmission power. In [18], joint error resilience and transmission power management at video frame level is proposed. However, as the video frame quality is variable in nature (even in an error-free environment), controlling the transmission power to achieve a target frame quality at the video frame level is inaccurate and would result in poor system performance. This problem can be solved by minimizing the total consumed power within a certain period, while achieving the optimal average video quality. The method proposed in [18] employs this concept for intra-refreshed video sequences, thus video performance is optimized for a ﬁxed intra-refresh period. However, this method cannot be applied in conjunction with rate-controlled AIR techniques [19], which are commonly used to produce a smoother output bit rate for transmission over a ﬁxed-bandwidth channel. Another issue that should be considered in the design of transmit-power optimization schemes is the support of network compatibility and interoperability between different networks and platforms. Transmit power is normally allocated at the physical layer for a given TTI [20]. If the power-allocation algorithms are closely coupled with the video- compression formatting, it is impossible to implement such algorithms at the physical layer without modifying the entire protocol stack of the existing network. The algorithm proposed in this section takes these issues into account in its design and implementation. In contrast to the method in [18], the proposed scheme can be applied equally to rate-controlled AIR video sequences and intra-refreshed sequences. The proposed method combines an unequal error-protection (UEP) technique and an unequal power-allocation (UPA) technique to obtain energy-efﬁcient video transmission. UEP is performed with the use of two different radio bearers. At the start of every video frame, the transmission energy for different bearers is selected in such a way as to achieve the maximum expected video frame quality for an increment in the transmission power step. 9.2.1 Performance Modeling The bitstream syntax of MPEG-4 uses a hierarchical structure. Each video frame is partitioned into smaller rectangular regions called macro blocks (MBs) (16 Â 16 pixels in size). Each MB is coded either in inter-mode or intra-mode. Intra-mode MBs are transform coded directly without applying motion compensation, while inter-mode MBs use motion compensation. Consecutive MBs are grouped to form video packets (VPs). The MPEG-4-adopted VP format is shown in Figure 9.25. Synchronization markers are used to isolate VPs from one another. Following the concept of data partitioning, data within a VP is further divided into two main

366 Visual Media Coding and Transmission First partition Second partition VP header Motion Texture data Figure 9.25 MPEG 4 VP format parts. The motion-related information for all the MBs contained in a given VP is placed in the motion part, and the relevant DCT data is placed in the texture part [21]. Combining VPs in a video frame forms a video object plane (VOP). The most important information the decoder needs to know prior to the decoding of compressed video data is placed in the VOP header part. This includes the spatial dimensions of the video frame, the time stamps associated with the current frame, presentation information, and the mode in which the current frame is coded. The video information is prioritized based on the MPEG-4 data partition techniques [22]. Information within the ﬁrst partition is sent over a higher-priority channel. A lower-priority channel is used to transmit the data within the second partition. VP header information is added to the beginning of the second partition in order to guarantee accurate stream synchronization at the decoder. The VOP header is also repeated in the lower-priority channel at the start of the video frame. The video data formats in prioritized streams are shown in Figure 9.26. Video performance is modeled as a combination of quantization distortion, E(DQ,pv), and channel distortion. Channel distortion is divided into three parts, namely spatial concealment distortion, temporal concealment distortion, and distortion due to error propagation. The distortion method adopted in this paper is similar to the method proposed in [18]. However, the model in [18] is enhanced for the use of AIR and features improved error-propagation estimation. If the video frame conﬁguration information or VP headers are lost in the transmission, the decoding process is impossible and the data belonging to the VP has to be discarded at the decoder. However, the error-concealment tools implemented at the decoder replace the discarded data with error-concealed data from the neighboring packets. The distortion resulting from this process is called the spatial-concealment distortion, E(Ds_con,pv), as only spatial error concealment is involved in the process. On the other hand, if the conﬁguration information, VP header information, and motion information is received correctly but the DCT information is corrupted, the decoder only discards the corrupted DCT data and replaces it with the corresponding concealed data from the previous frame. The distortion resulting from mo- tion-compensated error concealment is called temporal-concealment distortion, E(Dt_con,pv). Z1 Z2 VP header Motion VP header Texture data High-priority stream Low-priority stream Figure 9.26 Video data format in prioritized streams

Enhancement Schemes for Multimedia Transmission over Wireless Networks 367 Errors can propagate in two ways; either in the temporal domain or in the spatial domain. Frame-to-frame error propagation through motion prediction and temporal concealment is called temporal-domain error propagation, ftp. The propagation of errors from neighboring VPs via spatial concealment is considered spatial-domain error propagation, fsp. Taking the VP as the base unit, the expected frame quality can be written as: EðQjf Þ ¼ 10 : logðg= XIj EðDpi;vj ÞÞ ð9:4Þ i¼0 where EðQjf Þ is the expected quality, EðDip;vj Þ is the expected distortion of the VP, and Ij is the total number of VPs in jth video frame. i and j represent the ith VP of the jth video frame. g is a constant deﬁned by the dimension of the frame. EðDip;vj Þ can be written as: EðDpi;vj Þ ¼ EðDQi; ;j pvÞ þ riu;;jpv EðDis; j pvÞ ð9:5Þ con; þ rdi;;jpv EðDti; j pvÞ þ ftip; j þ fsip; j con; where rui;;jpv denotes the probability of receiving an undecodable VP. This includes the corruption of VOP headers, VP headers, or motion data. riu;;jpv equals the probability of ﬁnding an error in the ﬁrst partition. The probability of receiving a decodable VP, but with errors, where the DCT data is corrupted but other information is received correctly, is denoted by rdi;j; pv. This equals the probability of ﬁnding an error in the second partition but not in the ﬁrst partition. Both probability terms are functions of channel bit error rate, transmission bit energy, and total background noise power. 9.2.2 Probability Calculation Assuming that the probability of receiving a VOP header with errors in the high-priority channel is riv;ojp1, the probability of receiving data within the ﬁrst partition with errors is rMi;j, and the probability of ﬁnding an error in the second partition is xi;j. Then: rui;;jpv ¼ 1 À ð1 À rvi;ojp1Þ:ð1 À rMi;jÞ ð9:6Þ rdi;j;pv ¼ ð1 À rvi;ojp1Þ:ð1 À riM;jÞ:ci;j ð9:7Þ If the probabilities of channel bit errors in channel 1 and channel 2 are denoted by rb1 and rb2 respectively, then it can be shown that: rvi;ojp1 ¼ XV ð1 À rb1ÞV 1rb1 ¼ ð1 À ð1 À rb1ÞV Þ ð9:8Þ V¼1 where V represents the VOP header size. Similarly: riM;j ¼ 1 À ð1 À rb1ÞZ1 ð9:9Þ ð9:10Þ ci;j ¼ À ð1 À rb2ÞV þ ð1 À rb2ÞV Á XZ2 Z2 rb2 ÞZ2 zrzb2 z ð1 À 1 ¼ 1 À ð1 À rb2ÞV Á ð1 À rb2ÞZ2 z¼1 Z1 and Z2 are as deﬁned in Figure 9.26.

368 Visual Media Coding and Transmission 9.2.3 Distortion Modeling The expected distortions, EðDQ;pvÞ; EðDs con;pvÞ and EðDt con;pvÞ, of the MBs are calculated as speciﬁed in [18]. The quantization distortion is computed by comparing the reconstructed MBs and the original MBs at the encoder. Concealment distortions are also computed in a similar manner. The transmitted video data belonging to each MB is corrupted using a noise generator located at the encoder. Corrupted data is replaced by the concealed data, and data belonging to the original and concealed MBs is compared. In the case of the spatial-concealment distortion calculation, only a spatial-concealment algorithm is used to generate the concealed data. On the other hand, a temporal-concealment algorithm alone is used to conceal the corrupted data for the temporal-concealment distortion calculation. The correct reception of neighboring VPs and reference video frames is assumed in the calculations. 9.2.4 Propagation Loss Modeling Correlation between the corrupted VPs in the same frame and the distortion due to the MB mismatch in adjacent video frames is quantiﬁed by the spatial and temporal error propagation terms in (9.5). The temporal propagation loss, ftip;j, represents the propagation of corrupted information through predictive coding. An AIR algorithm, which uses a selected number of intra-coded MBs in a video frame, is used to prevent the error propagation. The temporal error propagation fisracmalec.ufltkpa;;tjmebddaetpMenBdsleovnetlh. eLceot dftkipn;;jmgbmboedteheusteemd ipnotrhael error propagation of the kth MB in the jth encoding of the MB. ftkp;;jmb is calculated for inter-coded MBs as: ftkp;;jmb ¼ PTj P½rui;;jpv Á EðDsk;ic;oj n;mbÞ þ rid;j;pv Á EðDtk;ic;oj n;mbÞ ð9:11Þ þ fskp;;jmb þ ð1 À riu;;jpvÞ Á ftkp;:jmb1 and for intra coded MBs as: ftkp;;jmb ¼ PjTP½rui;;jpv Á EðDks ;ic;oj n;mbÞ þ fskp;;jmb ð9:12Þ where EðDks ;ic;oj n;mbÞ and EðDkt ;ic;ojn;mbÞ represent the spatial-concealment distortion and the temporal-concealment distortion of the MB, respectively. As before, i and j represent the ith VP PipnrjTotPhpeaqgjutahatnivotiindﬁeeloossfsrthacmeaelfc.ruafsclkpa;t;mitjoibonnis.otThf ehdesipsPatoTtjirPatiloiesnrraoofrfupntrhcoetpioaMgnaBotif,oenwfofheficcththievsekhtchohuMalndBnbeinle the jth video frame. considered in the bit error rate, rbeff, and experimentally it is found to be approximated to: PTj P ¼ ð1 À ð1 À rbeff ÞFj Þ ð9:13Þ rbeff ¼ ðX Á rb1 þ Y Á rb2Þ=ðX þ YÞ ð9:14Þ where Fj is size of the jth frame, and X and Y are the total amount of data belonging in the ﬁrst and the second partition, respectively. ftip;j is computed as: ftip;j ¼ ð1 À Pui;;jpvÞ: X ftkp;;jmb1 ð9:15Þ k2W

Enhancement Schemes for Multimedia Transmission over Wireless Networks 369 where W denotes the sets of MBs in the ith VP in the jth video frame. The spatial error and fskp;;jmb, depend on propagation terms, fsip;j spatial-concealment the number of corrupted VPs in a frame and only propagate through the process. The fsip;j and fskp;;jmb are modeled as: fskp;;jmb ¼ PjSP Á riu;;jpvEðDks ;ic;oj n;mbÞ ð9:16Þ fsip;j ¼ PSj P Á riu;;jpvEðDis;jcon; pvÞ ð9:17Þ PjSP ¼ Nvjp Á ð1 À ð1 À rbÞVPj Þ ð9:18Þ where Nvjp and VPj represent the number of VPs and the average size of a VP in the jth frame, respectively. 9.2.5 Energy-optimized UEP Scheme Let the user-requested video quality in terms of average frame PSNR be Qt arg et. The total channel interference and noise experienced is denoted by the noise power spectral density, No. The minimum required transmission energy for an information bit to satisfy the user quality requirement under a given channel condition is Eb min. The expected video frame quality, E(Qf min), is computed using (9.4). It is assumed that the data on both higher-priority and lower- priority channels is transmitted with equal bit energies, Eb min. Point A in Figure 9.27 represents the estimated quality, E(Qf min). Point E shows the expected quality, if the transmission energy on both channels is incremented by 1 dB. The goal is to ﬁnd the combination of transmission energies in the two channels which maximizes the current video frame quality for an increment in the transmission energy. The optimum energy allocation is to be found in the vicinity of point B. The possible combinations of transmission energy allocation in different priority channels are listed in Table 9.17. Even E(Qf ) C F E K G H BD A E(Qf min) Eb min Ebmin+1 Ebmin+2 Eb Figure 9.27 Calculation of optimal transmission energy levels

370 Visual Media Coding and Transmission Table 9.17 Possible combinations of transmission energy allocation Point Energy level on high priority channel Energy level on low priority channel A Eb min Eb min B Eb min Eb min þ 1 C Eb min Eb min þ 2 D Eb min þ 1 Eb min E Eb min þ 1 Eb min þ 1 F Eb min þ 1 Eb min þ 2 G Eb min þ 2 Eb min H Eb min þ 2 Eb min þ 1 K Eb min þ 2 Eb min þ 2 F Eb min þ 1 Eb min þ 2 G Eb min þ 2 Eb min though it is valid to consider other transmission energy levels, the settings listed in Table 9.17 are more likely to provide optimum energy allocation. Moreover, this simpliﬁes the searching procedure. The expected video frame qualities are computed for the energy settings listed in Table 9.17, and the corresponding points are shown in Figure 9.27. The transmission energy levels corresponding to the point that shows the highest gradient from point A provide the optimal energy levels for the transmission of the current video frame. This algorithm operates at the video-frame level to ﬁnd the optimal operating point. It guarantees the end-user quality requirement and the optimal energy setting throughout the transmission. 9.2.6 Simulation Setup Realization of the proposed energy-optimized UEP algorithm over a UMTS network is shown in Figure 9.28. At the encoder, each encoded video frame is separated into two streams based on MPEG-4 data partitioning. The separated streams are mapped on to two transport channels. The high-priority data is sent over the highly-protected channel, which is protected with a 1/3 rate convolutional code. A 1/2 rate convolutional code is used to protect the lower-priority channel. At the physical layer, the information on the transport channels is allocated the selected transmission bit energy and is multiplexed on to the same physical channel for the transmission over the air interface. A UMTS physical link-layer simulator is developed. The simulator includes all the radio conﬁgurations, channel structures, channel codings/decodings, spreadings/despreadings, mod- ulation parameters, and transmission modelings, and their corresponding data rates according to the UMTS speciﬁcations. The transmitted signal is subjected to a multipath fast-fading environment. The multipath-induced inter symbol interference is implicit in the developed chip-level simulator. A detailed discussion of the channel simulations can be found in Chapter 8. Using the developed simulator, error characteristics of the transmission channel are simulated for a range of channel conditions and for different physical layer conﬁgurations.

Enhancement Schemes for Multimedia Transmission over Wireless Networks 371 MPEG-4 encoder Video frame Stream 2 Stream formation Stream 1 UMTS protocol stack Tch 1 Tch 2 Transport channel mapping Transmit power decision unit Transport channel multiplexing Transmit over wireless Physical channel channels Figure 9.28 Realization of proposed UEP/UPA over UMTS The vehicular A propagation condition and downlink transmission are considered in the experiment discussed in this section. The mobile speed is set to 50 kmph. A spreading factor of 32 is used in the physical channel conﬁguration. The experimentally-evaluated channel block error rates (BLERs) over the vehicular A environment are listed in Table 9.18. Video sequences are encoded according to the MPEG-4 simple proﬁle format. This includes the error-resilience tools such as video packetization, data partitioning, and reversible variable length code. The ﬁrst video frame is intra-coded, while others use inter-coding. The TM5 rate control algorithm is used to achieve a smoother output bit rate, while an adaptive intra-refresh algorithm [19] is used to stop temporal error propagation. ITU test sequence Suzie is used as the source signal in the experiments. The QCIF (176 Â 144 pixels) sequence is coded at 10 fps. Use of SF 32 in channel conﬁguration permits 64 and 97 kbps information rates with 1/3 rate and 1/2 rate convolutional coding, respectively. For the proposed algorithm, video coded at 88 kbps provides the appropriate source-channel coding ratio for the Suzie sequence. Table 9.18 Channel BLER for vehicular A environment: CC, convolutional code Eb/No 1/2 CC 1/3 CC 3 dB 0.92 0.78 4 dB 0.78 0.53 6 dB 0.31 0.13 8 dB 0.047 0.013 10 dB 0.0020 0.0010 12 dB 0.0010 0.000

Average Frame PSNR (dB)372 Visual Media Coding and Transmission 9.2.7 Performance Analysis The accuracy of the developed distortion model is evaluated by comparing the estimated performance and the actual video performance over a simulated UMTS environment. Video performance is measured in terms of frame peak signal-to-noise ratio (PSNR), which is the standard objective quality measurement. Each experiment is repeated 20 times in order to average the effect of bursty channel errors on the performance. The average frame PSNR is obtained by averaging over 6000 frames, and the results are shown in Figure 9.29. The experimentally-evaluated channel BLER (see Table 9.18) is used in the theoretical perfor- mance calculation. As can be seen in Figure 9.29, the theoretical PSNR values closely match the actual PSNR values for a wide range of channel conditions. Experiments were conducted for a range of channel conditions for both a data partition- based UEP (DP-based UEP) scheme and the proposed power-optimized UEP scheme. Results are shown in terms of average Eb/No vs. average frame PSNR in Figure 9.30. The performances of 1/2 rate and 1/3 rate convolutional code without application of UEP are also shown. The ﬁgure clearly demonstrates that more efﬁcient energy utilization is achieved with the proposed method than with the traditional DP-based UEP scheme or the non-UEP schemes. Video performances are limited by the quantization distortion with good channel conditions. In such situations, increasing the transmit power will not further increase the performance. This effect is well captured by the proposed algorithm, and the highest allocated transmit Eb/No is limited to 11.2 dB. For the allocation of higher transmission energies, the proposed algorithm shows close performance to that of a DP-based UEP scheme. However, the proposed algorithm considerably outperforms DP-based UEP schemes at lower transmit energy. For example, for transmission of the Suzie sequence, the achieved average frame PSNR with a DP-based UEP scheme is 18 dB for 6 dB energy allocation. The proposed power-optimized UEP scheme results in average frame PSNR of 24 dB for the same transmit bit energy allocation. 40 35 30 25 20 15 10 Actual Estimated 5 3 4 5 6 7 8 9 10 11 12 Eb/No (dB) Figure 9.29 Validation of model performance for DP based UEP scheme

Enhancement Schemes for Multimedia Transmission over Wireless Networks 373 12 1/2 rate CC 1/3 rate CC 11 DP based UEP power optimized UEP Transmit Eb/No (dB) 10 9 8 7 6 5 4 5 10 15 20 25 30 35 40 Average Frame PSNR(dB) Figure 9.30 Performance of the proposed algorithm for transmission of Suzie 9.2.8 Conclusion An energy-efﬁcient network-compatible performance-enhancement method is proposed for video communication over direct-sequence CDMA cellular networks. Prioritized video information is transmitted over two different transport channels with different error-protection capabilities. Transmit energy for each bearer is selected to maximize the expected frame quality for an increment in transmit energy. The experiment carried out over the simulated UMTS system shows signiﬁcant performance improvement with the proposed algorithm over the traditional DP-based UEP schemes and non-UEP schemes. 9.3 Link Adaptation for Video Services (Portion reprinted, with permission, from H. Kodikara, S.T. Worrall, A.M. Kondoz, A.H. Sadka, ‘‘Combined adaptive spreading gain control and unequal error protection for real-time video communications over WCDMA System’’, 5th Int. Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS’2004), Lisbon, Portugal, 21 23 April 2004. Ó2004 IEEE.) The previous sections investigated the performance of video in error-prone environments. The time-varying channel conditions resulting from the mobility of the terminal and the surroundings were not considered. The effects of time-varying channel conditions (slow- fading) on video performance and the techniques that can be used to mitigate these effects are investigated in this section. A performance enhancement method for real-time video communications employing link adaptation is discussed. A novel link-adaptation algorithm, which is based on feedback channel information including BLER, received signal strength (RSS), and the ﬁrst-order statistic of the RSS is proposed and analyzed. Two approaches for link adaptation are investigated. First, the effects of a link-adaptation scheme at the video frame level, which aims to optimize video quality by varying the channel-coding scheme and

374 Visual Media Coding and Transmission video source rate for a ﬁxed channel allocation, is studied. Second, a link-adaptation algorithm with a goal of maximizing the overall access network throughput is developed at the radio block level. Another link-adaptation algorithm is proposed for real-time streaming video applications. Link adaptation is not generally considered suitable for multi-user streaming video, because it usually requires interaction between the link-layer protocol and the encoder to perform source rate adaptation. However, the technique presented here facilitates stream switching, hence requires no such interaction with the encoder, thereby signiﬁcantly simplifying the link-adaptation system. The beneﬁts of the link-adaptation algorithms are demonstrated for MPEG-4-coded video transmissions over the simulated EGPRS access network. In addition, the effects of feedback delay, a noisy feedback channel, and bursty channel errors on the algorithms’ performances are investi- gated. Finally, a link-adaptation algorithm that makes use of variable spreading factor assignment in a UMTS network is examined for real-time video communications. Algorithm performance is further enhanced by combined application of unequal error-protection (UEP) and link-adaptation techniques. Results illustrate a signiﬁcant performance improvement in perceptual video quality. 9.3.1 Time-varying Channel Model Design The time-varying channel model used in the simulator consists of three main components, which are fast fading, shadow fading, and propagation loss. The fast-fading model follows the description of multipath propagation models in [23] and [8]. In these models, it is assumed that the mobile radio environment is dispersive with several reﬂectors, scatters, and different distances from the line-of-sight path between the mobile terminal and the base station. Shadow fading and propagation path loss are modeled as described below. 9.3.1.1 Shadow-fading Model Shadowing is modeled as a log normally-distributed random variable with correlated conse- cutive samples. The form of autocorrelation for the shadowing process depends on the user velocity, v, and the correlation distance, dc, of the particular channel. The auto-covariance, C§(t), of the shadowing process can be modeled as [24]: CzðtÞ ¼ s2e vt=dc ð9:19Þ where s is variance of log normal distribution. 9.3.1.2 Path Loss Model The COST 231-Walﬁsh-Ikegami model was used to approximate the path loss experienced in an urban environment when the cell radius is less than 5 km. The following parameters have been used: width of the road: 20 m height of building rooftops: 15 m height of base station antenna: 17 m

Enhancement Schemes for Multimedia Transmission over Wireless Networks 375 height of mobile station antenna: 1.5 m road orientation to direct radio path: 90 building separation: 40 m. For GSM 900 the corresponding propagation loss is: L ¼ 132.8 þ 38log(d ) For DCS 1800 the corresponding propagation loss is: L ¼ 142.9 þ 38log(d ) for medium-sized cities L ¼ 145.3 þ 38log(d ) for metropolitan centers The path loss model used for UMTS vehicular test environment is [8]: L ¼ 128.1 þ 37.6log(d ) where L is in dB, while d is in km. The following parameter values are assumed in the UMTS test environment: difference between the mean building height and the mobile antenna height: 10.5 m difference between the base station antenna height and the mean building rooftops height: 15 m horizontal distance between the mobile and the diffracting edges: 15 m average separation between rows of buildings: 80 m carrier frequency: 2000 MHz. 9.3.1.3 Mobility Model A pseudo-random mobility model with semi-directed trajectories is used to model the user mobility. The mobile terminal’s position is updated according to the de-correlation length, and the direction is changed at each position update according to the given probability [8]. The mobility model is deﬁned by the following parameters: speed (assume to be constant at): 3 kmph; 50 kmph probability to change direction at position update: 0.2 maximal angle for direction update: 45 de-correlation length: 5 m (corresponding to 3 kmph mobile speed); 20 m (corresponding to 50 kmph mobile speed). Mobile terminals are uniformly distributed and their direction is randomly chosen at initialization.

376 Visual Media Coding and Transmission 9.3.1.4 CIR Calculation The capacity of a cellular radio network is interference-limited. The network operator is assigned a band of frequencies by the regulatory authorities, and must strive to reuse the frequency band in an efﬁcient manner to maximize the number of subscribers to the service. Cells are tessellated to form clusters, and each cluster may use the entire allocated spectrum. Cells in neighboring clusters use the same frequencies, and hence mobile terminals in these cells may interfere with one another, causing co-channel interference. To reduce the co-channel interference, cells are further divided into sectors within each cluster. In GSM [23] three main cluster/sector conﬁgurations, namely 4/12, 3/9, and 1/3 are deﬁned. Only the C/I ratio for downlink is investigated in this study. Each base station is assumed to be transmitting equal power, PT. As the BS-transmitted power is independent of mobile position, frequency hopping does not affect the calculation of C/I ratio. C/I ratio is calculated for each of the different cluster sizes and sectorization conﬁgurations deﬁned in GSM. Because the C/I ratio in the downlink is location-dependent, the C/Is for different locations within the cell are also calculated. Consider a mobile that receives in the kth slot on carrier fi in the 0th cell. The jth co-channel interfering BS creates an interference with this mobile in the 0th cell when it communicates to the mobile using the kth slot of carrier fi in its own cell. The geometrical arrangement is shown in Figure 9.31. State that the path-loss component is represented by L( f,d,x), where f is transmitting frequency, d is distance between transmitter and receiver, and other factors such as geometric orientation, heights of antennas are represented by x. Shadow fading, S, is a function of velocity, v, de-correlation length, dc, and log normal variance, s. Assume that the fast fading is effectively combated by the use of channel equalization, frequency hopping, and signal processing. Thus, the received power at the mobile is: PR ¼ PT Á Lð f ; d; xÞ Á Sðv; dc; sÞ ð9:20Þ where PR and PT are received power and transmitted power, respectively. Referring to Figure 9.31, the total carrier power received by the mobile in the 0th cell is: PC ¼ PT Á T L0ð f ; d0; x0Þ Á S0ðv; dc; sÞ ð9:21Þ The total interfering power received by the mobile is: ð9:22Þ X PI ¼ PT Á Ljð f ; dj; xjÞ Á Sjðv; dc; sÞ j where subscript j represents the jth interfering cell. J thBS d 0 thcell J th cell MS D d0 0 thBS Figure 9.31 Downlink interference from the jth BS to MS in the 0th cell

Enhancement Schemes for Multimedia Transmission over Wireless Networks 377 The application of voice-activity detection results in discontinuous transmission, thereby reducing the interfering power received by the mobile. In order to consider the effect of voice- activity detection on C/I ratio, a voice-activity variable vj is introduced. Variable vj is deﬁned as [25]: vj ¼ 1; with a probability of m ð9:23Þ 0; with a probability of 1 À m where the mean value of the voice-activity random variable is given by E[vj] ¼ m. Following the above argument, Equation (9.22) can be modiﬁed as: X ð9:24Þ PI ¼ m: Á PT Á Ljðf ; dj; xjÞ Á Sjðv; dc; sÞ j Therefore: C=I ¼ PC ¼ LP0ð f ; d0; x0Þ Á S0ðv; dc; sÞ sÞ ð9:25Þ PI m Á j Ljð f ; dj; xjÞ Á Sjðv; dc; Calculation of C/I Variation Due to Path Loss For a hexagonal cell pattern, there are always six close interferers, irrespective of the number of cells per cluster, as can be seen from Figure 9.32. However, the distance between co-channel cells depends on the number of cells, N, in a cluster, and is given by: ð9:26Þ p D ¼ R Á 3N where D is the distance between co-channel cell centers and R is the cell radius. Using the COST 231-Walﬁsh-Ikegami path-loss model and hexagonal cell structure, the variation of C/I due to path loss is calculated for each of the cluster/sector conﬁgurations speciﬁed in GSM [23]. 60 60 55 55 50 50 45 45 40 40 35 35 30 30 25 25 20 20 15 15 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 normalized distance from base station 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 CIR (dB) normalized distance from base station CIR (dB) 1 1 VAD = 3/8 3/9 configuration VAD = 1/2 1/3 configuration VAD = 1 4/12 configuration Figure 9.32 Downlink: (a) 3/9 cluster/sector conﬁguration CIR for MS at normalized distance, r, from the BS; (b) CIR for MS at normalized distance, r, from the BS; triangular mark, 3/9 conﬁguration; circle mark, 1/3 conﬁguration; square mark, 4/12 conﬁguration

378 Visual Media Coding and Transmission The worst C/I condition occurs when the mobiles are located near the cell boundary. The difference between the two extreme cases, where the mobile is located near the BS or near the cell boundary, is about 35 dB. Figure 9.32 illustrates the effects of different cluster/sector conﬁgurations on C/I ratio. In the case of 1/3 conﬁguration, the effect of a second set of interferers is also considered. 4/12 cluster/ sector conﬁguration gives the best performance, while 1/3 cluster/sector conﬁguration gives the worst. The difference between these two cases is about 10 dB. Calculation of C/I Variation Due to Shadow Fading Consider a single interfere case. Equation (9.25) can be rewritten as: C=I ¼ L0 þ S0 À LI À SI ð9:27Þ where all the terms are in dB. As shadowing is modeled as a log normal process with zero mean and s variance, terms S0 and SI in Equation (9.25) are random variables with normal distribution. Following the properties of a normal distribution, it can be shown that the distribution of S0 À SI(S0 I) is also a normal distribution with zero mean and variance s0 I given by [24]: s0 2 ¼ s02 þ sI 2 À 2 Á r Á s0 Á sI ð9:28Þ I where r deﬁnes the shadowing correlation coefﬁcient. r ¼ 0 indicates no shadowing correlation, resulting in the worst-case scenario. The variance, s0 I, decreases as the correlation increases, reaching a minimum value when r ¼ 1. When r ¼ 1/2 and s0 ¼ sI ¼ s, s0 I becomes s. Following the above argument, it can be stated that s always indicates one instance of the resultant variance s0 I of the shadowing process, even in the scenario where more than one interference is taken into account. Applying the mobility model, each position of the mobile is estimated for the duration of the conversation. The C/I ratio due to the path loss is calculated at each position of the mobile, and the shadowing process is simulated separately as explained above for s ¼ 7 dB, v ¼ 3 kmph, and dc ¼ 5 m. The total C/I ratio is estimated by superimposing the simulated shadowing process over the calculated C/I ratio due to the path loss. Figure 9.33 shows the estimated C/I ratio for 30 s duration when the mobile travels at 3 kmph, taking into account both propagation loss and shadowing. The parameters used are listed in Table 9.19. 35 30 CIR (dB) 25 20 15 10 0 5 10 15 20 25 30 time (ms) Figure 9.33 Simulated channel CIR for 30 s duration

Enhancement Schemes for Multimedia Transmission over Wireless Networks 379 Table 9.19 Parameters used in the channel implementation Log normal variance 7 dB De correlation Distance 5m Radius of Hexagonal Cell 200 m Propagation Frequency 900 MHz Vehicular Speed 3 kmph Channel Environment TU3 CIR Margin 9 dB Fading Characteristics Raleigh fading Cell Conﬁguration 4/12 Frequency Hopping Ideal frequency hopping Path Loss Model COST 231 Walﬁsh Ikegami model 9.3.2 Link Adaptation for Real-time Video Communications One fundamental characteristic of cellular systems is the time-varying channel conditions experienced by mobile users due to differences in distance to the base station, slow/fast-fading characteristics of the channel, and other user/cell interference. However, real-time multimedia services such as audio and video require maintenance of a certain carrier-to-interference ratio (CIR) to give good perceptual quality. To keep the performance at a desirable level, traditionally a communication system is designed for the average or worst-case situation. This, however, results in a severely under-utilized system when the channel is in a good state. Power control and diversity techniques can be used to mitigate the effects of the time-varying nature of the channel. However, power control may add extra interference to other users, while diversity techniques may require complex processing. Another way to increase the robustness of the radio link to varying channel quality is to employ link-adaptation techniques [26]. The main idea of link adaptation is to adapt the modulation and coding levels to the feedback channel information. Link-adaptation techniques have attracted a lot of attention recently. Many data rate- adaptation techniques based on the estimated signal-to-interference noise ratio have been proposed to adapt coded modulation schemes, thus improving data throughput of the mobile channels [27,28]. The theoretical basis for optimal switching in practical mobile systems involving several different adaptation parameters is investigated in [29]. However, such previous investigations were based on data and voice transmission with simplifying assump- tions such as coherent detection, perfect mode synchronization on the transmission modes, zero feedback delay, and noiseless feedback channels. In contrast, the performance of the proposed adaptive system for real-time video communication is evaluated based on practical considera- tions of channel estimation noise, feedback noise, and feedback delay. 9.3.2.1 Link Adaptation for Real-time Video Communication in EGPRS Networks The ﬂow diagram of the proposed scheme is shown in Figure 9.34(a). The EGPRS physical layer, including interleaver, modulator, equalizer, de-modulator, and de-interleaver, was implemented as explained in [31] to simulate the reception performance of the EGPRS receiver. The effect of error upon the EGPRS protocols at the radio interface was simulated by integrating the physical layer model with a radio access data ﬂow model.

380 Visual Media Coding and Transmission Source Trans.Network RLC/MAC EGPRS Encoder Layer & SNDC/LLC Protocol Physical layer Protocols Link adaptation Channel state WIRELESS CHANNEL Decision unit Predictor FEEDBACK CHANNEL TRANSMITTING END RECIEVING END Channel quality measurement unit Source Trans.Network RLC/MAC EGPRS Decoder Layer & SNDC/LLC Protocol Physical layer Protocols (a) RLC/MAC Hdr. Data BCS TB convolutional coding convolutional coding SB puncturing 464 bits (b) Figure 9.34 (a) Flow diagram of the proposed link adaptation scheme; (b) radio block structure in EGPRS channels, one RLC block per 20 ms [30] In addition to the EGPRS protocol layer units, the system consists of a channel state predictor and a link adaptation decision unit at the transmitter. The channel state predictor estimates the channel condition based on the feedback information supplied. According to this estimate, the link adaptation decision unit commands the RLC/MAC layer protocol and source encoder to vary the modulation coding scheme, allocated number of timeslots, and source bit rate for the current transmission. At the receiver, a channel quality measurement unit is located at the RLC/MAC layer. This measures the channel quality in terms of RSS and radio block error occurrence, and feeds this information back to the transmitter via a noisy feedback link with certain delay. Mode synchronization is attained with a closed-loop method, as described in [30]. A control word describing the transmission modes for the radio block is embedded into the radio-block header, as illustrated in Figure 9.34(b). The header format is indicated by the stealing bits (SB) of the

Pages:

Willington Island

Visual Media Coding and Transmission

Like this book? You can publish your book online for free in a few minutes!

Create your own flipbook

TOP SEARCH

business design fashion music health life sports home marketing children

Visual Media Coding and Transmission

Read the Text Version

Willington Island

TOP SEARCH

RELATED PUBLICATIONS