Home Explore Visual Media Coding and Transmission

Visual Media Coding and Transmission

Published by Willington Island, 2021-07-26 02:21:34

Description: Visual Media Coding and Transmission is an output of VISNET II NoE, which is an EC IST-FP6 collaborative research project by twelve esteemed institutions from across Europe in the fields of networked audiovisual systems and home platforms. The authors provide information that will be essential for the future study and development of visual media communications technologies. The book contains details of video coding principles, which lead to advanced video coding developments in the form of Scalable Coding, Distributed Video Coding, Non-Normative Video Coding Tools and Transform Based Multi-View Coding. Having detailed the latest work in Visual Media Coding, networking aspects of Video Communication is detailed. Various Wireless Channel Models are presented to form the basis for both link level quality of service (QoS) and cross network transmission of compressed visual data. Finally, Context-Based Visual Media Content Adaptation is discussed with some examples.

MEDIA DOODLE

Read the Text Version

Pages:

Quality Optimization for Cross network Media Communications 431 10.3.3.1 EGPRS Module The EGPRS module of the QoS mapping emulator supports two modes: 1. Emulator (EMU) mode: The QoS mapping emulator connects the EGPRS emulator and UMTS emulator. In the UMTS-to-EGPRS direction, the EGPRS module will encapsulate MPEG-4 data decapsulated from the UMTS module, and send it to the EGPRS emulator, while in the EGPRS-to-UMTS direction, it will decapsulate the data received from the EGPRS emulator and transfer it to the UMTS module. 2. Mobile Station (MS) mode: The QoS mapping emulator only connects the MPEG-4 ﬁle transmitter and MPEG-4 decoder. The EGPRS simulation (encapsulation and decapsula- tion) will all be done in this module. This mode is only for where no separate EGPRS emulator or UMTS emulator is present. If an EGPRS emulator and a UMTS emulator are available, the MS mode is not recommended, because it cannot provide a vivid display and ﬂexible parameter setting. 10.3.3.2 EGPRS Data Flow Model As Figure 10.8 shows, the basic concept of the simulation is to encapsulate the MPEG-4 data from the transport layer to the physical link layer, or decapsulate the data received from another emulator in the reverse direction. The following subsections will describe the encapsulation process. Transport Layer (RTP/UDP/IP) The frames of the encoded MPEG-4 video will be segmented into packets of the maximum size of 1520 bytes, and a 40 byte RTP/UDP/IP header will be inserted into the packet data. The content of the header is not important. This project focuses on the extent to which header corruption may affect QoS. In the program, 40 continuous “0”s are inserted into the front of the packet data as the header ﬂag. If the receiver detects that the RTP/UDP/IP header does not contain 40 continuous “0”s, it will set the whole packet with “0”s, which adds additional errors to the transmitted stream. SNDCP Layer The main role of the SNDCP layer is to control header compression for IP packet data. QoS for any real-time trafﬁc that uses small IP packets (e.g. speech) may require compression. For example, voice-over-IP (VoIP) is often carried in RTP/UDP/IP where the uncompressed header has a size of 40 bytes, but the speech typically only occupies 20 24 bytes (depending on the voice codec). Compression can cut the header to only four bytes most of the time. Given its complexity, the emulator does not have the header compression function. In most cases, video packets will be much bigger than the header size. Hence, the header compression function may not have much effect on the emulation results. Header compression is not always used in real environments. SNDCP has two different SN-PDU formats: 1. SN-DATA PDU, for acknowledged data transfer. Its header occupies three bytes. 2. SN-UNITDATA PDU, for unacknowledged data transfer. Its header occupies four bytes [17]. For the same reason mentioned above, the emulator uses SN-UNITDATA PDU format. Header content is still not important. In the program, “1234” is set as the SNDCP header ﬂag. If the

432 Visual Media Coding and Transmission Figure 10.9 EGPRS LLC frame format [18] receiver detects that the SNDCP header is not “1234”, the whole packet will be set to “0”s, which adds additional errors to the transmitted stream. LLC Layer Each LLC frame consists of a header, trailer, and information ﬁeld. The frame header consists of the address ﬁeld and control ﬁeld, as shown in Figure 10.9. The address ﬁeld occupies one octet, while the control ﬁeld typically consists of between one and three octets [18]. In the program, the LLC header is set to occupy three octets. The content of the header is not important, and is not the main feature that this project should consider. The trailer is the frame check sequence (FCS) ﬁeld, which consists of a 24 bit cyclic redundancy check (CRC) code. The emulator also takes its value as the FCS size. The minimum value of the maximum number of octets in an information ﬁeld (N201) will be 140 octets, and the maximum value will be 1520 octets. To simplify the complexity, the emulator let the N201 equal 1520 octets. If packet data is greater than 1520 octets, it will be segmented. RLC/MAC Layer The simulation in this layer is important, since GPRS and EGPRS have different structures in the layer. Although channel coding is not implemented in this layer, its schemes can affect RLC/MAC block structure. For GPRS The RLC/MAC block for GPRS data transfer consists of a MAC header (ﬁxed length) and an RLC data block, as shown in Figure 10.10. The RLC data block consists of an RLC header (variable length), an RLC data unit, and spare bits GPRS uses CS-1 to CS4 coding schemes (Table 10.3). Figure 10.10 RLC/MAC block structure for data transfer for GPRS [19]

Quality Optimization for Cross network Media Communications 433 Table 10.3 Parameters for GPRS channel coding schemes with one timeslot [20] Scheme Code rate USF Pre coded Radio BCS Tail Coded Punctured Data rate kbps USF block bits bits excl. USF and BCS CS 1 1/2 33 181 40 4 456 0 9.05 CS 2 %2/3 36 CS 3 %3/4 36 268 16 4 588 132 13.4 CS 4 1 3 12 312 16 4 676 220 15.6 428 16 456 21.4 The emulator only focuses on the header size and payload size. The other parameters of the block can be ignored. In the program, three continuous “12”s are set as the RLC/MAC header ﬂag. If the receiver cannot detect the ﬂag, the corresponding RLC block will be dropped. For EGPRS The RLC/MAC block for EGPRS data transfer consists of a combined RLC/MAC header and one or two RLC data blocks, as shown in Figure 10.11. Each RLC data block contains octets from one or more upper-layer protocol data unit (PDU). Depending on the modulation and coding scheme, one or two RLC data blocks are contained in one RLC/MAC block. For MCS-1 6 there is one RLC data block, whereas for MCS-7 9 there are two, as shown in Table 10.5. In each transfer direction, uplink and downlink, three different header types are deﬁned (Table 10.4): 1. Header type 1 is used with MCS-7 9. 2. Header type 2 is used with MCS-5 and 6. 3. Header type 3 is used with MCS-1 4 [19]. Different header types have different header sizes in downlink and uplink directions. As mentioned before, the emulator only considers the header size and payload size; other parameters can be ignored. Retransmission EGPRS supports retransmission with resegmentation, while GPRS only supports retransmis- sion with the same coding scheme. Tables 10.6 and 10.7 show the choice of MCS for retransmissions. The emulator supports the retransmission function, but has undergone some modiﬁcation to simplify the implementation. The transmitter will push each packet into a queue. After an encapsulated packet is sent out, it will wait for the acknowledgement. If the receiver (EGPRS emulator) detects that the received packet cannot be decapsulated correctly, it will send a NACK for retransmission. If the process is correct, it will also send an ACK to signal that the Figure 10.11 RLC/MAC block structure for data transfer for EGPRS [19]

434 Visual Media Coding and Transmission Table 10.4 RLC/MAC header sizes in EGPRS Header type 1 Header type 2 Header type 3 Header size in uplink (bits) 46 29 31 Header size in downlink (bits) 40 28 31 Table 10.5 Coding parameters for EGPRS coding schemes with one timeslot [20] Scheme Code Header Modulation RLC blocks Raw data Family BCS Tail HCS Data rate code rate per radio within one payload rate block radio kbps (20 ms) block MCS 9 1.0 0.36 8PSK 2 2 Â 592 A 2 Â 12 2 Â 6 59.2 MCS 8 0.92 0.36 GMSK 2 2 Â 544 A 54.4 MCS 7 0.76 0.36 2 2 Â 448 B 8 44.8 MCS 6 0.49 1/3 1 592 A 29.6 544 þ 48 12 6 27.2 MCS 5 0.37 1/3 1 448 B 22.4 MCS 4 1.0 0.53 1 352 C 17.6 MCS 3 0.80 0.53 1 296 A 14.8 272 þ 24 13.6 MCS 2 0.66 0.53 1 224 B 11.2 MCS 1 0.53 0.53 1 176 C 8.8 NOTE: italics indicate padding. next packet can be sent. The transmitter will pop the corresponding packet from the queue. It should be noted that the 3GPP standard requires layer 2 retransmission at the RLC PDU level; however, retransmission is actually performed at the RLC SDU level in the developed emulator. This is to enable minimization of the emulator complexity, without signiﬁcantly affecting the overall emulator performance. Timeslots RLC/MAC blocks will be allocated with timeslots. Any number of timeslots from one to eight can be allocated (one block with one timeslot). The emulator allows the user to change the number of timeslots allocated to each connection. 10.3.3.3 UMTS Module RLC/MAC blocks will be forwarded to the physical link layer. To simulate the transmission corruption in this layer, error pattern ﬁles are introduced. The emulator will read the ﬁle at a random position and make the exclusive OR operation between the encapsulated information data and the bits of the error pattern ﬁle. After the corruption simulation, the data steam will be transferred to the EGPRS emulator. The emulator offers header corruption conﬁguration. A user can tick a check box to see the effect with or without header corruption on each layer. If the header corruption setting of a layer is ticked off then the header ﬂag of that layer will be reset in the data for transmission. The reset operation is carried out behind the corruption simulation process. Decapsulation is similar to encapsulation, but in the reverse direction.

Table 10.6 Choice of MCS for retransmissions with resegmentation [19] Scheme used Scheme to use for retransmissions after switching to a different MCS for initial transmission MCS-9 MCS-8 MCS-7 MCS-6–9 MCS-6 MCS-5–7 MCS-5 MCS-4 MCS-3 MCS-2 MCS-1 Commanded Commanded Commanded Commanded Commanded Commanded Commanded Commanded Commanded Commanded Commanded MCS-9 MCS-9 MCS-6 MCS-6 MCS-6 MCS-6 MCS-3 MCS-3 MCS-3 MCS-3 MCS-3 MCS-3 MCS-8 MCS-8 MCS-8 MCS-6 MCS-6 MCS-6 MCS-3 MCS-3 MCS-3 MCS-3 MCS-3 MCS-3 (pad) (pad) (pad) (pad) (pad) (pad) (pad) (pad) (pad) MCS-7 MCS-7 MCS-7 MCS-7 MCS-5 MCS-5 MCS-5 MCS-5 MCS-2 MCS-2 MCS-2 MCS-2 MCS-6 MCS-9 MCS-6 MCS-6 MCS-9 MCS-6 MCS-3 MCS-3 MCS-3 MCS-3 MCS-3 MCS-3 MCS-5 MCS-7 MCS-7 MCS-7 MCS-5 MCS-5 MCS-7 MCS-5 MCS-2 MCS-2 MCS-2 MCS-2 MCS-4 MCS-4 MCS-4 MCS-4 MCS-4 MCS-4 MCS-4 MCS-4 MCS-4 MCS-1 MCS-1 MCS-1 MCS-3 MCS-3 MCS-3 MCS-3 MCS-3 MCS-3 MCS-3 MCS-3 MCS-3 MCS-3 MCS-3 MCS-3 MCS-2 MCS-2 MCS-2 MCS-2 MCS-2 MCS-2 MCS-2 MCS-2 MCS-2 MCS-2 MCS-2 MCS-2 MCS-1 MCS-1 MCS-1 MCS-1 MCS-1 MCS-1 MCS-1 MCS-1 MCS-1 MCS-1 MCS-1 MCS-1 NOTE: MCS to use for retransmissions when resegmentation (RESEGMENT bit set to ‘1’) is carried out (speciﬁed as a function of the scheme used for the initial transmission).

Table 10.7 Choice of MCS for retransmissions without resegmentation [19] Scheme used Scheme to use for retransmissons after switching to a different MCS for Initial transmission MCS-9 MCS-8 MCS-7 MCS-6–9 MCS-6 MCS-5–7 MCS-5 MCS-4 MCS-3 MCS-2 MCS-1 Commanded Commanded Commanded Commanded Commanded Commanded Commanded Commanded Commanded Commanded Commanded MCS-9 MCS-9 MCS-6 MCS-6 MCS-6 MCS-6 MCS-6 MCS-6 MCS-6 MCS-6 MCS-6 MCS-6 MCS-8 MCS-8 MCS-8 MCS-6 MCS-6 MCS-6 MCS-6 MCS-6 MCS-6 MCS-6 MCS-6 MCS-6 (pad) (pad) (pad) (pad) (pad) (pad) (pad) (pad) (pad) MCS-7 MCS-7 MCS-7 MCS-7 MCS-5 MCS-5 MCS-5 MCS-5 MCS-5 MCS-5 MCS-5 MCS-5 MCS-6 MCS-9 MCS-6 MCS-6 MCS-9 MCS-6 MCS-6 MCS-6 MCS-6 MCS-6 MCS-6 MCS-6 MCS-5 MCS-7 MCS-7 MCS-7 MCS-5 MCS-5 MCS-7 MCS-5 MCS-5 MCS-5 MCS-5 MCS-5 MCS-4 MCS-4 MCS-4 MCS-4 MCS-4 MCS-4 MCS-4 MCS-4 MCS-4 MCS-4 MCS-4 MCS-4 MCS-3 MCS-3 MCS-3 MCS-3 MCS-3 MCS-3 MCS-3 MCS-3 MCS-3 MCS-3 MCS-3 MCS-3 MCS-2 MCS-2 MCS-2 MCS-2 MCS-2 MCS-2 MCS-2 MCS-2 MCS-2 MCS-2 MCS-2 MCS-2 MCS-1 MCS-1 MCS-1 MCS-1 MCS-1 MCS-1 MCS-1 MCS-1 MCS-1 MCS-1 MCS-1 MCS-1 NOTE: MCS to use for retransmission when resegmentation is not (RESEGMENT bit set to ‘0’) is not allowed (speciﬁed as a function of the scheme used for the initial transmission).

Quality Optimization for Cross network Media Communications 437 The UMTS module of the QoS mapping emulator also supports two modes: 1. EMU mode: In UMTS-to-EGPRS direction, the module will decapsulate MPEG-4 data received from the UMTS emulator and send it to the EGPRS module, while in the EGPRS- to-UMTS direction it will encapsulate the data forwarded from the EGPRS module and transfer it to the UMTS emulator. 2. MS mode: The UMTS simulation (encapsulation and decapsulation) will all be done in this module. It should be noted that the AM mode of the RLC layer is temporally unsupported in this mode. This mode is not recommended when the EGPRS emulator and UMTS emulator are available. 10.3.3.4 UMTS Data Flow Model Figure 10.12 shows the UMTS module data ﬂow. Encapsulation is just the reverse process of decapsulation. In MS mode, the emulator will implement both the encapsulation and the decapsulation process. In EMU modes, only one of these processes is implemented. Transport Layer Two connection types are supported in this layer: 1. Packet-switched (PS): A 40 byte header will be inserted into the encoded video frame. In the program, 40 continuous “0”s are set as the ﬂag. Segmentation will be implemented when the encoded video frame size is greater than 1520 bytes. 2. Circuit-switched (CS): No header will be added to the video frame. Figure 10.12 UMTS module data ﬂow

438 Visual Media Coding and Transmission PDCP Layer The main role of the PDCP layer is similar to that of the SNDCP layer in GPRS, which is control of header compression for IP packet data. In the emulator, PDCP-no-header PDU is adopted. The PDCP-no-header PDU does not introduce any overhead to the PDCP SDU [21]. RLC/MAC Layer The RLC layer will segment the data forwarded from higher layers into RLC blocks. Transport block (TB) size determines the size of the RLC blocks. There are three types of RLC PDU for data transmission: 1. Transparent Mode Data (TMD) PDU: The TMD PDU is used to transfer user data when RLC is operating in transparent mode. Hence, no overhead is added to the SDU by RLC. 2. Unacknowledged Mode Data (UMD) PDU: The UMD PDU is used to transfer user data when RLC is operating in unacknowledged mode. The UMD PDN header occupies the ﬁrst octet of the data stream. 3. Acknowledged Mode Data (AMD) PDU: The AMD PDU is used to transfer user data, piggybacked status information, and the polling bit when RLC is operating in acknowl- edged mode. The AMD PDU header occupies the ﬁrst two octets of the data stream. The emulator supports the acknowledged mode. But some simpliﬁcations and modiﬁcations have been made to reduce the implementation complexity. If the emulator works in the acknowledged mode, the transmission part will push every frame into a queue. When the receiver detects there is header corruption in the received RLC blocks, it will order the transmitter to retransmit the whole frame, which is stored in the transmitters queue. Retrans- missions of the RLC blocks have been tried, but the results cannot be accepted since the whole emulation system is slowed down. Further proper optimization may solve the problem. MAC entities can handle either the dedicated transport channel or the high-speed downlink shared channel. If channel multiplexing is performed at the MAC layer, a 4 bit MAC header is added to each RLC block [22]. Layer 1 (Physical Layer) Layer 1 will attach a selected CRC to the RLC/MAC blocks. TTI blocks are formed by combining RLC/MAC blocks according to the speciﬁed TTI length, before transmitting over the air interface. The channel coding scheme, spreading factor, and rate-matching ratio, which can be speciﬁed by the user, determine the number of PDUs to be encapsulated within the TTI blocks. Error pattern ﬁles are introduced in this layer to emulate the possible corruption in the real environment. The emulator will read the error pattern ﬁle from a random position and make the exclusive OR operation between the encapsulated information data and bits of the error pattern ﬁle. 10.3.4 General User Interface The graphical user interface is designed in the Visual Cþþ environment. This provides a platform for user interaction as well as user QoE evaluation. The user is allowed to change most of the radio network parameters and visualize the effect of these parameters on multimedia quality. Figure 10.13 shows a snapshot of the developed graphical user interface. In the main interface (Figure 10.14), the user can set the radio bearer parameters of the EGPRS module and UMTS module. The transmission status and performance of each module are also displayed in the main interface. In addition, the user is allowed to change the number of timeslots

Quality Optimization for Cross network Media Communications 439 Figure 10.13 Graphical user interface of QoS architecture allocated to the current connection and the total number of users in the system while media is being delivered. This facilitates the visualization of received video quality instantaneously. Figure 10.15 shows the QoS parameter conﬁguration dialog. The user can select the desired QoS parameters, such as type of service, trafﬁc class, data rate, residual bit error rate, and transfer delay. The page will also show operator control parameters, connection type, PDCP connection type, and the number of multiplexed transport channels. Not all parameters are available; some are designed for future use. Figure 10.14 Main interface of QoS mapping emulator

440 Visual Media Coding and Transmission Figure 10.15 QoS parameters conﬁguration interface The physical/radio channel parameter dialog (Figure 10.16) displays the appropriate spreading factor calculated based on the requested QoS parameters. In this dialog, the user can select the radio channel-related settings (carrier frequency, channel environment, mobile speed) and receiver characteristics (number of rake ﬁngers, rake combining, diversity Figure 10.16 Physical/radio channel parameters conﬁguration interface

Quality Optimization for Cross network Media Communications 441 Figure 10.17 Transport channel (data channel) parameter conﬁguration interface techniques, and power control). The transport channel (data channel) parameter dialog (Figure 10.17) allows users to deﬁne logical channel type, RLC mode, MAC channel type, MAC multiplexing, layer 1 parameters, TTI, channel coding scheme, and CRC. TB size and rate matching ratio, calculated from the other input parameter values, are also displayed. The user can change the settings of the channel coding scheme, header corruption, and retransmission in Figure 10.18 EGPRS parameter conﬁguration interface. Reproduced by Permission of Ó2005 IEEE

442 Visual Media Coding and Transmission the EGPRS property dialog (Figure 10.18). In general the parameters are set before the emulator runs, but adaptive parameter conﬁguration in runtime is supported. 10.4 Performances of Video Transmission in Inter-networked Systems An MPEG-4 encoder and decoder pair are implemented in real time at the mobile terminals. The connection setup is based on IP/UDP/RTP transport protocols [23]. Video frames are segmented if the video frame size exceeds the speciﬁed maximum transfer unit (MTU) size, before being encapsulated into IP packets for transmission. At the receiving end, if IP/UDP/RTP headers are found to be corrupted, data encapsulated within those packets is dropped at the network layer. The aim of the QoS mapping emulation system is to obtain the optimal bearer conﬁgurations for the EGPRS network and the UMTS network. Experiments are carried out to investigate the inﬂuences of different bearer conﬁgurations on an EDGE-to-UMTS system. First, video transmission is emulated in both EDGE and UMTS systems separately. Second, encoded video is transmitted over the joint EDGE-to-UMTS system. The transmission is emulated over the EDGE system. Then the output data from the EDGE emulator is forwarded to QoS mapping emulator, where the appropriate radio resources are allocated for transmission over the UMTS system. After transmission over the UMTS system, received data is forwarded to the receiving terminal emulator for decoding and display. 10.4.1 Experimental Setup At ﬁrst, video transmission emulations are made in both the EGPRS emulator and the UMTS emulator. The MPEG-4 decoder will record the received video data and write to a YUV ﬁle. The peak signal-to-noise ratio (PSNR) of the received data used for present objection quality will be calculated. Next, the video should be transmitted to the EGPRS emulator, and then the encapsulated output data from the EGPRS emulator should be forwarded to the QoS mapping emulator. The UMTS emulator will receive the output data from the QoS mapping emulator. The output video data from the UMTS emulator will also be recorded to a YUV ﬁle by the MPEG-4 decoder. The PSNR of the data will be calculated and compared with the PSNR calculated in the ﬁrst step. A 300 frame video (suzie.cmp) with a frame rate of 10 fps is selected for the test. Full error-resilience-enabled MPEG-4-coded video transmission is considered in the experiments discussed. In addition, the TM5 rate-control algorithm, together with an adaptive intra-refresh algorithm [24], is used to stop temporal error propagation and to achieve a smoother output bit rate. ITU test sequences Suzie, Foreman, and Carphone are used as the source signals. QCIF (176 by 144 pixels) sequences are coded at 10 fps. The received video quality is measured in terms of average frame PSNR. The video encoder parameter settings used in the experiment are listed in Table 10.8. Table 10.8 Parameters and performance of the sample video Setting Parameter Source rate 64 kbps Frame rate 10 fps Error resilience Enabled Video packet size 600 bits Adaptive intra refresh 10 intra coded MBs per frame Cyclic intra refresh 2 intra coded MBs per frame

Quality Optimization for Cross network Media Communications 443 Table 10.9 EDGE multislotting capacity for video (kbps). Reproduced by Permission of Ó2005 IEEE Scheme TS1 3 TS 6 TS 7 TS 8 TS MCS 1 7.5 22.5 45 52.5 60 MCS 3 12.6 37.8 75.6 88.2 100.8 MCS 5 19 57 114 133 152 MCS 6 25.2 75.6 151.2 176.4 201.6 MCS 9 50.3 150.9 301.8 352.1 384 10.4.2 Test for the EDGE System The throughput allocation per timeslot at the application level (as seen by the video codec) in the EDGE system is shown in Table 10.9 for different modulation coding schemes. Simulations are carried out for different propagation environments; the results for the typical urban (TU) environment are shown in Figure 10.19. Ideal frequency hopping is assumed for 3 kmph mobile speed. It is seen that MCS-1 gives better performance than MCS-2 at all C/I values up to around 20 dB. At this value, however, MCS-5 begins to provide a superior video quality to either of these schemes. MCS-6 does not match this performance until at least a C/I value of 30 dB. Further tests were conducted for different timeslot operations. An operating frequency of 1800 MHz in TU50 multipath fading channel environment is considered. Two channel coding schemes are selected for test: MCS-1 and MCS-5. The C/I ratio settings for MCS-1 vary from 9 32 30 28 Mean PSNR (dB) 26 24 22 20 18 MCS−1 MCS−2 16 MCS−5 MCS−6 5 10 15 20 25 30 35 40 C/I (dB) Figure 10.19 Video quality at TU1.5 IFH 1800 MHz

444 Visual Media Coding and Transmission Table 10.10 PSNR of the received video (MCS 1, EGPRS, ﬁve timeslots) C/I ratio (dB) 18 15 12 9 BLER 0.0002 0.007 0.095 0.035 PSNR (dB) 35.222392 34.542415 28.062148 11.335004 mean PSNR (dB) 40 35 30 25 20 15 10 7 9 11 13 15 17 19 C/I (dB) Figure 10.20 MPEG 4 performance over EGPRS, MCS 1, ﬁve timeslots to 18 dB, while the C/I ratio settings for MCS-5 vary from 12 to 24 dB, since MCS-1 has better performance against interferences. The original video source rate is 64 kbps. In the experiment, the number of timeslots allocated for RLC blocks using MCS-1 is set to ﬁve. Thus the source rate is limited to 8.8 kbps Â 5 ¼ 44 kbps. The number of timeslots allocated for RLC blocks using MCS-5 is set to two in order to keep the same source rate (22.4 kbps Â 2 ¼ 44.8 kbps. The retransmission function is used, but with no resegmentation, which means the transmitter only uses the same coding scheme to perform the retransmission. Header corruptions are enabled. For each C/I ratio setting, the same test is taken 10 times in order to average out the effect of burst channel errors. From Table 10.10 and Figure 10.20 it can be seen that the mean PSNR is very low when the C/I radio is set to 9 dB. Hence, for video communication the channel quality should be above 9 dB. In the following QoS mapping test, the C/I ratio setting in the EGPRS module of the QoS mapping emulator will not be 9 dB when the channel coding scheme is MCS-1. From Table 10.11 and Figure 10.21 it can be seen that the PSNR is very low when the C/I radio is set to 12 dB. Thus, in the following QoS mapping test, the value of the C/I ratio in the EGPRS module of the QoS mapping emulator will not be set to 12 dB when the channel coding scheme is MCS-5. Compared with MCS-1, with same source rate, MCS-5 has worse performance. Table 10.11 PSNR of the received video (MCS 5, EGPRS, two timeslots) C/I ratio (dB) 24 18 15 12 BLER 0.022 0.065 0.15 0.25 PSNR (dB) 34.050303 29.323671 24.243836 12.408263

Quality Optimization for Cross network Media Communications 445 40mean PSNR (dB) 35 30 25 20 15 10 11 13 15 17 19 21 23 25 C/I (dB) Figure 10.21 MPEG 4 performance over EGPRS, MCS 5, 2 timeslots. Reproduced by Permission of Ó2005 IEEE 10.4.3 Test for the UMTS System The actual information data rate in a UMTS system is a function of spreading factor (SF), rate- matching ratio, channel coding scheme, CRC attachment, operation mode, and transport block (TB) size. The calculated information data rates according to the different simulation parameter settings are shown in Table 10.12. The received video qualities for different spreading factor realizations are depicted in Figure 10.22. Video sequences are coded at the appropriate rates listed in Table 10.12. As expected, allocation of SF 32 provides slightly better performance than others in poor channel conditions, due to the better channel-protection capability of higher spreading factors. In better channel conditions, allocation of SF 16 provides superior video quality. SF 8 considerably underperforms compared to all other schemes, even in good conditions. This is due to the inter- symbol interference experienced in multipath channels. The channel coding algorithm tends to mitigate the inter-symbol interference effect. But signiﬁcant performance degradation is still visible for low spreading factors (such as 8). Results are shown only for convolutional coding, as similar performances are achieved with turbo coding. 10.4.4 Tests for the EDGE-to-UMTS System 10.4.4.1 Tests for EGPRS MCS-1, UMTS CC 1/3, SF 32 The EGPRS emulator chooses MCS-1 as the channel coding scheme, while the UMTS emulator selects 1/3 convolutional code as its channel coding scheme, and the SF is set to 32. The results are listed in Table 10.13. Table 10.12 Source throughput (kbps) capacity in UMTS. SF, spreading factor; CC, convolutional code; TC, turbo code. Reproduced by Permission of Ó2005 IEEE SF CC 1/2 CC 1/3 TC 1/3 None 64 39.5 26.10 26.7 80.6 65.5 197.1 32 97 64.5 139.55 419.1 299.3 899.1 16 206.1 137.4 619.2 1859.1 8 442.8 295.1 4 915.75 610

446 Visual Media Coding and Transmission MPEG4 preformance over Veh A cc 1/3 no pc, 10fps 40 35 30 mean PSNR (dB) 25 20 15 10 sf32 sf16 sf8 5 3 4 5 6 7 8 9 10 11 12 Eb/No (dB) Figure 10.22 Effect of spreading factor The performance results are compared to those over a single network connection. For the results shown in Figures 10.23 10.25, C/I of the EDGE channel is set to 18 dB (good channel), 15 dB (moderate channel), and 12 dB (poor quality channel), respectively, while the channel condition for the UMTS link is varied from Eb/No ¼ 6 dB to Eb/No ¼ 10 dB. When channels on both connections are in good condition, video performance gets better. However, the video quality is always less than that over the single UMTS network. This is due to the QoS parameter mismatch in the two networks. Figure 10.25 shows the video performance when the channel conditions on the EDGE link change from good to poor. Here C/I over EDGE is 12 dB. The results are 5 7 dB lower Table 10.13 PSNR of the received video (EGPRS MCS 1, UMTS CC 1/3, SF 32) PSNR (dB) UMTS UMTS UMTS UMTS UMTS Eb/N0 ¼ 9.6 Eb/N0 ¼ 8.7 Eb/N0 ¼ 7.7 Eb/N0 ¼ 6.9 Eb/N0 ¼ 5.9 dB dB dB dB dB 19.766769 E/GPRS 33.687812 33.225360 32.146420 27.658593 C/I ¼ 18 dB 33.383844 32.318679 31.145216 27.157409 20.100920 E/GPRS 27.289343 27.059496 24.233767 20.245704 C/I ¼ 15 dB 17.789202 E/GPRS C/I ¼ 12 dB

Quality Optimization for Cross network Media Communications 447 mean PSNR (dB) 40 only EGPRS, MCS-1 C/I=18dB 35 only UMTS, CC-1/3, SF-32 30 QoS mapping 25 20 6789 10 15 UMTS Eb/N0 (dB) 10 5 Figure 10.23 MPEG 4 performance in QoS mapping: EGPRS MCS 1, C/I ¼ 18 dB (ﬁxed), UMTS CC 1/3, SF 32 (variable) mean PSNR (dB) 40 only EGPRS, MCS-1 C/I=15dB 35 only UMTS, CC-1/3, SF-32 30 QoS mapping 25 20 6789 10 15 UMTS Eb/N0 (dB) 10 5 Figure 10.24 MPEG 4 performance in QoS mapping: EGPRS MCS 1, C/I ¼ 15 dB (ﬁxed), UMTS CC 1/3, SF 32 (variable) mean PSNR (dB) 40 only EGPRS, MCS-1 C/I=12dB 35 only UMTS, CC-1/3, SF-32 30 QoS mapping 25 20 6789 10 15 UMTS Eb/N0 (dB) 10 5 Figure 10.25 MPEG4 performance in QoS mapping: EGPRS MCS 1, C/I ¼ 12 dB (ﬁxed), UMTS CC 1/3, SF 32 (variable)

448 Visual Media Coding and Transmission Table 10.14 PSNR of the received video (EGPRS MCS 1, UMTS CC 1/2, SF 16) PSNR (dB) UMTS UMTS UMTS UMTS UMTS Eb/N0 ¼ 9.6 Eb/N0 ¼ 8.7 Eb/N0 ¼ 7.7 Eb/N0 ¼ 6.9 Eb/N0 ¼ 5.9 dB dB dB dB dB 15.943978 E/GPRS 32.799385 29.148538 20.096635 18.539844 C/I ¼ 18 dB 31.167210 28.571704 20.655864 18.902620 18.822408 E/GPRS 29.295367 26.219025 20.017732 19.196145 C/I ¼ 15 dB 14.213557 E/GPRS C/I ¼ 12 dB than the quality received over a single UMTS network. This is because when the received quality over the ﬁrst link is poor it is impossible to achieve better quality even when the second link is in good condition. This illustrates that it is necessary to perform joint QoS optimization over heterogeneous networks for efﬁcient resource utilization and improved end-user quality. 10.4.4.2 Tests for EGPRS MCS-1, UMTS CC 1/2, SF 16 From Figures 10.26 10.28, it can also be seen that the PSNR value in QoS mapping emulation is lower than the PSNR values in either EGPRS network emulation or UMTS network emulation. The results are listed in Table 10.14. The PSNR values here are low, which shows that an improper setting (channel coding scheme, spreading factor, etc.) may result in a signiﬁcant drop in video quality. 10.4.4.3 Tests for EGPRS MCS-5, UMTS CC 1/2, SF 16 The results obtained are listed in Table 10.15. Figures 10.29 10.31 show very low PSNR in this experiment. It is obvious that MCS-5 gives worse QoS performance than MCS-1. If the UMTS network also uses a low QoS performance channel coding scheme (in the experiments, CC 1/2, SF 16 has shown much lower performance than CC 1/3, SF 32), the ﬁnal received video quality of QoS mapping cannot be accepted. mean PSNR (dB) 40 only EGPRS, MCS-1 C/I=18dB 35 only UMTS, CC-1/2, SF-16 30 QoS mapping 25 20 678 9 10 15 UMTS Eb/N0 (dB) 10 5 Figure 10.26 MPEG4 performance in QoS mapping: EGPRS MCS 1 C/I ¼ 18 dB (ﬁxed), UMTS CC 1/3, SF 16 (variable)

Quality Optimization for Cross network Media Communications 449 mean PSNR (dB) 40 only EGPRS, MCS-1 C/I=15dB 35 only UMTS, CC-1/2, SF-16 30 QoS mapping 25 20 6 7 8 9 10 15 UMTS Eb/N0 (dB) 10 5 Figure 10.27 MPEG4 performance in QoS mapping: EGPRS MCS 1 C/I ¼ 15 dB (ﬁxed), UMTS CC 1/3, SF 16 (variable) mean PSNR (dB) 40 only EGPRS, MCS-1 C/I=12dB 35 only UMTS, CC-1/2, SF-16 30 QoS mapping 25 20 6789 10 15 UMTS Eb/N0 (dB) 10 5 Figure 10.28 MPEG4 performance in QoS mapping: EGPRS MCS 1 C/I ¼ 12 dB (ﬁxed), UMTS CC 1/3, SF 16 (variable) Table 10.15 PSNR of the received video (EGPRS MCS 5, UMTS CC 1/2, SF 16) PSNR (dB) UMTS UMTS UMTS UMTS UMTS Eb/N0 ¼ 9.6 Eb/N0 ¼ 8.7 Eb/N0 ¼ 7.7 Eb/N0 ¼ 6.9 Eb/N0 ¼ 5.9 dB dB dB dB dB 14.928519 E/GPRS 32.093485 28.697846 23.199515 17.286461 C/I ¼ 24 dB 28.048199 24.693708 19.796522 17.319961 16.112598 E/GPRS 20.711277 19.382652 18.095708 15.677293 C/I ¼ 18 dB 14.798710 E/GPRS C/I ¼ 15 dB

450 Visual Media Coding and Transmission mean PSNR (dB) 40 only EGPRS, MCS-5 C/I=24dB 35 only UMTS, CC-1/2, SF-16 30 QoS mapping 25 20 6789 10 15 UMTS Eb/N0 (dB) 10 5 Figure 10.29 MPEG4 performance in QoS mapping: EGPRS MCS 1 C/I ¼ 24 dB (ﬁxed), UMTS CC 1/2, SF 16 (variable) mean PSNR (dB) 40 only EGPRS, MCS-5 C/I=18dB 35 only UMTS, CC-1/2, SF-16 30 QoS mapping 25 20 6 7 8 9 10 15 UMTS Eb/N0 (dB) 10 5 Figure 10.30 MPEG4 performance in QoS mapping: EGPRS MCS 5 C/I ¼ 18 dB (ﬁxed), UMTS CC 1/2, SF 16 (variable) 40mean PSNR (dB 35 only EGPRS, MCS-5 C/I=15dB 30 25 only UMTS, CC-1/2, SF-16 20 QoS mapping 15 10 5 6 7 8 9 10 UMTS Eb/N0 (dB) Figure 10.31 MPEG4 performance in QoS mapping: EGPRS MCS 5 C/I ¼ 15 dB (ﬁxed), UMTS CC 1/2, SF 16 (variable)

Quality Optimization for Cross network Media Communications 451 Table 10.16 PSNR of the received video (EGPRS MCS 5, UMTS CC 1/2, SF 32) PSNR (dB) UMTS UMTS UMTS UMTS UMTS Eb/N0 ¼ 9.6 Eb/N0 ¼ 8.7 Eb/N0 ¼ 7.7 Eb/N0 ¼ 6.9 Eb/N0 ¼ 5.9 dB dB dB dB dB E/GPRS C/I ¼ 24 dB 33.700760 32.058648 28.062257 22.535770 16.964270 E/GPRS C/I ¼ 18 dB 28.921348 26.527228 23.079590 19.022860 15.795742 E/GPRS C/I ¼ 15 dB 20.199557 19.275756 18.908845 17.757226 14.192458 10.4.4.4 Tests for EGPRS MCS-5, UMTS CC 1/2, SF 32 The results obtained are listed in Table 10.16. It can be seen from Figures 10.32 10.34 that in most conditions, the UMTS module of the QoS mapping emulator will give better performance when the spreading factor is increased and the coding scheme kept the same. But if the PSNR values are already very low in an EGPRS network, changing the spreading factor will provide a signiﬁcant improvement. mean PSNR (dB) 40 only EGPRS, MCS-5 C/I=24dB 35 30 only UMTS, CC-1/2, SF-32 25 QoS mapping 20 15 6789 10 10 UMTS Eb/N0 (dB) 5 Figure 10.32 MPEG4 performance in QoS mapping: EGPRS MCS 5 C/I ¼ 24 dB (ﬁxed), UMTS CC 1/2, SF 32 (variable) mean PSNR (dB) 40 only EGPRS, MCS-5 C/I=18dB 35 only UMTS, CC-1/2, SF-32 30 QoS mapping 25 20 6789 10 15 UMTS Eb/N0 (dB) 105 Figure 10.33 MPEG4 performance in QoS mapping: EGPRS MCS 5 C/I ¼ 18 dB (ﬁxed), UMTS CC 1/2, SF 32 (variable)

452 Visual Media Coding and Transmission mean PSNR (dB) 40 only EGPRS, MCS-5 C/I=15dB 35 only UMTS, CC-1/2, SF-32 30 QoS mapping 25 20 6789 10 15 UMTS Eb/N0 (dB) 10 5 Figure 10.34 MPEG4 performance in QoS mapping: EGPRS MCS 5 C/I ¼ 15 dB (ﬁxed), UMTS CC 1/2, SF 32 (variable) 10.5 Conclusions This work has examined issues involved with QoS mapping. The idea of QoS mapping can be interpreted in a number of different ways, making it important to deﬁne the interpretation used in this document. In the interpretation used here, it is assumed that in a real communications scenario, two different mobile network types, possibly owned by two different network operators, are connected by a core network. It should be noted that there are no direct relays between the two mobile networks. This means that the two networks must be conﬁgured separately. The intention of the work delineated here has been to generate a set of performance curves with different QoS parameter settings over a range of different channel conditions that might be experienced in a mobile network. These curves will enable network operators to conﬁgure their systems in an optimized manner. QoS parameter mapping for video transmission over an EDGE-to-UMTS system have been investigated by means of system emulation. The QoS mapping emulation system is built to evaluate the effect when an EGPRS network user transmits videos to a user in a different network such as UMTS, and vice versa. The system consists of ﬁve components: MPEG-4 ﬁle transmitter, EGPRS emulator, QoS mapping emulator, UMTS emulator, and MPEG-4 decoder. EGPRS and UMTS emulators have been developed to support communication with other emulators. The QoS mapping emulator acts as a bridge between the EGPRS emulator and the UMTS emulator. A new MPEG-4 Terminal program, which has a mobile-phone shape, combines the functions of the MPEG-4 ﬁle transmitter and the MPEG-4 decoder. The intention of the emulator design is to enable investigation of optimal QoS settings when transmitting video between two different network types. The results give service providers and network operators a set of performance ﬁgures with different combinations of QoS parameters in two different networks, over a range of channel conditions. These performance ﬁgures can be used in QoS parameter selection in real networks. The experiment has shown that the video quality for EDGE-to-UMTS network transmission is lower than for EDGE-to-EDGE or UMTS-to-UMTS transmission, due to parameter mismatch between the two networks. If the QoS of the video transmission in a single network is poor, the network parameters on the second connection should be carefully selected to avoid

Quality Optimization for Cross network Media Communications 453 further quality degradation. Otherwise, the received quality will be unacceptable to the end user. There are many different possible combinations of channel conditions and scenarios that might be encountered in real networks. Therefore, it is not possible to give a short guide to the best possible set of parameters to use. Instead, the data from these simulations could be used to construct a further software package, capable of producing a set of optimized UMTS parameters given certain EDGE parameter inputs. References [1] http://www.anwire.org. [2] http://context.upc.es. [3] http://www.everest ist.upc.es. [4] http://www.netlab.nec.de/Projects/INTERMON.htm. [5] G. M. Muntean, P. Perry, and L. Murphy, “A new adaptive multimedia streaming system for all IP multi service networks,” IEEE Transactions on Broadcasting, Vol. 50, No. 1, pp. 1 10, Mar. 2004. [6] L. Huang, S. Kumar, and C. C. J. Kuo, “Adaptive resource allocation for multimedia QoS management in wireless networks,” IEEE Transactions on Vehicular Technology, Vol. 53, No. 2, pp. 547 558, Mar. 2004. [7] L. Tong and P. Ramanathan, “Adaptive power and rate allocation for service curve assurance in DS CDMA network,” IEEE Transactions on Wireless Communications, Vol. 3, No. 2, pp. 555 564, Mar. 2004. [8] W. Kumwilaisak, Y. T. Hou, Q. Zhang, W. Zhu, C. C. J. Kuo, and Y. Q. Zhang, “A cross layer quality of service mapping architecture for video delivery in wireless networks,” IEEE Journal on Selected Areas in Commu nications, Vol. 21, No. 10, pp. 1685 1698, Dec. 2003. [9] A. Garibbo, M. Marchese, and M. Mongelli, “Mapping the quality of service over heterogeneous networks: a proposal about architectures and bandwidth allocation,” Proc. IEEE International Conference on Communica tions, ICC2003, Anchorage, AK, Vol. 3, pp. 1690 1694, May 2003. [10] G. Araniti, P. De Meo, A. Iera, and D. Ursino, “Adaptively controlling the QoS of multimedia wireless applications through user proﬁling techniques,” IEEE Journal on Selected Areas in Communications, Vol. 21, No. 10, 1546 1556, Dec. 2003. [11] A. V. Moorsel, “Metrics for the Internet age: quality of experience and quality of business,” Technical Report No. HPL 2001 179, http://www.hpl.hp.com/techreports. [12] P. Wright, “A framework for analyzing user experience,” 2003, http://www.usabilitynews.com/news/article1008. asp. [13] 3rd Generation Partnership Project, “GERAN: GSM/EDGE radio access network (GERAN): overall description: stage 2 (release 5),” TS 43.051, v. 5.0.0., Jan. 2001. [14] 3rd Generation Partnership Project, “Physical channels and mapping of transport channel on to physical channel (FDD) (release 4),” TS 25.211, v. 4.6.0., Sep. 2002. [15] 3rd Generation Partnership Project, “Universal mobile telecommunications system (UMTS): multiplexing and channel coding (FDD) (release 6),” TS 25.212, v. 6.1.0., Mar. 2004. [16] 3rd Generation Partnership Project, “Universal mobile telecommunications system (UMTS): medium access control (MAC) protocol speciﬁcation (release 6),” TS 25.214, v. 6.0.0., Dec. 2003. [17] 3rd Generation Partnership Project, “Digital cellular telecommunications system (phase 2þ ): mobile station (MS): serving GPRS support node (SGSN): subnetwork dependent convergence protocol (SNDCP) (release 5),” TS 44.065, v.5.1.0., Sep. 2003. [18] 3rd Generation Partnership Project, “Digital cellular telecommunications system (phase 2þ ): mobile station (MS): serving GPRS support node (MS SGSN) logical link control (LLC) layer speciﬁcation (release 5),” TS 44.064, v. 5.1.0., Mar. 2002. [19] 3rd Generation Partnership Project, “Digital cellular telecommunications system (phase 2þ ): general packet radio service (GPRS): mobile station (MS): base station system (BSS) interface: radio link control/medium access control (RLC/MAC) protocol (release 5),” TS 44.060, v. 5.10.0., Dec. 2004. [20] 3rd Generation Partnership Project, “Digital cellular telecommunications system (phase 2þ ): overall description of the GPRS radio interface: stage 2 (release 4),” TS 43.064, v. 4.1.0., Apr. 2001. [21] 3rd Generation Partnership Project, “Universal mobile telecommunications system (UMTS): packet data convergence protocol (PDCP) speciﬁcation (release 6),” TS 25.323, v. 6.0.0, Dec. 2003.

454 Visual Media Coding and Transmission [22] 3rd Generation Partnership Project, “Universal mobile telecommunications system (UMTS): medium access control (MAC) protocol speciﬁcation (release 6),” TS 25.321, v. 6.0.0., Dec. 2003. [23] Y. Kikuchi, T. Normura, S. Fukunaga, Y. Matsui, and H. Kimata,“RFC 3016: RTP payload format for MPEG 4 Audio/Visual streams”. [24] “Information technology: generic coding of audio visual objects: part 2: visual,” ISO/IEC JTC 1/SC 29/WG 11, ISO/IEC 14496 2, Jul. 2001.

11 Context-based Visual Media Content Adaptation 11.1 Introduction Future media Internet will allow new applications to be realized with support for ubiquitous media-rich content service technologies. Virtual collaboration, extended home platforms, augmented, mixed and virtual realities, gaming, telemedicine, e-learning, and so on, in which users with possibly diverse geographical locations, terminal types, connectivity, usage environments and preferences will access and exchange pervasive yet protected and trusted content, are only a few of the possibilities. These multiple forms of diversity require content to be transported and rendered in different forms, which requires the use of context-aware content adaptation. This avoids the alternative of predicting, generating, and storing all the different forms required for every item of content. Figure 11.1 provides a generic representation of entities and contextual descriptions that are engaged in the delivery of context-aware multi- media applications. For the aforementioned reasons, there is a growing need to devise adequate concepts and functionalities for a context-aware content adaptation platform that suits the requirements of such multimedia application scenarios. This platform needs to be able to consume low-level contextual information to infer higher-level contexts, and thus decide the need for and type of adaptation operations to be performed upon the content. In this way, usage constraints can be met while restrictions imposed by digital rights management (DRM) governing the use of protected content are satisﬁed. This chapter is dedicated to providing comprehensive discussions on the use of con- textual information in adaptation decision operations, with a view to managing DRM and authorization for adaptation, and consequently outlining appropriate adaptation decision techniques and adaptation mechanisms. The main challenges are realized from identi- fying integrated tools and systems that support adaptive, context-aware, and distributed Visual Media Coding and Transmission Ahmet Kondoz © 2009 John Wiley & Sons, Ltd. ISBN: 978-0-470-74057-6

456 Visual Media Coding and Transmission Figure 11.1 Generic scenario illustrating the entities of a context aware service. Reproduced by Permission of Ó2008 IEEE applications that react to the characteristics and conditions of the usage environment and provide transparent access and delivery of content, where digital rights are adequately managed. To meet these challenges, the discussions in this chapter focus on describing a scalable platform for context-aware and DRM-enabled adaptation of multimedia content. The platform has a modular architecture to ensure scalability, and well-deﬁned interfaces based on open standards for interoperability as well as portability. The modules are classiﬁed into four categories, namely: (1) adaptation decision engine (ADE); (2) adaptation authorizer (AA); (3) context providers (CxPs); and (4) adaptation engine stacks (AESs), comprising adaptation engines (AEs) within. The platform makes use of ontologies during the adaptation decision- taking stage in order to enable semantic description of real-world situations. The decision- taking process is triggered by low-level contextual information and driven by rules provided by the ontologies. It supports a variety of adaptations, which can be dynamically conﬁgured. The overall objective of this platform is to enable the efﬁcient gathering and use of context information, in order to ultimately build content adaptation applications that maximize user satisfaction. This chapter is structured as follows: Section 11.2 presents an overview of the state of the art in context-aware content adaptation, focusing on context-aware applications, systems, and existing standards in the area. Section 11.3 details the perspective adopted in this chapter for generating and aggregating contextual information from diverse sources and proﬁling for generic multimedia communication and/or content access/distribution scenarios. Section 11.4 describes a selected application scenario for context-based adaptation of governed media content, identifying the scenario-based requirements in terms of contextual information. Section 11.5 presents the mechanisms that process the contextual information in order to

Context based Visual Media Content Adaptation 457 take decisions regarding possible reactive measures, and also discusses the use of ontologies in context-aware content adaptation, particularly in the adaptation decision process. These reactive measures request an appropriate adaptation of content to suit the conditions and constraints imposed by the context of usage in the speciﬁed application scenario. This section presents the DRM-based adaptation and authorization for an adaptation operation, while also providing a description of the use of the adaptation mechanism in the sequence of the decision process. It also describes user-centric content adaptation tools, which can be exploited as a form of AE to process the decision taken by the ADE with assistance from the DRM and authorization modules. Section 11.6 describes the interfaces required between the different modules of the content adaptation platform and the sequence of events that take place while performing DRM-based adaptation for a particular user or group of users in speciﬁc situations. Finally, Section 11.7 draws conclusions. 11.2 Overview of the State of the Art in Context-aware Content Adaptation This section provides a summary of recent developments and standardization efforts in context- aware content adaptation systems. It presents established concepts in context-aware com- puting, context models, and existing systems, architectures, and frameworks for context-aware content adaptation. It addresses aspects related to the acquisition of context, forms of processing contextual information, privacy aspects and protection of digital rights, and ﬁnally forms of reacting to context. 11.2.1 Recent Developments in Context-aware Systems Context-awareness can be deﬁned as the ability of systems, devices, or software programs to be aware of the characteristics and constraints of the environment and accordingly perform a number of actions/operations automatically to adapt to sensed environmental changes. The use of contextual information is instrumental in the successful implementation of useful and meaningful content adaptation operations. These, in turn, are becoming extremely important for the implementation of systems and platforms that deliver multimedia services to the end user. Content adaptation has in fact already gained a considerable importance in todays multimedia communications, and will certainly become an essential functionality of any service, application, or system in the near future. Continuing advances in technology will only emphasize the great heterogeneity that exists today in devices, systems, services, and applications. This will also bring out the desire in consumers for more choices, better quality, and more personalization options. However, to empower these systems to perform mean- ingful content adaptation operations that meet users expectations and satisfy their usage environment constraints, it is imperative that they use contextual information, and take decisions based on that information. This section provides an overview of recent develop- ments concerning the use of contextual information. Research on context-awareness started more than a decade ago [1]. However, it was only recently that this concept gained popularity within the multimedia research community and began to be addressed at the standardization level.

458 Visual Media Coding and Transmission 11.2.1.1 Concepts and Models Many different deﬁnitions can be found for context-aware applications, but most clearly relate them to adaptation operations. Accordingly, we can formulate the following deﬁnition of context, which agrees with most of the research work developed in the last years: Context aware applications are those having the ability to detect, interpret and react to aspects of a users preferences and environment characteristics, device capabilities, or network conditions by dynamically changing or adapting their behavior based on those aspects that describe the context of the application and the user. A more generic deﬁnition of context-aware applications can be used, which doesnt make any explicit reference to adaptation capabilities. Quoting [2]: A system is context aware if it uses context to provide relevant information and/or services to the user. Whereas this is sufﬁciently generic to include the previous deﬁnition or other possible ones, for the purpose of this chapter the ﬁrst deﬁnition is used. This is due to the fact that the objective is to outline the ways of using contextual information to assist and enhance adaptation operations in order to ultimately improve the quality of the user experience, while also satisfying restrictions imposed by DRM on the use of protected content. Accordingly, this chapter focuses on the use of context information to react to the characteristics and conditions of that context and trigger adequate adaptation operations. Early deﬁnitions of context were limited, or speciﬁc to a given application, as they were usually made by taking examples of the type of context information being used. Research work around context-aware services was focused in the mobile applications area, mainly on processing information regarding the location and type of device being used to receive and present the content. The work evolved, not only in the mobile communications area, but in others too. Other types of information collected through various sensors started to be used. One example is the area of human computer interfaces (HCI), where information regarding user genre and age, emotions, disabilities, and environmental conditions was used to adapt an applications interface to their particular usages. However, the above deﬁnition is more generic as it does not depend on the type of additional information being used to deliver the service, but rather on the effects that information may have on its behavior. [3] provides a good generic deﬁnition of context that can be applied no matter what the type of information being used or the application in use may be. This is probably the most quoted deﬁnition of context: Context is any information that can be used to characterize the situation of an entity. An entity is a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and application themselves. This deﬁnition implicitly states that any application using additional information with the capacity to condition the way the user interacts with the content is context-aware. Contextual information can be any kind of information that characterizes or provides additional information regarding any feature or condition of the complete delivery and

Context based Visual Media Content Adaptation 459 consumption environment. This diversity of information can be grouped into four main context classes: Computing, User, Physical, and Time. The ﬁrst group addresses the characteristics, capabilities, and conditions of the resources that can or must be used to provide the context- aware service. Although the word ‘‘computing’’ could be interpreted as referring exclusively to the terminal device, in fact it has a broader scope, as it refers to all resources involved in the delivery of the service: the terminal where the content is to be consumed, the network through which the content is to be delivered, and any additional service or engine that might provide added value to the multimedia service. Accordingly, we will rename this group ‘‘Resources’’. Below, some examples of the types of contextual information that fall into each of these four categories are provided: . Resources Context:Description of the terminal in terms of its hardware platform, including any property, such as processor, screen size, and network interface; description of the terminal in terms of its software platform, such as operating system, software multimedia codecs installed, and any other software application; description of the network, such as maximum capacity, instantaneously-available bandwidth, and losses; description of multi- media servers, for example in terms of maximum number of simultaneous users or maximum throughput; description of transcoding engines in terms of their hardware and software platforms, such as network interface, input/output formats allowed, and bit-rate range supported. . User Context:Description of the users general characteristics, such as gender, nationality, and age; description of the preferences of the user related to the consumption of content, such as type of media and language preferred; description of the preferences of the user in terms of their interests, related to the high-level semantics of the content, such as local news versus international news, or action movies versus comedy; description of the users emotions, such as anxious versus relaxed, or happy versus sad; description of the users status, such as online versus ofﬂine, or stationary versus walking; description of the history/log of actions performed by the user. . Physical Context:Description of the natural environment surrounding the user, such as lighting and sound conditions, temperature, and location. . Time Context:Indication of the time at which variations in the context occurred, and scheduling of future events. Besides these different types of contextual information, which together can describe any entity involved in the delivery and consumption of the multimedia content, another dimension of the contextual information should also be considered. This other dimension refers to the char- acteristics and nature of the contextual information itself, and not to the type of entity that it describes. In this dimension, the following aspects should be considered: . The accuracy or level of conﬁdence of the contextual information. . The period of validity: Contextual information may be static, thus having an unlimited validity period, or it may be dynamic and valid for only a given period of time. For example, general characteristics of the user are obviously static and do not require any additional information to be used; on the other hand, the users emotions will essentially be dynamic. Likewise, conditions of the natural environment such as lighting or background noise will also be dynamic.

460 Visual Media Coding and Transmission . The dependencies on other types of contextual information: Reasoning about one type of contextual information may depend on other types. Most of these different types of contextual information and their characteristics can be seen as low-level or basic, in the sense that they can be directly generated by some software or hardware appliance. Based on this basic contextual information, applications may formulate higher-level concepts. Of the several examples given above, clearly only some can be seen as being high-level. This is the case, for example, with some descriptions within the User Context, such as those concerning the emotions of the user and their physical state. This information can be inferred by analyzing low-level contextual information obtained by imaging and sound sensors. Location information belonging to the Physical Context can be low-level contextual information when expressed, for example, by geographic coordinates, or high-level context when referring to the type of physical space the user is in (e.g. indoor versus outdoor, train station, football stadium, etc.). Context-aware applications must thus initially acquire any contextual information, then process it, reason about it to formulate concepts, and take decisions on when and how to react. Furthermore, the acquisition of context, at least of some types of contextual information, must be a continuous (periodic or not) process, so that changes in the context may be perceived by the application. Of course, the process of reasoning about the basic contextual information will also be a continuous process, which will be conducted whenever changes in the basic contextual information are detected. The three phases are sometimes designated as sensing the low-level context, generating higher-level contextual information, and sensing the context changes. Considerable research work has been conducted worldwide on context-awareness in various areas. Until recently, one of the main challenges faced in context-aware computing was the lack of standardized models to represent, process, and manage contextual infor- mation. Using proprietary formats and data models does not allow different systems to interoperate, and thus seriously limits the use of context in distributed heterogeneous environments. Although recently standardization bodies such as the World Wide Web Consortium (W3C) and Motion Picture Experts Group (MPEG) have started working on speciﬁcations to represent and exchange context, the mechanisms that are provided for establishing relationships among contextual information and constraints are still very limited. They provide efﬁcient frameworks to develop moderately simple context-aware systems, but they do not address more complex and demanding context-aware applications, which in turn fail to provide the required support to represent additional information about context (as referred to above) or to reason about it. Ongoing major standardization efforts in the area of context representation are described in Section 11.2.2. In an attempt to overcome these limitations, researchers have been studying and proposing the use of ontologies to build context-aware systems. Ontologies provide the means to establish relationships between contextual information reasoning and formalizing of concepts. Thus, they enable the building of complex context models based on knowledge. Moreover, the use of ontologies based on open formats and with good declarative semantics can be the vehicle to achieve interoperability, enabling different systems to reason about the available information, using the part relevant to their contexts. The use of ontologies to provide efﬁcient and comprehensive models to build context-aware systems seems indeed to be one of the most important issues currently being addressed by researchers working in this area.

Context based Visual Media Content Adaptation 461 [4] discusses the suitability of using ontologies for modeling context-aware service plat- forms. It investigates the use of Semantic Web technologies [5] to formalize the properties and structure of context information in order to guarantee common understanding. The Context Ontology Language (CoOL) is an example of a context- oriented ontology approach developed recently. [6] presents an approach to context mediation by developing a model using the Web Ontology Language (OWL) [5]. The idea is to provide a common open model that can promote interoperability between systems using context information and operating in heterogeneous environments such as the Web. [7] proposes an extensible context ontology to build context-aware services and allow their interoperability in an ambient intelligence environment. Its proposal is to deﬁne a core ontology based on OWL and built around four major entities, among which the user assumes the central role. This ontology deﬁnes concepts and their relationships for each of these four entities, trying to address commonly-accepted basic needs of the majority of context-aware applications, while providing the ﬂexibility to scale to new requirements. [8] deﬁnes a context space and formulates a conceptual model for context-aware services. The idea is to build a methodology for designing context-aware services, providing guidelines on the selection of context information and adaptation alternatives. The model can also be used to assess the developed context-aware services. [9] has developed work for context-awareness applications in the area of pervasive computing. The research group at the College of Computing at Georgia Institute of Technology was involved in diverse projects concerning context-aware applications in different domains, including personal location-aware services and context-aware services for elderly people in the home environment. The motivation of the research in this group is the development of intelligent services for the elderly and/or physically-impaired. The initial focus of context- aware research was on the use of location information to build context-aware services. The work evolved to design a more generic platform to assist the implementation of context-aware applications. Its outcomes will be further cited in the next subsection, when describing the Context Toolkit. [10] describes a CoOL derived from its proposed approach to model context. It develops an aspect-scale-context (ASC) model where each aspect can have several scales to express some context information. The ASC model can be very interesting for implementing semantic- services discovery using context. 11.2.1.2 Architectures and Frameworks In parallel to the recent activity on the use of ontologies to develop meaningful and efﬁcient models to support the mediation of context, research groups have also engaged in the development of generic architectures and frameworks, in many cases also using the ontolo- gy-based model approach, to support the implementation of context-aware applications. [11] has investigated ﬂexible architectures using Web Services technologies to allow a dynamic extension of the type of context information without the need to introduce modiﬁca- tions to the systems providing the context-aware applications. In this work, a platform has been built where all the pre- and post-processing of the context information is performed by context plug-ins and context services, which communicate with the core system via Web Services. [11] deﬁnes its own eXtensible Markup Language (XML) schema to specify the syntax and

462 Visual Media Coding and Transmission semantics of the context information passed on to the platform. It does not contemplate the problem of the quality of the service, nor the use of feedback to assess the results. [12] has also investigated the impact of using context information on the core system for the delivery of services. This work proposes a ﬂexible architecture that can easily be extended to use different types of context information. It is developed within the framework of the Simple Environment for Context-Aware Systems (SECAS) project. This project aims at deﬁning ways of capturing context and passing it to the application, which uses it to perform adaptation operations that suit the context of usage. One of the main concerns is to evaluate the impact of using contextual information at the application level. The SECAS architecture is illustrated in Figure 11.2. SECAS uses context brokers in charge of collecting contextual information, a context provider module that encapsulates the contextual information in an XML document, and a context interpreter that maps low-level contextual parameters values to a high-level repre- sentation. While this architectural approach is interesting from the viewpoint of isolating the higher layers of the application from the modules that actually retrieve and process context, it Figure 11.2 The SECAS functional architecture [12]

Context based Visual Media Content Adaptation 463 requires each application or service to register itself in the context mediation platform, indicating what type of contextual information will be of value for it. This requires that applications are aware of the type of contextual information that can be generated and assumes that there are rather static relationships between context characteristics/conditions and the service being delivered. It somehow limits the ﬂexibility of the application to react differently according to various user proﬁles, or even according to the type of content being used by the service or the available adaptation mechanisms. Some of these limitations could be overcome by allowing applications to frequently update their registration parameters. However, this does not contribute to the performance of the system, and does not account for personalized adaptation. [13] and [14] report the work conducted by the Distributed Systems and Computer Networks (DistriNet) group of the Department of Computer Science at the Katholieke Universiteit Leuven, in consortium with other institutions, within the projects Context-Driven Adaptation of Mobile Services (CoDAMoS) and Creation of Smart Local City Services (CROSLOCIS). In the ﬁrst project, the group developed a middleware framework, designated context gatherer, interpreter, and transformer, using ontologies (CoGITO) [15] to allow the creation of context- aware mobile services in ambient intelligence environments, where it was assumed that a user was surrounded by other multimedia-enabled devices. Figure 11.3 shows a simpliﬁed view of the CoGITO architecture. This work focuses on the possibility of transfering services to nearby devices or replacing a component to save resources. The middleware includes a speciﬁc context-aware layer that detects changes in the context of the user and accordingly initiates service relocation and/or replacement of components. In CROSLOCIS, the main focus is on the development of a service Figure 11.3 Simpliﬁed representation of the CoGITO middleware

464 Visual Media Coding and Transmission Figure 11.4 The use of CoBrA for context aware applications [16] architecture that enables the creation of context-aware services, providing the mechanisms to collect, safely distribute, and use contextual information while respecting privacy issues. The Context Broker Architecture (CoBrA) project developed by the ebiquity group at the University of Maryland [16] is an agent-based architecture that supports context-aware systems in smart spaces. It uses the OWL to deﬁne a speciﬁc ontology for context representation and modeling, enabling the description of persons, intentions, and places. Figure 11.4 illustrates the use of CoBrA for the development of context-aware applications, showing its layered architecture. An intelligent agent, designated context broker, maintains a model of the context that can be shared by a community of agents, services, and devices present in the space. The system provides mechanisms to ensure privacy protection for users. The project particularly focuses on the intentions of the sensed users, and not on describing system or device characteristics and capabilities. The Context Toolkit [17] is a framework developed to assist the implementation of context- aware applications. It offers a distributed architecture that interconnects a series of software appliances, designated as context widgets. These software components work as plug-ins to contextual information sources, enabling applications to access different types of contextual information. They can be seen as services providing a uniform and seamless interface to various sensing devices, isolating the applications from the speciﬁcities of the context sensors. The services offered by the Context Toolkit and implemented through the widgets include access to network-related contextual information through a network application programming interface (API), and the storage of contextual information. Figure 11.5 presents the conceptual architecture of the Context Toolkit in terms of its functional components. The Service-Oriented Context-Aware Middleware (SOCAM) framework [19] deﬁnes a service-oriented architecture for building high-level context-aware applications using an

Context based Visual Media Content Adaptation 465 Figure 11.5 The functional blocks of the Context Toolkit [18] ontology-based reasoning based on W3C speciﬁcations. The SOCAM platform is based on four major modules: 1. a context provider 2. a context interpreter 3. a context database 4. a service discovery engine. Figure 11.6 presents the different components that form the SOCAM platform. Figure 11.6 Components of the SOCAM architecture [20]

466 Visual Media Coding and Transmission In SOCAM, the role of the CxP is to abstract the applications that process the contextual information from the devices that actually sense the low-level context. Applications can access a CxP service by querying the service discovery engine. The context interpreter conducts logic- reasoning to obtain a higher-level context, using a context ontology stored in the context database. Several other research groups worldwide have performed investigations on these topics and are still making efforts to build intelligent systems that produce contextual information and react to context changes while focusing on designing middleware layers that allow interopera- bility with multiple external sensing devices. Such is the case with the project Oxygen at the Massachusetts Institute of Technology (MIT) Media Labs [21], the project CoolTown at Hewlett Packard (HP) Research [22], and the International Business Machines Corporation (IBM) project BlueSpace [23]. Most early context-aware projects were developed within the mobile services application domain. They tried mostly to explore context to improve usability aspects by sensing how the available devices were being used. Generally, they reacted directly to the sensed low-level context. They usually lacked ﬂexibility as they did not make use of ontologies or made only a rather static and limited use of ontologies. Therefore, they did not explore the inter-relations among different types of low-level contextual information, and thus did not sufﬁciently address the aspects of interoperability, scalability, and security/privacy. In fact, earlier research was typically application-centric, overlooking aspects of gathering different types of contextual information from different spaces, and interoperability in heterogeneous environments. Likewise, aspects concerning the security of content and context metadata, and ensuring the privacy of the user, have only recently started to be addressed. The new generation of projects are now focusing on these aspects, relying on the use of standard ontology speciﬁcations, context representation frameworks, and middleware layers. Such is the case with the CoBrA and SOCAM projects. The current state of research in context-awareness can be summarized as follows: . The ﬁrst generation of context-aware systems was mainly concerned with improving usability aspects by directly using individual low-level or explicit contextual information, among which location was assumed to play a central role. . The second-generation systems focused on abstracting the application from the speci- ﬁcities of the process of sensing the low-level contextual information. They also explored the inter-relations between different types of low-level contextual infor- mation, incorporating functionalities to build higher-level or implicit context information. This generation focused on aspects of interoperability at the system level, and on the gathering and use of explicit contextual information from multiple sources. . The third-generation systems are now focusing on inferring more complex implicit contexts using ontologies that also take into consideration security and privacy issues. Interoperability continues to be a central concern. Accordingly, current research is also looking at context representation through the development of formal speciﬁcation frameworks. Common ontologies are also a vehicle for enabling interoperability from the semantic point of view. Different applications can use different sets of rules to reason about the same set of low-level contexts using the same ontology.

Context based Visual Media Content Adaptation 467 11.2.2 Standardization Efforts on Contextual Information for Content Adaptation This subsection describes the recent developments made in relevant international standardi- zation bodies, namely the W3C with its Composite Capability/Preferences Proﬁles (CC/PP) speciﬁcation, and the International Organization for Standardization/International Electro- technical Commission (ISO/IEC) MPEG with the MPEG-7 and MPEG-21 Digital Item Adaptation (DIA) standards. In addition to providing support for standardized basic context descriptors essential for the implementation of content adaptation operations, MPEG-21 DIA also provides the means to describe how those descriptors contribute to the authorization of adaptation operations (in Amendment 1). MPEG-21 is the standard adopted in this chapter for describing the content adaptation platform. As such, this subsection provides a succinct description of the most relevant DIA tools, indicating their functionality and scope of application. A description of the W3C standards is also included, due to the wide applicability of these standards and their considerable importance within the Semantic Web paradigm. We start by making reference to MPEG-7, notably its Part 5, Multimedia Description Schemes (MDS), since it is probably the most comprehensive multimedia description standard. It has a wide scope of applicability, in particular in content adaptation applications. MPEG-7 MDS provides support for the description of user preferences and usage history, as well as for adaptation tools [24]. Additionally, the MPEG-21 standard makes use of these kinds of MPEG- 7 description tools within the same scope. It should be emphasized that this subsection does not aim to provide a full description of the standards referred to. Instead, it aims at highlighting some speciﬁc features of each standard that can play an important role in the development of context-aware content adaptation systems. For a full description of each standard, the reader should refer to the ofﬁcial documentation and other publications given in the references section of this chapter. 11.2.2.1 MPEG-7 MPEG-7 [25,26] is a very comprehensive multimedia description standard that provides the mechanisms to describe information about the content (also referred to as contextual information of the content within the traditional content description environments, such as libraries) and about the information contained within the content. The ﬁrst class of descriptions includes, for example, the author of the content and its creation date. The second class includes in reality three different types of description capabilities: 1) low-level features of the content, such as color histogram, motion intensity, and shot boundary detections; 2) structural information related to the encoding parameters, such as coding scheme and aspect ratio format (also referred to as media characteristics); 3) high-level semantics describing the nature of the content (for example, ‘‘quiet open-air scene with bright blue sky and green grass ﬁelds’’). Within the scope of content adaptation applications, MPEG-7 plays an important role in describing information about the content, notably related to the format of the content, i.e. to the technical parameters with which the content is encoded, and to its media characteristics. This kind of description can also be used to describe the capabilities of the AEs, as in fact what is necessary to know is the range of encoding parameters in which each speciﬁc AE can work. Part 5 of the MPEG-7 standard (MPEG-7 MDS) [26] provides the framework for the description of

468 Visual Media Coding and Transmission these kinds of feature. For example, the possible variations that can be obtained from a given content can be described using the MPEG-7 variation description tool. Moreover, one of the attributes of this tool, namely the variation relationship attribute, is able to specify the type of operation that needs to be performed to obtain a speciﬁc variation. Types of operation speciﬁed include transcoding, which in turn may involve modiﬁcation of the spatio-temporal resolution; bit-rate color depth; transmoding or summarization; and so on [27 29]. When the type of adaptation is a transcoding operation, applications may use the tool transcoding hints, which provides guidelines on how to implement the transcoding operation, so that the quality may be preserved or low-delay and low-complexity requirements may be met. 11.2.2.2 MPEG-21 MPEG-21 [30,31] is the ISO/IEC standard, currently in its ﬁnal phase of development in MPEG. It focuses on the development of an extensive set of speciﬁcations, descriptions, and tools to facilitate the transaction of multimedia content in heterogeneous network environments. The standard is currently divided into 18 parts. The basic concepts of MPEG-21 are the User and the Digital Item (DI). The User is any type of actor that manipulates content, be it a person or a system (e.g. subscriber, producer, provider, network). A DI is the smallest unit of content for transaction and consumption, and at the conceptual level it can be seen as a package of multimedia resources related to a certain subject (e.g. MPEG-2 video stream, Moving Picture Experts Group Layer 3 Audio: audio ﬁle format/extension (MP3) ﬁle, a set of Joint Photo- graphic Experts Group (JPEG) pictures, text ﬁle, etc.), together with associated descriptions (e.g. rights descriptions, other context descriptors, MPEG-7 audio and video descriptors, etc.), including the respective digital item declaration (DID). Part 2 of the standard, namely Digital Item Declaration Language (DIDL), speciﬁes a standardized method based on XML schema for declaring the structure of the DI. Inside this XML document (i.e. the DID), standard MPEG-21 mechanisms are used to either convey resources and descriptions embedded directly in the DID or provide information on the location of the resources to be fetched. The DID provides an indication of the structure of the DI as well as of the relationships among its several constituents. A DI is thus used to create the concept of a package or single unit, around which a variety of multimedia information is bound by some common attribute. Figure 11.7 illustrates the concept of the MPEG-21 DI. Part 7 of the standard, namely DIA [32,33] provides a set of tools allowing the description of the characteristics and capabilities of networks, terminals, and environments, as well as of the preferences of users. In addition, this set of tools also provides the deﬁnition of the operations that can be performed upon the content and the results that can be expected. Figure 11.8 provides an illustration of the available tools in MPEG-21 DIA and its use for adaptation purposes. Among others, speciﬁc adaptation tools and descriptions of the MPEG-21 DIA standard, such as usage environment descriptor (UED), adaptation quality of service (AQoS), and universal constraints descriptor (UCD), deﬁne a set of descriptors and methodologies to describe the context of usage, the operations that can be performed upon the content, and the result that can be expected. Accordingly, these tools can be used to implement context-aware content adaptation systems. They can be used to analyze the current status of the consumption environment and decide upon the need to perform adaptation, including the type of adaptation to perform. The outcome of this process can be used to notify encoders or servers, receivers or

Context based Visual Media Content Adaptation 469 Figure 11.7 Example of an MPEG 21 digital item (DI) intermediate AEs, to adapt the stream to particular usage constraints and/or user preferences. The former can include, for example, the available network bandwidth and terminal display capabilities. The latter can indicate the removal of undesired objects from an MPEG-4 video scene or the ﬁltering out of some of the media components of a DI. In addition, MPEG-21 DIA provides the means to enable ﬁner-grained control over the operations that can be performed when interacting with the content (e.g. playing, modifying, or adapting DIs) in the form of declarative restrictions and conversion descriptions. Figure 11.8 Tools and concepts of MPEG 21 DIA. Reproduced by Permission of Ó2007 IEEE

470 Visual Media Coding and Transmission Figure 11.9 Concept of DIA Systems adopting this approach are able to produce different results according to the diverse usage environment constraints and/or user preferences/proﬁles upon the same query. The concept of the adaptation within MPEG-21 is illustrated in Figure 11.9. Adaptation of DIs involves both resource and descriptor adaptation. Various functions, such as temporal and spatial scaling, cropping, improving error-resilience, prioritization of parts of the content, and format conversion, can be assigned to the AE. Implementation of an AE has not been normatively deﬁned in the MPEG-21 standard, and therefore many technologies can be utilized. As illustrated in Figure 11.8, Part 7 of the MPEG-21 standard, DIA, speciﬁes the following set of eight tools to assist adaptation operations: . UED, usage environment description tools: To provide descriptions related to user char- acteristics, terminal capabilities, network characteristics, and natural environment characteristics. . AQoS, terminal and network quality of service tools: To provide descriptions of QoS constraints, the adaptation operations capable of meeting those constraints, and the effects of those adaptations upon the DI in terms of its quality. . UCD, universal constraints description tools: To allow descriptions of constraints on adaptation operations, and a control over the types of operation that are executed upon the content when interacting with it.

Context based Visual Media Content Adaptation 471 . BSDL (Bitstream Syntax Description Language) and gBS (generic Bitstream Syntax) tools: To provide the means to describe the high-level structure of bitstreams using XML and XML schemas. Their goal is to allow the manipulation of the bitstreams through the use of editing- style operations (e.g. data truncation) with a format-agnostic processor. Thus, they provide the means to manipulate the content at the bitstream syntax level. . Metadata adaptability tools: To provide information that can be used to reduce the complexity involved in adapting the metadata contained in a DI, allowing the ﬁltering and scaling of metadata as well as the integration of XML instances. . Session mobility tools: To provide the means to transfer state information, regarding the consumption of a DI, from one device to another. . DIA conﬁguration tools: To carry information required for the conﬁguration of DI AEs. Among the above tools, the ﬁrst three provide the core support for building context-aware content adaptation systems by describing both the characteristics of the usage environment, as well as of the operations that can be performed on the multimedia content, and the expected results. Their use enables a decision mechanism to select the adaptation operation that satisﬁes the constraints of the consumption environment while maximizing a given parameter or utility. This utility is most often realized as the quality of the service or the quality of the user experience. These tools are brieﬂy described below. Furthermore, a description of the BSDL and gBS tools is also included, as they provide support for the implementation of adaptation operations that are independent of the format of the resources to be adapted. As such, in spite of the fact that they are particularly suited for use with scalable formats, they can also be used in a great variety of situations. UED Tools UED tools allow the description of characteristics of the environment in which the content is consumed, notably the capabilities of the terminal, the characteristics of the network, and information regarding the user and their surrounding natural environment. 1. Terminal capabilities:Refers to the capabilities of the terminal where the content is to be consumed, in terms of the types of encoded format that are supported (i.e. ‘‘codec capabilities’’), the display and audio output device characteristics and several input devices (i.e. ‘‘input output (I/O) capabilities’’), and ﬁnally physical characteristics of the terminal, such as the processing power, amount of available storage and memory, and data I/O capabilities (i.e. ‘‘device properties’’). 2. Network characteristics:Includes a description of the static attributes of the network, such as maximum channel capacity (i.e. ‘‘network capabilities’’), and a description of parameters of the network that may vary dynamically, such as instantaneous available bandwidth, error rate, and delay (i.e. ‘‘network conditions’’). The former tools are used to assist the selection of the optimum operation point during the setup of the connection, whereas the latter ones are used to monitor the state of the service and update the initial setup accordingly. 3. User characteristics:There are four different subsets of descriptions that fall into this category, as follows: (i) A generic group of descriptions using MPEG-7 description schemes (DS), more precisely the ‘‘user agent’’ DS. This group provides information on the user themselves, as well as indications of general user preferences and usage history.

472 Visual Media Coding and Transmission (ii) A second subset of descriptions, which is related to the preferences of the user regarding the way the audiovisual (AV) information is presented, such as audio power and equalizer settings, or video color temperature, brightness and contrast, and so on. (iii) A third subset of descriptions, which provide indications regarding auditory and visual impairments of the user, such as hearing frequency thresholds or deﬁciencies in the perception of colors. (iv) A ﬁnal subset of descriptions related to the instantaneous mobility and destination of the user. 4. Natural environment characteristics: Provides information regarding the characteristics of the natural environment surrounding the user who wishes to consume the content. Using MPEG-7 DSs, this provides information regarding the location and the time at which the content is consumed, and information describing AV attributes of the usage environment, such as noise levels experienced in the surrounding environment or the light/illumination conditions. Figures 11.10 and 11.11 show example excerpts of UED ﬁles, describing characteristics of a terminal and the conditions of a network, respectively. AQoS and UCD Tools The main purpose of adapting the content is to provide the user with the best possible experience in terms of perceived quality. The perceived quality is greatly conditioned by the Figure 11.10 Excerpt of a terminal UED

Context based Visual Media Content Adaptation 473 Figure 11.11 Excerpt of a network UED instantaneous availability of network and terminal resources, despite also being highly subjective to the user and a number of factors pertaining to their surrounding environment. To cope with varying network and terminal conditions, it is possible to envisage a number of different reactive measures or operations to perform upon the content in most cases. The MPEG-21 DIA standard provides a mechanism that enables decisions on which of those reactive measures to endorse in a given situation, while maximizing the user experience in terms of perceived quality. Through the inclusion of descriptors that indicate the expected result in terms of quality for a set of different operations performed upon the content under certain context usage constraints, ADEs can obtain the best possible operation point and the correspondent transformation to perform. This is accomplished through the AQoS and UCD description tools. The former provides the indication of different sets of encoding parameters and the resulting quality of the encoded bitstream for each of those sets. The latter enables the transformation of that information, together with the information about the current conditions of the usage context conveyed as UED, into the form of restrictions that can further be used by the ADEs. Bitstream Syntax Description (BSD) Tools Within large-scale networked video communications, when some kind of adaptation to the content needs to be performed due to a quality drop, it may prove to be quite difﬁcult to gain

474 Visual Media Coding and Transmission access to transcoders or media processors that are able to actually understand the speciﬁca- tions of a particular encoding scheme in order to process and adapt the original encoded bitstream. Quite often, it would be desirable to alter the characteristics of the video bitstream along the transmission chain, but only at speciﬁc points so as not to interfere with users sharing the same content who are happy with the level of service quality that they are experiencing. This requires the availability of content adaptation gateways placed along the distribution chain, which have the ability to process speciﬁc encoding formats. Given the large number of existing encoding formats and their variants, it may prove to be cumbersome to have such media gateways. In addition, the solution is not easily expand- able or scalable towards new encoding formats. One much better approach is to enable the processing of encoded bitstreams without the need to actually understand their speciﬁc encoding syntaxes, i.e. in a format-agnostic way. BSD tools of the MPEG-21 DIA provide the means to convey a description of the encoded bitstream using XML, and to perform the transformation of this description in the XML space. This can be done, for example, through the use of eXtensible Stylesheet Language Transfor- mation (XSLT). The adapted bitstream is then generated from the original bitstream but using the modiﬁed description. Figure 11.12 illustrates this process. In order to be able to generate universally-understandable descriptions of speciﬁc encoding schemes, the standard has developed an XML-based language, the BSDL. To generate generic descriptions of binary sources independently of the encoding format, the standard has speciﬁed the schema gBS. The description of the syntax of the speciﬁc encoding of a bitstream is generated at the encoder side as a BS schema. The schema either travels together with the bitstream or is requested when needed. It essentially provides a high-level description of the structure or organization of the bitstream. Therefore, it only allows for simple adaptation operations, such as ﬁltering, truncation, and removal of data. Nonetheless, more advanced forms are possible when using the gBS schema. This tool enables the use of semantic labels and the establishment of associations between those labels and syntactical elements being described. In addition, it provides the means to describe the bitstream in a hierarchical way. BSD tools can be used with any kind of encoding format, but are extremely interesting when used with scalable encoding formats, as they may enable very simple operation. For example, when the constraints of the environment impose a reduction on the bit rate, having a description of the syntax of the scalable bitstream greatly simpliﬁes the role of an AE in identifying the layer(s) to be dropped. Figure 11.12 Editing like content adaptation using BSDL and gBS

Context based Visual Media Content Adaptation 475 Worldwide research groups are investigating the use of the tools speciﬁed in the MPEG-21 standard to develop efﬁcient AEs and improve the proposed approaches [24,34 43]. DANAE (Dynamic and distributed Adaptation of scalable multimedia coNtent in a context- Aware Environment) [39,40], an Information Society Technologies (IST) European Commis- sion (EC) co-funded project, addressed the dynamic and distributed adaptation of scalable multimedia content in a context-aware environment. The objective of DANAE was to specify and develop an advanced MPEG-21 infrastructure for such an environment, so as to permit ﬂexible and dynamic media content adaptation, delivery, and consumption. The goal was to enable end-to-end quality of multimedia services at a minimal cost for the end user, with the integration of features and tools available from MPEG-4 and MPEG-7, and DRM support in the adaptation chain. The system focused on DIA and digital item processing (DIP speciﬁed in Part 10 of the MPEG-21 standard), and contributed to the MPEG-21 standardization efforts. DANAE supports the application of DIP for DIA to adapt DIs in a globally-optimized and semantically-aware way in order to allow a dynamic change in the usage context, as well as to enable the adaptation of live content, such that the content may be delivered while still being created. The ENTHRONE project [41,42], another IST project co-funded by the EC under Frame- work Programme (FP) 6, has developed a content mediation and monitoring platform based on the MPEG-21 speciﬁcations and distributed technologies. The ENTHRONE platform features a core system designated ‘‘ENTHRONE Integrated Management Supervisor (EIMS)’’, which maintains distributed databases and repositories of MPEG-21 DIs, and enables customized access to them by multiple users through heterogeneous environments and networks. The EIMS incorporates an ADE that uses MPEG-21 DIA tools, notably UED, UCD, and AQoS. The ENTHRONE ADE transforms the UEDs that it receives into UCDs. Different UEDs are used: one to express the capabilities of terminals and the user preferences generated at the consumption peer; another to express the capabilities of available AEs; and a third to express the condition of the network. Thus, the ADE transforms the characteristics of the current usage environment into a constraint representation. It then uses AQoS to describe the operations that can be performed upon the content (i.e. the possible variations of the content through their characteristics, such as frame rate, aspect ratio, and bit rate) and the result in terms of quality that can be obtained with each variation. Although the project provides a complete framework for the generation and management of, and customized access to, multimedia content using the MPEG-21 speciﬁcations, the work of ENTHRONE in content adaptation to date has been limited to altering the encoding parameters of each media resource. The Department of Information Technology (ITEC) at the Klagenfurt University is actively contributing to the development of the MPEG-21 speciﬁcations, in particular those that aim to assist the context-aware adaptation of content. This research group has implemented MPEG-21 applications, such as multimedia adaptation tools (e.g. gBSDtoBin, BSDLink Webeditor) and the ViTooKi operating system, which provides the support for describing terminal capabilities and user preferences using MPEG-21 [43]. The Catalan Integrated Project aims to develop an advanced Internet environment based on a universal multimedia access (UMA) [33,44] concept using MPEG-21 standard tools. The project intends to employ the MPEG-21 and MPEG-7 standards extensively for the adaptation decision-taking procedures, which will select the best adaptation of a speciﬁc content, taking into account network characteristics, terminal capabilities, and the state of the AV content transcoding servers.

476 Visual Media Coding and Transmission 11.3 Other Standardization Efforts by the IETF and W3C In addition to the tools and descriptions speciﬁed in MPEG-21, other standardization bodies have also conducted work towards the support of context-aware applications and the assistance of adaptation operations. Such standardization organizations include the W3C and the Internet Engineering Task Force (IETF), both of which have delivered related speciﬁcations for a few recent years. Content adaptation at the network edges is already being performed by content distributed mediators in order to balance the network load or in an attempt to satisfy different user groups proﬁles and redirect requests based on geography and/or other proﬁle characteristics. This is being achieved through the use of content distribution networks (CDNs), peer-to-peer (P2P) technologies, and personalization techniques [34]. In the mobile networking world, content ﬁltering services are being added to most caches. Wireless network proxies transform both protocol and Web content, converting HyperText Markup Language (HTML) into Website Meta Language (WML) for small-screen displays and HyperText Transfer Protocol (HTTP) into Wireless Datagram Protocol (WDP) for wireless delivery purposes. Adaptation of content, aiming at increased penetration, is already happen- ing, in particular in the mobile world. However, it still offers somewhat limited functionality, as it essentially accounts for bandwidth restrictions and limited device capabilities. The focus is still on the adaptation of the layout of the service rather than on the content itself. The same is happening in the TV broadcasting world, aimed at the adaptation of interactive applications and Web-based content to multiple non-interoperable platforms. The IETF has produced the Internet Content Adaptation Protocol (iCAP) speciﬁcation, which is basically an HTTP-based remote procedure call protocol that allows clients to send HTTP messages to instruct iCAP servers, or generic Web servers to perform some kind of operation on the content. iCAP servers are dedicated to performing speciﬁc tasks, and thus are expected to perform such tasks better or more efﬁciently than the generic Web servers. The targeted adaptation operations were deﬁned from a business perspective, and are seen as value- added services to the client. Examples of the envisaged adaptation operations are virus scanning, advertisement insertion, and also some forms of content translation or ﬁltering. The Mobile Alliance Forum has standardized the User Agent Proﬁle (UAProf) speciﬁca- tion [45], which provides an open vocabulary for Wireless Access Protocol (WAP) clients to communicate their capabilities to the servers. It deﬁnes the data structure to describe client devices and to transport that information to the servers. This information may include hardware characteristics, such as screen size or type of keyboard; software characteristics, such as browser manufacturer; and also user preferences (e.g. sound enabled, color choices, etc.). The idea is to empower the service provider with information that may assist them in customizing the service to the end-user needs to a certain extent. The UAProf vocabulary has six components: . HardwarePlatform: Characteristics of the hardware of the terminal, including the type of device, model, display size, and memory. . SoftwarePlatform: Characteristics of the operating environment of the device. . NetworkCharacteristics: Information about the network infrastructure, such as bearer information. . BrowserUA: Identiﬁcation of the browser application available on the device.

Context based Visual Media Content Adaptation 477 . WapCharacteristicslists: The WAP capabilities of the terminal. . PushCharacteristics: Push speciﬁcations of the device, such as maximum size of a push message. The Resource Description Framework (RDF) [46] and OWL [5] are speciﬁcations of the W3C that constitute the general description frameworks suitable for expressing the context. RDF is a generic speciﬁcation for representing metadata, in particular metadata related to resources on the Web,1 being strongly associated with the Semantic Web or Web 2.0 concepts. RDF can be represented in one of three different formats, as follows: 1. XML syntax, which is the format commonly used to exchange RDF data between machines. 2. Triple representation, consisting of sets of (Subject, Predicate, Object) values. 3. Graph representation. Figure 11.13 provides examples of the same metadata represented in RDF in each of its three possible formats. In this ﬁgure, the metadata is a simple sentence written in English, which is then represented using RDF in: i) its triplet format; ii) the XML format; and ﬁnally iii) the graph format. The W3C has also delivered the CC/PP speciﬁcation [47], which deﬁnes an RDF-based framework for describing device capabilities and user preferences. It provides the means to specify client capabilities (i.e. the ‘‘user agent’’ information) and user preferences using uniform resource identiﬁers (URIs) and RDF text sent in HTTP requests. The user agent speciﬁes the preferences of the user in the header of the client HTTP request, such as versions of content or languages, and is empowered with negotiation capabilities. Figure 11.13 The three different formats for RDF representation 1 ‘‘The Resource Description Framework (RDF) is a language for representing information about resources in the World Wide Web. It is particularly intended for representing metadata about Web resources, such as the title, author and modiﬁcation date of a Web page’’, from the RDF Primer [45].

478 Visual Media Coding and Transmission Although the CC/PP speciﬁcation was originally developed to be used mainly on wireless devices, it can be applied to any Web-enabled terminal. It uses RDF in its XML serialized format to exchange proﬁles between devices with information on the user agents capabilities and the users preferences. A CC/PP proﬁle can be seen as a two-level tree containing components and attributes of those components. Components can be the hardware or the software platforms of the terminal, or any speciﬁc application running on top of those platforms. Different components may appear in one CC/PP proﬁle. A reference to the schema that indicates the types of component that appear in that proﬁle as well as the rules con- cerning the attributes of those components is included in the proﬁle, so that the recipient may correctly interpret and use the information contained. Since the CC/PP speciﬁcation uses RDF, its proﬁles are composed of sets of (Subject, Predicate, Object). The components (the Subject) have named attributes (the Predicate) and values for those attributes (the Object). The most common components in CC/PP refer to the hardware and software platforms of the terminal device, and thus are related to the capabilities of the terminal. However, there are other types of component that can be described in a CC/PP proﬁle, such as the location. CC/PP uses a vocabulary to deﬁne the format and language for specifying the names and values of components as well as their attributes. However, different CC/PP proﬁles or applications may use different vocabularies. This is a particular feature of CC/PP that allows different applications to deﬁne and use particular vocabularies that suit their particular needs. One of the vocabularies that has been traditionally used in CC/PP proﬁles is the UAProf in the mobile world. For other application areas, and in general terms, the W3C recommends the use of RDF to deﬁne vocabularies. In Figure 11.14, an example of a CC/PP is illustrated for a hypothetical personal digital assistant (PDA). The W3C is quite active in the ﬁeld of context-aware applications, and as such is contributing to the enabling of content adaptation. Many of the current speciﬁcations and approaches being studied or developed towards the implementation of content adaptation operations rely on the use of W3C technologies. Butler provides in [48] a survey of technologies that can be used to enable devices to communicate their capabilities to the providers of services, enabling them to customize content for use. This work is focused on the use of W3C and Mobile Alliance technologies for use in mobile environments. The work also concentrates on designing a system that enables mobile clients to negotiate the format of the content delivered to them through the use of the CC/PP and UAProf speciﬁcations [49]. Furthermore, Gilbert and Butler [50] deﬁned a data model to allow the use of proﬁles with multiple vocabularies when using the CC/PP speciﬁcation to allow mobile devices to communicate their capabilities to servers. In spite of its wide applicability, the CC/PP model presents some limitations when addressing more complex context-aware scenarios. Although in principle any kind of contextual information could be described using CC/PP as long as that context information could be described using RDF, it does not provide the mechanisms to carry additional information, such as temporal information or resolution of the contextual information. However, the most important limitations may come from the fact that CC/PP components and attributes (or subtypes of them) have a limited set of values and a restricted syntax. For example, CC/PP does not provide any support for expressing relationships or constraints. It also has some limitations regarding the type of information.

Context based Visual Media Content Adaptation 479 Figure 11.14 Example of a CC/PP 11.4 Summary of Standardization Activities This section provides an overview of major standardization efforts that have emerged in recent years and can be generically applied to the implementation of services and tools that contribute to universal access to multimedia content to realize UMA. Context-aware content adaptation systems provide the means to implement UMA, and can thus be built based on the technologies described. Having carefully analyzed these standards, it seems fair to say that the MPEG-21 standard is the most complete one. This is easily explained by the fact that the scope of MPEG- 21 is much more than ‘‘just’’ adaptation or ‘‘just’’ context description, as it: . Deﬁnes a complete infrastructure for packaging all the intervenient elements in a compact way that contributes to the augmented access and use of multimedia content, such as content and context descriptions, content representation and composition, management of intellec- tual property rights (IPRs) and digital rights, enforcement of digital rights and coarse control content consumption, and so on. . Provides a comprehensive framework for the use of context information, and accordingly mediates the optimum access to content while also deciding the need for adaptation. . Has a scope of adaptation aiming at all types of content, with its tools/descriptions targeting all kinds of adaptation.

480 Visual Media Coding and Transmission . Makes use of a comprehensive set of descriptions (also through the use of MPEG-7) that are able to characterize virtually any of the entities usually cooperating for the delivery of the adapted multimedia service. . Is ﬂexible enough to scale to new requirements as the technology progresses. MPEG-21 provides the means to bridge the gap between being able to understand and use context and evaluating the quality of the result achieved by the adaptation mechanisms. The tool AQoS, from the set of DIA tools, is a step in this direction. Nevertheless, more work is needed to actually evaluate the results of adaptation decision-taking and of the adaptation itself in terms of user expectations and degree of fulﬁllment of the satisfaction of user experience. Likewise, further investigation is needed to fully understand context and how to optimally use it, especially in what concerns the generation of implicit or higher-level context. In this particular aspect, it seems of utmost interest to try to merge the work of the research community engaged in the use of ontologies to build contextual models, as described in Section 11.2.1. Nevertheless, it should be emphasized that the W3C speciﬁcations are very important not only due to their ﬂexibility, but also because of their wide acceptance and already extensive usage. Thus, cooperation between the two standardization activities in MPEG and W3C would enable the former to beneﬁt from the larger market penetration, as well as from the use of Web 2.0 technologies, and the latter from the extensive knowledge gained on multimedia processing. 11.4.1 Integrating Digital Rights Management (DRM) with Adaptation (Portion reprinted, with permission, from M.T. Andrade, H. Kodikara Arachchi, S. Nasir, S. Dogan, H. Uzuner, A.M. Kondoz, J. Delgado, E. Rodriguez, A. Carreras, T. Masterton and R. Craddock, ‘‘Using context to assist the adaptation of protected multimedia content in virtual collaboration applications’’, roc. 3rd IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom2007), New York, NY, USA, 12 15 November 2007. Ó2007 IEEE A wide variety of research work has been conducted on DRM [51 53] and adaptation [54 56] to date. However, such work has essentially been carried out independently, without any signiﬁcant exchange of information between the different groups addressing each of the two topics. Nonetheless, adaptation is an operation to be performed upon the content. Accordingly, and as long as the content is governed or protected, the content adaptation operation should also be subjected to certain rules and rights. Therefore, it is inevitable that these two separate communities will cross the boarders eventually and start to work together. 11.4.2 Existing DRM Initiatives We consider DRM as any of the several technologies used by copyright owners to control access and usage of digital data and hardware, handling usage restrictions associated with a speciﬁc instance of a piece of digital work. The most signiﬁcant initiatives trying to standardize open DRM systems (which guarantee interoperability) are MPEG-21 [57], Open Mobile

Pages:

Willington Island

Visual Media Coding and Transmission

Like this book? You can publish your book online for free in a few minutes!

Create your own flipbook

TOP SEARCH

business design fashion music health life sports home marketing children

Visual Media Coding and Transmission

Read the Text Version

Willington Island

TOP SEARCH

RELATED PUBLICATIONS