|
|||||||||||||||||
Special Feature: Quality of Experience (QoE) Design and Management for Audiovisual Communication Services Media-layer Objective Video Quality Assessment Technology for Video Communication Services (ITU-T J.247)AbstractWe summarize the media-layer objective video quality assessment technology standardized as ITU-T (International Telecommunication Union, Telecommunication Standardization Sector) Recommendation J.247. This technology objectively estimates with good precision the user’s quality of experience (QoE) of video distorted by encoding and packet loss in IP-based video communication services (IP: Internet protocol).
1. BackgroundBroadband services and video communication services for personal computer and cell-phone users have been expanding rapidly. Providing such services to customers at an appropriate level of quality requires quality assessment technology that can accurately measure the quality of experience (QoE) of the video communication services. To efficiently design and manage services taking service quality into consideration, we need a video quality assessment technology that enables automatic assessment of this quality. The video quality assessment technology for assessing the coding distortion of MPEG-2 (MPEG: Motion Picture Experts Group) encoding on standard television (SDTV) signals has been standardized as ITU-T (International Telecommunication Union, Telecommunication Standardization Sector) Recommendation J.144 [1]. This conventional technology, however, cannot assess the effects of the diverse coding systems and bit rates used in video communication services and cannot assess video degraded by packet loss in IP (Internet protocol) networks. To solve this problem, the Video Quality Experts Group (VQEG), an international study group in ITU consisting of video quality researchers, conducted a technical examination [2]. As a result, four systems, including an NTT method, were adopted as an international standard called ITU-T Recommendation J.247 in August 2008 [3]. In this article, first we explain the technical features, target applications, and some application examples of the J.247 standardized technology. Then, we describe the J.247 international standardization algorithm (NTT method) and its quality estimation accuracy. Finally, we mention future developments. 2. Summary of J.2472.1 OverviewRecommendation J.247 describes a media-layer objective video quality assessment technology that estimates the video quality of video watched by customers from pixel information. Specifically, it quantifies quality by comparing the pixel information of reference and degraded videos. As such, it is a full-reference-type objective quality assessment technology (Fig. 1) .
The flow of a video delivery service (a video communication service as an example) is shown in Fig. 1 from left to right. First, the video content to be delivered is encoded to compress the amount of information to be transmitted. Second, the compressed video data is delivered over the network from the delivery server. The customer receives the data at his or her location, decodes it, and finally watches the video. Here, the reference video is the content before encoding, and the degraded video is the video either just after encoding or just after decoding. The video quality assessment method compares the pixel information between the reference and degraded videos, and it estimates with good accuracy the user’s QoE taking into consideration human visual characteristics. This technology lets us assess the effects of the diverse coding systems and bit rates used in video communication services. We can assess the quality of video distorted by packet loss in IP networks. 2.2 Target applicationsThe target applications of this technology are video delivery services for personal computers, smartphones, and other devices and videophone and videoconferencing services. Details of the application domain are given in Table 1. The target video resolutions are QCIF (176 × 144 pixels), CIF (352 × 288 pixels), and VGA (640 × 480 pixels). The target video codecs are almost all the main video codecs used in actual video delivery services, such as H.264/AVC, MPEG-4, Windows Media, and RealVideo. The types of video distortion caused by packet loss and affected by the various codec types, bit rates, and frame rates were selected taking into consideration their variations in actual services.
The applications of this technology include in-service quality monitoring at the head end, remote destination quality monitoring when a copy of the source is available, quality verification of archived video, and codec performance comparisons. If we monitor the coding quality in real time at the time of encoding, we can quickly see any problems in the encoding process. When this technology is applied to these applications, it provides the following benefits. (1) Reduces personnel expenses incurred by service providers by automating the pre-delivery content quality check that is currently performed visually. (2) Raises customer satisfaction through speedy troubleshooting and responses to customer complaints. (3) Reduces the extent of quality degradation by monitoring and managing the quality experienced by customers in terms of customer sensations. 3. NTT algorithm in J.247The J.247 international standardization algorithm (NTT method) is shown in Fig. 2. Specifically, this method assesses subjective quality influenced by video distortion through the following steps.
Step 1: Temporal/spatial alignment process between the reference and distorted videos This step matches the pixels and frames of the reference and degraded videos so they can be compared appropriately. Unless all pairs of pixels in the reference and degraded videos are aligned correctly, a pixel-wise full-reference objective video assessment method cannot properly estimate subjective video quality in the following estimation process. First, macro-alignment is performed. This process consists of temporal/spatial alignment, noise removal, and gain/offset alignment. Temporal/spatial alignment is performed once per pair of video clips, i.e., the reference and degraded videos, to align all the pixels in the spatial and temporal directions. Noise removal removes the influence of high-frequency noise in the degraded video that is imperceptible to humans. Gain/offset alignment matches the pixel value distribution of the reference video with that of the degraded video. This degradation is due to the color arrangement in a decoder or a player (including a video board) that receives the video. Second, micro-alignment is performed to match the frames between the reference and degraded videos taking into consideration the influence of video frame skipping and freezing. Step 2: Coding quality estimation model This step derives three characteristic parameters related to encoding distortion [4]. (1) Overall distortion that occurs throughout all the frames is derived by calculating the luminance difference between the reference and degraded videos. (2) Distortion in the form of block distortion is derived by calculating the ratio between horizontal and vertical edges and other edges. (3) Distortion in the form of motion blur is derived by calculating the frame-to-frame luminance differences expressed for each 8 × 8-pixel block between the reference and degraded videos. Step 3: Packet-loss-related degradation estimation model This step derives two additional parameters related to video distortion caused by packet loss [5]. (4) Local block distortion that occurs locally in specific frames is derived by calculating the degree of temporal variation of the local block distortions in all the frames when frame-to-frame luminance differences between the reference and degraded videos are large. (5) Distortion in the form of freeze distortion and variance of the frame rate is derived by calculating the weighted duration of time in which the same image is displayed while the reference image changes. Step 4: Overall quality estimation This step estimates the total effect of quality degradation on subjective quality by calculating the weighted sum of the five characteristic parameters derived in steps 2 and 3. 4. Quality estimation accuracyHere, we show one verification example of the video quality estimation accuracy of the NTT model. We assessed degraded videos encoded by two encoding methods by using eight different video scenes with VGA resolution that were not used in optimizing the model. The experimental parameters were bit rate, frame rate, and packet-loss ratio. We compared the subjective assessment values derived in the subjective experiment with the objective assessment values. Subjective assessment was performed using the 5-grade ACR-HR (absolute category rating with hidden reference) method* with 24 subjects [6]. The results estimated by the conventional method, which uses the peak signal noise ratio as an objective index of coding quality, and by the NTT method are shown in Figs. 3 and 4, respectively. The NTT method estimated subjective quality more accurately than the conventional method. Both Figs. 3 and 4 show the estimated quality of every video individually. In some cases, maximizing the average subjective quality of multiple videos is important, e.g., in optimizing the parameters of the codec. Therefore, we averaged the assessment values of the eight video scenes per experiment condition. The results are shown in Fig. 5. The correlation coefficient between the subjective and objective assessment values is 0.94.
5. Future developmentNTT Service Integration Laboratories intends to expand the scope of the NTT method to high-definition television (HDTV) videos and get it standardized. In addition, we intend to contribute to the implementation of quality monitoring systems for video delivery services in the ubiquitous-broadband era as well as the implementation of these technologies in quality estimation and monitoring devices. References
|