Feature Articles: QoE Estimation Technologies
Monitoring the Quality of IPTV Services
This article introduces a quality-monitoring model that can be used to estimate end-users' quality of experience of Internet protocol television services using packet header information. This model can be used for in-service quality monitoring.
Internet protocol television (IPTV) services are now widely provided. The quality of experience (QoE) of IPTV services is affected by the audiovisual content, encoding and decoding techniques, network performance, and display technology. It is therefore important to ensure that end-users receive high-quality IPTV content. The ideal way to do that would be for the service provider to identify the quality degradation factors, monitor QoE by taking the main quality degradation factors into account, and finally, to conduct a thorough investigation to determine the cause of quality degradation and then address the problems.
2. Activities of ITU-T SG12 in quality monitoring
An IPTV processing chain is shown in Fig. 1. This processing chain consists of video acquisition and editing, encoding, network transmission, decoding, and display, as mentioned in section 1. All of these elements can affect end-users’ QoE. ITU-T (International Telecommunication Union, Telecommunication Standardization Sector) Recommendation G.1081  defines performance monitoring points for IPTV services that will enable the service provider and/or network operator to monitor the performance of the entire IPTV service delivery process. There are five points in the processing chain where performance quality is monitored: the source media and metadata are monitored at point 1; encoded and packetized source media are monitored at point 2; packet transmission characteristics are monitored at point 3; received packets are monitored at point 4 to determine whether they can adequately provide the required QoE at the client terminal, and the displayed media are monitored at the client terminal at point 5. If all the information obtained at these monitoring points is integrated, the locations where quality has degraded can be determined.
ITU-T Recommendation G.1081 does not define how objective quality assessment models should be applied at each performance monitoring point. Monitoring the QoE at the head end is important because quality degradation at points 1 and 2 influences the QoE of all users. In these cases, full reference (FR) media-layer models are suitable for monitoring quality. At points 3 and 4, it is preferable to analyze packets using a method with low computational load because the analysis needs to be implemented in terminals such as mobile terminals, home gateways, and set-top boxes (STBs). In these cases, packet-layer models are suitable for estimating QoE from IP packet header information. At point 5, it is essential to monitor the QoE by using the no-reference (NR) media-layer model, which estimates QoE from media signals received at the client terminal. Because FR media-layer models – that can be applied to head-end QoE monitoring have already been standardized, NTT has focused on developing packet-layer models that can be used for end-user QoE monitoring using packet header information . Application areas can be categorized into lower resolution (LR: quarter common intermediate format (QCIF), quarter video graphics array (QVGA), and half VGA (HVGA)) and higher resolution (HR: standard definition (SD) and high definition (HD)) areas. The LR application area can be used for mobile IPTV services, and the HR application area can be used for STB based IPTV services. In 2006, ITU-T study group 12 (SG12) also launched a project that aims to develop quality monitoring models for LR and HR application areas. In October 2012, ITU-T standardized two Recommendations; ITU-T Recommendation P.1201.1  can be used for the LR application area, and P.1201.2  can be used for the HR application area. This article introduces ITU-T Recommendation P.1201.1 as it is most relevant because of the recent rapid growth of Mobile IPTV services. The P.1201.1 model was developed by integrating the NTT and Huawei models and was verified in an ITU-T performance evaluation contest.
3. P.1201.1 model
A block diagram of the P.1201.1 model is shown in Fig. 2. This model consists of three modules: parameter-extraction, parameter-calculation, and quality-estimation modules as follows.
1) Parameter-extraction module
This module extracts a Real-time Transport Protocol (RTP) timestamp, sequence number, and marker bit from RTP headers and the rebuffering starting time and length from the client terminal.
2) Parameter-calculation module
This module calculates audio- and video-related quality parameters (e.g., coding bit rate and packet loss) using parameters extracted by the parameter-extraction module.
3) Quality-estimation module
This module estimates audio, video, and audiovisual quality using quality parameters calculated by the parameter-calculation module. Additionally, each module has a sub-module, as shown in Fig. 2.
This article summarizes the quality degradation factors processed by the quality-estimation modules. The details of the parameter extraction and calculation modules are not explained here, as they are beyond the scope of this article.
3.1 Audio quality estimation module
Audio quality is affected by the codec (coder-decoder) type, coding bit rate, packet loss, and rebuffering, so it is necessary to model the relationship between the following quality factors and the subjective audio quality.
- Effect of audio codec (i.e., AMR-NB (adaptive multi-rate narrowband), AMR-WB+ (extended adaptive multi-rate wideband), AAC-LC (advanced audio coding low complexity), HE-AACv1 (high-efficiency AAC, version 1), and HE-AACv2) on audio quality
- Effect of coding bit rate on audio quality
- Effect of lost audio frame length due to packet loss on audio quality
- Effect of rebuffering on audio quality (under study)
3.2 Video quality estimation module
As with audio quality, video quality is affected by the codec type, coding bit rate, packet loss, and rebuffering. Video quality is also affected by the number of bits per video frame type because it varies depending on the spatio-temporal information of the video content. Therefore, it is necessary to model the relationship between the following quality factors and the subjective video quality.
- Effect of video codec (i.e., MPEG-4 (Motion Picture Experts Group, version 4) and H.264) on video quality
- Effect of video resolution (i.e., QCIF, QVGA, and HVGA) on video quality
- Effect of coding bit rate, frame rate, and ratio of I-frame (intra-coded frame) bit count to the total bit count on video quality
- Effect of the number of packet-loss events, number of damaged video frames, and the video frame area damaged by the packet losses on video quality
- Effect of the number of rebuffering events, average rebuffering length, and average interval between rebuffering events on video quality
3.3 Audiovisual quality estimation module
Audiovisual quality is estimated based on audio- and video-related quality factors since audiovisual quality is affected by both audio and video quality.
Packet headers do not indicate the codec type and implementation or the video resolution, so coefficients for audio, video, and audiovisual quality estimation modules need to be optimized for these factors. ITU-T Recommendation P.1201.1 provides the optimized coefficients .
The rest of this section describes how the number of damaged video frames and the damaged video frame area are derived. When a video frame is lost, degradations of the video frame are propagated until the next I frame is received (Fig. 3). Therefore, determining the number of damaged video frames is an effective way of estimating quality. In addition, the damaged video frame area itself affects video quality. The damaged area is derived as follows: 1) the second packet of the “i+1”th video frame is lost, so the module outputs 50% as the damaged video frame area; 2) the first packet of the “i+2”th video frame is lost, so the module outputs 100% as the damaged video frame area; 3) the “i+3”th video frame has not lost any packets, so the module outputs 100% as the damaged video frame area because the previous video frame was lost, and degradation propagation lasts until the next I-frame.
4. Performance of P.1201.1 model
ITU-T SG12 verified the validity of the P.1201.1 model by conducting numerous subjective tests using audio, video, and audiovisual sequences that were generated by varying the codec type, coding bit rate, packet-loss pattern, packet-loss concealment, and rebuffering pattern. The audio, video, and audiovisual quality estimation modules were verified by using the root mean square error (RMSE) and Pearson’s correlation (PC). Table 1 (see Appendix of ITU-T Recommendation P.1201 ) lists RMSE and PC values. Additionally, ITU-T verified that the P.1201.1 model reached a sufficient level of quality-estimation accuracy because the RMSE was small and PC was high, as listed in Table 1.
5. Application scenario
In-service quality monitoring is an important application of the P.1201.1 model. The P.1201.1 model can be applied as followed:
1) Customer complaints are often made because of quality degradation. Service providers can resolve them by monitoring the quality level. Even if the end-user does not notice any degradation, service providers can resolve problems due to quality degradation before customers complain about the quality.
2) Quality degradation locations can be detected quickly by gathering data on quality and the causes of quality degradation from many users.
This article introduced a quality estimation model we developed for IPTV services and the P.1201.1 model. The P.1201.1 model has reached a sufficient level of quality-estimation accuracy and can be applied to monitor the quality of mobile IPTV. As described previously, the P.1201.1 model cannot be used for transmission control protocol (TCP)-based progressive video streaming because it was developed for realtime IPTV services. Although there are different types of video streaming, quality factors such as coding and rebuffering in realtime video streaming are the same as those of TCP-based video streaming. Therefore, it may be possible to extend the model and apply it to TCP-based video streaming if the parameter-extraction module can process TCP headers. Although the application area of the P.1201.1 model is limited to lower resolution, it is desirable to extend the model to the higher resolution application area because the newer mobile terminals support higher resolution. ITU-T SG12 is planning to extend the P.1201.1 model because of these technological developments.
In addition to promoting the P.1201.1 model, to which NTT contributed, our group is involved in globally competitive research and development of a quality-monitoring technique for a new service.