Recent Activities of QoE-related Standardization in ITU-T SG12

Yoichi Matsuo, Masanori Koike, and Kazuhisa Yamagishi

Abstract

To provide communication services with appropriate quality, network service design and management are essential, requiring technologies to measure and evaluate quality quantitatively. ITU-T (International Telecommunication Union - Telecommunication Standardization Sector) Study Group 12 researches evaluation methods, measurement techniques, and specified values for quality of experience (QoE)—the quality users perceive from a service—and the quality of service (QoS) required to achieve its target values. This article introduces the latest trends in standardization for QoE/QoS evaluation and management technologies for video media.

Keywords: ITU-T SG12, quality of experience, video quality

1. ITU-T SG12

The International Telecommunication Union - Telecommunication Standardization Sector (ITU-T) Study Group (SG) 12 is the lead SG within ITU-T for quality of service (QoS)/quality of experience (QoE) studies. Standardization of media-quality assessment methods is also being conducted by ETSI (European Telecommunications Standards Institute) in Europe and ATIS (Alliance for Telecommunications Industry Solutions) in North America. Network QoS specifications are also being developed by various standardization bodies, such as the IETF (Internet Engineering Task Force) and 3GPP (Third Generation Partnership Project). Therefore, ITU-T SG12 takes global leadership while considering these standards, ensuring the consistency of documentation.

2. Recommendation P.1199: Parametric object-recognition-ratio-estimation model for remote monitoring of surveillance video delivered from autonomous vehicles

The levels of automated driving are defined by the Society of Automotive Engineers (SAE), with six levels from Levels 0 to 5 based on the driving entity and area where driving is possible [1]. For Level 4, which corresponds to automated driving without a driver (specified automated operation), the installation of a remote monitoring device and the assignment of a person (specified automated operation supervisor) are required by regulations to execute remote monitoring [2]. As shown in Fig. 1, the operator monitors the automated driving system’s operational status and checks for road obstacles using onboard video from the vehicle’s cameras, which are transmitted to the monitoring center. Therefore, the video quality transmitted from the onboard cameras must be sufficiently clear for the operators to recognize objects. NTT has established a model to derive the probability that the operators can recognize objects when viewing images transmitted from the onboard camera. This model was standardized by ITU-T as Recommendation P.1199 in November 2025. Using this technology enables monitoring whether the video quality transmitted from the onboard camera is sufficient for object recognition by measuring the object-recognition ratio.

Fig. 1. Remote monitoring system configuration.

This model estimates object-recognition ratios by inputting encoded video parameters (bitrate, frame rate, resolution), transmission-related data-loss parameters (packet-loss rate, frame-loss count), and vehicle speed, as shown in Fig. 2. Note that the quality of surveillance video transmitted from autonomous vehicles depends on daytime/nighttime conditions during operation, weather, and in-vehicle camera settings. Therefore, we constructed a framework capable of handling various situations by using these parameters as prior information and optimizing the estimation coefficients for each set of prior information.

Fig. 2. Input/output of Recommendation P.1199.

While this model targets surveillance video encoded with H.265/High Efficiency Video Coding (HEVC), its extension to AOMedia Video 1 (AV1) encoding, which has seen increasing use, is also under consideration and will be verified in the future.

3. Recommendation P.1204: Video quality assessment of streaming services over reliable transport for resolutions up to 4K

Recommendation P.1204 has been established to define quality-monitoring technology compatible with 4K video and H.265/HEVC for monitoring the quality of adaptive bitrate streaming. Recommendation P.1204 is a model that estimates the quality of input video and consists of multiple modes corresponding to the input data. The following recommendations have been established: Recommendation P.1204.3, which estimates quality using all bitstream information, including metadata (bitrate, resolution, frame rate); Recommendation P.1204.4, which uses video signals; and Recommendation P.1204.5, which uses both metadata and video signals. Recommendation P.1204.1, which uses only metadata, and Recommendation P.1204.2, which uses metadata plus frame-level information, underwent long integration discussions. Their integration was approved at the September 2025 SG12 meeting and subsequently standardized.

With the increasing adoption of AV1-encoded video distribution, studies are underway to adapt to this new encoding format. Recommendation P.1204.4 has been established to extend support for AV1 encoding, and extensions for other modes to support AV1 are also under continued consideration.

Establishing recommendations for new modes and adaptation to encoding will enable appropriate quality monitoring for video-streaming services.

4. Recommendation P.940: Computational model used for the monitoring and quality assessment of videotelephony services

Recommendation P.940 was established to monitor the QoE of videotelephony services. Recommendation P.940 consists of six assessing blocks: audio quality, video quality, audiovisual quality, audiovisual interaction delay, audiovisual media synchronization, and videotelephony quality. This model’s key feature is that, in addition to audiovisual-quality assessment, which also exists in conventional objective video-quality models, it handles factors specific to videotelephony services that affect quality, such as delay and media synchronization, due to the service’s bidirectional nature. It ultimately outputs a mean opinion score.

This recommendation enables quantification of the degree of quality degradation on the basis of network conditions and video information available at the calling terminal. It is expected to facilitate appropriate monitoring of the QoE of videotelephony services.

5. Recommendation P.MLS: Quality of speech generated with machine learning techniques

Data generated using machine learning techniques have been used in various fields. Speech-data generation is used in scenarios such as text-to-speech conversion and voice-based chatbots. While Recommendation P.806 has been established as a subjective quality-assessment method for speech quality and Recommendation P.863 and Recommendation G.107 as objective quality-assessment methods for estimating perceived quality, it has not been verified whether these specifications can adequately address generated speech data.

Therefore, a work item has been launched to verify the applicability of existing recommendations to speech generated using machine learning techniques. By analyzing the perceptual characteristics of the generated speech data, the plan is to either extend existing recommendations or define new recommendations capable of addressing generated speech data.

Plans include discussing experimental conditions and conducting subjective evaluation experiments using the generated speech data.

6. Future outlook

In the standardization of media-quality evaluation methods within ITU-T SG12, support for voice calls and video streaming has largely been established. However, ongoing work continues on existing recommendations, including revisions to expand support for new codecs and examine novel data types generated using machine learning techniques. Studies have also begun examining the quality of previously unexamined services such as surveillance video used for autonomous driving.

Various services are expected to emerge as fifth-generation mobile communication systems (5G) and 6G advance. The design and management of QoS/QoE for various services will thus become increasingly critical. Therefore, it remains essential to monitor the progress of SG12’s discussions.

References

[1]	SAE International, “Taxonomy and Definitions for Terms Related to Driving Automation Systems for On-road Motor Vehicles,” 4970.724, pp. 1–5, 2018.
[2]	National Police Agency, “The Act Partially Amending the Road Traffic Act (Act No. 32 of 2022),” Apr. 2022.

	Yoichi Matsuo Senior Research Engineer, Network Service Systems Laboratories, NTT, Inc. He received an M.E. and Ph.D. in applied mathematics from Keio University, Kanagawa, in 2012 and 2015. Since he joined NTT, he has been engaged in research on network management using AI technologies such as deep learning and reinforcement learning. He has been contributing to ITU-T SG12 since 2020 and engaged in research on quality degradation of adaptive audiovisual streaming services. He has been an associate rapporteur of Question 13/12 since 2025.
	Masanori Koike Research Engineer, Network Service Systems Laboratories, NTT, Inc. He received a B.E. in mathematical engineering and information physics in 2015 and M.S. in information science and technology in 2017 from the University of Tokyo. Since joining NTT in 2017, he has been engaged in research related to video quality. He has been contributing to ITU-T SG12 since 2023.
	Kazuhisa Yamagishi Senior Research Engineer, Supervisor, Network Service Systems Laboratories, NTT, Inc. He received a B.E. in electrical engineering from Tokyo University of Science in 2001 and M.E. and Ph.D. in electronics, information, and communication engineering from Waseda University, Tokyo, in 2003 and 2013. In 2003, he joined NTT, where he has been engaged in the development of objective quality-estimation models for multimedia telecommunications. He has been contributing to ITU-T SG12 since 2006. He has been a rapporteur of Question 13/12 since 2017, vice-chair of Working Party 3 in SG12 for 2021–2024, a chair of Working Party 3 in SG12 since 2025, and is a vice-chair of SG12 for the 2022–2024 and 2025–2028 study periods.

↑ TOP