Feature Articles: Fixed-mobile Convergence Services with IOWN

Vol. 23, No. 1, pp. 22–28, Jan. 2025. https://doi.org/10.53829/ntr202501fa2

Communication Control Platform for Advanced Real-time Communication

Yusuke Hara, Akira Ijuin, Yasuharu Yamashita, Kenta Maeda, and Rihito Suzuki

Abstract

To provide communication services over Internet Protocol networks, it is necessary to develop, for each service, a mechanism for controlling voice and video sessions on the basis of the various types of participating terminals and connections used, which prevents content providers from devoting their resources to enrich content in their communication services. This article introduces our research and development activities toward solving this problem that involve providing a communication control platform that can be applied to various communication services provided by network operators.

Keywords: real-time communication, WebRTC, standardization

PDF

1. Advancement of communication services

Communication services over Internet Protocol (IP) networks are accessed by a variety of pieces of user equipment (UEs), such as personal computers (PCs) and smartphones, in various environments, such as fixed and mobile networks, and exchange voice and video media. Use cases have emerged in which artificial intelligence (AI) analytical engines and non-player characters collect and analyze non-verbal information, such as emotions, through media connections to achieve metacommunication. In such use cases, it is necessary for content providers (CPs) to develop, for each communication service they offer, session control methods that take into account differences in UEs, network environments, and media. This need has prevented them from concentrating their resources on the one thing that should be their focus, namely, enriching content in communication services.

To solve this problem, we are undertaking research and development for a flexible yet high-quality communication control platform that can be incorporated in carrier-quality communication services. This article introduces our activities for standardizing media control signals and the technology for achieving the Immersive Real-Time Communication (RTC) Platform, a communication control platform that supports signaling for media session control.

2. Standardization activities related to the specifications for implementing the carrier WebRTC platform

With the advent of new extended reality (XR) devices and the growing use of XR applications in the industrial, educational, and entertainment fields, network operators and vendors have come to consider immersive interaction as an important element of next-generation real-time communication. Considering recent trends, the 3rd Generation Partnership Project (3GPP), which defines international standard specifications for mobile communication technologies, is studying technical specifications for metaverse and XR services provided by mobile network operators.

Two major methods are being studied for achieving real-time communication services in immersive spaces. One is for extending the current IP Multimedia Subsystem (IMS), which is an architectural framework used in public IP telephone networks. The other is for introducing a new WebRTC server network. NTT has led a study that focuses on the WebRTC-based method. We chose to focus on this method because it is more compatible with web development, faster to develop and deploy, and more suitable for quickly meeting user needs than extending the IMS, the specifications of which are stringent for stable operations and are becoming more complex due to functional extensions.

One of the key issues in standardizing the WebRTC-based method is the lack of interoperability between the server networks of different vendors and network operators and between terminal devices and server networks. This lack of interoperability is because the Internet Engineering Task Force (IETF), which develops protocols for the Internet, has not specified a common signaling protocol for a WebRTC session control.

To address this issue with the WebRTC-based method, NTT has been leading the development of RESPECT (REaltime & REality media Setup Protocol, Extensible and CompacT) as a session-control signaling protocol for WebRTC and application programming interface (API) (i.e., service control API) specifications required to control network functions on the service, assuming the architecture shown in Fig. 1 (which is specified in 3GPP TS 26.506 [1]). RESPECT is a media-session-control protocol compliant with the WebRTC standard, featuring high reliability and compatibility with web development. Its transaction management and timeout functions as well as its identification system that enables verification of the calling party’s authorization provide reliable session control suitable for use in network operator services. An example of its compatibility with web development is that it enables web developers to use existing libraries, frameworks, and a web browser’s debugging functions, which is achieved using WebSocket for signaling paths and adopting JSON (JavaScript Object Notation)-format messaging. The service control API is designed for CPs using the WebRTC platform of network operators, enabling them to manage operator-provided resources according to their service requirements. It is characterized by the ability to configure media and data forwarding controls in detail. It enables CPs to configure a variety of connection types (e.g., VR conference room and webinar)—each of which is designed and fixed for each service in general WebRTC services—via the API, enabling faster and more flexible service provisioning on demand. These key findings from the study in 3GPP Release 18 have been agreed upon and formalized in the technical report 3GPP TR 26.930 [2].


Fig. 1. RTC architecture (content provider collaboration).

The architecture and interfaces of the Immersive RTC Platform provided by NTT are designed and implemented on the basis of the above 3GPP standard.

3. Immersive RTC Platform

Figure 2 shows an overview of our Immersive RTC Platform. Transmission and reception of media, such as audio, video, and data, between UEs are achieved using two components: the signaling controller, which executes session control of media communication, and the media processor, which aggregates and distributes media communications from end users and CP UEs. The following sections introduce these components, the service control API used by CPs to handle information necessary for signaling control, monitoring and operational control that supports stable operation of the Immersive RTC Platform, and test and deployment strategies for achieving and maintaining high-quality performance.


Fig. 2. Overview of the Immersive RTC Platform.

3.1 Immersive RTC Platform: signaling control

For UEs to execute media communication, they need to convey to the intended receiver information about the media they intend to send and receive information from that receiver about the media that will be sent back. The Immersive RTC Platform achieves these interactions by using RESPECT-compliant control signals (hereafter referred to as RESPECT signals) that it exchanges with UEs.

The Platform first receives, from a UE via a RESPECT signal, media information on the data, audio, and video the UE is to send. On the basis of this information, it determines to which UE it will send each type of media and via which instance of media processer. On the basis of this decision, it sends a signal instructing the media processor to send to and receive from the UE the required media and sends a RESPECT signal containing the required information to the UE for connecting to the media processor as well as information about the media to be received from other UEs.

The Platform also uses a RESPECT signal to control the start and stop of media transmission and reception by UEs. Some communication services require a certain connection sequence between UEs. For example, they may first connect an end user’s UE to a CP-provided UE that handles that end user’s information then start the media transmission. The end user’s UE is next instructed to start the transmission and reception of media. To enable UEs to send and receive media appropriately, the Immersive RTC Platform controls UEs by combining the above functionality with a mechanism for sending notifications to the CP that is triggered by events such as a new UE connection.

These controls are predefined for each connection type. A CP can carry out the control needed for the communication service it provides by simply registering the relevant connection type in advance with the Immersive RTC Platform. The platform currently supports several major connection types, but we are developing a generic connection-type model that can be applied to a wider variety of services.

3.2 Immersive RTC Platform: media processing

In the Immersive RTC Platform, two media processing components—the selective forwarding unit (SFU), which distributes audio, video and data, and the multipoint control unit (MCU), which executes media conversion and synthesis—are central components in configuring star-shaped communication paths to UEs. The star configuration has the advantage of low loads on communication paths and easy scaling when the number of users increases.

By controlling media on the basis of control signals from the signaling controller, the SFU and MCU enable CPs to provide communication services. The following three types of connections are assumed in the communication services use case (Fig. 3).

(1) 1:1 service in which an end user controls the image of their avatar

(2) 1:N service in which live video is distributed to multiple end users

(3) N:N service in which multiple end users talk with each other or with generative AIs, or in which audio and video received from end users are subjected to emotion analysis


Fig. 3. Use case of communication services.

To provide bidirectional media communication over the Internet via network address translation (NAT), the Immersive RTC Platform uses the Interactive Connectivity Establishment (ICE). The ICE establishes media sessions directly between UEs and the SFU/MCU using the Session Traversal Utilities for NAT (STUN) server.

We are developing a client library that enables easy handling of media transmission and reception control by UEs. By using the client library, CPs can develop, at a lower cost than for full-scratch development, client-side programs that conform to the terminal browser’s API, which enables CPs to focus on enriching content (Fig. 4).


Fig. 4. Client library.

We are also developing simulcast, which selects and distributes an appropriate media type on the basis of the reception capability of the UE. Simulcast makes media distribution possible to a wide variety of devices, such as PCs and smartphones, enabling CPs to incorporate media distribution into their communication services that are targeted at a wide range of users and usage scenarios.

3.3 Immersive RTC Platform: provision of an API for CPs

For the signaling controller and media processor to operate, they require information specific to the communication service a CP provides, such as the connection type of the communication service.

The Immersive RTC Platform provides a service control API in the REST (Representational State Transfer) format. This API enables CPs to register and delete these types of information.

The API has authentication and authorization functions based on credential information, such as API keys provided by an external authentication infrastructure, and verifies that given operations are being carried out by a communication service operator of a correctly authorized CP. It also has security functions, such as restricting the number of simultaneous requests, to ensure stable operation.

3.4 Immersive RTC Platform: component monitoring and operational control

If the signaling controller or media processor were to stop functioning, the operation of those communication services that incorporate the Immersive RTC Platform would be affected. This in turn will affect the end user’s service experience. This section introduces the monitoring and operational control used by the Immersive RTC Platform to ensure stable operation.

The Immersive RTC Platform runs on a public cloud and uses the public-cloud-provided monitoring function to monitor the basic status of each instance of the signaling controller and media processor. Since the public-cloud-provided function only monitors and controls the operating state from the perspective of the public cloud service, it cannot guarantee that the Immersive RTC Platform is in such a state that it can provide normal functionality. Therefore, the Immersive RTC Platform is equipped with a mechanism for executing those monitoring and operational control functions that are not supported by the public cloud. Application-level monitoring is adopted as follows. Using the same mechanism as that used by the signaling controller and media processor for their collaboration, the monitoring unit periodically sends monitoring signals. If the monitored instance does not return a normal response, the monitoring unit notifies the operator of an error, enabling early detection of a state in which functions cannot be provided.

An example of a function that enables the operator to control each component of the Immersive RTC Platform is the function that enables the operator to inhibit, or cancel the inhibition of, the establishment of new control sessions within instances of the signaling controller and media processor. This function enables the operator to stop an instance after the number of control sessions for that instance falls to zero, reducing the number of units of each component without affecting the communication service provided.

3.5 Immersive RTC Platform: test and deployment strategies

We use agile development so that the Immersive RTC Platform can quickly respond to changes in connected devices and environments. The current development team uses a two-week sprint cycle. We adopted a variety of measures to quickly deploy each sprint’s deliverables and evaluate their quality. Three measures we have adopted from the viewpoint of quality control are introduced below.

The first is regression testing of existing functions. Regression testing is carried out as automated end-to-end (E2E) testing from a web browser. This testing ensures that existing functions continue to work correctly when new functions are added or code is changed. Specifically, automated testing is conducted on each deployment in collaboration with continuous integration tools. This process enables early detection and correction of bugs, ensuring quality while maintaining development speed.

The second is the evaluation of newly released functions. It is important that we start making the evaluation plan for new functions during the development sprint. During this sprint, we formulate detailed development requirements and test plans. At this stage, we clearly define user stories and acceptance criteria and create test cases. After determining the scope of unit testing for each component, we conduct test design at the phases of integration testing and E2E testing. This enables us to conduct testing in phases as development progresses and check quality so that no major rework becomes necessary.

The third is the establishment of a deployment pipeline to ensure continuous deployment. Deployment in each environment (development, staging, and production) is executed through an automated deployment pipeline. This ensures consistency in the deployment process, reduces manual intervention, and increases deployment speed. Differences between environments can also be easily managed in the repository. This enables early identification of the causes of malfunctions that arise due to differences in the environment.

Through these efforts, we efficiently deploy the files developed in a two-week sprint cycle and rapidly evaluate their quality. We are continually improving these processes and searching for better methods.

4. Toward diverse communication services

We will expand the range of communication services that can incorporate the Immersive RTC Platform and promote the creation of various services driven by open and new ideas in communication services, which are currently closed to each CP, so that end users will be able to enjoy diverse and complex services provided by a wide variety of CPs.

References

[1] 3GPP TS 26.506: “5G Real-time Media Communication Architecture (Stage 2).”
[2] 3GPP TR 26.930: “Study on the enhancement for Immersive Real-Time communication for WebRTC.”
Yusuke Hara
Senior Research Engineer, Network Control Software Project, NTT Network Innovation Center.
He received a B.S. and M.S. from Osaka University in 2008 and 2010. Since 2010, he had been engaged in the development of datacenter infrastructure management software at a system vendor. Since joining NTT Network Innovation Center in 2024, he has been engaged in the R&D of immersive real-time communication systems.
Akira Ijuin
Research Engineer, Network Control Software Project, NTT Network Innovation Center.
He received a B.S. and M.S. from Keio University in 2007 and 2009. Since joining NTT Network Service Systems Laboratories in 2009, he has been engaged in the R&D of telecommunication systems and immersive real-time communication systems. He is a member of the Institute of Electronics, Information and Communication Engineers (IEICE) of Japan.
Yasuharu Yamashita
Senior Research Engineer, Network Control Software Project, NTT Network Innovation Center.
Since joining NTT Switching System Laboratories in 1990, he has been engaged in the R&D of advanced IN (Intelligent Network) of public switched telephone networks, IP telephone control systems, WebRTC control systems, etc.
Kenta Maeda
Senior Research Engineer, Network Control Software Project, NTT Network Innovation Center.
Since joining NTT Network Innovation Center in 2021, he has been engaged in IP telephone control systems, WebRTC control systems, etc.
Rihito Suzuki
Researcher, NTT Network Service Systems Laboratories.
He received a B.E. in electronic information communication engineering from the University of Tokyo in 2018. Since joining NTT Network Service Systems Laboratories in 2021, he has been involved in the standardization of immersive real-time communication in 3GPP SA4.

↑ TOP