Regular Articles

Vol. 13, No. 8, pp. 53–59, Aug. 2015. https://doi.org/10.53829/ntr201508ra2

Improving User Capacity and Disaster Recovery Time in IP Telephone Service Systems

Hiroshi Shibata, Kouki Minamida, Hiroshi Miyao,
Takashi Nambu, and Toru Takahashi

Abstract

The NTT Group is working on improving Internet protocol (IP) telephone networks and solving certain problems associated with them. Specifically, efforts are underway to improve the accommodation rate, reduce network operating costs when the number of network users increases, and reduce service recovery time when a network is damaged because of a natural disaster. To resolve these problems, we developed a system architecture that simplifies the operation of IP telephone networks and uses network equipment much more efficiently. This is possible with a subscriber data management server. We describe here our new network architecture, the mechanism to solve these problems, and the effect it has on IP telephone networks.

Keywords: IP telephone, disaster recovery, call session control function server, home subscriber server

PDF

1. Introduction

Essential social infrastructures such as electric power systems, water systems, and communication systems were destroyed over a huge area after the Great East Japan Earthquake struck on March 11, 2011, and these systems could not be used for a long time after that. Specifically, 29,000 mobile communications base stations and 1.9 million fixed telephone lines could not be used. It took about three days to recover telephone services provided by the NTT Group [1]. As a social infrastructure provider, the NTT Group learned from this experience that network operators must ensure consistent availability of services and early recovery of services even after wide-area disasters.

The devices used for Internet protocol (IP) telephone service range from smartphones to conventional fixed telephones. Therefore, IP telephone services must accommodate frequent terminal location registrations and high-frequency use of each terminal.

We describe here the new IP telephone network architecture that will improve the flexibility of subscriber accommodation, and we explain the results of a study on how the new network architecture reduces the time to recover from a disaster.

2. Characteristics of and problems with IP telephone network architecture

Currently, the NTT Group offers a variety of IP telephone services such as Hikari Denwa and the 050 number voice over IP (VoIP) service. The IP Multimedia Subsystem (IMS) architecture [2–4] is partially adopted in these IP telephone services.

The IP telephone service was initially available for fixed telephones via VoIP routers for private use. Office use, such as the pilot telephone number service, also began in conventional fixed telephone networks. Therefore, we adopted a network architecture that efficiently performs a call control process; the correspondence between the call control server (the call session control function (CSCF)) and the user (telephone number) is fixed, and the CSCF is designed to accommodate each number range, the same as for the traditional public switched telephone network (PSTN). The CSCF holds the subscriber data (service contract information and authentication information) related to the telephone number to be accommodated [5]. Routing information to identify the CSCF that accommodates the number range is managed by each CSCF. Therefore, the CSCF that receives a call request from a terminal (indicated as (1) in Fig. 1) determines the destination CSCF that accommodates the terminal by using its own routing information (2), and transfers the call request from the terminal to the destination CSCF (3). In addition, to maintain reliability and non-interrupted service during system changes, each CSCF adopts an active and standby (ACT/SBY) dual system as a redundant system.


Fig. 1. Old IP telephone network architecture.

In extending the system to cope with an increase in the number of users and the amount of traffic, it is conventionally assumed that there is a proportional relationship between users and traffic; therefore, resources are designed after calculating the load from the traffic rate and average holding time to install additional equipment in accordance with the increase in traffic. However, because softphone users, for example, include many non-active users* who install applications on their terminal but do not use them frequently, there are cases in which the number of users is large, but the load in the IP telephone network is low. Therefore, the CPU (central processing unit) working rate is low at the maximum amount of subscriber data to be accommodated, and vice versa, so equipment resources cannot be fully used. As a result, equipment utilization efficiency is low compared to the older IP telephone network (Problem 1).

For load-balancing, however, we need to change the routing information of each CSCF to identify the destination CSCF of incoming users, which is a significant operational burden (Problem 2).

In addition, with the current redundant configuration, both ACT/SBY systems could be damaged in the event of a large-scale disaster, so there is a problem concerning the availability of the social infrastructure (recovery time for both system failures) (Problem 3). We explain the details of this problem in the next section.

Our solution to these problems is to separate the subscriber data and routing information from the CSCF. The home subscriber server (HSS) manages subscriber data and routing information, so the relationship between telephone numbers and the CSCF is flexible. Therefore, the CSCF-HSS separation architecture can be adopted. It should be noted that upon separation, we take into account the impact on services specific to the conventional fixed telephone networks such as pilot telephone number and call-pick-up services (Problem 4).

* A non-active user is an IP telephone service subscriber who has not used an IP telephone recently for a given period (for example, a month). The other subscribers are active users.

3. Issues regarding service recovery time after natural disaster

We now explain the details of Problem 3. NTT Network Service Systems Laboratories has been working to improve reliability during serious disasters by building on knowledge and experience gained in the aftermath of the 2011 Japan earthquake. With the conventional network configuration, a simultaneous failure of redundant configuration systems must be assumed after a serious disaster because the subscriber data contained in the CSCF are fixed. Therefore, we need to rebuild equivalent devices.

Specifically, the following procedure is required.

• Replace afflicted CSCF hardware.

• Restore operating system (OS) and applications.

• Restore subscriber data.

It takes several days to complete this procedure, which is itself an issue. (Note that the recovery time depends on the manpower and equipment available for the recovery work.)

4. CSCF-HSS separated architecture

We introduce here the CSCF-HSS separated architecture to solve the above-mentioned problems.

4.1 Effective use of server resources for managing users

In the CSCF-HSS separated architecture (Fig. 2), HSSs manage the master data of user profiles, and the CSCF downloads user profile data on demand from the HSSs when the user equipment registers its location information via an IP telephone. After that, the CSCF retains the user profile data as a cache during the validity period of the location information. With this mechanism, the CSCF does not retain the user profiles for non-active users who have not used an IP telephone for a certain period of time. Therefore, the CSCF can manage more user profiles of active users because it does not consume as much memory as the old architecture. Thus, we can use the CSCF’s resources effectively to solve Problem 1.


Fig. 2. CSCF-HSS separated architecture.

4.2 Automatic updating of routing information

To update the routing information in order to detect the CSCF that manages a certain user when user profile data are moved from one CSCF to another, we introduce a mechanism in which an HSS automatically updates the routing table when a CSCF downloads user profile data from the HSS (Fig. 2). Therefore, it is not necessary for each CSCF to have a routing table because the CSCFs can detect the destination CSCF that manages the user profile data. Therefore, service operators do not need to update the routing information, and we can solve Problem 2.

4.3 Maintaining communication quality for fixed telephone service

We provide fixed IP telephone services for businesses and user-grouping services such as pilot telephone number or call pick-up services. In these services, we need to update the cache of user profile data in CSCFs when the master data are updated in the HSS by submitting or changing the service order. In the IMS standard models, a CSCF downloads user profile data from an HSS on demand when receiving a call; however, this incurs delays due to the downloading of all group member profile data. Thus, communication quality decreases.

Therefore, we extended an HSS’s function to push user profile data to CSCFs defined in the IMS standard [6]. Our HSS simultaneously pushes all user profile data in a group. In our new architecture, CSCFs do not need to download all group member profile data on demand and can reduce the number of times the HSSs need to be accessed because the CSCFs already have all profile data in the group due to the extended push function and can respond rapidly. Thus, the communication quality of user-grouping services is maintained (Fig. 3) and we can solve Problem 4.


Fig. 3. Rapid response by pushing all group user profiles to CSCFs.

5. New recovery procedure and effectiveness

We solve Problem 3 by creating recovery procedures for the new architecture in which HSSs are separated from CSCFs. We introduce two new recovery procedures. First, we describe the recovery procedure (pattern 1), in which the time to restore user profile data is reduced compared to the old procedure. Second, we describe the procedure (pattern 2) in which we use server resources of other surviving CSCFs to recover the failed CSCFs. We describe these procedures and their effectiveness below.


Pattern 1: Other surviving CSCFs do NOT have enough unused server resources.

(1) Recovery procedure

Step 1: Replace completely failed CSCF hardware with new hardware.

Step 2: Restore the OS and application software.

After step 2 is completed, the rebuilt CSCFs can serve users that were prevented from using IP telephone services due to failed CSCFs by downloading user profile data on demand from the HSSs (Fig. 4).


Fig. 4. CSCF recovery procedure in new architecture (pattern 1).

(2) Effectiveness

We can reduce the recovery time by skipping unnecessary steps such as restoring user profile data in the rebuilt CSCFs.


Pattern 2: Other surviving CSCFs HAVE enough unused server resources and can serve users supported by failed CSCFs.

(1) Recovery procedure

Step 1: Reassign the affected users to other surviving CSCFs instead of the failed CSCFs.

Specifically, we change the settings in the HSSs so that the surviving CSCFs serve the affected users instead of the failed CSCFs. After that, the routing tables in the HSS are updated automatically, so we do NOT need to change the routing tables of all surviving CSCFs (Fig. 5).


Fig. 5. CSCF recovery procedure in new architecture (pattern 2).

(2) Effectiveness

We can recover IP telephone services for affected users without building new CSCFs to replace the failed CSCFs, so the disaster recovery time drastically decreases.

The difference between the old and new IP telephone network architectures with respect to recovery time when redundant CSCFs fail completely is shown in Fig. 6. Specifically, recovery is very fast in pattern 2 because we can recover the services within several hours. It takes several days to recover services with the old architecture. Thus, we can solve Problem 3.


Fig. 6. Improved disaster recovery time in new architecture.

6. Future work

We plan to work on developing a much faster recovery method and will therefore investigate a recovery procedure that involves not only servers such as HSSs and CSCFs but also user equipment, transport network elements, and business support systems.

References

[1] Ministry of Internal Affairs and Communications, “Part 1: Information and Communications in the Aftermath of the Great East Japan Earthquake,” in The 2011 White Paper on Information and Communications in Japan.
http://www.soumu.go.jp/johotsusintokei/whitepaper/eng/WP2011/part1.pdf
[2] 3GPP, “TS 23.228: IP Multimedia Subsystem (IMS); Stage 2 (Release 7),” 2007.
[3] ETSI/TISPAN, “TS 182.006: IP Multimedia Subsystem (IMS); Stage 2 description,” 2008.
[4] ETSI/TISPAN, “ES 282 001: NGN Functional Architecture,” 2009.
[5] M. Miyasaka, N. Horikome, and K. Kishida, “Bandwidth Management and Control Technology in the NGN,” NTT Technical Review, Vol. 7, No. 7, Jul. 2009.
https://www.ntt-review.jp/archive/ntttechnical.php?contents=ntr200907sf5.html
[6] 3GPP, “TS 29.229: Cx and Dx Interfaces based on the Diameter Protocol; Protocol details (Release 5),” 2007.
Hiroshi Shibata
Research Engineer, NTT Network Service Systems Laboratories.
He received his B.S. and M.S. from Osaka University in 1994 and 1996, respectively. He joined NTT Network Service Systems Laboratories in 1996. His current R&D area is software engineering. He is a member of the Institute of Electronics, Information and Communication Engineers (IEICE).
Kouki Minamida
Research Engineer, NTT Network Service Systems Laboratories.
He received his B.S. and M.S. from the University of Tokyo in 1992 and 1994, respectively. He joined NTT in 1994. His current R&D area is software engineering. He is a member of IEICE.
Hiroshi Miyao
Senior Research Engineer, NTT Network Service Systems Laboratories.
He received his B.S. in physics from Ibaraki University in 1988 and his M.S. in physics from Hokkaido University in 1990. He joined NTT Transmission Systems Laboratories in 1990. His current R&D area is software engineering.
Takashi Nambu
Senior Research Engineer, NTT Network Service Systems Laboratories.
He received his B.E. and M.E. from the University of Tokyo in 1998 and 2000, respectively. He joined NTT Network Service Systems Laboratories in 2000 and engaged in VoIP system development. He is now engaged in R&D on improving software quality and productivity.
Toru Takahashi
Senior Research Engineer, Supervisor, NTT Network Service Systems Laboratories.
He received his B.E. from Tokyo University of Science in 1991 and joined NTT Communication Switching Laboratories the same year. His current R&D area is session control technologies. He is a member of IEICE.

↑ TOP