Feature Articles: Network Technology for Digital Society of the Future—Toward Advanced, Smart, and Environmentally Friendly Operations
Failure Localization in Optical Transmission Networks
The core network is the backbone of society’s telecommunications infrastructure, and it therefore requires rapid failure localization in order to handle diverse types of failures. This article introduces a failure localization method being studied by NTT Network Service Systems Laboratories in collaboration with an NTT Group company.
Keywords: core network, failure localization, optical parameters
The core network continues to increase in capacity to support a wide variety of services, and consequently, rapid failure localization is necessary when a failure occurs. The traditional role of the network operator has been to understand signal quality within the network by monitoring alarms issued by transmission equipment and performance monitor (PM) information, to identify the failure location based on that information whenever a failure occurs, and to carry out facility restoration.
However, there are times when isolating the location of a failure from alarms and PM information can be difficult. For example, if the mechanism that adjusts the optical power of wavelength division multiplexing signals for each optical path should fail, the optical power of a certain optical path may increase. This increase in optical power on that path can intensify the nonlinear effects of the optical fiber and degrade signal quality, which can spread to other optical paths propagating along the same optical fiber. This degradation in signal quality can be detected at a transponder (TRPD) where optical signals terminate, but since the site that raises the alarm differs from the actual location of the failure, time is needed to determine the failure’s impact and to troubleshoot the cause, and even more time is needed to completely restore facilities.
2. Failure localization method
NTT Network Service Systems Laboratories in cooperation with an NTT Group company is studying a method to enable rapid failure localization based on actual examples of anomalies and major failures. The proposed failure localization method is overviewed in Fig. 1. We envision the case in which transmission equipment implementing the optical power adjustment mechanism has failed within one of NTT’s many central offices making up the core network. First, in Step 1, a failure is detected based on temporal degradation in signal quality as monitored by a TRPD. Next, the correlation between signal quality and high-resolution time-series data of optical parameters is analyzed. These parameters include phase, amplitude, frequency, and polarization, which are newly monitored using this method.
The correlation analysis is done to identify which parameters are contributing to the degradation in signal quality. Since the optical parameters are related to the state of transmission links, this information can be used to infer the cause of failure. The result of Step 1 (inference result) is forwarded to the network control server from the TRPD that terminates the optical paths with degraded signal quality.
Although Step 1 could be performed on the network control server, the flow of a massive amount of data into the data communication network (DCN)—the IP (Internet protocol) network used for monitoring and control purposes—situated between the transmission equipment and the network control server, would create congestion. The proposed method suppresses the flow of massive amounts of data onto the DCN by performing failure detection and failure cause inference at the TRPD and forwarding only the inference results to the network control server.
In Step 2, we use a network control server that we are studying in collaboration with NTT Communications. With the results of Step 1, the network topology information, and route information of optical paths, the network control server localizes the failure coverage area in terms of NTT central office unit. Given the failure cause inferred in Step 1 and conventional technology, the network control server determines which package needs replacing. Facility restoration is then carried out.
3. Failure detection and failure cause inference at TRPD
The process in Step 1, up to failure cause inference, is shown in detail in Fig. 2. The TRPD is a transmission package that converts an optical signal into a client signal. First, an optical device within the TRPD performs photoelectric conversion while saving optical parameter information. Then, after performing optical parameter compensation by a digital signal processor (DSP), the symbol detection process converts the signal into a bit sequence, and optical parameter information is dropped. Finally, an optical transport network (OTN) framer performs bit error correction and deframing to produce a client signal. Signal quality is conventionally monitored using a parameter such as bit error rate (BER), the acquisition of which results in the loss of the optical parameter information. While this method can be used to detect failures, it is incapable of identifying the optical parameters contributing to the signal quality degradation.
With the extraction of multiple optical parameters by the DSP, the proposed method can monitor the state of transmission links and infer the cause of the failure. Since commercial systems are unable to monitor the state of transmission links, we introduce here a method that offers high-resolution time-series data of both the BER output by the OTN framer as is presently done and the optical parameters obtained from the DSP. The method detects failures from the temporal degradation in signal quality. With this method, a correlation analysis based on time-series data of all optical parameters and signal quality is done, and the correlation is evaluated using a criterion, for instance a correlation coefficient close to ‘1.’ In this way, the proposed method can identify the optical parameters contributing to signal degradation (infer the cause of failure).
4. Future plan
Because the locations issuing alarms and the actual failure locations can differ, this method localizes failures in which the impacts have taken time to determine and the causes have normally taken time to troubleshoot. The proposed method obtains optical parameters as high-resolution time-series data, so it therefore looks promising for the early detection and prediction of failures. Desktop studies and the development of prototype equipment are currently in progress, and verification experiments began in the spring of 2019.
Development of a failure localization method is necessary for core networks that are expected to offer increased capacity in the future. We hope to advance this study and contribute to reducing the maintenance and operation overheads of the NTT Group.