Feature Articles: Network Technology for Digital Society of the Future—Toward Advanced, Smart, and Environmentally Friendly Operations

Failure Point Estimation Using Rule-based Learning

Naomi Murata, Fumika Asai, Taisuke Yakawa, Satoshi Suzuki, Haruo Oishi, and Akira Inoue

Abstract

NTT Access Network Service Systems Laboratories aims to achieve smart and advanced network operations supporting the digital transformation of the NTT Group. This article introduces a means of failure point estimation using rule-based learning that immediately presents potential failure points at the time of a failure. This estimation technique is based on technology for autonomously deriving cause-and-effect relationships (rules) between failure points and alarms.

Keywords: rule-based learning, failure point estimation, Network-AI

PDF

1. Introduction

The occurrence of a failure in a large-scale network generates many alarms. A skilled maintenance operator must then analyze the large number of alarms and isolate the failure point by testing or other means. We are researching and developing failure point estimation technology using rule-based learning with the aim of shortening this analysis and troubleshooting work and reducing the burden of carrying out maintenance tasks through prompt failure recovery (Fig. 1). The use of this technology is expected to lead to reduced operating expenses.


Fig. 1. Prompt recovery from failure.

2. Failure point estimation using rule-based learning

In this section, we describe the key features of our failure point estimation technology.

2.1 Reduction of operator analysis/troubleshooting work

Failure point estimation using rule-based learning is technology based on decision-making using rules. A rule is an if-then construct that expresses a conclusion derived when a certain condition holds in the form of “if condition then conclusion.” When such rules are applied to network failures, a rule is defined with the if portion designating a combination of events (event group) such as alarms and log information originating in network equipment at the time of a failure, and the then portion designating the cause and location of that failure. When a failure occurs, comparing alarm conditions with such rules enables efficient derivation of points (candidates) in the network causing that failure. A maintenance operator can then mount a response to that failure based on the failure point candidates derived. This reduces the workload associated with time-consuming alarm analysis and troubleshooting-related diagnosis while offering the potential of failure response independent of operator skills.

2.2 Systemization

We constructed a failure point estimation system using rule-based learning with high accuracy by combining this technology with a commercially available rule engine (an engine that performs processing based on if-then rules), as shown in Fig. 2. This system maintains configuration information targeted for management as topology data in a data format that the system can analyze. At the time of a failure in the target environment, the system processes an event group consisting of alarm and log data as input data and presents the operator with the results of estimating failure points based on rules. If no rules corresponding to the current failure case have been registered, the operator can input information on the correct cause of failure through a graphical user interface, thereby saving that case as an example of a past failure ready for rule learning.


Fig. 2. Failure point estimation system using rule-based learning.

Here, rule learning not only serves to add a new rule but also to use added rules as a basis to examine whether all stored failure examples from the past can be used to make correct judgments on current failures. Past failure examples include an event group made up of alarm and log data plus the cause and point of failure for each failure case. Since the know-how of maintenance operators who perform actual failure analysis and troubleshooting is learned in the form of rules, this system can also contribute to the conversion of failure-response actions (operator know-how) into knowledge.

3. Future outlook

Going forward, we plan to study ways of improving the accuracy of failure point estimation by using enhanced learning algorithms and to expand the application scope of the proposed technology.

Naomi Murata
Researcher, Access Network Operation Project, NTT Access Network Service Systems Laboratories.
She joined NTT Communications in 2006 and is currently engaged in developing operation support systems of access networks.
Fumika Asai
Researcher, Access Network Operation Project, NTT Access Network Service Systems Laboratories.
She received an M.E. in civil engineering from Tokyo Institute of Technology in 2014. She joined NTT EAST in 2014 and is currently developing operation support systems of access networks at NTT.
Taisuke Yakawa
Research Engineer, Access Network Operation Project, NTT Access Network Service Systems Laboratories.
He joined NTT WEST in 2000 and is currently involved in developing operation support systems of access networks.
Satoshi Suzuki
Senior Research Engineer, Access Network Operation Project, NTT Access Network Service Systems Laboratories.
He received an M.E. in global environmental engineering from Kyoto University in 1995. Since joining NTT in 1995, he has mainly been researching and developing network operation support systems in access networks and wide area Ethernet networks.
He is a member of the Institute of Electronics, Information and Communication Engineers.
Haruo Oishi
Senior Research Engineer, Supervisor, Access Network Operation Project, NTT Access Network Service Systems Laboratories.
He received an M.E. from Tokyo Institute of Technology in 1999. He joined NTT in 1999 and is currently engaged in the development of operation support systems of access networks.
Akira Inoue
Senior Research Engineer, Supervisor, Access Network Operation Project, NTT Access Network Service Systems Laboratories.
He received an M.E. in mechanical engineering from Osaka University in 1994. He joined NTT in 1994 and is currently engaged in the development of operation support systems of access networks.

↑ TOP