To view PDF files

You need Adobe Reader 7.0 or later in order to read PDF files on this site.
If Adobe Reader is not installed on your computer, click the button below and go to the download site.

Feature Articles: Network Technology for Digital Society of the FutureĦ½Toward Advanced, Smart, and Environmentally Friendly Operations

Automatic Generation of Recovery-command Sequences

Takehiro Kawata, Yoichi Matsuo, Hiroki Ikeuchi,
and Yuka Hashimoto

Abstract

We describe technology for automatically generating recovery-command sequences, which is intended to support quick recovery actions by system operators and to achieve automatic recovery from ICT (information and communication technology)-system failures.

Keywords: recovery-command sequence, seq2seq, automation

PDF PDF

1. Introduction

In current large-scale ICT (information and communication technology) systems, troubleshooting has become more complicated due to the diversification of the causes of network failures. The increase in operational costs has also become a serious problem. We are developing technology for automatic generation of recovery-command sequences that is designed to help system operators recover from failures quickly and achieve automated recovery operations [1].

2. Overview of technology

An overview of our technology is shown in Fig. 1. Sequences of recovery commands are estimated by using a sequence-to-sequence technique (seq2seq) [2], which is a neural-network model that learns the relationship between an input sequence and an output sequence (Fig. 2).


Fig. 1. Outline of automatic generation of recovery-command sequences.


Fig. 2. Estimation by seq2seq.

Seq2seq is widely used in translation systems and dialog tasks. In our technology, we use a sequence that consists of a series of log identifiers (IDs) as an input sequence. The log IDs are generated by associating system logs and alarms related to system failures with unique numbers [3]. We also use a sequence of words that consists of a recovery-command sequence as an output sequence. Learning the relationship between the input sequence and the output sequence makes it possible to estimate a command sequence that will restore the system when a new failure occurs.

When the command sequence estimated in this method is executed, it is necessary to measure the reliability of the estimation and the impact on the system of the command sequence. In our technology, we define the reliability of a command sequence by multiplying the generated probabilities of each word that composes the recovery-command sequence. Thus, the reliability can be regarded as a probability of the system recovery when the obtained command sequence is executed. Moreover, we can define the impact on the system by using the information about the impact on performance of the system when recovery-command sequences were executed in past failures. These indicators (i.e., reliability and impact) can be used to decide whether to execute the obtained command sequence.

3. Future work

We will continue to work on verifying our technology by using data obtained from commercial systems and improving the accuracy of the estimated recovery-command sequences. We will also improve the definitions of the reliability and the impact from the viewpoint of practical system operation to achieve automated recovery operations.

References

[1] H. Ikeuchi, A. Watanabe, Y. Matsuo, and T. Kawata, “Automatic Generation of Recovery Command Sequences Using Seq2Seq,” IEICE General Conference, B-7-25, Tokyo, Japan, Mar. 2019 (in Japanese).
[2] I. Sutskever, O. Vinyals, and Q. Le, “Sequence to Sequence Learning with Neural Networks,” Advances in Neural Information Processing Systems 27 (NIPS 2014), 2014.
[3] T. Kimura, A. Watanabe, T. Toyono, and K. Ishibashi, “Proactive Failure Detection Learning Generation Patterns of Large-scale Network Logs,” Proc. of the 11th International Conference on Network and Service Management (CNSM 2015), Barcelona, Spain, Nov. 2015.
Takehiro Kawata
Senior Research Engineer, Communication Traffic & Service Quality Project, NTT Network Technology Laboratories.
He received a B.E. in applied mathematics and physics from Kyoto University in 1993. Since joining NTT in 1993, he has been researching management and performance analysis of computer networks and cybersecurity. From February 2004 to January 2005, he was a visiting researcher at Columbia University, USA. He received the Institute of Electronics, Information and Communication Engineers (IEICE) Network Systems Research Award in 2007. He is a member of IEICE.
Yoichi Matsuo
Researcher, NTT Network Technology Laboratories.
He received an M.E. and Ph.D. in applied mathematics from Keio University, Tokyo, in 2012 and 2015. Since joining NTT in 2015, he has been conducting research on network management.
Hiroki Ikeuchi
Researcher, Traffic Engineering Group, Communication Traffic & Service Quality Project, NTT Network Technology Laboratories.
He received a B.S. and M.S. in physics from the University of Tokyo in 2014 and 2016. Since joining NTT in 2016, he has been researching network management. He is a member of IEICE.
Yuka Hashimoto
NTT Network Technology Laboratories.
She received an M.S. in mathematical science from Keio University, Tokyo, in 2018. Since joining NTT in 2018, she has been involved in research on automation technologies for network operation. She is a member of IEICE.

↑ TOP