Global Standardization Activities
Vol. 20, No. 5, pp. 45–50, May 2022. https://doi.org/10.53829/ntr202205gls
Latest Research Results and ITU-T Standardization Activities on Soft Errors Caused by Cosmic Rays
Due to advances in digital transformation, it is important to take measures against soft errors caused by cosmic rays to maintain a safe and secure network. This article explains the latest research results on soft errors from NTT laboratories, commercialization of soft-error test technology, and standardization activities in the International Telecommunication Union - Telecommunication Standardization Sector (ITU-T).
Keywords: soft error, cosmic ray, ITU-T
1. Background and overview
As services become more varied and people seek greater convenience, modern social infrastructures are undergoing digital transformation. However, even as people enjoy greater convenience in everyday life, inexplicable problems with electronic devices are increasing due to cosmic phenomena. When cosmic rays from outer space collide with oxygen or nitrogen atoms in the atmosphere, neutrons are generated. When these neutrons collide with semiconductors within electronic devices, they can cause soft errors, which are errors caused by neutrons rewriting data stored in such devices. Soft errors can induce failures that critically impact social infrastructure (Fig. 1) [1, 2]. Various measures are taken to ensure stable operation of social infrastructure, such as error-prevention measures within electronic devices and redundancy in equipment and systems. However, as high integration and miniaturization of semiconductors advance, electronic devices will be increasingly affected by neutrons. A soft error is a temporary failure (memory bit inversion) caused by electrical noise, unlike a hard error in which a semiconductor device fails permanently, and the device is recovered by restarting or overwriting it. As shown in Fig. 1, the failure rate of soft errors has risen sharply compared with that of hard errors, which does not change due to the miniaturization of semiconductors. For example, 1 semiconductor device with 10,000 failures in time (FITs: indicates the number of failures per 1 billion hours) causes 0.09 failures per year. If a network is operated with 5000 units equipped with 6 of the same semiconductor devices as telecommunication equipment, it is expected that 262 failures will occur per year. It may be difficult to identify the cause of soft errors because such errors may cause malfunctions or system shutdown due to rewriting of saved data, and once the power is turned off, no trace is left. Since the probability of soft errors occurring per electronic device is extremely low and cannot be reproduced, it may be a heavy burden for the network operator to investigate the cause and take countermeasures. Against this background, soft-error countermeasures have become important for telecommunication systems that require high reliability.
Therefore, to enable countermeasures and evaluation of such soft errors, NTT laboratories have established a soft-error testing technology that can reproduce soft errors in a short time and calculate the soft-error occurrence rate in the natural world and various environments with high accuracy. We have also commercialized the technology and standardized it in the International Telecommunication Union - Telecommunication Standardization Sector (ITU-T) Study Group 5 (SG5).
2. Measurement of neutron energy characteristics that cause soft errors
To take measures against soft errors, it becomes important to pay attention to the number of failures due to soft errors arising per hour or per day in designing semiconductors and systems. To be able to calculate the number of soft-error-induced failures in a variety of environments, one needs to know the soft-error rate, i.e., the rate at which a soft error occurs at a certain level of neutron speed or energy.
The soft-error rate varies depending on the neutron energy level. The neutrons flying in an environment have an energy distribution that varies from place to place, e.g., on the Earth, in outer space, or on other planets. Therefore, to determine the number of failures caused by soft errors, it is necessary to take the number of neutrons at a given energy in each type of environment into consideration. This is calculated as follows:
(i) Let the number of neutrons at energy E be φ(E).
(ii) Multiply this by the neutron-E-dependent soft-error rate σ(E).
(iii) The number of failures caused by neutrons at E can be calculated by φ(E) × σ(E).
The total number of failures caused by soft errors at a given place or environment can be obtained as in Eq. (1) by integrating over the number of failures ((iii) above) with all the E distributed in the environment.
Number of failures caused by soft errors
Therefore, the data on the neutron-energy-dependent soft-error rate (in (ii) above) are essential for calculating the number of failures caused by soft errors. However, soft-error rates have been measured only for discrete energies using an accelerator. Consequently, soft-error rates could only be obtained at discrete energies. This has made accurate calculation of the number of failures caused by soft errors extremely difficult. If we are to calculate the number of failures accurately, we need data on soft-error rates measured at continuously varying neutron energies, but such measurement has been considered impossible. NTT has developed an ultra-high-speed error-detection circuit that enables us to precisely measure the flight times of neutrons arriving at a semiconductor even if the velocities are close to the speed of light. From the flight time, we can deduce the speed of the neutrons causing the soft errors. The circuit makes it possible to measure soft errors caused by neutrons across an extremely wide range of energies up to 800 MeV (Fig. 2) .
3. Soft-error test using accelerator neutron-driven sources
Since the soft-error rate is extremely low per semiconductor device, it is difficult to reproduce soft errors at the development stage of telecommunication systems. Soft errors can be generated in a short time by irradiating the natural world with several orders of magnitude more neutrons. Conventionally, a single semiconductor device has been tested to reproduce soft errors using a high-energy accelerator (several 100 MeV), such as a powerful accelerator at the Los Alamos Neutron Science Center (LANSCE) in Los Alamos National Laboratory, USA. This is because the above-mentioned neutron energy-dependent soft-error rate has not been clarified, so a high-energy accelerator capable of generating a neutron spectrum having almost the same shape as the natural world shown in Fig. 3 was used [4–6]. If the neutron-spectrum shape is the same, it is possible to easily calculate how many times the acceleration is relative to the natural world by the ratio of the number of neutrons from the accelerator to the number of neutrons in the natural world. However, it has been a very high hurdle in terms of securing machine time and cost since there are only a few accelerators with such specifications in the world. If there are the above-mentioned data on the soft-error rate depending on the neutron energy, even if the neutron spectrum shape is different, the data can be converted into the number of soft-error failures in the natural world, various environments not only at ground level but also at high altitudes, in space, or even on another planet, or other accelerator environments. As shown in Fig. 3, NTT has demonstrated that the neutron spectrum is different from that in the natural world, but soft errors can be reproduced even with the compact accelerator-driven neutron source using relatively low-energy electrons (33 MeV) owned by Hokkaido University. We also confirmed that soft errors could be evaluated at the development stage of telecommunication systems (Fig. 4). There are many accelerators with specifications of this level in Japan and can be fully used at the development stage from the viewpoint of securing machine time and cost.
Since the problem of such soft errors is related not only to telecommunication systems but also to all electronic devices that require high reliability used in infrastructure, etc., it is expected that there will be demand for this test. We conducted a joint experiment with Nagoya University and SHI-ATEX Co., Ltd. for commercialization, and in December 2016, NTT Advanced Technology started a commercial service for soft-error testing. Soft-error tests are currently being conducted on electronic devices other than for telecommunications.
4. ITU-T’s standardization
Against this background, at the October 2015 meeting of ITU-T SG5, commencement of a study on soft errors in telecommunication equipment was approved with the intention of defining requirements on measures to mitigate soft errors, ranging from design techniques to evaluation methods. The Ad Hoc Committee on Soft Error Testing (SOET Adhoc) member companies worked together and developed draft Recommendations. Six Recommendations for soft errors were then enacted at ITU-T in 2018. After that, revised versions that reflected the latest research mentioned above and how to address the issues of soft-error testing were enacted. These Recommendations provide design methodologies, test methods, quality-estimation methods, information on semiconductor devices, and reliability requirements for soft-error mitigation of telecommunication systems (Fig. 5, Table 1).
Specifically, K.124 provides an overview of the effects of particle radiation and design methods to mitigate the impact of soft errors, K.131 describes the principles and design methods for soft-error mitigation measures for the equipment that comprises carrier telecommunications networks, and K.150 defines characteristic parameters and functions of semiconductor devices that a telecommunication-equipment designer needs when implementing soft-error mitigation measures. In addition, K.139 defines the reliability requirements for telecommunication systems in relation to soft errors, K.130 shows the soft-error test methods using the accelerator, and K.138 describes the reliability-estimation methods on the basis of the results of soft-error testing taking into account the severity of the effect of soft errors. These Recommendations allow telecommunication-systems suppliers to understand soft-error tolerance before actual operation and clarify how much tolerance manufacturers should have.
5. Future perspective
We plan to proceed with research on new countermeasures and evaluation technologies in the space environment by using our technologies on the ground that we have thus far developed.