You need Adobe Reader 7.0 or later in order to read PDF files on this site.
If Adobe Reader is not installed on your computer, click the button below and go to the download site.

Feature Articles: Fixed-mobile Convergence Services with IOWN

Vol. 23, No. 1, pp. 43–49, Jan. 2025. https://doi.org/10.53829/ntr202501fa5

Power-saving Enabler That Uses Software Technology to Reduce Network Power Consumption

Kei Fujimoto, Ko Natori, Hikaru Harasawa, Ikuo Otani, and Shogo Saito

Abstract

The amount of traffic flowing through networks is increasing yearly, and the increase in power consumption of network devices is becoming an issue. This article introduces research and development efforts to reduce power consumption by using software technology within the context of software processing of networks.

Keywords: communication software, power saving, low latency

1. Software implementation of network functions and power saving

Network devices that have been implemented with dedicated hardware are increasingly being replaced by software running on general-purpose servers. This implementation method is often referred to as network function virtualization (NFV). The widespread transition to NFV can be put down to the improved performance of general-purpose central processing units (CPUs) and advances both in software technology for processing communication functions and in virtualization technology. The advantages of NFV are that initial costs for installing hardware devices are low because mass-produced general-purpose servers can be used; the development time for software implementation of functions is shorter than that for hardware circuit design; and flexible operations, such as dynamic configuration changes, are possible. While NFV has many advantages, it also faces challenges. One is how to achieve high performance. With dedicated hardware, the circuit design can be optimized for each application, leading to high processing efficiency and high performance. In contrast, NFV is implemented using software running on a general-purpose server, and the use of such general-purpose hardware often leads to inefficiencies. Therefore, high performance cannot be achieved without innovation. Examples of such innovation includes using a large number of CPUs to continuously monitor packet arrivals and process packets quickly and suppressing CPU hibernation to reduce the overhead of recovery from the hibernation state. However, all these innovations sacrifice power efficiency. There is generally a trade-off between high performance and power saving. This is especially true for systems that require high throughput and low latency. An example of a system that requires high throughput and low latency is a virtual radio access network (vRAN).

2. Introduction to a use case: vRAN

Mobile communication networks use a vast number of wireless base stations to provide communication services across a wide area. The main roles of base stations are to transmit and receive radio waves to and from mobile communication devices for wireless communication and to process radio signals. Base stations are required to provide high throughput and meet very stringent low-latency requirements. For this reason, dedicated hardware optimized for use in base stations has conventionally been used. The advantages of NFV have been studied in vRANs, which operate with software running on a general-purpose server. Figure 1 shows an example of a vRAN system configuration. A RAN consists of three types of functional units: radio units (RUs), distributed units (DUs), and central units (CUs). The functional units being studied for implementation using software running on a general-purpose server are DUs and CUs, which process radio signals. The vRAN transmits and receives radio signals to and from terminals, such as mobile phones. vRAN radio resources are multiplexed in the frequency and time directions, and radio waves are exchanged in time slots of short intervals in microseconds. DUs and CUs must process radio signals without delay within these time slots. Therefore, they need to meet stringent low-latency requirements in microseconds. At the interface between an RU and a DU, called the fronthaul, the DU needs to provide high throughput of more than 20 Gbit/s in the Split 7.2 functional deployment specified by the O-RAN ALLIANCE. It is technically challenging to implement DU or CU functions with software running on a general-purpose server because of the need to achieve throughput exceeding 20 Gbit/s while meeting microsecond-order latency requirements. For example, Linux is widely used as an operating system (OS), and the standard packet transmission and reception mechanism in the Linux kernel can cause delays from several hundred microseconds to milliseconds [1], making it difficult to achieve throughput as high as 20 Gbit/s. Achieving power savings while meeting stringent low-latency and high-throughput requirements is even more technically challenging. For example, many widely used general-purpose CPUs have a function to put themselves into an idle state when there is no task to process, but the overhead required to recover from a deep idle state can exceed 100 microseconds. If a CPU is frequently transitioning into and out of an idle state, the low-latency requirements may not be met. Thus, it is a challenging task to both satisfy the low-latency and high-throughput requirements and achieve power savings at the same time.

Fig. 1. Example of vRAN system configuration.

3. Power-saving enabler that uses software technology to reduce power consumption

NTT Network Innovation Center is undertaking research and development of a power-saving enabler, POSENA (POwer Saving ENAbler), with the aim of using software technology to achieve power savings in applications that have stringent low-latency and high-throughput requirements. Figure 2 shows an overview of POSENA. The concept of POSENA is to use software to control processing of the given load in such a way that the minimum necessary computing resources are used. Great power savings can be achieved, particularly when the load is low.

Fig. 2. Overview of POSENA.

POSENA uses multiple technologies. Power saving is enabled by adding POSENA to applications, middleware, OSes, or virtualization software, as an add-in^*. Particularly promising use cases are NFV applications that require stringent low-latency and high-throughput requirements, such as vRANs, virtual reality (VR), augmented reality (AR), media processing, routers, and other functional parts of a network.

This article introduces two technologies used by POSENA: Power-saving low-latency data transfer technology [2] and CPU idle-aware optimal tuning and control technology [3], with examples of their use in a vRAN, a promising use case.

*	Add-in: Extending the functionality of software by applying patches or other operations.

4. Power-saving low-latency data transfer technology

4.1 Conventional high-speed, low-latency packet transmission and reception methods

Linux, a widely used OS, has a packet transmission/reception function in its kernel so that application developers can easily develop programs that send and receive packets. The packet transmission/reception function is designed for general-purpose use. For example, it diverts CPU resources to packet reception only when a packet arrives so that it can function even when the computer has only a single CPU core. Specifically, when a packet arrives at the network interface card (NIC), the Linux kernel is notified by a hardware interrupt. Triggered by this interrupt, the Linux kernel stops processing the current task and begins handling the packet reception. This packet reception method is called the interrupt method and is highly efficient because it diverts CPU resources only when a packet arrives. It is also highly versatile because it can operate even when the computer has only a single CPU core. The downside is that, when an interrupt occurs, this method introduces an overhead due to context switching, such as interrupting the current processing task and saving the data being computed. In addition, when packet reception processing is handled in the Linux kernel, a large delay may occur due to the need to copy packet data from memory in the kernel space to memory in the user space, making it difficult to achieve high throughput.

To avoid this problem, there is a mechanism for bypassing the kernel in cases where low latency and high throughput are required, such as NFV applications. This mechanism executes packet transmission/reception in the user space rather than using networking functions provided by the Linux kernel. One example is the Data Plane Development Kit (DPDK), which is provided as open source. DPDK instantiates a thread, called a polling thread, resident in the user space that monitors packet arrival at the NIC. Since the CPU is thus constantly monitoring packet arrivals, it can achieve high responsiveness when receiving packets. For this reason, DPDK is often used in vRAN systems, which need to meet microsecond-order latency requirements and provide high throughput in excess of 20 Gbit/s. Although this polling scheme is excellent in terms of low latency and high throughput, it is problematic because the polling thread is constantly using CPU resources to monitor packet arrivals. Even when no packets are arriving, the CPU utilization is always 100%, resulting in high power consumption. Figure 3(a) illustrates this problem.

4.2 Packet transmission/reception method that satisfies both high-speed/low-latency and low-power-consumption requirements

A method of achieving low latency and high throughput and yet saving power involves revising the above-mentioned polling thread in such a way that the thread sleeps during periods when no packet is arriving, as shown in Fig. 3(b). This can avoid the constant 100% CPU-core usage described above, thus reducing power consumption. A potential problem with this is that the arrival of a new packet may go unnoticed during the sleep period. To prevent this from happening, hardware interrupts are used so that packet arrivals can be noticed immediately even when the thread is asleep. However, hardware interrupts cause the interrupted thread to suspend processing and incur the overhead of context switch, making it difficult to achieve low latency and high throughput. We solved this problem by taking note of the fact that, as long as the interrupted thread is not doing any processing at the time of interruption, there is no major problem. We created a polling thread that is dedicated to monitoring packet-arrival and packet-reception processing and does no other processing. This thread sleeps when no packets are arriving. When a packet arrives and a hardware interrupt occurs, the thread does not execute any processing. Hardware interrupts are very high-priority processes. If the hardware interrupt handler is required to execute a large amount of processing, it will consume CPU time, leaving too little allocated for stable system operation, leading in turn to system instability. To avoid this problem, the only processing required of the hardware interrupt handler issued by the NIC when a packet arrives is the action of waking up the polling thread. The packet reception processing is executed in the context of the polling thread rather than by the hardware interrupt handler. Therefore, even if a packet arrives during sleep, the thread can be woken instantly and the packet reception process can be executed.

Fig. 3. Change in CPU utilization over time of polling thread.

To evaluate the effectiveness of this method, we conducted experiments using a simple software switch that sends and receives packets and confirmed that this method saves about 4.5 W per polling thread. We also confirmed that the increase in latency caused by putting the polling thread into a sleep state was as small as 4 to 7 microseconds. We also evaluated the practicality of this method by conducting a demonstration test using a commercial vRAN vDU product. We were able to confirm that this method reduced power consumption without adversely affecting wireless signal processing.

The number of polling threads depends on the throughput required for any particular system. In most cases, a system will have multiple polling threads. As the number of polling threads increases, so does the power-saving effect. This sleep control method can be applied not only to network input/output (I/O) of packet transmission/reception processing but also to accelerator I/O and to inter-process communication.

5. CPU idle-aware optimal tuning and control technology

5.1 Power-saving mechanism via hardware control provided by the CPU

Most CPUs, which are widely used processors, have a hardware-controlled power-saving mechanism. For example, when there is no task to be processed, power is saved by stopping the CPU’s operating clock, lowering the operating voltage, and disabling the cache function. This mechanism is called CPU idle or low power idle (LPI). In the case of Intel CPUs, it is called a C-state. CPU idle is very useful for power saving but introduces latency when a CPU core in the idle state needs to return to the active state. If the CPU idle function is used without careful consideration, latency and throughput requirements may not be met. Therefore, when stringent latency requirements or high throughput are required, the CPU idle function is disabled.

5.2 CPU idle control that satisfies both low-latency/high-throughput and power-saving requirements

This technology achieves power savings by color-coding CPU cores into the idle CPU core group that can make the transition to a “deep” CPU idle state and the active CPU core group that can only go into a “shallow” CPU idle state. Power savings are achieved by controlling which CPU core group a given task is assigned to. Figure 4 shows an overview of this technology. It is desirable to assign tasks with stringent latency requirements, tasks that require high throughput, and frequently executed tasks to the active CPU core group. It is preferable to assign tasks with less stringent latency and throughput requirements and sparsely executed tasks to the idle CPU core group. When a task is assigned to a CPU core that is in a deep CPU idle state, a lengthy recovery time is required before the CPU core returns to the active state, as mentioned above. A way of shortening this recovery time is to wake the core that is in a deep CPU idle state in advance of assigning a task to it. This enables us to benefit from the power-saving effect of transitioning a core to a deep CPU idle state without the added latency caused by the recovery time.

Fig. 4. Overview of CPU idle-aware optimal tuning and control technology.

To evaluate the effectiveness of this technology, we implemented and evaluated a prototype of this technology using a general-purpose server with a CPU featuring a CPU idle function. The evaluation results varied depending on the number of CPU cores mounted on the server hardware and type of task to be processed, but we confirmed a reduction in power consumption of more than 100 W per server. To confirm the practicality of this technology, we conducted a demonstration experiment by applying it to a commercial vRAN vDU product. We confirmed that it reduces power consumption without adversely affecting radio signal processing.

6. Future outlook

The amount of communication traffic flowing through networks is increasing yearly. The network device load to handle this communication traffic is also increasing. Without technical innovation, the increase in communication traffic will continue to boost power consumption. Therefore, it is urgent that we take specific measures that will reduce power consumption but ensure high performance. We will continue to create many highly effective power-saving technologies and implement them in practical applications.

References

[1]	K. Fujimoto, M. Kaneko, K. Matsui, and M. Akutsu, “KBP: Kernel Enhancements for Low-latency Networking for Virtual Machine and Container without Application Customization,” IEICE Transactions on Communications, Vol. E105-B, No. 5, pp. 522–532, May 2022. https://doi.org/10.1587/transcom.2021EBT0004
[2]	K. Natori, K. Fujimoto, and A. Shiraga, “Low-latency and Energy-efficient Frame Forwarding for vRAN Traffic,” Proc. of the 8th IEEE International Conference on Network Softwarization (NetSoft 2022), pp. 97–102, Milan, Italy, June 2022. https://doi.org/10.1109/NetSoft54395.2022.9844046
[3]	K. Fujimoto, H. Harasawa, K. Natori, I. Otani, S. Saito, and A. Shiraga, “PWU: Pre- Wakeup for CPU Idle to Reduce Latency and Power Consumption,” Proc. of the 30th International Conference on Software, Telecommunications and Computer Networks (SoftCOM 2022), pp. 1–6, Split, Croatia, Sept. 2022. https://doi.org/10.23919/SoftCOM55329.2022.9911402

	Kei Fujimoto Senior Manager, Network Control Software Project, NTT Network Innovation Center. He received a B.E. in electrical and electronic engineering and M.S. in informatics from Kyoto University in 2008 and 2010. Since joining NTT Network Service Systems Laboratories in 2010, he has engaged in the development of a transfer system for integrated services digital network (ISDN) services and research of network-system reliability and network application programming interfaces. From 2016 to 2018, he engaged in the creation of new services related to big data in NTT WEST. His current research fields are low-latency networking and power-aware computing. He is a member of the Institute of Electronics, Information and Communication Engineers (IEICE). He received Young Researcher’s Award from IEICE Technical Committee on Network Systems in 2020, Highly Commended Paper Award at IEEE International Telecommunication Networks and Applications Conference (ITNAC) in 2021, and Best Paper Award from IEICE in 2023.
	Ko Natori Researcher, Network Control Software Project, NTT Network Innovation Center. He received a B.E and M.E in computer science from Keio University, Kanagawa, in 2019 and 2021. Since 2021, he has been researching networking system software at NTT Network Innovation Center. His research interests include low-latency networking, system software, and operating systems. He is a member of IEEE and IEICE.
	Hikaru Harasawa Network Control Software Project, NTT Network Innovation Center. He received a B.E. in computer science from University of Aizu, Fukushima, in 2019 and M.E. in computer science from University of Tsukuba, Ibaraki, in 2022. Since 2022, he has been researching energy-efficient computing at NTT Network Innovation Center. His research interests include power-aware computing and low-latency networking.
	Ikuo Otani Research Engineer, Network Control Software Project, NTT Network Innovation Center. He received a B.S. and M.S. in physics from the University of Tokyo in 2011 and 2013. Since joining NTT Network Service Systems Laboratories in 2013, he has engaged in improving the efficiency of network processing in virtualized servers. From 2019 to 2021, he engaged in developing virtualized server infrastructure for a core network in NTT DOCOMO. His research interests include power-aware computing and hardware-assisted task processing. He received the Network System Research Award and Young Researcher’s Award from IEICE Technical Committee on Network Systems in 2018. He is a member of IEICE.
	Shogo Saito Senior Research Engineer, Network Control Software Project, NTT Network Innovation Center. He received a B.E. and M.E. in computer science from University of Tsukuba, Ibaraki, in 2011 and 2013. Since joining NTT Network Service Systems Laboratories in 2013, he has engaged in improving authentication in networks. From 2016 to 2019, he engaged in developing server applications for fixed-line telecommunication services in NTT WEST. His research interests include power-aware computing. He is a member of IEICE.

↑ TOP