

Regular Articles Routing and Spectrum Assignment Using Deep Reinforcement Learning in Optical NetworksAbstractIn future optical networks, effective use of spectral resources will be an issue as complexity increases due to diversified and dynamic requirements. In this article, we first give an overview of the routing and spectrum assignment (RSA) problem in elastic optical networks. We then introduce our novel RSA algorithm called Mask RSA, which allows for efficient route and spectral resource selection by deep reinforcement learning. Keywords: routing and spectrum assignment, reinforcement learning, optical networks 1. IntroductionInternet traffic has been grown rapidly due to increasingly diversified traffic demands as a result of both videostreaming and cloudcomputing services. To handle the changing requirements of future traffic in backbone networks, the concept of the elastic optical network (EON) was proposed [1]. EONs have a finely granular frequency grid (typically a multiple of either 6.25 or 12.5 GHz) to allocate the minimum frequency bandwidth to each channel, compared with the traditional flexedgrid (e.g., 50 GHz). EONs allocate a different number of frequency slots (FSs) to a connection request in accordance with the bandwidth demand for more efficient spectral utilization. However, nonuniform FS allocation results in spectral fragmentation that degrades spectral utilization. Therefore, efficient use of the spectral resources in EONs requires a routing and spectrum assignment (RSA) algorithm that can prevent spectral fragmentation. Research into optical networking has yielded proposals based on RSA using deep reinforcement learning (DRL) [2] in EONs. The pioneering studies are on DeepRMSA [3] and its improved version [4], which determine the routing path among K shortest paths (KSPs), outperforming traditional RSA based on shortest path and KSP algorithms. These pioneering studies indicated that DRLbased RSA is promising. In this article, we introduce the basic concept of RSA in EONs followed by DRLbased RSA algorithms for efficient network planning in EONs. We then explain our proposed DRLbased RSA algorithm called Mask RSA along with its performance evaluation. 2. EONsEONs have emerged as one of the most promising network technologies for nextgeneration optical networks. In simulation, an EON consists of nodes and links, and each link has FSs. FSs are represented as indexed list of FS status, e.g., available and occupied FSs are represented as 1 and 0, respectively, then FSs indicate a list [0,0,0,1,1,1,0,1,1]. Each connection request has a duration, and when the duration elapses, the FSs that were used are released. Figure 1 illustrates an example of a simulated EON that consists of four nodes, A, B, C, and D, and four links, AB, BC, CD, and DA. Each link has eight FSs. Let us take the case of an incoming connection request from node A to node C with two FSs. To check which FSs are available to assign in the route ABC, FSs of links in the route are calculated by the bitwise AND operation; the FSs of AB are [0,1,1,1,0,0,1,1] and BC are [1,1,1,0,0,1,1,1], and the resulting FS status is [0,1,1,0,0,0,1,1]. In the route, assignable candidates are the 2^{nd} and 3^{rd} and 7^{th} and 8^{th} FSs. Like this example, a spectrum continuity constraint (i.e., same FSs should be used across links) exists, which ensures that all links in the endtoend route use the same FSs. Spectrum assignment (SA) algorithms determine which candidates are used, as explained in the following section.
3. RSATo give an overview of RSA, let us take an example of a wellknown heuristic algorithm, the KSP and first fit (KSPFF) algorithm. The KSPFF algorithm solves the routing subproblem (by KSP) and SA subproblem (by FF) separately. It first determines a route then the FSs to be used. Figure 2 shows an overview of determining which route and FSs.
In routing, the KSPFF algorithm first precomputes a routing table that is an ordered list of the K routes between all pair of nodes. Next, when a connection request arrives, KSPs are obtained from the routing table by using the pair of source and destination nodes. Finally, a path where assignable FSs exist is searched for in order of shorter paths. When available FSs do not exist in K routes, the connection request is rejected, which is called blocking. After routing, spectrum assignment by FF is executed; in all assignable FSs, the lowest indexed FSs are selected. This assignment procedure packs existing connections into the smallest number of FSs, leaving a larger number of available FSs. An efficient RSA algorithm can prevent spectral fragmentation. Let us take two cases in which FSs are [1,0,0,1,0,0,1] and [0,0,0,0,1,1,1] with a 2FS request. In the first case, there are many spectral losses, making it impossible to assign the 2FS request. The second case allocates FSs consecutively, so there is no spectrum loss and 2FS requests can be allocated. Therefore, preventing spectral fragmentation can allocate more requests. 4. DRLbased RSAAn overview of dynamic RSA in the format of a wellknown DRL modeling is shown in Fig. 3. At each time step, the agent takes an action, and the action space is predefined at the formulation phase. For example, when DRL is applied to the routing subproblem that selects one of the KSPs, the action space is {1, 2, ..., K}, the action of which is mapped to the corresponding shortest path. Next, the agent receives an observation and reward from the environment. Observation is the status of the current environment which includes a request and a status of FS utilization. Reward is a value that represents how good the action is. An agent takes actions on the basis of the observation, and parameters of the agent’s actiondecision function, i.e., a deep neural network (DNN) is updated to maximize the total number of rewards.
One of the key problems with DRLbased RSA is how to define the action space; assignable FSs vary time to time and depends on routing paths. Figure 4 shows an example of the FS selection from action spaces proposed in a previous study [4]. The action space is {1, 2}, the action of which is mapped to assignable candidates. In case 1, the first candidate is 1–2 FSs, and the second one is 6–7 FSs. The DNN determines which candidate should be used to accommodate as many future connection requests as possible. Unlike case 1, the problem in case 2 is that the number of assignable candidates is less than the size of the action space. In this case, for actions that are not mapped to any assignable candidates, blocking is mapped; the blocking action is selected, and its connection request is rejected. Since this definition of action cannot be used to evaluate all assignable candidates properly when the size of the action space is less than the number of assignable candidates, a trained agent would be suboptimal. Therefore, efficient DRLbased RSA algorithms need to be flexible in accordance with the changing number of assignable candidates while avoiding spectral fragmentation.
5. Mask RSATo handle the dynamic changes in the number of assignable candidates without spectral fragmentation, our Mask RSA masks unassignable candidates to take into account assignable ones. Mask RSA is based on our past study [5]. First, we explain the definition of an action space in Mask RSA. Mask RSA implements routing by selecting one of the KSPs, and SA is executed by selecting the first index of used FSs. Thus, an action space in Mask RSA is defined as {1, 2, ..., S × K, S × K + 1}, where S and K are the numbers of FSs and paths, respectively. The option of an action is donothing; if assignable resources do not exist in a KSP, take the action of donothing, which leads to blocking. For example, when the selected action number is 120, (1) donothing if 120 > S × K; otherwise (2) the selected path is and the start index of used FSs is 120 mod S. This formulation makes an agent select a routing path and FSs concurrently. Figure 5 gives an overview of Mask RSA inference. To handle dynamic changes in the number of assignable candidates, the masking approach is used. First, for each routing path, an assignable boundary slot mask is generated as a vector with its assignable position at a boundary of 1; otherwise, 0. An ABSM can prevent spectral fragmentation since it takes into account only boundaries. Next, a donothing mask is created; 0 when assignable candidates exist among KSPs, otherwise 1. Finally, both K ABSMs and donothing mask are concatenated to generate an RSA assignable mask Let softmax() and argmax() be softmax and argmax functions, respectively. The mapping function from convolutional neural network (CNN) output to an action is written as where is a DNN output vector, and is the Hadamard product. This mask prevents both the examination of unassignable choices and spectral fragmentation.
6. DemonstrationWe evaluated the performance of Mask RSA by comparing it with the KSPFF algorithm through simulations. The simulations involved a dynamic traffic scenario in which requests were generated on the basis of a Poisson process following a uniform traffic distribution. The average arrival rate and service duration for training were 10 and 12, respectively. The requested FS width was randomly selected in the range of 1 to 8. The tested networks were JPN25 (25 nodes and 43 links) and JPN48 (48 nodes and 82 links) [6]. Each FS was set to be 12.5 GHz, and each link had 320 FSs. Figure 6 shows request blocking probabilities versus traffic loads from 110 to 200 Erlangs. In both networks, our Mask RSA outperformed the KSPFF algorithm even when the traffic load differed from that used in training. These simulations in two topologies showed that the masking approach enabled efficient RSA regardless of the network size or traffic load used for training; accordingly, Mask RSA outperformed the KSPFF algorithm.
7. Further studiesIn this article, we introduced the application of DRL to dynamic optical network planning in EONs. The proposed DRLbased RSA algorithm, Mask RSA, is enhanced with domainspecific knowledge, outperforming heuristic algorithms. We plan to expand our study to more complicated networks, e.g., multicore, multilayer networks, and impairmentaware RSA, to enable more efficient network planning. References
