# **Ultralow-latency Optical Circuit Based on Optical Pass Gate Logic**

### Akihiko Shinya, Tohru Ishihara, Koji Inoue, Kengo Nozaki, and Masaya Notomi

#### Abstract

A novel light speed computing technology has been developed by NTT, Kyoto University, and Kyushu University that employs nanophotonic technology in critical paths and thus overcomes the problem of operational latency that is the chief limiting factor in conventional electronic circuits. The ultimate objective of this work is to develop an ultrahigh-speed optoelectronic arithmetic processor. This article provides an overview of our recent work and describes the successful implementation of this novel optical computing technology.

Keywords: photonic integration, ultralow latency, nanophotonics

#### 1. Importance of ultralow latency operations

Improvements are still being achieved in the processing capacity of processors by increasing the number of cores and enhancing parallelization. However, the frequency response has leveled off, as one can see in **Fig. 1(a)** [1]. In other words, basic throughput continues to improve through integration and parallelization, but reductions in latency or delay have reached a plateau. Particularly in situations requiring spinal reflexive speed response, this calls for a significant technological breakthrough in the development of arithmetic processors capable of responding at super high speed.

## 2. Introduction of optical technology in arithmetic chips

Research and development in optical computing focused on achieving ultrahigh-speed calculations exploiting the immense broadband of light continued throughout the 1980s. The problem with this approach is that optical transistors are quite large and vastly inferior to complementary metal oxide semiconductor (CMOS) transistors in density, power consumption, cascadability, and other factors. It is not surprising that research in this area fell off sharply in the 1990s. At the same time, however, optical communications has proven vastly superior not only for long-haul communications but also in ongoing research to exploit the vast bandwidth of light in developing optical interconnects within and between chips. Today we see a marriage between light and electronics—optoelectronics—that exploits light for information transport and electronic circuitry for information processing.

More recently we have seen remarkable progress in nanophotonic technology as new solutions have been found dealing with problematic issues that plagued optical and optoelectronic computing research in the past. In photonic crystal technology, for example, optical elements have been significantly downscaled to a mere 1/1000th the size they were a decade ago with corresponding decreases in power consumption, which brings optical elements into close competition with CMOS circuits. It is time that we reconsider the prevailing division between optical and electronic, with optical used primarily for transport and electronic for information processing.

#### 3. Arithmetic chip delay factor

The frequency response rate-limiting issue mentioned earlier can be attributed to resistance (R) and



SPECint: Standard Performance Evaluation Corporation integer benchmark

Fig. 1. Processor trends and transistor-generated delay.



Fig. 2. Circuit configuration comparison.

capacitance (C) in the wiring path of CMOS circuits. The gate switching time of CMOS transistors has been sharply reduced by advances in semiconductor micro-fabrication technology, but the total delay of CMOS gates levels off at around 10 ps due to R and C in the transistor interconnects, as shown in **Fig. 1(b)** [2]. Moreover, R and C in the wiring only increases as transistors become more compactly integrated and wiring is stretched thinner and longer, which further increases the latency of actual circuits.

Electronic circuits also inevitably exhibit a certain amount of latency due to their structure. One of the most widely used circuit configurations is the AND/ OR logic circuit shown in **Fig. 2(a)**. The output signal from one logic gate drives the following logic gate, so obviously, the latter gate cannot do anything until the output signal from the previous gate arrives. The wait time involved in these gate operations is proportional to the number of gates, which makes for substantial arithmetic delay.

### 4. Arithmetic chip with optical and electronic elements integrated at transistor level

One solution to wiring-induced latency is on-chip optical communications. This is essentially a photonic technology for conveying information between cores, but here we extend this approach to the transistor level as a solution to the architecture-induced latency problem. In trying to come up with the ideal circuit configuration, we can find a valuable clue in the field of electronics. A schematic pyramid-shaped



Fig. 3. Schematic diagrams of digital adder circuits.

tree circuit based on a *binary decision diagram* (BDD) [3] is shown in **Fig. 2(b**). We assume a configuration in which "1" is output from the signal source located in the leaf part of the tree at the base of the pyramid, and Boolean operations are performed by selecting either signal source "1" or no signal source "0" depending on the combination of external inputs ( $x_1$ ,  $x_2$ ...). Various methods for simplifying BDDs have been proposed, and if these methods can be applied to the BDD-based circuit, the number of switches could be greatly reduced.

This type of circuit configuration is called a *pass* transistor logic circuit. The signal passing through the circuit is called a *carry*, and an operation is performed by steering the carry flow with  $2 \times 1$  switches.

Here, we refer to the optical version of this structure as an *optical pass gate logic circuit*, and we replace the electronic switches with  $2 \times 1$  and  $2 \times 2$ optical gates. In this architecture, light is used as the carry signal.

The optical pass gate logic circuit has a number of significant advantages:

- All switches making up the critical path operate collectively—We saw earlier that the gate operation wait time is proportional to the number of gates in an AND/OR logic circuit since subsequent gates cannot act until they receive the carry signal from the previous gate. Since optical pass gate logic circuits operate all gates collectively, though, they support critical paths requiring only a few picoseconds at most.
- Light speed operations—Since the optical carry does not sense R or C in the optical path, circuits are not slowed by R and C limitations in paths. Although optical gate operations do incur some RC delay, the operation time is affected very little since all gates operate collectively.
- Logic operation without optical transistor— Operations that require an optical transistor that

controls the optical carry by another light signal are very difficult to implement since with today's technology they consume enormous amounts of energy, generate practically the same amount of latency as CMOS gates, and have a host of other issues. However, our optical pass gate logic circuit performs logic operations without an optical transistor simply by passing the optical carry through electrically controlled optical gates.

One might assume that this configuration could be just as easily implemented with electronic circuitry, but the carry signal passes right through the series resistance of multiple transistors, which would drive up R and make it virtually impossible to fabricate a high-speed response circuit.

In contrast, our optical carry scheme is independent of R and C, so the carry propagation time is dramatically reduced by exploiting nanophotonic technology. For example, the propagation time for an optical gate length of 100  $\mu$ m is on the order of ~1 ps. This is just a fraction of the latency generated by a CMOS gate.

#### 5. Ultralow-latency optical parallel adder

Let us consider a specific circuit configuration as an example of a digital adder. A typical electronic circuit configuration is illustrated in **Fig. 3(a)**. The carry signal ( $c_i$ ) operates the gate in the i + 1th logic block, and the result generates the next carry signal ( $c_i + 1$ ). One will note that a certain amount of wait time is generated for the gate operations in the various logic blocks by this step. The new circuit configuration we propose is shown in **Fig. 3(b)**. In this scheme, all gates in the logic blocks are operated collectively, and this fundamentally changes the structure of carry signal propagation.

Let us first configure a BDD-based full adder (FA) as the i + 1th logic block.



Fig. 4. Schematic diagram of optical FA.

An FA takes two 1-bit inputs (x and y) representing the two significant bits to be added. In the circuit shown in **Fig. 4(a)**, a Mach-Zehnder interferometer (MZI) is incorporated as a 2 × 1 switch. The switch is configured to select the upper (lower) input port when the input signal ( $x_i$ ,  $y_i$ ,  $c_i$ ) is "1" ("0"). The circuit selects the light source located in the leaf part of the tree structure according to the truth table in **Table 1**. Note that  $x_i$ ,  $y_i$ , and  $c_i$  are all input at the same time, and consequently, all the MZIs are driven at the same time. This allows the carry operation [ $c_{i+1}$  = CARRY ( $x_i$ ,  $y_i$ ,  $c_i$ )] and *i*th digit addition [ $s_i$  = SUM ( $x_i$ ,  $y_i$ ,  $c_i$ )] to be completed just by propagation of light from the light source.

Note, however, that this circuit only adds two 1-bit inputs, x + y. In order to add multi-bit inputs, the optical carry signal ( $c_i + 1$ ) output from the *i*th FA circuit must be capable of operating the i + 1th FA circuit gate. For example, this could be achieved using an optoelectronic (OE) converter. Although there is a way of converting  $c_{i+1}$  to electronic signals, this involves latency, which again raises the issue of delayed operation time.

This led us to implement the block diagram shown in **Fig. 4(b)** [4]. This circuit operates according to the truth table in **Table 2**, which redefines the truth table in Table 1. Instead of the light source in Fig. 4(a), here we employ optical  $c_i$  and  $x_i$  signals. Light  $c_i$  uses output from the *i*th FA circuit, while the optical  $x_i$  signal is produced by combining light from the light source and from the MZI in the upper left. As is apparent from Table 2, the CARRY and SUM operations respectively select  $c_i$  ( $x_i$ ) and  $\overline{c}_1$  ( $c_i$ ) when exclusive or

| Input                 |                       |                       | Output |                                |                       |  |  |
|-----------------------|-----------------------|-----------------------|--------|--------------------------------|-----------------------|--|--|
| <i>C</i> <sub>i</sub> | <b>X</b> <sub>i</sub> | <b>y</b> <sub>i</sub> |        | <i>C</i> <sub><i>i</i>+1</sub> | <b>S</b> <sub>i</sub> |  |  |
| 1                     | 1                     | 1                     |        | 1 🐓                            | 1 🇳                   |  |  |
| 1                     | 1                     | 0                     |        | 1 🐓                            | 0                     |  |  |
| 1                     | 0                     | 1                     |        | 1 🐓                            | 0                     |  |  |
| 1                     | 0                     | 0                     |        | 0                              | 1 🈴                   |  |  |
| 0                     | 1                     | 1                     |        | 1 🐓                            | 0                     |  |  |
| 0                     | 1                     | 0                     |        | 0                              | 1 🈴                   |  |  |
| 0                     | 0                     | 1                     |        | 0                              | 1 🈴                   |  |  |
| 0                     | 0                     | 0                     |        | 0                              | 0                     |  |  |
| 🗳 🐓 Light source      |                       |                       |        |                                |                       |  |  |

Table 1. FA truth table.

Table 2. FA truth table in which light source is replaced by input signals  $C_{i}$ ,  $X_{i}$ .

| Input |                       |                       | Output                  |                       |  |  |
|-------|-----------------------|-----------------------|-------------------------|-----------------------|--|--|
|       | <b>X</b> <sub>i</sub> | <b>y</b> <sub>i</sub> | C <sub><i>i</i>+1</sub> | S <sub>i</sub>        |  |  |
|       | 1                     | 1                     | X <sub>i</sub>          | C <sub>i</sub>        |  |  |
|       | 1                     | 0                     | C <sub>i</sub>          | $\overline{C_i}$      |  |  |
|       | 0                     | 1                     | C <sub>i</sub>          | <b>C</b> <sub>i</sub> |  |  |
|       | 0                     | 0                     | X                       | C,                    |  |  |

(XOR)  $(x_i, y_i) = 1$  (0). This operation drives the three MZIs shown on the right side of Fig. 4(b). For example, the SUM operation is executed when  $c_i$  ( $\overline{c_1}$ ) is input to the port in the upper left (lower left) of the MZI in the middle of the right side, and by selecting the port in the lower left (upper left) when XOR



Fig. 5. Simulation results for 4-bit digital adder.

 $(x_i, y_i)1 = (0)$ . In this architecture, only one MZI is in the path where  $c_i$  is input and  $c_i + 1$  is output. This is the critical path that limits addition operations.

The simulation results for 4-bit addition are presented in **Fig. 5**. The leading edge of each digit's signal reveals the response speed of XOR operations. Note that arithmetic latency of XOR does not accumulate as the number of digits increases. However,  $\tau$ in the figure reveals a cumulative arithmetic delay of four digits, which generates a delay of about 1 ps per digit using a 100-µm-long MZI. The bottom line is that this ultralow latency figure is far smaller than the 22-ps-per-digit latency of current state-of-the-art circuits implemented in CMOS.

#### 6. Future prospects

This article introduced ultralow-latency optical pass gate logic circuits using a digital adder as an

example. We plan to build on this new architecture as we pursue operational trials on ultrasmall-feature devices that we are now developing as a concurrent project.

#### References

- K. Rupp, "40 Years of Microprocessor Trend Data." https://www.karlrupp.net/2015/06/40-years-of-microprocessortrend-data/
- [2] The Semiconductor Industry Association (SIA), "The National Technology Roadmap for Semiconductors, 1997 Edition," 1997.
- [3] T. Asai, Y. Amemiya, and M. Kosiba, "A Photonic-crystal Logic Circuit Based on the Binary Decision Diagram," Proc. of International Workshop on Photonic and Electromagnetic Crystal Structures (PECS), T4-14, Sendai, Miyagi, Japan, Mar. 2000.
- [4] T. Ishihara, A. Shinya, K. Inoue, K. Nozaki, and M. Notomi, "An Integrated Optical Parallel Adder as a First Step Towards Light Speed Data Processing," Proc. of the 13th International SoC Design Conference (ISOCC 2016): Smart SoC of Intelligent Things, pp. 123–124, Jeju, South Korea, Oct. 2016.



#### Akihiko Shinya

Group Leader, Senior Research Scientist, Supervisor, Photonic Nanostructure Research Group, NTT Basic Research Laboratories and NTT Nanophotonics Center. He received a B.E., M.E., and Ph.D. in electri-

He received a B.E., M.E., and Ph.D. in electrical engineering from Tokushima University in 1994, 1996, and 1999. He joined NTT Basic Research Laboratories in 1999. His current research involves photonic crystal devices. Dr. Shinya is a member of the Japan Society of Applied Physics (JSAP) and the Laser Society of Japan.

#### **Tohru Ishihara**

Associate Professor, Department of Communications and Computer Engineering, Kyoto University.

He received a Dr.Eng. in computer science from Kyushu University, Fukuoka, in 2000. For the next three years, he was a Research Associate in the VLSI Design and Education Center at the University of Tokyo. From 2003 to 2005, he was a researcher with Fujitsu Laboratories of Ameri-ca in the Advanced CAD Technology Group. From 2005 to 2011, he was with Kyushu University as an Associate Professor. In April 2011 he joined Kyoto University, where he is currently with the Department of Communications and Computer Engineering. His research interests include low-power design methodologies and power management techniques for embedded systems. He has served on the program commit-tee of numerous conferences. Dr. Ishihara is a member of the Institute of Electrical and Electronics Engineers (IEEE), the Association for Computing Machinery, the Information Processing Society of Japan, and the Institute of Electronics, Information and Communication Engineers (IEICE).



Koji Inoue

Professor, Department of I&E Visionaries, Kyushu University.

He received a B.E. and M.E. in computer science from Kyushu Institute of Technology, Fukuoka, in 1994 and 1996. He also received a Ph.D. from the Department of Computer Science and Communication Engineering, Graduate School of Information Science and Electrical Engineering, Kyushu University, Fukuoka, in 2001. In 1999, he joined Halo LSI Design & Technology, Inc., NY, USA, as a circuit designer. He is currently a professor in the Department of I&E Visionaries, Kyushu University. His research interests include power-aware computing, high-performance computing, dependable processor architecture, secure computer systems, three-dimensional microprocessor architectures, and multi/many-core architectures.





#### Kengo Nozaki

Research Engineer, Photonic Nanostructure Research Group, NTT Basic Research Laboratories and NTT Nanophotonics Center. He received a B.E., M.E., and Ph.D. in electri-

He received a B.E., M.E., and Ph.D. in electrical and computer engineering from Yokohama National University, Kanagawa, in 2003, 2005, and 2007. He joined NTT Basic Research Laboratories in 2008. His current interests include all-optical switches, memories, and electro-optic devices based on photonic crystals and related photonic nanostructures. He received the Best Paper Award from Photonics in Switching (PS) in 2012, the IEICE Electronics Society Young Researchers Award in 2014, and the Best Paper Award from OECC (OptoElectronics and Communications Conference)/PS in 2016. Dr. Nozaki is a member of JSAP.

#### Masaya Notomi

Senior Distinguished Scientist, Photonic Nanostructure Research Group, NTT Basic Research Laboratories; Project Leader of NTT Nanophotonics Center.

He received a B.E., M.E., and Ph.D. in applied physics from the University of Tokyo in 1986, 1988, and 1997. He joined NTT in 1988. Since then, his research has focused on controlling the optical properties of materials/devices by artificial nanostructures (quantum wires/dots and photonic crystals). In addition to his work at NTT, he has also been a professor in the Depart-ment of Physics, Tokyo Institute of Technology, since 2017. He received the IEEE/LEOS (Lasers & Electro-Optics Society) Distinguished Lecturer Award (2006), the JSPS prize from the Japan Society for the Promotion of Science (2009), a Japan Academy Medal (2009), and the Commendation for Science and Technology by the Japanese Minister of Education, Culture, Sports, Science and Technology (2010). Dr. Notomi is an IEEE Fellow and a member of JSAP, the American Physical Society, and the Optical Society (OSA).