![]() |
|||
|
|||
Feature Articles: Keynote Speeches at NTT R&D FORUM 2024 - IOWN INTEGRAL ![]() IOWN INTEGRALAbstractThis article introduces the initiatives, practical examples, and future outlook of NTT’s large language model “tsuzumi” and the Innovative Optical and Wireless Network (IOWN). It is based on the keynote speech given by Shingo Kinoshita, NTT senior vice president, head of research and development planning, at the “NTT R&D FORUM 2024 - IOWN INTEGRAL” held from November 25th to 29th, 2024. Keywords: IOWN, tsuzumi, artificial intelligence 1. NTT R&D Forum 2024 overviewIn the title of my talk, “IOWN INTEGRAL,” INTEGRAL has two meanings: “integration” and “indispensable.” “Integration” refers to the application and integration of the Innovative Optical and Wireless Network (IOWN) across a wide range of areas, and “indispensable” means that IOWN will become indispensable to the Earth and humankind. At NTT R&D Forum 2024, the exhibition areas were divided into RESEARCH, DEVELOPMENT, and BUSINESS.
1.1 RESEARCH area: recommended exhibits(1) Personal Sound Zone, non-invasive glucose sensor Let me first introduce active noise cancelling technology for the Personal Sound Zone. While open-ear headphones that leak no sound and do not cover the ear are now available, the problem remains that users can still hear peripheral noise in trains and other noisy places. This exhibit introduced active noise cancelling technology that eliminates this noise so that users can hear the music they want to hear. It also presented technology that blocks peripheral noise when entering a dome-like space. We also introduced a non-invasive glucose sensor. This is technology for measuring blood sugar level without having to prick one’s body with a needle. It uses a device that is even smaller than the prototype presented in 2023, so we are getting even closer to its commercialization (Fig. 1).
(2) Drug-ingredient-penetration promotion technology, visualization of hand and foot dexterity The first of these exhibits introduced drug-ingredient-penetration promotion technology that facilitates the penetration of drug ingredients in a beauty facial mask, for example, through ionization using NTT battery technology having low environmental impact. The other exhibit introduced technology for visualizing hand and foot dexterity using a smartphone. (3) Quantum computer NTT has undertaken the development of quantum computers. Mainstream quantum computers are built by using the superconducting method or neutral-atom method. These methods, however, need to be used at extremely low temperatures, which means that equipment for continuous cooling is necessary, and thus quantum computers will invariably be large. In contrast, the quantum computer targeted by NTT uses optical pulses, the same as used in optical communications, to achieve quantum states called quantum bits or qubits that serve as the basis of computation. Our method enables large-scale computation as long as there is equipment for generating optical pulses. It also makes cooling to extremely low temperatures as required with other methods unnecessary, negating the need for large-scale equipment. With this method, we would like to use the optical communications technology developed by NTT to accelerate current trends in quantum computers toward a large-scale quantum computer capable of practical general-purpose computation. 1.2 DEVELOPMENT area: recommended exhibits(1) C89 space business brand, wireless energy transmission technology The “NTT C89” brand in the space business field was officially launched on June 13, 2024. NTT C89 defines the businesses, services, and R&D activities of NTT Group companies in the space field as “stars.” It expresses the idea of creating an 89th constellation by organically linking these stars*. By organically linking the businesses and services of NTT Group companies in the space field and proposing solutions that meet customer needs, we aim to strengthen the business of NTT Group companies in the space sector while creating synergetic effects and opening up new markets in space. Regarding wireless energy transmission technology, I first point out that it would be possible, for example, to run mini-cars (rovers) on the surface of the moon over long distances through built-in batteries using solar cells. However, temperature differences are intense in a lunar environment and solar cells cannot be used if the battery is not operating well or if the vehicle enters a shaded area. How to go about supplying energy in a lunar environment is therefore an issue. In response to this problem, we developed technology for supplying power remotely by using electric-field surface waves and passing electromagnetic waves on the sand covering the lunar surface. By changing the communication format used from observation satellites to ground stations from conventional radio-frequency communication to optical communication, we aim to create businesses on a scale of tens of billions of yen in annual revenue.
2. IOWN2.1 IOWN roadmapI would like to introduce the IOWN roadmap, from IOWN 1.0 to IOWN 4.0. IOWN 1.0 is a technology for establishing connections using photonics between datacenters. IOWN 2.0 is board-to-board connections using photonics for devices accommodated within datacenter racks. IOWN 3.0 is package-to-package connections using photonics, and IOWN 4.0 is intra-chip, or die-to-die connections using photonics. This is how IOWN is evolving. The All-Photonics Network (APN) is an elemental technology for configuring each of the above versions of IOWN. With IOWN 1.0, the APN will feature wider bandwidths and reduced power consumption. Photonics-electronics convergence (PEC) technology will evolve along with IOWN versions as PEC-2, PEC-3, and PEC-4. Likewise, the next-generation computing platform, the data-centric infrastructure (DCI), will evolve together with the evolution of PEC (Fig. 2).
2.2 All-Photonics Connect powered by IOWNIn March 2023, NTT EAST and NTT WEST launched an APN IOWN 1.0 service. They began providing a new service, All-Photonics Connect powered by IOWN, from December 1, 2024 to expand and enhance frequency bands, coverage areas, and types of interfaces. The following are three key features of this new service. 1) A maximum bandwidth guarantee of 800 Gbit/s, meeting the world’s highest level 2) Wide-area service provision with connections between major cities 3) Enhanced service configurations/interfaces and low power consumption We used to provide only the OTU4 (optical transport unit 4) optical interface, but on hearing from corporate customers that “Ethernet interfaces are easier to use,” we decided to provide Ethernet interfaces as well. The provision of Ethernet interfaces made terminating equipment at customer sites unnecessary, thus saving space and reducing power consumption (a maximum reduction of 940 W at both sites). 2.3 IOWN APN steps 1 and 2 for enterpriseOn August 29, 2024, an APN connection was established between Japan and Taiwan as a world’s first. A variety of video-based demonstrations were set up between Taiwan and Japan. Although the distance between Japan and Taiwan is about 3000 km, a delay time of approximately 17 ms was achieved. Since the transmission delay of optical fiber is said to be 15 ms, this means that we have achieved stable communication with low latency and no jitter (Fig. 3). Several experiments are also being conducted using the APN between Japan and Taiwan.
(1) Ultra-high-speed data backup by using the APN In the event of a major disaster, for example, plant data could be simultaneously backed up not only in a datacenter in Japan but also in a datacenter in Taiwan. A long transfer distance slows down transfer speed and significantly increases backup time. However, data transfer can be greatly sped up using the APN, minimizing system-restoration time when a disaster occurs. Effective transfer speeds will differ even at the same transmission speed of 10 Gbit/s. For example, a similar dedicated line, Interconnected WAN, achieved a transfer speed of only 2.81 Gbit/s, while the APN doubled this to almost 5 Gbit/s. As a result, backup time too could be decreased from three minutes to one minute, enabling highly efficient data backup. (2) High-efficiency remote production by using the APN Live broadcasting of soccer, baseball, and other sports events requires a large broadcast van and more than 50 people for each match to provide on-site support over a long period. This requires a huge amount of resources for a broadcast station, so making program production more efficient has become urgent. In response to this problem, we can connect a studio, stadium, or other site to the APN and send all data via the cloud. We can also store software for editing on the cloud to enable remote editing from a production base. This scheme would enable the production of high-quality programs with one-third the staff required in the past. 2.4 IOWN APN steps 1 and 2 for datacenter exchange(1) APN connections between datacenters overseas We are also undertaking APN connections between datacenters located in other countries. In India, for example, we set up connections between three datacenters in Mumbai in September 2024 and conducted verification experiments of APN connections between datacenters in the United States and in the United Kingdom in 2023. Thus, we are working to establish use cases of distributed datacenters by using the APN overseas to support NTT’s global business. (2) Watt/bit linking by using the APN The use of the APN for watt/bit linking is also expected as part of a plan in which the government maintains the power grid and communications infrastructure in an integrated manner. By connecting regionally distributed datacenters through datacenter exchange, computer processing can be executed at datacenters located in areas where the supply and demand of green energy is greatest. This promotes local production for local consumption of green energy and improves the usage efficiency of renewable energy by dynamically arranging workload on the basis of supply-and-demand conditions of renewable energy. (3) Distributed GPU cloud by using the APN Consideration is now being given as to whether the APN could also be used in AI machine learning, which has recently become a hot topic. This is because the rack space at datacenters concentrated in urban centers is becoming scarce, making it difficult to extend graphics processing unit (GPU) clusters. For the case that GPU expansion is desired but there is no space to do so, we are conducting experiments on using GPUs at different datacenters like a single GPU cloud. From conducting an experiment on the drop in performance when using distributed datacenters compared with a single datacenter, the results indicated that training time would take 29 times longer when using the Internet compared with only 1.006 times when using the APN, suggesting that distributed GPUs could be used almost as if they were located in the same datacenter. 2.5 IOWN APN vs. dark fiberThe question as to whether dark fiber is better has also come up, so we compare the APN and dark fiber to explain why the APN is superior. Since APN network services are already being provided, only the access portion needs to be configured, making the launch period short, and connection points can be changed in an on-demand manner. Management cost is also very low, and for long-distance transmission, the APN is superior since it will cover for any relay equipment not prepared on one’s own. Reliability and redundancy is also high with the APN, and since a single fiber can be shared, the APN is also much more economical compared with dark fiber (Fig. 4).
2.6 APN step 3APN step 3 increases transmission capacity by 125 times compared with that at the time of announcing the IOWN concept, which is a dramatic increase from step 2. We aim to raise power efficiency even more by further promoting optical connections so that we can economically expand APN areas. In step 3, we will also make more enhancements to the APN to further expand its use. One of these enhancements is on-demand optical path control. This will require that wavelength collisions and wavelength paths be controlled, and the technologies for doing so are “optical-path-design technology” and “wavelength-conversion and wavelength-band-conversion technology.” In the APN, a separate problem arises in that there are constraints in achieving low-latency, large-capacity on-demand services when connecting two points by an optical path in an end-to-end manner. For example, what wavelength band can pass depends on the optical fiber, so a mechanism is needed to flexibly control a large-capacity optical path. A system for achieving such a mechanism is a Photonic Exchange (Ph-EX) (Fig. 5).
A wavelength-band-conversion function can use optical fiber already laid in the existing network by converting light to optimal wavelength bands to transmit signals along that optical fiber and achieve an end-to-end optical connection. NTT possesses technology for bundling wavelengths and converting them at a device, so wavelength-band conversion can be executed with efficiency without delay. NTT also developed a wavelength-conversion function that can execute conversion in units of wavelengths without delay, which reduces total delay time. 2.7 PEC-3/PEC-4I will now talk about the third- and fourth-generation PEC devices. Our goal is to apply an optical engine to board-to-board connections as PEC-2 from FY2025, apply photonics to package-to-package connections as PEC-3 from 2028, and apply photonics to intra-chip, die-to-die connections as PEC-4 from 2032. For PEC-3 and PEC-4 devices at NTT, we are driving the evolution of silicon photonics and that of membranes (thin films). In IOWN PEC, we would like to implement an ultra-small optical transceiver within a package. To this end, we have fabricated a very small 16-channel prototype transceiver at only 1.11 mm × 2.75 mm. To achieve a small and high-speed direct modulation laser, how to make the laser smaller and confine light and prevent heat generation are key problems that must be solved. However, with the conventional fabrication method, the active layer is thick due to vertical stacking, and heat is easily generated with increased height. At NTT, to achieve a thin active layer, we radically changed the structure of the existing optical device, devised a horizontal-fabrication method, and applied indium phosphorus in the form of a membrane on a silicon carbide substrate. With this technology, NTT laboratories are on a world-class level. 2.8 DCI-2Finally, for IOWN 2.0, I would like to introduce DCI-2 we are now developing with a target date for commercialization around 2026. In DCI-2, we aim to increase power efficiency by eight times by connecting composable disaggregated infrastructure servers that subdivide computer resources into units of boards to optical switches using PEC devices and controlling them with a DCI controller. 2.9 IOWN Global Forum member statusThe IOWN Global Forum was launched in 2019. Since then, the number of members has been increasing steadily, currently 154 organizations and associations. Even Google has recently joined and begun participating in discussions. 3. Generative AI/tsuzumi3.1 Evolution of tsuzumiSince the announcement of tsuzumi in November 2023, we have provided consultation about its implementation to many companies, and after one year, this has come to more than 900 companies. In addition, tsuzumi was the first large language model (LLM) in Japan to be adopted in Microsoft’s Models-as-a-Service lineup, which was announced at Microsoft’s Ignite conference held in Chicago in the United States. There are also plans to adopt tsuzumi in the Salesforce LLM Open Connector for actual use in the future. (1) Issues with LLM scaling up LLMs are appearing in models of various sizes, and a trend toward large-scale models can also be seen, but training cost is huge. For example, the training cost of GPT-3 when ChatGPT first appeared was about 500 million yen per session, while the training cost of GPT-4 and Gemini is up to 15 to 20 billion yen per session. Power consumption is also massive. One training session on the scale of GPT-3 requires 1300 MWh or the power from one nuclear power plant. Going forward, the need for upgrading GPUs is expected to become particularly intense, so there is a need to consider environmental issues with the aim of achieving the United Nations’ Sustainable Development Goals. (2) Features of tsuzumi Against this background, we researched and developed tsuzumi with the aim of creating a small and lightweight LLM. The following are five main features of tsuzumi. 1) Lightweight: Can run on one GPU/one CPU 2) Flexible customization: Easy to incorporate specialized knowledge of industries and organizations 3) Multimodality: Supports reading comprehension of graphs, tables, etc. in addition to text 4) Proficiency in Japanese: World-class linguistic proficiency, especially in Japanese 5) Developed from scratch: Developed foundational models from scratch The reasons for developing foundational models from scratch revolve around issues such as copyright problems, development freedom, and economic security. We are conducting R&D with the aim of achieving a detailed, well-thought-out model in Japan. In 2023, we commercialized version 1.0 of tsuzumi with 7 billion (7B) parameters. Versions 1.1 and 1.2 represent an evolution toward more supported languages and multimodal support. While still a beta version, we have raised accuracy from 7B to 13B, achieving a level of accuracy comparable to world-class LLMs of the same scale, namely, Llama 2 and Llama 3. In summarization and Q&A, this beta version outperforms Llama (Fig. 6).
3.2 Extensions to tsuzumi(1) AI agent: Operates a personal computer for the user An AI agent operates a personal computer on behalf of the user and executes the target task. For example, if the user gives the instruction “purchase product A listed in this catalog,” the language model visits the product-purchasing site or creates an in-house purchasing site and automates all procedures up to the actual purchase of that product. In daily work, it is rare for one task to be completed on a single page. With tsuzumi, however, simply chatting with the system will automatically open up the pages needed and even input the information required. With respect to many input fields, tsuzumi can refer to company manuals and use its language comprehension ability to determine what information should be entered where then enter that information. This series of operations can also be completely automated, but it is designed so that human checking can be carried out along the way to prevent any errors from occurring. (2) AI agent: Digital human that behaves naturally like a human Unlike past digital humans that only make mechanical responses, we aim to develop a digital human capable of more human-like, smooth exchanges we call “synlogue.” The idea is to have the speaker and listener create utterances together. In other words, one speaker would not necessarily have to complete an utterance before the other speaker begins to talk. We are thus researching and developing a new dialogue architecture that creates a series of utterances while multiple LLMs having different processing speeds and expertise collaborate in generating that conversation. We will achieve a digital human capable of more natural dialogue that can freely speak and easily be spoken to. Such a digital human will utter responses in agreement, create pauses in generating utterances by deliberately hesitating, and let the conversation partner talk if interrupted while talking. This digital human makes abundant use of NTT technologies such as image recognition, situational awareness, and voice recognition. However, portions involved in slow-paced thinking and topic selection use ChatGPT. (3) Multimodality: Understands voice features and content and replies in natural language It is possible to extend the ability of LLMs to understand and analyze not only language but to also understand the content of speech and information unique to speech such as intonation. If age, gender, or other attributes can be predicted from the pitch or intonation of a speaker’s voice, it should be possible to analyze what the speaker needs and the urgency of that need. As a direct application, this technology could be applied to automatic call distribution at a call center to reduce customer wait times. By handling voice not only in input but in output too, we aim to develop AI operators and AI automatic replies for call centers, actual shops, and other applications. (4) Utterance-unit speech summary: Quickly summarizes spoken words We have developed technology that provides ease-of-reading as in a full-text summary while maintaining the real-time characteristics of speech recognition. This technology enables real-time summarizing of a long meeting or presentation so that participants who join a meeting midway through can quickly grasp the main points that have so far been made. It can also quickly grasp in real time information that could not be obtained by conventional speech recognition and full-text summaries, making work more efficient and speeding up decision making. (5) Multimodality: Gives guidance on how to run in place of a sports trainer Another application of multimodality is to reproduce the perspectives and judgments unique to a sports trainer using generative AI. In running, for example, this extension simply observes a runner in action from the viewpoint of a sports trainer to identify key points in the runner’s running style and analyze differences between those movements and those of a role model. It can also provide easy-to-understand coaching tips just like those of a sports trainer and guide the runner to run in a way closer to that of the role model. 3.3 Applications of tsuzumi(1) AI network operation × generative AI Our goal is self-evolving zero-touch operation to prevent and minimize the impact of failures and quality drops in network services on customers. This means the ability to automatically detect and analyze any kind of failure that might occur and take appropriate measures without human intervention. Specifically, we will apply generative AI to the R&D of an AI/network technology group consisting of operation tasks and to the development of an AI/network training platform. We will simulate pseudo-failures using AI and network digital twins to learn diverse and unknown failures. (2) Security operations × generative AI Generative AI can be extremely effective not only in network operations but also in security operations. In the work of preparing an in-house security report, for example, the conventional approach has been to have any report prepared by a new security head checked and brushed up by a veteran security head to produce a good report. However, this kind of knowledge is tacit knowledge developed through years of experience that is not easily acquired or inherited. This creates a problem in that the quality of reports depends on the individual. However, NTT has accumulated the results of these tasks up to the present, which means that it possesses a large quantity of reports prepared by new security heads and reports checked and brushed up by superiors. This data can therefore be used to train an LLM and formalize this tacit knowledge so that perfect security reports can be prepared by simply supplying information. Linking this technology with databases will also enable the creation of even more valuable security reports for in-house use. 3.4 AI constellationWe came up with the concept of the AI constellation by thinking that, instead of creating a large monolithic LLM, wouldn’t it be possible to solve social problems by creating small, specialized, and diverse LLMs that can behave in either an autonomous, decentralized manner or in coordination with each other. As a use case of the AI constellation, NTT recently held a workshop in Omuta City, Fukuoka Prefecture in which AI agents discussed local social problems with each other. Specifically, the agents grasped local conditions, presented ideas from diverse perspectives, and discussed the issues amongst themselves. As a result of this activity, human ideas and opinions emerged, stimulating further discussions. 3.5 AI basic researchThere are still many unknowns on why generative AI behaves the way it does. For example, there are questions such as “How is it that generative AI trained only in English can also handle Japanese?” Developing and controlling generative AI, the inner workings of which can be understood, is said to be difficult. At NTT Research, research into understanding the inner workings of AI has begun by launching a new research field called “Physics of Intelligence” in collaboration with Harvard University’s Center for Brain Science. To give a typical research case, a relatively accurate picture can be produced when entering a prompt such as “Draw a lizard (or goldfish) with the color specified.” However, if the animal specified is a panda, a less than accurate picture will be produced. These experiments concern the essence of imagination in AI and generative AI, and the difference between the two, which can be stated as “a lizard can be imagined but a panda cannot,” is being mathematically proven and research results are being presented. 4. Becoming a center of excellence“Do research by drawing from the fountain of knowledge and provide specific benefits to society through commercial development.” Goro Yoshida, the first director of the Electrical Communication Laboratory, spoke these words on the founding of what was to become NTT laboratories. These words still live on as our DNA, and at NTT, we attach great importance to the flow of research, development, and social implementation. NTT aims to become an R&D center of excellence having responsibility for all of these steps, and to this end, we will repeat the cycle of research, development, and social implementation. (1) Research With regard to number of papers, NTT ranked 11th in the world in the 2017–2021 tabulation but moved up to 9th in the world in the 2019–2023 tabulation. We hope to become 5th in the world in the near future. On narrowing down the fields, there are many in which NTT has been 1st or 2nd in the world. For example, in optical communications, the basis of IOWN, and in information security, neurological-function analysis, and quantum computers, NTT has reached 1st and 2nd globally. We hope to expand our involvement in world-class research fields from here on. With regard to the number of patent applications in generative AI, NTT ranks 13th in the world and 1st in Japan. However, the number of patent applications by countries such as the United States and China are increasing and are expected to keep increasing in the years to come, so at NTT, we plan to step on the accelerator and make every effort to increase our number of patent applications. At the same time, NTT Research presented 110 research papers in FY2023, which accounted for 14% of the world’s most advanced papers in cryptography, some of which have received international awards. (2) Development In development, we will accelerate our R&D efforts in IOWN and tsuzumi that I previously introduced. (3) Social implementation In 2023, the Research and Development Planning Department, Market Planning and Analysis Department, and Alliance Department linked up under the Research and Development Market Strategy Division to form a new system with the goal of getting research results into society not only in terms of technology but also from a market perspective. In this new system, the Research and Development Planning Department works closely with the Market Planning and Analysis Department and Alliance Department to implement R&D results into society. A number of companies have also been launched as spin-offs. These include NTT sonority that develops and sells open-ear headphones with no sound leakage as I introduced earlier, Space Compass that aims to construct space datacenters, and NTT Green & Food involved in land-based aquaculture. Another spin-off from NTT laboratories is NTT AI-CIX that aims to contribute to further advances in AI. Its founding reflects the intensification of data use to promote the digital transformation of society and industry as part of new data-driven value creation promoted by NTT and mutual expansion of both domestic and global AI businesses (Fig. 7).
The original role of NTT AI-CIX in R&D was to develop AI models, but going forward, it is looking to provide end-to-end solutions from consulting on AI model development plus platform services by focusing on two inseparable issues: what kind of problems are present in the customer’s industry and how can these problems be solved. 5. In conclusionBy repeating the cycle of research, development, and social implementation, we aim to produce R&D results that are useful to everyone. In this endeavor, we look forward to your continued support. |