Feature Articles: Media-processing Technologies for Artificial Intelligence to Support and Substitute Human Activities
Researching AI Technologies to Support and Substitute Human Activities
The coming of digital transformation has been gaining interest, and its development seems to be accelerating during the COVID-19 pandemic. Competition in artificial intelligence (AI) technologies is also intensifying, and platformers are accumulating massive amounts of data as learning data. Amid these trends, NTT Media Intelligence Laboratories is promoting the research and development (R&D) of AI technologies to support and substitute human activities by taking advantage of the technologies and expertise that it has cultivated. This article introduces this R&D initiative.
Keywords: support of human activities, substitute for human activities, AI technology
NTT Media Intelligence Laboratories has been involved in the research and development (R&D) of technologies for processing various forms of media, such as voice, audio, language, images, and video, and has put several of these technologies into practice. More recently, it has made a number of contributions to business such as operator support at contact centers , provision of artificial intelligence (AI) agents [2, 3], sound-collection technology for emergency call systems , and video-compression devices for 4K/8K broadcasting .
However, today’s market environment is going through major changes and competition is intensifying. Every industry is experiencing digital transformation (DX), which is revolutionizing business models through digitalization, and the development of DX is expected to accelerate to deal with the COVID-19 pandemic. The advent of deep learning has ignited a third AI boom, and the basic algorithms of AI technologies are now available to anyone. Moreover, learning data that contributes to the improved performance of AI systems is now being collected on a large scale by platformers, such as Google, Apple, Facebook, and Amazon, so a world in which AI performance increases daily is becoming a reality.
In response to this external environment, the NTT Group is promoting initiatives such as B2B2X (business-to-business-to-X) and DX toward the creation of a smart world as part of its Medium-Term Management Strategy. It is also promoting the concept of the Innovative Optical and Wireless Network (IOWN) to build a smart world through innovative technologies.
Against this background, NTT Media Intelligence Laboratories is leveraging the technologies and expertise cultivated through R&D activities in media processing to pursue the R&D of AI technologies for supporting and substituting human activities—the source of all value—and the R&D of Digital Twin Computing (DTC) to create new value from a medium- to long-term perspective . In this article, we introduce the R&D of AI technologies toward an application domain in which AI supports and substitutes human activities.
2. Overview of AI technologies to support and substitute human activities
Several possible scenarios can be considered when talking about an application domain in which AI supports and substitutes human activities. For example, in terms of increasing efficiency, AI can not only improve the productivity of operators at contact centers and AI agents but also improve business processes and productivity in offices and improve the quality of life as new value. We expect the COVID-19 pandemic to drive the penetration of teleworking and online meetings and transform the way in which they should be carried out.
However, applying current AI technologies to such scenarios is difficult. For example, when attempting to support and substitute human activities on a more personal level based on a person’s environment, there would be a need to obtain data on that person and environment, but there are situations in which a large amount of data cannot be obtained. Under such conditions, it would be difficult to achieve the full potential of AI. Taking speech-recognition technology as an example, differences in its application would arise in telephone calls, meetings, etc. depending on whether only speech-to-text processing is sufficient or whether processing as far as speaker identification is necessary. In the case of online meetings, there may even be more differences in expected performance and other requirements.
NTT Media Intelligence Laboratories has begun to address these issues by focusing its efforts on achieving efficient learning with a relatively small amount of data, producing new AI technologies, and achieving breakthroughs in the performance of current technologies.
3. Current initiatives in AI technologies to support and substitute human activities
The Feature Articles in this issue introduce a group of technologies that we are currently developing. First, in the article “Media-processing Technologies for Ultimate Private Sound Space” , the focus is on sound as a major element of technologies that can produce new AI technologies, such as for understanding surrounding conditions from sound, listening to only the person you want to hear, and eliminating sound you do not want to hear. Our aim with these technologies is to achieve an ultimate private space that can be expected in applications such as teleworking.
Next is “Saxe: Text-to-Speech Synthesis Engine Applicable to Diverse Use Cases” . Regarding speech-synthesis technology for generating speech for virtual announcers, AI agents, etc., this article introduces technology for high-accuracy reading of heteronyms according to context and deep-neural-network speech-synthesis technology for reproducing a variety of speaker characteristics at low cost. These technologies are aimed at achieving breakthroughs in the performance of current technologies and efficient learning with a relatively small amount of data.
In “Speech Recognition Technologies for Creating Knowledge Resources from Communications” , we envision speech recognition in scenarios such as meetings and face-to-face customer service. In addition to techniques for improving current speech-to-text technology, this article introduces technologies for producing new AI technologies such as for extracting the gender, emotions, and other characteristics of the speaker from sound.
In “Knowledge and Language-processing Technology that Supports and Substitutes Customer-contact Work” , we introduce technologies for achieving breakthroughs in the performance of current technologies and producing new AI technologies. These technologies include document summarizing with a certain specified length according to the application scenario and response analysis for improving the productivity of an operator involved in inside (remote) sales.
Finally, to achieve a fourth-dimensional (4D) digital platform  that integrates various types of sensing data in real time and enables a variety of future predictions, the article “Spatial-information Processing Technology for Establishing a 4D Digital Platform”  introduces technology for digitalizing real-world space and point-cloud-coding technology for efficiently compressing 3D data that include temporal changes. These are technologies for efficient learning with a small amount of data and achieving breakthroughs in the performance of current technologies.
4. Anticipated use cases
We now introduce use cases of AI technologies for supporting and substituting human activities that take into account the recent COVID-19 pandemic (Fig. 1). In creating personal space, the aim is to construct a pseudo personal space for teleworking and a space that ensures privacy within one’s home. In online meetings, AI technologies will be used to support innovative collaborative work by converting the proceedings of a meeting to text, summarizing and translating the meeting, and easing the time/space constraints in people’s traditional work styles. Additionally, by letting AI do the work traditionally done by people, such as announcing, we anticipate quick work processes that require no contact among fellow workers. Finally, from the viewpoint of distribution reform, we envision the possibility of identifying goods in short supply and associated locations from social networking services (SNSs), etc. and automatically delivering those goods by recognizing 3D urban structures.
5. Future outlook
The surrounding environment is changing rapidly. To provide various means of supporting and substituting human activities, technology must deal accordingly with these changes. To this end, we plan to promote R&D with a flexible frame of mind while understanding both the macro and micro aspects of these changes.