Pursuing Research While Being Somewhat Stolid to Outside Trends
As a world-renowned researcher in high-quality audio signal processing and coding, NTT Fellow Takehiro Moriya of the NTT Communication Science Laboratories is devoted to the international standardization of technologies in this field. We asked him about the significance of his research, his past research experiences, and his future goals, and what he would like to say to today’s young researchers.
Significance of speech compression technology
—Dr. Moriya, could you first give us a brief description of your research?
Well, I have been researching methods for efficiently representing speech and music signals in digital form for many years. To give you an example, music that we hear from a portable music player or digital broadcast is not the original signal but rather the result of compressing that information to about one-tenth of its original amount. Our ears, however, cannot, for the most part, detect that such sound has actually been compressed. Similarly, the audio that we hear from our mobile phones has been compressed to about one-tenth of the amount of information in an ordinary landline call. In my research, we look for ways to compress signals as much as possible while maintaining sound quality and for ways to reproduce sound while cutting down on the amount of information used.
Unfortunately, our technology cannot, at present, be directly used for coding in music broadcasting such as for the popular MP3 format, but I think it’s fair to say that it’s finding widespread use in mobile phones on a global scale (Fig. 1).
What has made this possible is the relatively recent digitization of mobile phones and technology for making efficient use of the frequency spectrum. In this regard, it is instructive to consider that if the information in analog radio signals as used by the first generation of mobile phones was simply digitized and then carried as-is on radio waves, the amount of information to be carried would be huge, resulting in lower efficiency. In other words, the fact that the radio spectrum is a limited resource means that digitization without speech compression would reduce the number of mobile-phone users to about one-third or one-fourth that of the analog era.
Against this background, compression technology research moved forward rapidly. At that time, my fellow researchers and I were faced with a number of requirements, such as placing compression processing on the handset’s chip, no delay, and no drop in sound quality. After we researched the extent to which information could be reduced and how economical radio spectrum utilization could be achieved, some of our improved technology came to be used in NTT DOCOMO and other Japanese mobile phones (second generation). For third-generation mobile phones, the approach was a bit different. Instead of NTT directly creating compression schemes for other parties, technologies that we and our seniors had patented came to be adopted by standardization organizations in Europe and the USA. In this way, we have contributed to improved speech quality in almost all mobile phones throughout the world. These technologies are also being used in Japanese and overseas IP (Internet protocol) telephone equipment.
Researching the relationship between sound and people
—In the wired world, broadband is expanding and bandwidth limitations are disappearing. Is compression still necessary?
It’s already about 30 years since NTT adopted a policy supporting the conversion to broadband systems, and the opinion that compression technology research, which goes against the grain of that movement, is perhaps no longer necessary has only intensified since the time that I entered the company. As a result, compression technology research at NTT has been contracting steadily. At the same time, one can clearly see that compression technology, which today is still used in mobile phones, digital broadcasts, and other applications, has somehow continued to survive. Actually, I like to think about the novel services that the next generation of broadband systems will soon make possible, but for the time being, I know we must finish what we set out to do in compression research.
Even in the broadband era, the ability to encode an audio signal without causing any distortion and to restore the signal to its original form can have a variety of uses. This is called lossless coding. The lossless-coding standard for audio use was standardized by ISO/IEG MPEG (International Electrotechnical Commission and International Organization for Standardization Motion Picture Experts Group) five years ago, while the standard for telephone voice calls was standardized last year by ITU (International Telecommunication Union). In short, standards that can be used by customers for a variety of applications have been prepared.
For example, we are preparing a compression-software library and constructing prototype transmission equipment so that sound can be carried without any distortion on satellite broadcasts or IPTV (Internet protocol television). Using this equipment, NTT WEST is promoting a project for simultaneously relaying the stage performance of the famous Takarazuka Revue to other theaters without any sound degradation over a broadband connection. Furthermore, companies that provide systems for managing large volumes of music data can use lossless coding to compress that data with high accuracy and absolutely no distortion and archive the data as a package for a long time. As shown by these examples, lossless coding is expected to find use in a variety of fields (Fig. 2).
Thus, in addition to working on ways to reduce the amount of information used in reproducing sound, I would also like to research ways of making the best use of broadband. To this end, I believe there is a need to research the relationship between sound and people from both the neurological and physiological perspectives to determine, for example, what techniques can be used to enable humans to hear pleasing sounds (Fig. 3).
Technologies contributing to the world at large
—Could you tell us how you became involved in compression research and what, if any, hardships you have experienced in your work?
What I was doing during my university days was completely different from my current research. At that time, I had a vague and worrisome feeling that whatever I took great pains to research might in the end have no use for anyone and make no contribution to the world. Thus, if I was to find employment, I had a strong desire to do something that would shake up the world.
Fortunately, upon graduating, I was able to enter NTT, where I soon heard a talk given by Yasushi Takahara, the director of NTT R&D Headquarters at that time. In his talk, he said: “In ten years time, I would like to create a mobile phone that can be slipped into the front pocket of a shirt.” I was very impressed with these words, and among the various technical problems that had to be solved to realize this dream, I selected digital speech compression as my research theme. In this regard, NTT could already boast world-class achievements in speech analysis technology, and lots of attention was being paid to this technology with the idea that it might be applicable to future digital mobile phones on a worldwide basis. Furthermore, as I consider myself to be a frugal character who likes to eliminate waste wherever I can, a research theme that compresses speech and uses only as much information as necessary was a perfect fit for me.
—Did you begin compression research with the idea of making it applicable to mobile phones?
Yes, of course. At that time, optical fiber was already on the scene, and as research and development efforts were focused on wired broadband, there was little room for the use of compression technology. I therefore thought that if compression technology were ever to be used, the only hope would be in the field of digital mobile phones. But successful application there would produce huge results. Thus, setting out to do research in this field was, if anything, a gamble, but I decided to take the risk and give it my best shot.
Under these circumstances, I encountered a strong countervailing wind within the company from the moment I began my research, and I was sometimes admonished with words like, “Don’t bother with such useless research as compression?can’t you take on something more promising?” Even the development of practical digital mobile phones was not considered to be feasible at that time, and when I entered the company, there were still about five or six researchers pursuing compression. This number decreased every year, and there was even a time when I was the only one researching compression. Then, after being in this state for about half a year, I received the opportunity to spend some time at Bell Laboratories in the USA.
Winning an ARIB-sponsored competition
—How did compression research in the USA differ from that in Japan?
In contrast to Japan, there was a move in the USA to standardize digital formats for mobile phones, and research on compression schemes was quite active. Considering that everyone at NTT had given up on compression research, I was surprised to discover that many people in the USA were wholeheartedly engaged in it. Thus, I too, together with engineers in the USA, began researching compression as an extension of what I had been doing in Japan. Some time after this, the Association of Radio Industries and Businesses (ARIB: a radio-related standardization organization) in Japan decided to convene a compression-technology competition as had been done in the USA, and I was quickly called back to Japan. After a short period of preparation, my colleagues and I submitted a scheme to the first competition of this kind, but it lost out by a narrow margin to the American scheme. Three years later, however, in 1993, with the call for a high-compression-rate standard, I formed a new project team and developed a new compression scheme that won the ARIB competition and went on to be adopted as Japan’s standard system.
—As a researcher, do you adhere to any principles or have any mottos?
I see myself as a serious-minded company employee and always have this motto in mind (Dr. Moriya shows me a small card tucked inside his employee ID holder):
“Company Creed: Based on technology development with a global vision, we shall strive to provide advanced services and high reliability toward the creation of a prosperous and culturally rich society.”
This company creed goes back to the time that NTT became a public corporation. Its words fit my work perfectly: I want to use compression technology to provide services and high reliability on a global scale, and I’m constantly wondering how I can provide technical support and help make everyone’s life more prosperous and culturally enriched. And I would say that my final objective is to make people happy in some way through my research. I would of course be delighted if the company could make some profit in the process, and I would be happy to receive some indirect praise in the form of “great service” or “very interesting” even if I never find out who among all the people using our services are saying this. Whenever I am at a loss as to how to proceed with my work and am trying to determine which path would best fit this personal objective of mine, or when I just get a bit tired, I remember the words of our company creed and suddenly feel better.
—Earlier this year, you received Japan’s Medal with Purple Ribbon, didn’t you?
Yes, that’s right. Thanks to many people, I had the great honor of receiving the Medal with Purple Ribbon in May 2010. This medal recognized that the compression technology that we improved over the course of mobile-phone technology competitions has contributed at least in some way to mobile phones and IP telephony throughout the world. I would like to take this opportunity to express my deep appreciation to all those who have offered their invaluable guidance, to my colleagues and subordinates who undertook this joint research with me, and to those who have helped in spreading this compression technology throughout the world. I myself do not think that I have yet made enough technical contributions to merit this distinction, but I plan to magnify my efforts in bringing compression technology to even higher levels.
—In closing, what message would you like to leave for young researchers?
First of all, I would like to point out that the NTT research environment is a true blessing. There are many excellent researchers among one’s seniors and superiors, facilities are extensive, it’s relatively easy to obtain cooperation from the outside, and good budgetary support is provided. In such an environment, carrying out research to please our customers is of the utmost importance. At the same time, while this may sound a bit paradoxical, one must become somewhat stolid to outside trends. Why would I say this? Well, for researchers, meaningful and essential work can span 50 or 100 years, and if they think only about the immediate future, they will never be able to have a long-term vision. Of course, there are times in the research process when a short-term outlook is necessary, but a healthy balance needs to be achieved between the two. So all in all, I would like to tell young researchers at NTT to select a research theme that gives them unending pleasure and to devote their undivided efforts to pursuing their research objectives.
NTT Fellow, Moriya Research Laboratory, NTT Communication Science Laboratories.
He received the B.S., M.S., and Ph.D. degrees all in applied mathematics and instrumentation physics from the University of Tokyo in 1978, 1980, and 1989, respectively. Since joining the Musashino Electrical Communication Laboratories of Nippon Telegraph and Telephone Public Corporation (now NTT) in 1980, he has been engaged in research on and the standardization of speech and audio coding. In 1989, he stayed at AT&T Bell Laboratories as a guest researcher. He is a member of the Acoustical Society of Japan, the Information Processing Society of Japan, and the Institute of Electronics, Information and Communication Engineers of Japan and a fellow of IEEE.