To view PDF files

You need Adobe Reader 7.0 or later in order to read PDF files on this site.
If Adobe Reader is not installed on your computer, click the button below and go to the download site.

Feature Articles: Collaborations with Universities Leading to Open Innovation in NTT's R&D

High-presence Audio Live Distribution Trial

Hiroshi Yamane, Akio Yamashita, Koji Kamatani,
Masashi Morisaki, Tomomi Mitsunari, and Akira Omoto


This article reports on a trial of live distribution of video featuring high-presence audio over optical networks conducted in September 2010 and presents an evaluation of the results. With the spread of such distribution, demand for high-presence replay is on the rise, and NTT WEST has been involved with Kyushu University in joint research on high-presence video and audio.

Chuo-ku, Osaka, 540-8511 Japan

1. Introduction

NTT WEST believes that one way of using optical networks in the future will be live distribution, so it has been involved in various undertakings to this end.

On September 8, 2007, NTT WEST conducted a live broadcast of the "TCA Special 2007" from the Takarazuka Grand Theater in Hyogo Prefecture to two theaters: one in Tokyo and one in Osaka (TCA: Takarazuka Creative Arts). And on December 24 in the same year, NTT WEST conducted live broadcasts of the "Closing day of the Takarazuka Review Hanagumi Performance" from the Tokyo Takarazuka Theater to seven cinemas: four in Tokyo, two in Osaka, and one in Nagoya. Unlike normal movie content, this advanced live broadcast featured high-quality content distributed via the network to a number of commercial cinemas [1].

Conducted jointly by TCA, NTT, and content holders, these trials aimed to provide an overall assessment of the potential of the future business of live broadcasting over networks; address audio-visual quality issues, management structures, technical issues, business models, and profitability; and gather feedback from the content viewers themselves.

The configuration used for the trial (Fig. 1) involved connections between NTT Communications' optical network (leased circuit services etc.) and NTT WEST's optical networks (local area network communications services), across which the multicasting took place. The distribution equipment included an IP (Internet protocol) interface (NA5000), an encoder (HE5000), and a decoder (HD5000) (all products of NTT Electronics Corporation); the distributed content was MPEG-2 (46 Mbit/s) video with stereo-quality audio. The equipment was continually monitored from a web console to enable monitoring of the entire distribution network and to enable communications for troubleshooting between sites. An IRC (Internet relay chat) server was set up as part of the platform to allow two-way chat-style communications from client personal computers at different sites, as well as an Internet telephony system.

Fig. 1. 2007 live distribution trial.

2. Technical issues with live distribution services

The 2007 trial showed that the MPEG-2 resolution (46 Mbit/s) enabled clear viewing of details such as individual spangles on the Takarazuka Review costumes, indicating that there are no issues with resolution. However, when the camera panned sideways during line dancing scenes and similar scenes, the image on the screen became very difficult to watch—it induced seasick-like feelings in those viewing in the cinema's front rows—so we concluded that conventionally shot TV content is not always suitable for the big screen. Furthermore, although these cinemas had 5.1 surround sound systems, distortion in the stereo audio signals that had been encoded, compressed, and transmitted led to viewers complaining that the sound was thin or no good.

Regarding video quality, resolutions higher than MPEG-2 (46 Mbit/s) are possible, but they generally lead to higher network costs, which prompts content holders to ask whether it is possible, in terms of actual business, to keep costs down without affecting video quality.

To resolve the issue of nauseous feelings induced by camera panning, we have introduced shooting methods that take into account the constraints of large-screen projection by switching shots among multiple cameras while eliminating panning as much as possible. To address the issue of network costs, we have been able to halve the video bandwidth while maintaining the same video quality as MPEG-2 (46 Mbit/s) by using MPEG-4 AVC/H.264 instead.

However, the audio quality problems still needed to be addressed. There were four issues that needed to be resolved: (1) audio quality at the time of recording, (2) audio quality during mixdown, (3) transmission quality suitable for 5.1 surround sound, and (4) an audio environment that can be replayed on 5.1 surround sound systems.

3. Joint research with Kyushu University

With a knowledge base in audio-visual engineering and staging technologies, Kyushu University runs a "Culture Hall Management Engineer Training Program" training unit. The aims are to (1) teach personnel the skills to make and implement plans as part of community measures to promote local arts and culture, while making efforts to promote effective use of local citizen halls and public centers using optical networks; (2) promote regional development; and (3) bridge the information gap by distributing arts and cultural events that are held predominantly in the big cities of Osaka and Tokyo, and thus expand opportunities to use content in local settings.

Since measures to distribute arts and cultural events to local public halls via optical networks match NTT WEST's approaches and ideas for live distribution, and since both NTT WEST and Kyushu University are working to spread live distribution via optical networks, we have embarked upon joint research.

As part of this joint research, NTT WEST and Kyushu University have considered ways to combine knowledge and technologies to address the audio quality problems identified in the 2007 trial and demonstrate solutions and have also considered ways to promote live distribution to public halls by using optical networks.

Specifically, research supervised by Associate Professor Akira Omoto of Kyushu University into the visualization of reflected sound in an enclosed space by means of sound intensity measurement [2] has been conducted for both the sender and receiver of the audio signals. Through an understanding of the characteristics of the acoustic fields at both ends, this research has provided us with new and optimized recording and mixing techniques. By using Kyushu University's knowledge in combination with NTT WEST's optical networks and encoding technology created through NTT's R&D, and with the cooperation of NTT Learning Systems, we were able to achieve mixing optimized for the replay venue. For the system demonstration, we were provided with content from TCA in the same way as in the 2007 trial, and we were also assisted by Kadokawa Cineplex as the receiver of the high-presence live audio (replay venue). We conducted joint research with the participation of TCA and Kadokawa Cineplex [3].

4. High-presence audio live distribution trial

On September 12, 2010, we transmitted the Takarazuka Review––Snow Group performance of the final performance of the "Natsuki Mizu Goodbye Show" from the Tokyo Takarazuka Theater to test the high-presence audio distribution system (Fig. 2). Because our aim was to find out if we could reproduce the original audio from Tokyo Takarazuka Theater at the replay venue, we used only one receiving site: the Cineplex at Makuhari. The two sites were linked by a 40-Mbit/s network connection in asynchronous transfer mode (ATM). We used the NA5000 as the IP interface. We also used the new audio rate oriented adaptive bit-rate video encoder/decoder developed by NTT Network Innovation Laboratories; this codec can control the bitrate between video and audio in real time, and higher video quality is achieved by making use of extra bits saved by using lossless audio compression. Since any audio quality degradation should be avoided, we used MPEG-4 Audio Lossless Coding (ALS) [4], to which NTT Communication Science Laboratories is one of contributors in standardization activities.

Fig. 2. 2010 high-presence audio live distribution.

Moreover, we used a highly efficient live distribution system developed by NTT Network Innovation Laboratories, which served to ensure end-to-end reliability with error correction and IP packet encryption. For picture quality, we chose to use the MPEG-4 AVC/H.264 (average 20 Mbit/s) format after evaluating the 2007 trial results. Furthermore, to compare high audio presence with conventional systems, we used TCA's commercial service to connect the Tokyo Takarazuka Theater with another cinema (a conventional stereo-sound cinema) via the Business Ether-Wide service and feed high-definition video with stereo audio to it. Furthermore, with the cooperation of NTT Cyber Space Laboratories, we created a questionnaire for viewers and statistically analyzed the results to assess audio quality.

Regarding source audio recording, we sent a total of 13 channels to the mixer, which included 9 channels from the Takarazuka Review venue audio and 4 channels of independently added ambient audio, which were mixed for the 5.1 surround sound system at Cineplex Makuhari with the output fed to the lossless audio encoding equipment.

This setup enabled us to faithfully reproduce the mixed audio signal at the Cineplex Makuhari venue. A supervisor in the seating at Cineplex Makuhari was able to relay information about the replayed audio back to the mixer at the Tokyo Takarazuka Theater in real time to enable mixing adjustments as required.

As a result, viewers in the movie theater were able to experience a similar atmosphere to the source venue. They were naturally compelled to cheer and clap just as if they were really at the Takarazuka Theater––a reaction that was not observed during the 2007 trial. In this way, we were able to successfully overcome the limitations of conventional live network distribution and create a unified feeling between the two venues.

5. Trial evaluation

We surveyed viewers about the high-presence audio replay at Cineplex Makuhari and compared the results with survey results collected from viewers at the conventional stereo-sound cinema.

We received 157 responses from the 275 viewers at Cineplex Makuhari (about 57.1%) and 103 responses from the 308 viewers at the conventional cinema (33.4%). When we compared these results, we found several noteworthy differences between the two groups (Fig. 3).

Fig. 3. Questionnaire results.

In response to the question about the impact of the sound on a scale of 1–7, 66.1% of respondents at Cineplex Makuhari rated the sound in the top 3, i.e., as very strong, strong, or moderately strong compared with only 55.7% for the stereo-sound cinema. In response to the question, "How strongly did you feel as if you were actually in the Tokyo Takarazuka Theater?" 69.8% of respondents at Cineplex Makuhari selected very strongly, strongly, or moderately strongly compared with 59.6% for the stereo-sound cinema.

The biggest difference between the two groups was for the question regarding entrance fees. 35.3% of Cineplex Makuhari respondents said that they were very satisfied, satisfied, or moderately satisfied and, including ones who answered indifferently, 69.6% of them were agreeable to the entrance fee. By contrast, the corresponding figures for the stereo-sound cinema were 14% and 53%. These results indicate that the general level of satisfaction was significantly higher at Cineplex Makuhari.

6. Future plans

At NTT WEST, we plan to further analyze these results and establish business models for commercializing audio-visual lossless encoding and transmission and live distribution technologies with the aim of involving even more content holders to popularize live distribution services. Furthermore, we aim to increase business efficiency by using optical networks and increase content security by providing a safe and reliable network while promoting research into optical live distribution services.


[1] (in Japanese).
[2] Y. Fukushima, H. Suzuki, and A. Omoto, "Visualization of Reflected Sound in Enclosed Space by Sound Intensity Measurement," Acoustical Science and Technology, Vol. 27, No. 3, pp. 187–189, 2006.
[3] (in Japanese).
[4] Y. Kamamoto, T. Moriya, N. Harada, and C. Kos, "Enhancement of MPEG-4 ALS Lossless Audio Coding," NTT Technical Review, Vol. 5, No. 12, 2007.
Hiroshi Yamane
Manager, Solution Business Department, Corporate Marketing Headquarters, NTT WEST.
He received the B.S. degree in sociology and MBA degree from Momoyama Gakuin University (St. Andrew's University of Japan), Osaka, in 1989 and 1996, respectively, and the Ph.D. degree in engineering from Nara Institute of Science and Technology in 2001. He joined BRAINS R&D Center, NTT Business Communications Headquarters, Tokyo, in 1989. He moved to NTT WEST in 1999. He has developed many visual communication services such as the audio, visual, and communication systems design for Kyushu National Museum of Japan (as a licensed design engineer) and the 4K digital cinema international experiment. He received the 10th Telecommunications Advancement Foundation Award in 1995. He is a member of the Institute of Electronics, Information and Communication Engineers and the Institute of Image Information and Television Engineers of Japan.
Akio Yamashita
Assistant Manager, Services Creation Department, NTT WEST.
He graduated from the Faculty of Science, Konan University, Hyogo, in 1999. He joined NTT in 1999. Having gained experience in company system development in the Technology Innovation Department, he is currently engaged in optical distribution system (ODS) network distribution joint research related to high-presence audio.
Koji Kamatani

Broadband Service Department, NTT Media Supply.
He received the B.E. and M.E. degrees from Kyoto University in 1998 and 2000, respectively. He joined NTT WEST in 2000 and worked in the Corporate Marketing Headquarters on planning and building corporate, municipal, and university networks. After that he focused on live distribution service development for ODS and digital cinema. He moved to the Broadband Service Department in NTT Media Supply Co. in July 2011. He is currently engaged in designing and operating Internet systems as a system administrator of an Internet service provider.

Masashi Morisaki
Services Creation Department, NTT WEST.
He graduated from the Faculty of Economics, Shiga University in 1997. He joined NTT in 1997. While working in the Corporate Marketing Department at Hyogo Branch, he was involved in developing systems for local government offices. He is currently engaged in ODS network distribution joint research related to high-presence audio.
Tomomi Mitsunari
Assistant Manager, Osakaminami Division, NTT WEST-KANSAI.
She graduated from Ritsumeikan University College of Law in 1999 and joined NTT in the same year. Having gained experience in building government and public office systems at the Yamaguchi Branch, she worked on ODS network distribution joint research related to high-presence audio. She became an Assistant Manager in Osakaminami Division, NTT WEST-KANSAI in July 2007 and is currently engaged in the project of agency business.
Akira Omoto
Associate Professor, Faculty of Design, Kyushu University.
He graduated from Kyushu Institute of Design in 1987 and received the Ph.D. degree from the University of Tokyo in 1995 for a thesis on an active noise barrier. From 1987 to 1991, he worked as an R&D engineer at Nittobo Acoustic Engineering Co., Ltd. In 1991, he was appointed research associate at the Department of Acoustic Design, Kyushu Institute of Design; he was made Associate Professor in 1997. With the unification of universities in 2003, he became Associate Professor at the Faculty of Design, Kyushu University. His research interests include measurement, evaluation, and control of enclosed sound fields.