Feature Articles: Collaborations with Universities Leading to Open Innovation in NTT's R&D
High-presence Audio Live Distribution Trial
This article reports on a trial of live distribution of video featuring high-presence audio over optical networks conducted in September 2010 and presents an evaluation of the results. With the spread of such distribution, demand for high-presence replay is on the rise, and NTT WEST has been involved with Kyushu University in joint research on high-presence video and audio.
NTT WEST believes that one way of using optical networks in the future will be live distribution, so it has been involved in various undertakings to this end.
On September 8, 2007, NTT WEST conducted a live broadcast of the "TCA Special 2007" from the Takarazuka Grand Theater in Hyogo Prefecture to two theaters: one in Tokyo and one in Osaka (TCA: Takarazuka Creative Arts). And on December 24 in the same year, NTT WEST conducted live broadcasts of the "Closing day of the Takarazuka Review Hanagumi Performance" from the Tokyo Takarazuka Theater to seven cinemas: four in Tokyo, two in Osaka, and one in Nagoya. Unlike normal movie content, this advanced live broadcast featured high-quality content distributed via the network to a number of commercial cinemas .
Conducted jointly by TCA, NTT, and content holders, these trials aimed to provide an overall assessment of the potential of the future business of live broadcasting over networks; address audio-visual quality issues, management structures, technical issues, business models, and profitability; and gather feedback from the content viewers themselves.
The configuration used for the trial (Fig. 1) involved connections between NTT Communications' optical network (leased circuit services etc.) and NTT WEST's optical networks (local area network communications services), across which the multicasting took place. The distribution equipment included an IP (Internet protocol) interface (NA5000), an encoder (HE5000), and a decoder (HD5000) (all products of NTT Electronics Corporation); the distributed content was MPEG-2 (46 Mbit/s) video with stereo-quality audio. The equipment was continually monitored from a web console to enable monitoring of the entire distribution network and to enable communications for troubleshooting between sites. An IRC (Internet relay chat) server was set up as part of the platform to allow two-way chat-style communications from client personal computers at different sites, as well as an Internet telephony system.
2. Technical issues with live distribution services
The 2007 trial showed that the MPEG-2 resolution (46 Mbit/s) enabled clear viewing of details such as individual spangles on the Takarazuka Review costumes, indicating that there are no issues with resolution. However, when the camera panned sideways during line dancing scenes and similar scenes, the image on the screen became very difficult to watch—it induced seasick-like feelings in those viewing in the cinema's front rows—so we concluded that conventionally shot TV content is not always suitable for the big screen. Furthermore, although these cinemas had 5.1 surround sound systems, distortion in the stereo audio signals that had been encoded, compressed, and transmitted led to viewers complaining that the sound was thin or no good.
Regarding video quality, resolutions higher than MPEG-2 (46 Mbit/s) are possible, but they generally lead to higher network costs, which prompts content holders to ask whether it is possible, in terms of actual business, to keep costs down without affecting video quality.
To resolve the issue of nauseous feelings induced by camera panning, we have introduced shooting methods that take into account the constraints of large-screen projection by switching shots among multiple cameras while eliminating panning as much as possible. To address the issue of network costs, we have been able to halve the video bandwidth while maintaining the same video quality as MPEG-2 (46 Mbit/s) by using MPEG-4 AVC/H.264 instead.
However, the audio quality problems still needed to be addressed. There were four issues that needed to be resolved: (1) audio quality at the time of recording, (2) audio quality during mixdown, (3) transmission quality suitable for 5.1 surround sound, and (4) an audio environment that can be replayed on 5.1 surround sound systems.
3. Joint research with Kyushu University
With a knowledge base in audio-visual engineering and staging technologies, Kyushu University runs a "Culture Hall Management Engineer Training Program" training unit. The aims are to (1) teach personnel the skills to make and implement plans as part of community measures to promote local arts and culture, while making efforts to promote effective use of local citizen halls and public centers using optical networks; (2) promote regional development; and (3) bridge the information gap by distributing arts and cultural events that are held predominantly in the big cities of Osaka and Tokyo, and thus expand opportunities to use content in local settings.
Since measures to distribute arts and cultural events to local public halls via optical networks match NTT WEST's approaches and ideas for live distribution, and since both NTT WEST and Kyushu University are working to spread live distribution via optical networks, we have embarked upon joint research.
As part of this joint research, NTT WEST and Kyushu University have considered ways to combine knowledge and technologies to address the audio quality problems identified in the 2007 trial and demonstrate solutions and have also considered ways to promote live distribution to public halls by using optical networks.
Specifically, research supervised by Associate Professor Akira Omoto of Kyushu University into the visualization of reflected sound in an enclosed space by means of sound intensity measurement  has been conducted for both the sender and receiver of the audio signals. Through an understanding of the characteristics of the acoustic fields at both ends, this research has provided us with new and optimized recording and mixing techniques. By using Kyushu University's knowledge in combination with NTT WEST's optical networks and encoding technology created through NTT's R&D, and with the cooperation of NTT Learning Systems, we were able to achieve mixing optimized for the replay venue. For the system demonstration, we were provided with content from TCA in the same way as in the 2007 trial, and we were also assisted by Kadokawa Cineplex as the receiver of the high-presence live audio (replay venue). We conducted joint research with the participation of TCA and Kadokawa Cineplex .
4. High-presence audio live distribution trial
On September 12, 2010, we transmitted the Takarazuka Review––Snow Group performance of the final performance of the "Natsuki Mizu Goodbye Show" from the Tokyo Takarazuka Theater to test the high-presence audio distribution system (Fig. 2). Because our aim was to find out if we could reproduce the original audio from Tokyo Takarazuka Theater at the replay venue, we used only one receiving site: the Cineplex at Makuhari. The two sites were linked by a 40-Mbit/s network connection in asynchronous transfer mode (ATM). We used the NA5000 as the IP interface. We also used the new audio rate oriented adaptive bit-rate video encoder/decoder developed by NTT Network Innovation Laboratories; this codec can control the bitrate between video and audio in real time, and higher video quality is achieved by making use of extra bits saved by using lossless audio compression. Since any audio quality degradation should be avoided, we used MPEG-4 Audio Lossless Coding (ALS) , to which NTT Communication Science Laboratories is one of contributors in standardization activities.
Moreover, we used a highly efficient live distribution system developed by NTT Network Innovation Laboratories, which served to ensure end-to-end reliability with error correction and IP packet encryption. For picture quality, we chose to use the MPEG-4 AVC/H.264 (average 20 Mbit/s) format after evaluating the 2007 trial results. Furthermore, to compare high audio presence with conventional systems, we used TCA's commercial service to connect the Tokyo Takarazuka Theater with another cinema (a conventional stereo-sound cinema) via the Business Ether-Wide service and feed high-definition video with stereo audio to it. Furthermore, with the cooperation of NTT Cyber Space Laboratories, we created a questionnaire for viewers and statistically analyzed the results to assess audio quality.
Regarding source audio recording, we sent a total of 13 channels to the mixer, which included 9 channels from the Takarazuka Review venue audio and 4 channels of independently added ambient audio, which were mixed for the 5.1 surround sound system at Cineplex Makuhari with the output fed to the lossless audio encoding equipment.
This setup enabled us to faithfully reproduce the mixed audio signal at the Cineplex Makuhari venue. A supervisor in the seating at Cineplex Makuhari was able to relay information about the replayed audio back to the mixer at the Tokyo Takarazuka Theater in real time to enable mixing adjustments as required.
As a result, viewers in the movie theater were able to experience a similar atmosphere to the source venue. They were naturally compelled to cheer and clap just as if they were really at the Takarazuka Theater––a reaction that was not observed during the 2007 trial. In this way, we were able to successfully overcome the limitations of conventional live network distribution and create a unified feeling between the two venues.
5. Trial evaluation
We surveyed viewers about the high-presence audio replay at Cineplex Makuhari and compared the results with survey results collected from viewers at the conventional stereo-sound cinema.
We received 157 responses from the 275 viewers at Cineplex Makuhari (about 57.1%) and 103 responses from the 308 viewers at the conventional cinema (33.4%). When we compared these results, we found several noteworthy differences between the two groups (Fig. 3).
In response to the question about the impact of the sound on a scale of 1–7, 66.1% of respondents at Cineplex Makuhari rated the sound in the top 3, i.e., as very strong, strong, or moderately strong compared with only 55.7% for the stereo-sound cinema. In response to the question, "How strongly did you feel as if you were actually in the Tokyo Takarazuka Theater?" 69.8% of respondents at Cineplex Makuhari selected very strongly, strongly, or moderately strongly compared with 59.6% for the stereo-sound cinema.
The biggest difference between the two groups was for the question regarding entrance fees. 35.3% of Cineplex Makuhari respondents said that they were very satisfied, satisfied, or moderately satisfied and, including ones who answered indifferently, 69.6% of them were agreeable to the entrance fee. By contrast, the corresponding figures for the stereo-sound cinema were 14% and 53%. These results indicate that the general level of satisfaction was significantly higher at Cineplex Makuhari.
6. Future plans
At NTT WEST, we plan to further analyze these results and establish business models for commercializing audio-visual lossless encoding and transmission and live distribution technologies with the aim of involving even more content holders to popularize live distribution services. Furthermore, we aim to increase business efficiency by using optical networks and increase content security by providing a safe and reliable network while promoting research into optical live distribution services.