|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Selected Papers: Research Activities in Laboratories of New NTT Fellows MPEG-4 ALS: Performance, Applications, and Related Standardization ActivitiesAbstractMPEG published a new lossless audio coding scheme called MPEG-4 Audio Lossless Coding (ALS) in March 2006. This paper overviews MPEG-4 ALS and its possible applications. In addition, it introduces an audio archiving tool that uses MPEG-4 ALS as the encoding engine. This archiving tool offers excellent compression performance and functionality for handling several audio files as one archived file. Test results show that its compression performance, in terms of compression ratio and required processing time, is much better than that of ZIP when the input data is audio files. The tool's application format has been proposed to MPEG-A and is now being discussed as one of the multimedia application formats (MAFs) under the name MPEG-A Professional Archival MAF. Other recent standardization activities are also briefly mentioned.
1. IntroductionWith the publication of a new lossless audio coding scheme called MPEG-4 Audio Lossless Coding (ALS) in March 2006 [1], the MPEG standard now has efficient lossless encoding tools for audio data files. There is a strong demand for music in formats that provide superior audio quality such as CD (compact disc) quality. If the audio signal is recorded with high resolution, compressing the data using a lossy encoding scheme is of no use because some information at higher frequencies will be lost. A lossless encoding scheme must be used to compress high-resolution audio data. Such a lossless compression format can be used for Internet distribution of audio files (e.g., streaming or downloading of music from online music stores), as an audio format for portable music players, and for other high-resolution disc formats such as Blu-ray and HD-DVD. On the other hand, master tapes or original pressings of historical analog recordings are in the process of deteriorating, and it is becoming impossible to listen to them in their original form because very few of the players are still available and some no longer exist. Analog recorded audio signals should be digitized with the highest possible sampling ratio and the highest quantized bit depth. Such signals should be compressed by a standardized lossless compression scheme. In addition, in recent years, it has become popular to store professional-level recordings as a set of files that contain not only audio tracks but also meta-information files and other non-audio files, such as plug-in binaries, notes, and picture images, in a hierarchical folder structure. Such a set of files is called a recording project. With the increasing popularity of high-resolution audio contents, the total size of the intermediate files for an audio recording project is getting larger. For example, the total size of the audio data sampled at 192 kHz with 24-bit resolution is four times larger than data sampled at 48 kHz. The dependence of audio data size for three minutes of music on sampling frequency and bit resolution is shown in Fig. 1. A standardized professional archival format is strongly demanded for the storage, delivery, and preservation of recorded digital contents.
For archival application, users often want to compress audio data files together with the whole folder structure. Gnu Zip [2] or WinZip [3] is widely used for this purpose, but, unfortunately, the compression performance is not high enough for raw audio data files. Here, as a way to handle several audio files as one archived file, we describe an audio archiving tool that uses MPEG-4 ALS for the encoding engine. Section 2 overviews MPEG-4 ALS and its possible applications. Section 3 introduces our MPEG-4 ALS-based archiving tool, describes its possible applications, and presents experimental test results. Section 4 covers MPEG's latest standardization activities concerning MPEG-A Professional Archival MAF and section 5 mentions other standardization activities. Finally, section 6 makes some concluding remarks. 2. MPEG-4 ALS2.1 OverviewMPEG-4 ALS is an extension of the MPEG-4 audio coding family for lossless compression of audio data [1], [4]–[14]. The ALS core codec is based on forward-adaptive linear prediction, which offers remarkable compression performance with low complexity. Additional features include long-term prediction, multichannel coding, and compression of floating-point audio material [15], [16]. ALS also offers good flexibility in terms of the compression-complexity tradeoff, ranging from very low-complexity implementations to maximum compression modes, so it is adaptable to different requirements. Its many other features include: -General support for virtually any uncompressed digital audio format, including the Sony Wave64 file format and Broadcast Wave Format (BWF) [19]–[21]. -Support for linear PCM (pulse code modulation) resolutions of up to 32 bits at arbitrary sampling rates. -Multichannel/multitrack support for up to 65,536 channels, including 5.1-, 7.1-, and 22.2 channel surround sound. -Support for audio data in the IEEE 754 32-bit floating-point audio format -Quick random access to encoded data -Support for the MP4 file format, which allows multiplexing and synchronization with video [22]. Input formats supported by MPEG-4 ALS are listed in Table 1.
2.2 Possible applicationsLossless audio coding in general and MPEG-4 ALS in particular have many applications, at both the professional and consumer levels. These include: -Internet distribution of audio files (streaming, online music stores, and downloading) -High-resolution disc format -Portable music players -Archival systems (for broadcasting, studios, record labels, and digital transfer) -Studio operations (for storage, collaborative working, digital backup, and digital transfer) In these applications, MPEG-4 ALS is used as an encoding engine for the lossless compression of the audio data. For the archival systems and studio operations, the archiving tool based on MPEG-4 ALS described in section 3 will be more suitable. 2.3 PerformanceThe specifications for the coded bitstream of MPEG-4 ALS and its decoding scheme have been established, but there is still some room for improving coding efficiencies with respect to compression performance and encoding and decoding times, which should be done without losing the decoding compatibility with the standard. We have proposed several fast encoding algorithms for MPEG-4 ALS [23]–[24] and developed an efficient encoding and decoding tool. The improved encoding and decoding software developed by NTT for MPEG-4 ALS, called “MPEG-4 ALS fast”, can encode and decode very efficiently in terms of both processing speed and compression ratio. It has been shown that it can encode and decode much faster than the publicly available reference software for MPEG-4 ALS and faster than any other lossless coding software. In tests, two implementations of an MPEG-4 ALS codec, MPEG-4 ALS RM18 [25] and MPEG-4ALS fast, were compared with three of the most popular programs for lossless audio compression: FLAC (version 1.1.2) [26], Monkey's Audio (MAC 4.01) [27], and OptimFrog (OptimFROG v4.520b1) [28]. The tests were conducted on a 2.39-GHz AMD Opteron processor 250 with 2 GB of RAM (random access memory). The data for the tests was acquired from the standard set of audio sequences for MPEG-4 lossless coding originally donated by Panasonic Corporation. We used a total of 51 stereo waveform files with sampling rates of 48 kHz and resolution of 16 bits and with sampling rates of 48, 96, and 192 kHz and a resolution of 24 bits. The duration of each waveform was 30 seconds. The total size of the 51 files was 682,562,430 bytes. The results for the tested encoders and decoders are shown in Figs. 2 and 3. The vertical axis shows the compression ratio, defined as
The results show that MPEG-4 ALS RM18 and MPEG-4 ALS fast can achieve better performance than other lossless codecs in terms of the best balance of compression performance and encoding/decoding speed, with MPEG-4 ALS fast offering the best performance. The compression ratio was 7 to 8% better than that of FLAC even when the encoding/decoding speed was twice as fast. The results also show that the CPU (central processing unit) load for realtime encoding and decoding using MPEG-4 ALS fast was approximately 1 to 2% that of the Opteron Processor 250 when the fastest operating mode was used. 3. Archiving tool based on MPEG-4 ALS3.1 Possible applications(1) Digital backup and direct delivery of recorded music projects Nashville members of the P&E Wing of The Recording Academy have formed a Delivery Specifications Committee, which has created the Delivery Recommendations for Master Recordings document [29]. In this document, the committee states that it expects direct delivery (via secure connection on the Internet) to be commonplace in the future and recommends uploading files to very-large-scale digital libraries. The preferred delivery of a recorded music project would include Broadcast Wave Files (BWF) [20]–[22] of every multitrack and two-track element, without processing or automation. All of the audio tracks should be flattened, which means converting small recorded segments into a contiguous sound file padded with silence between the segments, and migrated to the BWF format with a maximum of one channel per BWF. In addition to BWFs, digitized versions of usual documentation (tracking sheets, engineer's notes, setup notes, sketches of microphone placement, recording map documents, lyrics, charts, orchestral arrangements, and parts-and-mix documentation) and any other data pertinent to the recording project should be included in the delivery files. Folder-structure-based archiving is suitable for the direct delivery of a recorded music project. It is natural that all files related to the recorded music project should be kept in an appropriate folder. The files in the folder should be compressed and archived together in a single file (see Fig. 4). This would offer a high degree of safety because it can prevent users from miscopying a file during the delivery process. In addition, when uploading or downloading a project by FTP (file transfer protocol), it is much easier to transmit a single archive file than it is to transmit many individual BWF files (see Fig. 5).
(2) Compression and archiving of intermediate data generated by sound editing tools Many professional sound editing tools, such as ProTools from DigiDesign [30] or Nuendo from Steinberg [31], keep work files in folders. Raw waveform data from separate tracks is stored in separate files with a specific file format, such as .wav or .aiff, and those files are kept in one or more folders. The set of files in the folder for a song consisting of the waveform files of all tracks is called a project. A project contains the project information file and individual audio track files. During editing operations, compression and archiving tools that can compress all files in a folder with the folder structure into a single file would be very advantageous for this application. Users could keep a snapshot of a version of edited files in an archive file, so that they could roll back to the previous version of the edits if desired. Sometimes, a target file may be a non-audio data file or may already be encoded with a lossy compression tool, such as MP3/AAC. In that case, the file should simply be added to the archive file as is. (3) Preservation/archiving Lossless compression of files is becoming very popular because it reduces the demand on storage media for bitwise-exact copies of digitized masters (Fig. 6). All files related to the content are archived together in a single archive. The relationships among the files may sometimes be tight, but sometimes not. Folder-structure-based archiving is more suitable for archiving files of this kind.
(4) Packaging file format for non-audio waveforms In addition to audio data, MPEG-4 ALS can also efficiently compress non-audio multichannel signals, such as biomedical (e.g., electroencephalography (EEG) or magnetoencephalography (MEG)) and seismic data [15], [17]. For example, 512-channel MEG data can be compressed to about 15 to 40% of its original size. This archival application would be used for packaging such non-audio waveform data together with other related information. 3.2 Overview of the encoder and decoder for the archiving toolOverviews of the encoder and decoder for our audio archiving tool are shown in Figs. 7 and 8. If the target file contains PCM audio data, it is encoded with the MPEG-4 ALS encoder for lossless compression. Audio data files, such as .wav and .aiff files, are compressed with MPEG-4 ALS and the encoded bitstreams are added to the archive file with file attributes, such as file names, read/write attributes and folder structure information. If the target file is an audio file that has already been encoded with a lossy compression tool, such as AAC (advanced audio coding), or if it is a non-audio file, it is simply added to the archive file as is without any further compression.
3.3 Experimental resultsThe compression performance of our archiving tool was tested with a set of files totaling 242,805,050 bytes. A folder containing the files was compressed and archived with our archiving tool and with WinZIP. The folder contained 82 audio files, 40 image files, and two project information files (binary files) of the audio editing tool. The test was conducted on a personal computer with a Pentium M (1.6 GHz) CPU. The encoding and decoding times of our tool and those of WinZIP for compressing and extracting the song files are shown in Fig. 9. The vertical axis shows the compression ratio, defined as
4. Standardization of MPEG-A Professional Archival MAF4.1 MPEG-A: MAFsMPEG's multimedia application formats (MAFs) [32] provide a framework for integrating elements from several MPEG standards into a single specification that is suitable for specific but widely usable applications. For example, MAFs specify how to combine metadata with timed media information for a presentation in a well-defined format that facilitates interchange, management, editing, and presentation of the media. The presentation may be ¡Ælocal' to the system or may be accessible via a network or other stream delivery mechanism. Selected MAFs are expected to become parts of the ISO/IEC 23000 specification. The 23000 series is also called MPEG-A (where “A” stands for application). 4.2 Professional Archival MAFTo archive files containing audio tracks along with meta-information files and picture images in a folder with a hieratical folder structure, we have designed an archiving file format that makes use of MPEG-4 ALS as the encoding engine and proposed it to the standardization committee. We are now working to develop the proposed format into an MAF standard called the MPEG-A Professional Archival MAF. This is intended to be a standardized solution that enables sustainability, accessibility, and playability of digital content, making full use of MPEG tools for maximum interoperability. As such, it aims to preserve our digital heritage and also provide for improved preservation of pre-digital content. 4.3 Technical requirementsIn order to generalize the requirements and expand the scope to include audio, video, picture images, and other multimedia data, MPEG issued a call for requirements for the Professional Archival MAF [33]. Examples of packaging and unpackaging tools for the Professional Archival MAF based on the current specification are shown in Fig. 10.
5. Other standardization activities related to MPEG-4 ALSSeveral ongoing standardization activities related to MPEG-4 ALS are seeking to enhance its utility and appeal. An overview of these activities is shown in Fig. 11.
At IEC TC100, we proposed a new work item for IEC 61937, called “Digital audio––Interface for nonlinear PCM-encoded audio bitstreams applying IEC 60958 Part 10: Nonlinear PCM bitstreams according to the MPEG-4 ALS format”. The purpose of this specification is to transmit MPEG-4 ALS encoded bitstreams via the IEC 60958 interface, which is well known as the S/PDIF (Sony/Philips digital interconnect format) digital interface specification. There is a strong need for the capability to store, preserve, and transmit high-resolution audio signals in a cost efficient way using a standardized lossless compression scheme. The development an interface standard for the transmission of MPEG-4 ALS encoded bitstreams among various professional or high-end consumer equipment is expected. 6. ConclusionWe have shown several possible applications of MPEG-4 ALS and described the performance of our latest implementation encoder and decoder for it. We described a new tool based on MPEG-4 ALS for archiving audio data. Test results show that its compression performance is much better than that of ZIP. The tool's application format has been proposed to MPEG-A and is now being discussed as one of the MAFs under development. MPEG 4 ALS and its related standards are expected to be very useful tools in both professional and consumer applications. References
|