Auto-calibrating Echo Canceller Software for VoIP Applications on Smartphones
Automatically calibrating echo canceller software has been developed for voice over Internet protocol (VoIP) applications on smartphones. Because the audio properties of smartphones typically depend on the model, the speech quality of a VoIP application may sometimes degrade, especially during hands-free conversations. We extended the calibration ability of our software in order to handle the variations in smartphone audio properties. As a result, our software exhibited better performance than most conventional software.
Keywords: echo canceller, smartphone, VoIP application
As smartphones become more widespread, communication is becoming more flexible than that achieved with ordinary fixed or mobile phones or personal computers. When making a voice call using a smartphone, some users may sometimes choose a voice over Internet protocol (VoIP) application instead of the default voice call function embedded in the smartphone. However, to provide good speech quality, the different audio properties of the various types of existing smartphones must be taken into consideration. The audio processing functions for the embedded voice call function can be well tuned to the property of each target smartphone model. In contrast, it is difficult to provide a VoIP application with such well-tuned audio processing functions since the application must be able to run on any model. Therefore, the speech quality of a VoIP application may sometimes degrade, especially during hands-free calls.
We developed auto-calibrating echo canceller software that can be implemented on VoIP applications. The following three functions are included in the software.
(1) Acoustic echo cancellation (AEC)
(2) Noise reduction (NR)
(3) Automatic gain control (AGC)
These three functions have internal parameters that can be automatically calibrated depending on the property of each smartphone model, which means that VoIP applications with our software can support most smartphone models.
2. Problems with audio processing on VoIP applications for smartphones
When we make a voice call by using a VoIP application on a smartphone, some acoustic problems may disturb our communication with the other party. Typical problems are acoustic echo, noise, and unbalanced sound levels (Fig. 1).
Acoustic echo is caused by acoustic coupling between the loudspeaker and microphone, both of which are built into smartphones. The sound reproduced from the built-in loudspeaker is conducted to the built-in microphone. Thus, when the far-end talker’s voice is reproduced from the loudspeaker, it is picked up by the microphone and may be sent back to the talker through the smartphone. This feedback sound is called acoustic echo.
The built-in microphone picks up not only the talker’s voice or the above-mentioned acoustic echo, but also background noise. The noise level depends on the environment where the talker is located and the acoustic or electrical properties of the microphone.
Even though a talker may speak into the microphones of various smartphone models at the same volume and from the same distance, the transmitted sound level sometimes differs depending on the model. This is because the microphone sensitivity and amplifier gain of each model are not always designed to be the same. Also, some smartphones have a built-in level-control device such as a compressor or limiter for processing the microphone signal. The loudness of the loudspeaker also depends on the model.
3. Auto-calibrating echo canceller software
To solve the acoustic echo, noise, and unbalanced sound level problems on any smartphone model, we developed automatically calibrating echo canceller software. A functional block diagram of the software is shown in Fig. 2. The software has three main functions, as follows.
3.1 Acoustic echo cancellation
The acoustic echo can be cancelled if the acoustic echo signal is properly predicted by the AEC function. The accuracy of the predicted echo signal depends on how the acoustic echo path is modeled and how the model parameters are adaptively estimated. Most conventional AEC functions adaptively estimate only the room reverberation part of the echo path. This is not sufficient, however, for voice calls carried out using a smartphone. Therefore, we extended the parts of the echo path that can be adaptively estimated by taking into account distortion in the small built-in loudspeaker, variation in the sound buffering delay, and abrupt level changes caused by the built-in level-control function . Because the new AEC function can track a wider range of variations in the echo path, it can predict the echo signal more accurately.
3.2 Noise reduction
The NR function is for transmitting a clear voice signal to the far-end listener by suppressing background noise based on the noise’s stationarity. The basic algorithm for the NR function is similar to that used for the supporting technologies for Hikari Living, which provides visual communication via a television set .
3.3 Automatic gain control
The AGC function adjusts both the transmitted and received sound levels into an adequate range. Controlling both levels makes it possible to automatically amplify or reduce the sound level by up to 12 dB.
4. Performance evaluation
The performance of the developed software was evaluated on five different smartphone models. In a single-talk case, in which no one speaks into the near-end microphone and the far-end talker’s voice is reproduced from the near-end loudspeaker, the acoustic echo was reduced by more than 40 dB for all tested models. Even in the case of double-talk, in which both talkers spoke simultaneously, the acoustic echo was reduced by more than 20 dB for all tested models. With the conventional software, the acoustic echo was reduced by only 10 dB in the double-talk case for a certain model.
The speech quality of the near-end talker during double-talk was also evaluated using the perceptual evaluation of speech quality (PESQ), which is described in ITU-T (International Telecommunication Union-Telecommunication Standardization Sector) P.862. The PESQ score for our software was 0.45 better than that for the conventional software on average for five models and at most 1.15 better at maximum for a certain model.
We developed auto-calibrating echo canceller software for VoIP applications on smartphones. The new software exhibited better performance than the conventional software did due to its extended calibration capability enabling it to handle variations in smartphone audio properties. Although we focused on implementing the new software on VoIP applications, it can also be implemented on the embedded voice call function and exhibit good performance.