New Generation of Codecs

For many years, the type of codecs developed for Voice over IP networks have been made with static sampling and transmission rates. For instance, G711a or u law, which is also known as PCM (pulse code modulation), has a sampling rate of 64 Kbps and will generate a fixed payload of 160 bytes every 20 milliseconds. This codec has been used in the industry as the leading high-quality voice transmission codec, and is used within high speed network environments.

Over wide area network links, the PCM codec sampling rate takes too much critical bandwidth however, so a different type of codec was developed, G729 or G729a. This codec has a sampling rate of 8 kbps and only generates 20 bytes of payload for every 20 milliseconds. However, if these voice packets starts to suffer excessive delay or drops then the quality will suffer greatly.

Now codec designers looking at the network congestion issue have come up with a different approach altogether. Instead of just running samples at a fixed rate, the came up with a codec that typically runs at the highest quality based upon current network conditions, but when voice transmission starts to experience undue delay or drops, it downgrades itself to a lower transmission rate dynamically. In other words, the codec  will auto-tune itself to actual network load conditions, making it very “elastic“.

There are a couple of codecs that take this approach today and are gaining rapid popularity. The first that comes to mind is RTAudio, a proprietary codec developed by Microsoft for over six years. It is incorporated into many Microsoft product lines, most notably Microsoft Office Communicator (OCS). This codec works so well, that Microsoft has claimed that it will work with perfect clarity in networks that incorporate or don’t incorporate Quality of Service (QoS).

As a skeptic, I decided to put this codec to the test in a lab environment where I saturated a hub (not a switch) with generated traffic and loaded all Ethernet interfaces to their maximum. The G711 codec sample was totally unrecognizable and G729a was extremely rough, but RTAudio was as clear as Sprint’s pin drop commercial. This made a believer out of me on how these new codecs can effectively maintain the highest quality under the most severe network conditions.

ex1
High Level overview of RT Audio Encoder by Microsoft

RTAudio uses two bands – narrow at 8 Khz and wide-band at 16 Khz – which results in packet sizes of 22 and 45 bytes respectively at a 20 millisecond frame rate. It is probably this reason that today, Microsoft does not incorporate any CAC (call admission control) mechanisms within their OCS deployments.

Another company developing the same type of approach is Speex. They are a open source community developing the next generation of codec and their approach is very close to Microsoft’s. But unlike Microsoft, their codec operates in the narrow-band, wide-band, and ultra-band frequency giving potentially greater quality of voice samples. Additionally, like Microsoft, they have provisioned the following elements within their codec:

  • Narrow-band (8 kHz), wideband (16 kHz), and ultra-wideband (32 kHz) compression in the same bitstream
  • Intensity stereo encoding
  • Packet loss concealment
  • Variable bitrate operation (VBR)
  • Voice Activity Detection (VAD)
  • Discontinuous Transmission (DTX)
  • Fixed-point port
  • Acoustic echo canceller
  • Noise suppression
ex2
Comparison of Speex and the leading codecs used in VOIP environments

Both Microsoft’s RT Audio and Speex use codec architecture based upon CELP (code excited liner predictors) which in layman’s terms is a code book of human speech which compares to the actual speech spoken, looks up the code book value for it, and transmits it to the receiver. Additionally, it compares the actual voice sample to the code book version and sends a difference value as well. This is a very crude explanation for a very complicated and complex process.

These are exciting times to see not only the Voice over IP application sets built to streamline corporate communications, but also see how the actual transmission of real time traffic being voice itself is being improved upon.

References:
Overview of the Microsoft RTAudio Speech Codec (.doc file)

Author: Joe Parlas

In this article

Join the Conversation