First Principles logo

Comparing AAC, MP3 and TwinVQ Lossy Compression of Audio

Robin Whittle .   Melbourne Australia  rw@firstpr.com.au    Last major update 13 September 1999.  (But important links added at the front and some other minor changes to links to other sites since then - 22 March 2001.  )

Investigating the quality of lossy algorithms: Advanced Audio Coding (AAC), MPEG Audio Layer 3 (MP3) and Yamaha's SoundVQ, an implementation of TwinVQ.
 

Back to the Audio compression page, which leads to some tests on lossless algorithms (totally updated in December 2000.).
Back to the First Principles main page - for material on telecommunications, Internet music marketing, stick insects . . .

 
 
14 - 19 December 2000  Please note:

There is a highly significant listening test report from the EBU in June 2000 on a variety of algorithms, including AAC and MP3. http://www.ebu.ch/trev_dolby_frm.html  Proper listening tests are very difficult and expensive to conduct.  I recommend you read this report in its entirety before bothering too much with what I wrote below, in December 1998.

The report and a separate file with the results in greater graphic detail are both .PDF files.  Current Acrobat plugins are a menace in terms of not caching the file when re-viewing it or printing it, and are often too dumb to save to disk with the original file name.  Here are the URLs of the main report and two sub-reports which contain graphs in a larger format.  If you shift click on them, you should be able to save them to disc and read them at your leisure.  They are about 1.3 Megs in total

The EBU report tests the following codecs:
  • Microsoft Windows Media 4.
  • AAC - implementation by FhG-IIS.
  • MP3 - or close to it, by Opticom.
  • Q-Design Music Codec 2 - prototype version of that for Quicktime.
  • Real Networks 5.0.
  • Real Networks G2. Newer, widely used system based on "DolbyNet".
  • Yamaha Sound VQ.
These were tested at:
  • 16 kbps mono.  Q-Design gets special mention for music, but not for speech.
  • 20 kbps stereo. Lower subjective results than 16 kbps mono. Ditto the Q-Design special mention. 
  • 32 kbps stereo.  AAC leads.
  • 48 kbps stereo.  AAC leads with MP3 close behind.  Windows Media gets special mention for a folk music test for being indistinguishable from the reference.  Q-Design is not much better than at 20 kbps.
  • 64 kbps stereo.  AAC wins by a country mile averaging 80 points.  At this data rate, AAC was the only codec which evaluated in the "excellent" range for all items tested.
This report also discusses the codecs specifically.  The Microsoft and Q-Design codecs show highly variable results on different test material at 48 and 16 kbps respectively.

While the report does not give the complete breakdown of results, by codec, by test item, my interpretation of this is:

  1. Forget TwinVQ.
  2. The Windows and Q-Design codecs were very fussy about what material they encoded.  With some items they were better and others much worse.  Q-Design shows no significant improvement as the data rate increases.
  3. Real Audio G2 is solid at all rates, except 20kbps stereo where Real Audio 5 is better. G2 rates a fraction better than AAC at 16 kbps mono.
  4. MP3 tails slightly behind AAC as the data rate increases, except for at 64kbps where AAC is very significantly better than both MP3 and Real Audio 2, which have about the same score.
Its horses for courses!

Unfortunately, while AAC is widely regarded as being better then MP3 (as good at 96kbps as MP3 at 128 kbps) MP3 is good enough and is so established that the more tightly licensed AAC is unlikely to displace it for a while.  Think Beta vs. crappy, widely marketed VHS, except VHS coming first - and as before, the average user not being fussy enough to care.  Fortunately, with decoders in software on PCs, we aren't stuck with the fixed hardware and media investments which makes only one kind of video cassette system viable, even if it is not the best.  Portable MP3 players, including CD players, imbed decoders which cannot be updated as can PC software.

I think Real Audio G2 is here to stay for a few years for streaming applications, and for archived files.  Its ability for a single file on disc to generate multiple streams, including via HTTP, for different players, is very snappy.

AAC licensing is apparently tied up with attempts to keep music "secure" - which I think is a waste of time.


Here are some other important new URLs:

http://www.commvergemag.com/commverge/extras/P178673.htm Extensive analysis and links regarding lossy (MP3 and WMA at least) compression and some lossless codecs.  Be sure to check this site!  When I looked at it, the page was corrupt and would only display properly on MS Internet Explorer.  There are many interesting things here, including a link to his listening tests of a watermarking system (Hiss!!) which was clearly audible and is apparently to be used on DVD audio discs.  Watermarks are a waste of time, for too many reasons to explain here, but see what I wrote in 1997 about them: http://www.cni.org/Hforums/cni-copyright/1997-02/1005.html .

http://CodecReview.com/ Dave Weekly's specialist site with many links, some tests of lossless codecs and plans for a much more extensive and interactive codec comparison. 


http://privatewww.essex.ac.uk/~djmrob/mp3decoders/ David J M Robinson tests 24 MP3 decoders with a variety of encoders, including VBR (variable bit rate) and finds that only five pass all his tests. Salute!


There is a freeware AAC encoder project: http://www.audiocoding.com The source code is available at:  http://sourceforge.net/projects/faac/ There is a bit of patent cat-and-mouse going on here!


A Dolby AAC site is: http://www.aac-audio.com . The announce that Music-Match Jukebox will support AAC.  I had a suspicion that AAC or some related Dolby approach is used in Real Audio, which I think achieves remarkable results in stereo at only 20 kbps.  The music lacks top-end detail, and speech sounds a little odd, but the music is still well worth listening to, for instance, from the archives or real-time source at fab community music station WMNF in Florida.  However, the EBU report mentioned above distinguishes between AAC and Real Audio.  CodecReview.com states that Real Audio 3 to 5 is based on DolbyNet/AC-3 http://www.dolby.com/tech/ac3flex.html .  But what technology is behind Real Audio G2?


The MP3 Encoder's Mailing List is at:  http://geek.rcc.se/mp3encoder/ .
 

Scope

This page documents my own investigation of the audio quality provided by AAC (an early, unlicensed and non-optimised encoder / decoder) , MP3 and TwinVQ/SoundVQ.  These are not full-blooded double-blind listening tests.  They are for my own interest and concentrate on finding musical sounds which are most likely to cause audible differences in the decoded signal.  These test show the performance of particular encoders and decoders, and do not necessarily show the maximum possible performance of the algorithm.

This site also contains links to other sites regarding these three compression algorithms.

I am particularly interested in the applicability of these compression algorithms to music delivery - as part of my interest in music marketing, which is the subject of a separate page: musicmar .

Note that this is not an investigation of low bit-rate schemes suitable for streaming (real-time delivery) of music via 33.6 or 56 kbps modems. Although I tested some lower bit rates, I didn't really investigate them.  My question was: "What algorithm and bit rate can be relied upon to encode a very wide range of music so it is audibly indistinguishable from the original, including with demanding listeners and listening environments?"
 
 
 
6 July 2000  Please note:
  1. This work was done in late 1998 and I am not attempting to keep up with developments in this rapidly changing field.  I can't keep this as an up-to-date link farm for lossy compression either.
  2. See the following sites for more recent developments and links:

 
13 September 1999  Please note:
  1. This work was done in late 1998 and I am not attempting to keep up with developments in this rapidly changing field.  I can't keep this as an up-to-date link farm for lossy compression either.
  2. My aim was not to find the best MP3 encoder or decoder, but to find out roughly how good the various algorithms were, or could be.
  3. Most of the things I tested have now been superseded by later versions - for instance MusicMatch http://www.musicmatch.com/ is now (Sept 99) up to version 4.1 – totally different from the demo 2.50.005 version I used.
  4. I am currently using LAME http://www.sulaco.org/mp3/ on my Linux machines for MP3 encoding.

Summary

AAC is a most impressive compression algorithm.  According to carefully conducted listening tests, at 128 kbps, it seems to be superior to MP3 at 192 kbps.  This is reported by David Meares, Kaoru Watanabe and Eric Scheirer in their February 98 paper which is in a Word 6 file, zipped at: http://www.cselt.it/mpeg/public/w2006.zip .  I have quoted some of the results below, in the AAC section.

I found that the audio quality of the Yamaha SoundVQ encoder (2.54eb1) and decoder (2.51eb1) is noticeably inferior to MP3 or AAC at the available bit rates of 96 and 80 kbps for stereo.  Its performance on simple slowly swept-frequency sine-waves in the 3 to 6 kHz range is really bad.  Amongst TwinVQ users, these problems are generally well recognised and accepted - with the argument that TwinVQ's artefacts are not too unpleasant, that it's lower bit rate (80 or 96 kbps) is attractive and that it copes well with a wide variety of music, including tracks which work badly with MP3 joint stereo (for instance those from analogue master tapes which have significant L - R phase differences).

Test sound files, and some of the decoded files are provided in .WAV format. I have included some graphic frequency analysis images as well.

I don't believe that the term "CD quality" should be applied to any lossy algorithm.  That said, I believe that for the majority of music and listening conditions, MP3 when properly implemented at 128 kbps (though it seem that joint stereo will fail with some out of phase material) and AAC when properly implemented at 128 and probably 96 kbps will probably reproduce virtually all music in a way that the degradation is inaudible to virtually all listeners.

Personally, if I was buying music, I would want a delivery system that wasn't teetering on the edge of human perception.  My tests of lossless algorithms (See here.) suggest that for pop, rock and techno, music can only be compressed losslessly to about 55 to 75% of its normal size.

Until Internet bandwidth and costs improve, MP3 and soon AAC will play a vital role in the discovery and delivery of music for commercial and non-commercial purposes.
 
 

Caveats

I do not have a lot of experience with these algorithms.  This was an attempt to find whatever it took to trip MP3, AAC and TwinVQ up.  TwinVQ, trips up on the most fundamental component of sound - the sine wave - and so I cannot take it seriously.  Nor do I think claims that "music does not contain sine waves" are valid. (Think of the Theremin in the Dr Who theme.)  Accepting its limitations, it does cope remarkably well with a wide range of music.  Lots of people like TwinVQ, and a lively discussion about it can be found at the VQF.COM discussion forum: http://www.vqf.com/bbs/?board=VQF.comForum , particularly starting with my post.

This field is changing rapidly.  I may not be able to keep this page up-to-date.  Be sure to check with the sites mentioned below for the latest developments.

There are many MP3 encoders and decoders, and it is evident that depending on the combination of encoder/decoder, the data rate, the type of music, the choice of stereo or joint-stereo for encoding (if you can choose), and characteristics of the original material which can cause joint-stereo encoding to sound bad,  the audible results may vary considerably.  To test all the combinations would be a mammoth task.  Please let me know if you find anyone doing this even partially.

Updates

I can't keep up with all the developments in lossy audio compression, but I will attempt to update this page - primarily by linking to more up-to-date sites.

One set of updates is flagged in the text as: up990424 for 24 April 1999.  If you search for this, you will see what has changed.

Another set is flagged in the text as: up990606 for 6 June 1999.
 


Preamble

I believe that if the Analogue to Digital Conversion (ADC) and Digital to Analogue Conversion (DAC) are performed properly, then the 44.1 kHz sampling rate and linear 16 bit resolution system established by Sony in the early 1980s for the audio CD is entirely adequate for reproducing stereo signals which are to be heard by humans in any "ordinary" listening environment. (This includes the highest quality headphones and speakers with the most exquisite music.  It does not involve hiding a safe distance from the speakers when the cannons in the 1812 overture go off, and then running up to the speaker to hear quantitisation noise as the track fades out.)

Achieving the potential of 16 bit 44.1 kHz digital audio is a challenging task - it only became possible around 1990 as far as I am aware.  It can best be accomplished with oversampling ADCs followed by linear-phase digital decimation filters to bring the sampling rate down to 44.1 kHz, whilst rejecting frequencies outside the audio range without the need for high-Q analogue filters.  For instance see the Delta-Sigma ADCs of Crystal Semiconductor. (The mathematical and electronic principles of these delta-sigma ADCs are partially beyond me.)

With the existence of the CD, the DAT recorder and the CD-R, these extraordinary ADCs which Crystal and AKM pioneered have, as far as I can see, solved the problems of audio recording and storage.

So why do some people want 96 kHz sampling?  Maybe to keep their canine friends happy or to impress those, including themselves, who believe that 44.1 kHz is inadequate?  (There are some people who work professionally in audio who are very keen about 96 kHz sampling.  Check the Seneschal site for material on 96 kHz audio.)   I agree that 20 bit resolution is highly desirable for recording, mixing and editing, but I still think that a properly edited (with dither) recording in a form suitable for playback on headphones or loudspeakers can contain a perfectly adequate signal to noise+distortion ratio with a 16 bit signal resolution at 44.1 kHz.  (Dither extends the resolution in the most audible frequencies by several bits - to 18 or 19 bits or so.  The playback is probably best done with 4 or 8 times oversampling digital filters and 18 bit current switching DACs (the extra bits are output by the filters and should be used) so that only a very gentle analogue low-pass filter is required.

Lossless compression (compression is here used as a synonym for "data-reduction") algorithms for 16 bit 44.1 kHz stereo signals (1,411,200 bits per second) seem to reduce most music by only about 30% - so they are not very widely used.  It looks like a daunting task to do much better than this.

So why are people saying that MPEG Audio Layer 3 compression to 128,000 bits per second (128 kbps - a compression ratio of 11.025 to 1) is "CD Quality"?  Because, they want to believe it is true, or they can't tell the difference.  (But see later - when I found it hard to tell the difference too.)  "CD Quality" should rightfully mean any lossless form of conveying the full 44.1 kHz 16 bit stereo bitstream - but the term has been so widely misused now that I think it is best avoided.

MPEG Audio Layer 3 (hereafter referred to as "MP3") and perhaps AAC (MPEG Advanced Audio Coding) are shaping up as the preferred form of distributing and storing music via the Internet.  In general the bit rate of 128 kbps is used at present - so I am concerned that we are taking a serious step backwards in audio quality from the potentially pristine and transparent 16 bit 44.1 kHz system established by Toshi Doi and his colleagues at Sony in the late 1970s.

These two algorithms - and TwinVQ (Yamaha calls it SoundVQ) - all work by breaking the sound into short time segments, filtering those segments into separate frequency bands, encoding the signal in each frequency band, and then - using a mathematical model of human hearing, sending the most audible parts of the signal to the output stream.  With enough bits in the output stream, the result may be lossless - the decoded file is bit-for-bit identical with the original.  However at the data-rates of interest to Internet users, these compression algorithms are certainly lossy.  With a lot of music, on the crappy speakers that many people listen to music on, in the imperfect listening conditions (computer, car and other background noises), this loss in the compression system may not be audible at all.

So for general use, with lots of boisterous music, I think these algorithms are likely to be fine at 128 kbps - assuming the encoding (compression) is performed optimally, which may not always be done due to not all encoders (or decoders) being perfectly written and due to CPU-intensive nature of filtering, analysis and of the recursive approaches to figuring out the best way to pack the data into the output stream.

However this is not to say that the losses in the compression algorithms are insignificant or should be ignored.  Sound and human hearing involves very subtle processes - and having come all this way to the point where we can record and reproduce stereo audio without any significant degradation, I don't believe we should put up with lossy compression algorithms if we are purchasing music for keeps.

This page links to some sites of interest regarding compression, and then documents my attempt to find the weaknesses of MP3, AAC and VQ.

In the future I may have some links regarding "digital watermarking" or "fingerprinting".  For now, let me say that I think watermarking is doomed to failure for a number of technical and business reasons.
 

The three encoder-decoders I used

AAC: The AAC compression algorithm is documented at http://mp3tech.cjb.net and www.mp3.com has a list of AAC software.  From that list I found the site of the enigmatic Astrid/Quartex  (up990424 it was at http://www.geocities.com/ResearchTriangle/Facility/2141/ but see the AAC links section below on where to get it) - who has a Windows based AAC encoder and decoder. Thanks to astrid_quartex@hotmail.com for making this software available! The files I got were called aacdec01.zip and aacenc02.zip. These contain version 0.1 of the decoder and 0.2 of the encoder.  The encoder zip file contained an executable and an aacenc.txt file which were dated 12 October 1998.

Be sure to check at Astrid's site above, and at the AAC sites listed below for later versions - but here are the zip files in case you find them hard to get.  aacdec01.zip  aacenc02.zip

According to the Fraunhofer AAC FAQ, any software (such as Astrid/Quartex's) which is based on the MPEG source code will not be of the highest quality, and any AAC implementation must be licensed by the patent holders.  In case Astrid/Quartex's site disappears, you may wish to search AltaVista for "aacenc" or "aacdec", (or with "02" or "03" etc, after that name - such as "aacenc02" or refer to some of the sites in the AAC links section below.  There is another AAC encoder/decoder from Homeboy as well. See the AAC links section below for more sites for the Astrid/Quartex encoder/decoder.

MP3: The Munich based Fraunhofer Institut for Integrated Circuits IIS-A is in many respects the home of MP3 - they did a lot of the work on developing the standard: http://www.iis.fhg.de/amm/techinf/layer3/ They are not so popular in MP3 circles at present (November 1998) because of claims they are making regarding patents and pressure they have successfully exerted on a number of authors of freely available and/or shareware MP3 programs.  I used their Windows demo-edition encoder and decoder for these experiments.  The versions I used are: WinPlay 3 Version 2.3 beta 5 from http://www.iis.fhg.de/amm/download/mp3player/index.html and the command-line encoder program "mp3encdemo31.exe" which identifies itself as "MPEG Layer-3 Encoder V3.1 Demo (build Sep 23 1998)" and which comes in the file: mp3encdemo_3_1_win32.zip. The encoder is available for various Unices - including x86 Linux - and Windows at http://www.iis.fhg.de/amm/download/mp3enc/index.html .
 

VQ: Yamaha has a freely available VQ (more properly TwinVQ) encoder and decoder for Windows - which I used in these tests: http://www.yamaha-xg.com/english/xg/SoundVQ/index.html . The versions I used are:  encoder 2.54eb1 and decoder 2.51eb1.
 

 

Links and information about the algorithms

AAC

AAC will be part of the forthcoming MPEG-4 standard, so "AAC", "MPEG-4" and "MP4" may be used interchangeably at some sites.

There are three "profiles" for AAC in the MPEG-2 data stream.  "Main" is the fully fledged AAC. "LC" (Low Complexity) and "SSR" (Scalable Sample Rate) are lower quality options for restricted CPU power implementations.  I think that all AAC software mentioned here is not mucking around with the lower quality profiles.

MP3 and other algorithms

MPEG-4

TwinVQ

"TwinVQ" is the proper term.  But I use "VQ" at this site.  "SoundVQ" is Yamaha's term for this compression system, and files are normally stored with an extension of "VQF".

TwinVQ will also be a part of MPEG-4.

  • TwinVQ (Transform-domain Weighted Interleave Vector Quantitisation) was developed by NTT Human Interface Laboratories: http://www.hil.ntt.co.jp/top/index_e.html. The English version of the TwinVQ home page is:  http://music.jpn.net/ .
  •  
  • Yamaha's site is: http://www.yamaha-xg.com/english/xg/SoundVQ/index.html .
  •  
  • A big activist site for TwinVQ is VFQ.COMhttp://www.vqf.com . They have a discussion area, which I posted to regarding these tests.  My posting is: http://www.vqf.com/bbs/display.php3?board=VQF.comForum&DISP=2436 . Follow this link for alternative viewpoints to my negative assessment of TwinVQ!
  •  
  • Search for "twinvq" with AltaVista by clicking here!
  •  

    Other related algorithms

    To Page 2