I am not an expert in the inner workings of MP3, AAC or VQ - or on finding material which is tricky for them - so this is a first attempt at finding musical material which will make their limitations more obvious than more common kinds of music, such as rock, jazz or orchestral music.
I figure that these algorithms are not going to have much trouble with bass signals - and that the drama would occur with very high frequency signals, and in coding artefacts which are audible in mid-range sounds when other supposedly masking sounds occur. All test files are 44.1 kHz stereo.
Note:To download these files onto your hard disc, so you can listen to them, edit them etc, don't just click on the Link. With Netscape, hold down the Shift key and then click on the link to the .wav file.
Here are the three files and a description:
si.wav (2,042,500 bytes, AKA sitar.wav) 11.578 seconds copied directly from a CD of Indian classical music. Up-front sitar gives way to drone and mridangam (I think it is a mridangam - a horizontal two-headed drum). Lots of exquisite high-frequency detail which we know the sound of exactly - and with drums which might tempt the algorithms to throw away a few things that we can in fact hear.
bi.wav (2,813,636 bytes, AKA binaural.wav) 15.95 seconds excerpted from my 1996 Csound piece Spare Luxury. This was generated entirely with Csound, plus my unreleased binaural unit generator. The piece itself is not published or on sale anywhere yet. Several flanged major-seventh chords move slowly. On headphones, there is a profound sense of movement and of the sound being quite close as it passes. This should show up an problems with the algorithms upsetting timing or phase. Also, there is a lot of very clearly structured, complex, high-frequency detail - and this should show up any inadequacies in the exact waveform reproduction of the compression algorithms. The waveforms in the piece are straightforward, and the flanging is smooth, so we know exactly in our ears what the sound should be like. Any glitches should be easy to hear.
tr-left.wav (1,917,974 bytes, AKA tr808.wav) Mono 21.745 seconds. This was a special test signal - not music for listening pleasure. I recorded it to both channels of the line-in of a Sony TCD-D7 DAT recorder, which records at 48 kHz. The D7 uses a Crystal AKM Delta Sigma ADC. This was played into my Zefiro ( http://www.zefiro.com ) ZA-2 digital audio sound card - which has a Crystal DSP programmed to down-sample to 44.1 kHz. I have not done rigorous tests, but I believe the Zefiro does a good job of down-sampling. The resulting stereo file was used for all the tests, but the WAV files of this test signal and its decoded versions at this site are mono - left channel only - for faster downloading.
A modified TR-808 drum machine Cymbal sound is playing constantly - but is based on a bright chord from a modified Casio M-10 keyboard, so it is a shimmering metallic chord-based sound. Later I change the signal level of the Casio so the Cymbal sounds become extremely spiky - for instance the following screen shot shows individual samples from around 14.57 seconds:
That should give these Fast Fourier Transform based algorithms some curry!
There are also Rimshot sounds in the midst of the Cymbal sounds, and in the middle some intense Snare sounds which are really gritty, because I wound up the level of the TR-808's internal white noise generator.
I used the AAC encoder at bit rates of 128, 96 and 64 kbps. There are no other options to this encoder - the Astrid/Quartex version 0.2.
The VQ encoder's highest bit rates, and the only ones for 44.1 kHz, were 96 and 80 kbps (48 and 40 kbps per channel) - and I used these, with the quality setting of the encoder set to high. A useful feature of the encoder is that it can play both the just encoded file and the original - making comparisons very easy. There are many lower bit-rates and sample-rates which I did not investigate - these might be fruitful for some purposes.
For MP3, I used data rates of 256, 192, 128, 96 and 64 kbps - with the -qual command line switch set to 9, which is the maximum. The first three encode at 44.1 kHz. The 96 kbps mode resamples the input file for encoding and it is reproduced by the decoder with a sample rate of 32 kHz. Similarly the 64 kbps does this and reproduces the audio at 22.05 kHz.
The documentation for the encoder explains three kinds of stereo (in addition to "dual channel", which is not used):
But the doco which comes with the encoder, in node11.html, table 2.1 gives the following ambiguous table for which stereo modes will be used
- Stereo Two independently encoded channels, but bits are allocated to them flexibly, so one channel can be more than half the available bit rate if it contains more detail than the other at a particular point in time.
- MS stereo (This, I understand is also referred to as Joint Stereo.) Sum [L + R] and difference [L-R], rather than two separate channels. This makes sense when most of the sound in each channel is the same as the other - then the difference signal contains less energy than either the left or right channels
- MS/IS or Intensity stereo. High frequencies mixed to mono and then (in effect) panned. The Fraunhofer doco notes that this is not suitable for high quality applications.
Table 2.1: different stereo modes Bitrates - stereo mode 8000 - 18000 mono only 18000 - 96000 MS/IS (intensity) stereo 96000 - 192000 MS stereo 192000 - 256000 stereo
It seems that only the 64 kbps rate was doing Intensity Stereo, since the player displayed "II" for those. On this basis, I would rewrite the table:
Bitrates - stereo mode 8000 - 17999 mono only 18000 - 95999 MS/IS (intensity) stereo 96000 - 191999 MS stereo 192000 - 256000 stereo
If my understanding is correct, then this could be summarised as:
Stereo mode 265 44.1 Stereo 192 44.1 Stereo 128 44.1 MS (Joint) Stereo 96 32 MS (Joint) Stereo 64 22.05 Intensity Stereo
(3 Dec 1998: Note I am not convinced that the Fraunhofer encoder is producing stereo. Perhaps it only does "MS (Joint) stereo". What's the best way to find out from the .MP3 file?)
Decoding the AAC files to .WAV files was straightforward - that is all the decoder program does. The VQ and Fraunhofer WinPlay 3 MP3 decoders were players - they drive the sound card, but will not generate .WAV files. (Later I tried WinAmp, which can apparently write to a .WAV file - but I couldn't find the "Nullsoft disk writer" plugin. Apparently a player called Nad can write to a .WAV file too.)
My machine uses a Zefiro ( http://www.zefiro.com ) ZA-2 sound card with digital in and out (optical, coaxial - SPDIF - and balanced - AES/EBU). It also has a good Crystal DAC on board. To turn the decoded VQ and MP3's into .WAV files I recorded the output on DAT and played it back into the computer. This would involve no degradation in sound quality whatsoever. There is presumably some Windows program which can take whatever is going to the sound card and write it to a .WAV file - but what is it? (I read that Willow Media could do it - but it doesn't seem to for me.)
However, this method of recording to DAT did not work for the MP3 files at 96 and 64 kbps, because their sampling rates were 32 and 22.05 kHz respectively. I didn't pursue those rates in detailed listening tests. I could have used Music Match and Freeamp to convert those to .WAV files - but with some frequency-sweep files they crashed and I lost interest.
All listening tests were done by taking the digital audio from the ZA-2's AES/EBU output and sending it to a rather obscure and now obsolete rack-mount DAC from Yamaha - a DA202. This has an eight-times oversampling digital filter driving a Burr-Brown PCM-56 16 bit DAC for each channel. Except for very low-level signals, this is a superb Digital to Analogue Converter. The DA202 drives a Yamaha domestic amplifier receiver, and I listen with Sony MDR-65 headphones. These have the diaphragm very close to the ear. The bass response isn't great, but for mid and high frequencies, I think they are very good.
I did not test for measurable distortion, or test that the algorithms behaved properly with absolute maximum level signals.
While CPU time this is not my major interest, I thought I might as well measure the encode and decode times. I certainly need to measure the file-sizes of the compressed output, since they do not always closely match the claimed bit rate.
For these tests I used my K6 (early chip, running at 180 MHz) 64 Mbyte Windows 98 machine with no other processes. I encoded a stereo .WAV file of white noise, separate noise for each channel, approximately 1,000,000 samples, or 22.675 seconds. The file is 4 Megabytes (actually 3,999,912 bytes, of which 3,999,892 are audio data).
The following table gives the nominal bit rates and compression ratios for the various algorithms, the encoding times, the decoding times for AAC (the VQ and MP3 software I used was real-time on my machine), the final encoded (compressed) file size, the actual bit-rate this represents, and the actual compression ratio. Compression ratio is relevant to the raw 44.1 kHz 16 bit stereo rates of 176,400 bytes per second or 1,411,200 bits per second.
Note that the Fraunhofer MP3 encoder was working with the highest quality setting, and that the AAC encoder from Astrid/Quartex is an early version which is not at all optimised for speed. Faster MP3 encoders no-doubt exist.
AAC 128 11.05 : 1 365,277 128.87 10.95 205 25 AAC 96 14.7 : 1 274,554 96.86 14.57 103 25 AAC 64 22.05 : 1 183,836 64.86 21.76 71 25 MP3 265 5.5125 : 1 728,084 256.9 5.49 376 MP3 192 7.43 : 1 546,063 192.66 7.32 435 MP3 128 11.05 : 1 364,042 128.44 10.99 390 MP3 96 14.7 : 1 273,024 96.32 14.65 262 MP3 64 22.05 : 1 182,439 64.37 21.92 150 VQ 96 14.7 : 1 272,615 96.18 14.67 173 VQ 80 17.64 : 1 227,199 80.16 17.60 157
These compression algorithms have a certain frequency response. I tested this by encoding and decoding three test signals:
- A white-noise file (separate noise for each channel) containing an even distribution of energy from 0 to 22 kHz with the peaks of the waveforms at about +/- 20,000 within the 16 bit +/- 32,768 range. On a frequency analysis, this shows up as a reasonably flat level of signal varying randomly within a range of -36 to -48 db.
- A sine-wave sweep (identical in both channels) from 0 to 22 kHz over 22 seconds.
- A low frequency sine-wave sweep (identical in both channels) from 0 to 100 Hz over 10 seconds.
By encoding and decoding these, it is easy to see what the frequency limits of the algorithms are.
Low frequency limitsAll three algorithms at all sample rates I tested had excellent low frequency response - flat to below 5 Hz. This is not to say that they necessarily reproduced sine-waves in the 0 to 100 Hz range without any fluctuations. Although the fluctuations in the VQ reproduction were not audible to me, they certainly were visible looking at the waveform in an audio editing program.
High frequency limits - and some gain fluctuations and nasty tones!The high frequency limits varied markedly - as expected with the different bit-rates and algorithms. The frequency quoted below is where the gain started to roll off. Generally the gain was 40 or more dB down just a small frequency above this.
Algorithm Bit Rate
Notes AAC 128 44.1 17.45 kHz -3dB at 17.45 kHz
->96 dB at 18.06 kHz
AAC 96 44.1 17.45 kHz As above. AAC 64 44.1 11 kHz -3dB ~ 11 kHz
->96 dB at 11.6 kHz
MP3 265 44.1 20.05 kHz MP3 192 44.1 19 kHz MP3 128 44.1 14.25 kHz "CD quality" indeed! Note that other encoders may reproduce higher frequencies. MP3 96 32 12.9 kHz MP3 64 22.05 ~ 10.5 kHz VQ 96 44.1 ~20 kHz Terrible artefacts above 2.5 kHz!!!! VQ 80 44.1 Ditto. Ditto.
AAC ripples or modulation
The ripple in the passband varied considerably. These figures are guesstimates based on just looking at the waveform in an audio editor. For AAC at 128 and 96 kbps, I estimate +/- 0.5 dB. For AAC at 64 kbps, I estimate +/- 1 dB.
I did a special test to investigate this "ripple": I created a 10 second file with a sine wave sweeping between 1000 and 1001 Hz. There's no fluctuation in amplitude in the original file, but after 128 kbps AAC encoding and decoding, I see a random-looking gain fluctuation of +/- one or two percent. This gain fluctuation seems to be in the 0 to 20 Hz range, and so constitutes a slight AM modulation which will produce diffuse, low level (-40db for a +/- 1% modulation) sidebands +/- 20 Hz from the component frequencies of the input signal. This is not ideal at all, but is unlikely to be very audible. I can in fact hear it with this 1 kHz tone. I don't see or hear any difference at 96 or 64 kbps.
Here is a short test file, mono, with two seconds of the original tone (1000 to 1000.2 kHz) with a 0.2 second gap, followed by the result of the AAC 128 kbps encode-decode. It is mono, 4.4 seconds, 387 kbytes: aac128-1khz-compare.wav Listen for a slightly rougher quality in the second section.
Note that this ripple or modulation, is not necessarily a problem with the AAC algorithm, but could be caused by the particular implementation in this encoder or decoder. There is absolutely no such visible or audible modulation in the MP3 128 kbps encode-decode process on this 1 kHz sine wave.
The MP3 files at 256, 192, 128 and 96 kbps had no visible ripple, except between 19.4 and 20 kHz for the 256 kbps. I couldn't decode or record the 22.05 kHz 64kbps MP3 file to a .WAV file with the software I had handy, so I can't say what the ripple was.
VQ - serious sidebands and other nasty artefacts
The VQ handling of the two sweep files - 0 to 100 Hz and 0 to 22 kHz was atrocious. In the 0 to 100 Hz test, the frequencies below about 40 Hz had rough audible distortion products, and above that, there were "birdie" sounds in the midrange. In the 0 to 22 kHz sweep, a number of bad things happened. Firstly "birdie" sideband sounds start to appear between 1 and 3 kHz. Then the sideband tones become predominant and complex, making a real mess of all the other frequencies. Beyond about 10 kHz it seems to be mainly sideband tones and aliasing, and right up to 22 seconds, including all the time between 17 seconds and 22 seconds, when we shouldn't hear anything, there are all sorts of noises being produced.
Here is a mono version of the file. It is 22 seconds, 1.941 megabytes. sw-vq-96-mono.wav .
Here is the worst part of it, when the input sine wave is between 3 and 6 kHz. 3 seconds, 263 kbytes. sw-vq-96-mono-3-to-6khz.wav .
To visualise this, consider the following images. The first is a Cool Edit ( http://www.syntrillium.com ) spectral view of a mono version of the AAC 128 kbps version of the 0 to 22 kHz sine-wave sweep file. Frequency is vertical, and time is horizontal - so we expect a line going up to the right, since each second to the right means a 1 kHZ rise in frequency. That's exactly what we get until we reach the limit of the AAC algorithm at about 17.6 kHz. (In the original file, the line continues right up to 22 kHz.)
The above image shows that all is well - there are no significant sidebands being generated.
Now take a look at the same image for the TwinVQ encoded and decoded file, at the 96 kbps rate (48 kbps per channel). It looks like it is on fire - all that extra colour (or at least that below 18 kHz which is the accepted limit of human hearing) is stuff we shouldn't be hearing!
At first I suspected there was something wrong with this particular Yamaha implementation of TwinVQ, but now I know this is an accepted aspect of TwinVQ - that it is no good with sine waves!
I don't think it sounds good with music, and it certainly sounds horrible with simple things like a slowly swept sine wave. Perhaps the TwinVQ algorithm has unique strengths at very low bit rates - but at the maximum of 96 kbps for stereo, I don't believe it is worth using at all. See http://www.vqf.com for a more positive perspective on TwinVQ.
I did not evaluate the phase response of these algorithms. Phase response caused by high Q analogue low-pass anti-aliasing filters in the Sony PCM-1610 CD mastering machine was, in my opinion, responsible for a great deal of the anti-CD sentiment amongst audiophiles in the 1980s and early 1990s - because most CDs were mastered on this until the PCM-1630 became more common in the late 80s.
TwinVQFirst, lets deal with TwinVQ. Some people believe this is a superior system to MP3. The results I obtained are contrary to this. However, according to discussions in the forum at http://www.vqf.com TwinVQ does have a distinctive "sound" (ie recognisable limitations and artefacts) but these are not regarded as serious, and the low data rates and the consistent performance of the software is preferred over MP3-128, which is acknowledged as sounding potentially better, but which can sound really bad on some tracks (probably those with phase problems if MP3 is using joint stereo).
TwinVQ seems to add all sorts of extraneous sounds (artefacts of or the compression / decompression processes) at a low, but potentially perceivable level. See the section above for a discussion of how it handles a simple swept sine wave - with the sw-vq-96-mono-3-to-6khz.wav file to show you what it sounds like. I can't hear the artefacts on the si.wav sitar piece - because the music itself masks them.
On my binaural piece, its not so much extra noises I notice, but a horrid warbling. Hear it for yourself: vq96-bi-compare.wav 1.443 Megabytes, 8.2 seconds stereo 44.1 kHz - four seconds of original and then 4 seconds of the TwinVQ 96 kbps result.
With tr.wav - the TR-808 drum machine test file - TwinVQ at 96 kbps results in significantly less detail in the high sibilant sounds, and most noticeably a spreading of the energy of the rim-shot sounds. Listen to a mono file, 44.1 kHz, 11.7 seconds, 1.035 Megabytes: vq96bicompare.wav . The rim-shot is preceded by clearly audible noise, which is visible in the diagram you can view here: vq96-tr-compare.gif . The original waveform is in green, above the TwinVQ result in mauve.
I don't like TwinVQ.
MP3There are a wide range of bit-rates I used - and the most significant is 128 kbps. Various reports indicate that AAC can achieve the same audio quality as MP3 in about 70% of the bits.
The tr.wav and the 0 to 22 kHz frequency sweep test files were the same signal in left and right channels, so (as far as I can understand it) with Intensity Stereo (64 kbps) or MS stereo (96 or 128 kbps), the encoder would concentrates almost all its bits on the one signal. So at these rates, the TR-808 signals are not realistic tests of a musical situation. I can't hear any audible problems with the tr.wav test at MP3-128 - but since it is a mono signal, it is roughly equivalent to having 256 kb for stereo. Perhaps I should have put the TR-808 sounds in one channel and some continual string sounds or pink noise in the other. Also, I don't have an easy way of saving the 64 and 96 kbps decoded signals as .WAV files, making comparisons difficult. The 64 kbps and 96 kbps MP3 files use a lower sample-rate anyway - so we don't expect miracles. They sound OK, considering the lower sample rate, but lets move on to the two music files and the data rates of 128, 192 and 256 kbps. I will call the decoded results of these MP3-128, MP3-192 and MP3-256 respectively.
I will start with the binaural piece. At 128 kbps, the most obvious difference is that the frequencies are limited to 14.25 kHz. Its not CD-quality, but assuming that human hearing goes to 17.5 kHz, this is a loss of the top 18.5% of the audible range. This represents about 3 semitones - a quarter of an octave - in a hearing range which from 30 Hz to 17.5 kHz (a frequency ratio of 583) is 110 semitones. So on a logarithmic scale, this frequency limit represents the loss of about 3% of the audible range. Its a part of the sound to be sure - but it is never where musically significant things occur.
To compare the MP3-128 with the original, I first put the original through a sharp high-pass filter at 14.25 kHz so I could listen to what I knew was missing from the MP3-128. With both the binaural and the sitar pieces, I had to turn up the volume beyond normal listening levels to hear it at all - and it was not particularly strong or significant.
So in these test files, which I purposely chose for their bright, high-end detail - for me, the signal lost by MP3-128s low pass filter at 14.25 kHz was not audible anyway.
However other MP3-128 artefacts are audible. Before looking at that, lets consider some other MP3 encoders and players . . .
In an effort to find whether these artefacts were a product of the particular encoder-decoder (Fraunhofer's) I tried some other encoders and decoders: Xing's demo Audio Catalyst 1.0 : http://www.xingtech.com/products/audiocatalyst/ and Music Match demo 2.50.005 http://www.musicmatch.com . (I had to write the files to CD-R before I could encode them.) I played these back with Fraunhofer's WinPlay 3 2.3 beta5, with FreeAmp 1.0.0 http://www.freeamp.org and with Music Match. In all cases I recorded the output to DAT and then played it back to the computer and saved it as a .WAV file. The results were confusing - and I don't have time to sort out all the combinations of these programs. This was all at 128 kbps. I found very objectionable "warbling" at certain points of my bi.wav test file and some "glitches".Back to the artefacts of the Fraunhofer encoder and decoder: with the bi.wav test file, encoded at 128 kbps. The only problem I can hear is a slight fluctuation in volume around 2.2 to 3.0 seconds into the test - as the sound moves (not pans - this is binaural!) from right to left. Here is the first five seconds of that - bi-mp3-128-first-5-secs.wav (885 kbytes, 5 seconds 44.1 kHz stereo). If you change the sample rate to 22.025 kHz, then you can hear the volume fluctuations more clearly. They are only slightly audible to me at 44.1 kHz. Click here to see a frequency-time plot of the first seven seconds of the original followed by the first seven seconds of the MP3-128 version. Some of the volume fluctuations are visible as vertical lines showing loss of high frequencies.
Frequency analysis of the playback of the Audio Catalyst encoded MP3, played back with Music Match, FreeAmp and WinPlay 3 shows visibly different outputs. There was highly objectionable warbling and sudden fraction-of-a-second shifts in volume between left and right channels. Click here to see a frequency (vertical) vs. time (about 3 seconds horizontal) analysis made with CoolEdit ( http://www.syntrillium.com ). The left channel is on top - this is part of the bi.wav test file, encoded with Audio Catalyst and decoded with Music Match. The highest visible frequencies are about 16 kHz. Clearly visible are two brief glitches in which most of the high frequencies of the right disappear for about 15 msec and appear more strongly on the left. This must be a decoder problem, since those glitches appear repeatably with the Music Match decoder, but not with FreeAmp or Fraunhofer's WinPlay. Those two decoders do however cause other audible problems which are not so visible on the frequency analysis and are not audible on the Music Match decoding. Click here to see a freqency-time display of the original bi.wav on the left, with the middle and right sections being the Music Match and WinPlay3 playbacks of the AudioCatalyst encoded 128 kbps file. Audio Catalyst reported it was doing in "Joint Stereo", which shows up as an "I" on WinPlay and so which I assume is the same as "MS Studio" according to the Fraunhofer documentation.
It seems that some of this software has a long way to go!
One thing is for sure - I don't have time to muck around looking at every combination of MP3 encoder, data-rate, player and test signal! However, see below where I added (3 Dec 1998) material on the decoding of WinAmp, which in one instance at least was more accurate than WinPlay.
I will concentrate on the Fraunhofer encoder and decoder - because it seems to me that this should be the most highly evolved and stable pair of programs to use.
I have no idea what the explanation is for these glitches in high frequency response - nor do I have time to find out.
Such glitches would not be clearly audible with most popular music which has rapidly changing volume anyway - but it is still a degradation, like the dropouts of an audio cassette tape.
(4 December 1998. I tried playing the same file with WinAmp - and the glitches which were audible and visible on the above-mentioned frequency analysis were not there. So the problem is in the Fraunhofer WinPlay3 decoder! There were still some slight glitches earlier, which are barely audible and barely visible.)
At 192 and 256 kbps, I could not hear any degradation in the bi.wav test signals. Click hereto see a frequency analysis of the original and MP3-192 version, between about 7.6 and 10.6 seconds into the test. The highest frequencies are about 19 kHz, and the "blockiness" of the allocation of bits to the high frequencies only affects those signals above 15 or 16 kHz. I have previously determined that in this test piece, I couldn't hear those signals anyway at normal listening levels.
So for my binaural test signal MP3-128 seems good apart from some glitches which are probably the result of faulty encoder and/or decoder design. MP3-192 and MP3-256 sound the same to me as the original. I played the MP3-192 back at half the normal sample-rate, and sure enough I could hear a very high tone at the start of the test which was continuous in the original an changing volume in the MP3-192 version. Frequency analysis showed that this tone would have been about 18.5 kHz in the original - and that the broken nature of it in the MP3-192 version resulted from the encoder not allocating bits to it on an intermittent basis. The same thing happened to a lesser degree with MPG-256.
Now lets look at the TR-808 drum test signal at 128 kbps. I couldn't hear any audible degradation - even after halving the sample rate (other than the 14.25 kHz frequency limitation). So I won't bother with the 192 and 256 kbps results.
With MP3-128, I could not hear any audible degradation in the sitar test sample - si.wav. So it seems that other than finding nasty glitches in encoders and decoders other than those of Fraunhofer's, and other than Fraunhofer's encoder having a 14.5 kHz frequency limit, my quest to find a test signal which showed the weaknesses of MP3 at 128 kbps had ended almost completely in failure! The exception was a slight fluctuation in volume around 2.2 to 3.0 seconds into the bi.wav test.
So I set about inventing a 10 second test sample that would expose MP3's weaknesses . . test3.wav (1.764 Megabytes.) In the right, a set of continuous sawtooths at 441, 882, 1323, 1764 and 2205 Hz, plus a sine wave rising from 441 to 2205 Hz. On the left, a similar set of sine-waves, but instead of being the five lowest harmonics of 441 Hz, they are of 1385.4423 Hz (441 * pi), which should make it out-of-tune with the signals in the left channel.
Hmm - some artefacts are audible with 64 kbps, but I can't hear any problems at 96 or 128 kbps.
. . . I have very reluctant to conclude that MP3 at 128 kbps could sound indistinguishable from the original.
- My experiences with MP3 a few years ago.
- My deep suspicion of anything which detracts from the purity of 16 bit 44.1 kHz stereo - which I believe was only truly achieved with the early 1990's development of excellent delta-sigma ADCs, and which is an extraordinary and immensely valuable achievement.
- My understanding, based on reading and my own attempts at algorithms, that lossless compression can only reduce the size of most music to about 60 to 80%.
- Understanding of how fine human hearing can be - able to hear audio signals which are less than 1 step in the +/- 32,768 range of 16 bit audio (hence the need for dither, if there isn't enough noise in the signal already).
- Consequently, my deep suspicion about reducing data to 10% of its normal size.
- Reports from people who say that MP3 at 128 kbps isn't quite right.
- Marketing bumpf about MP3 at 128 kbps being "CD quality".
- A deep distrust of breaking things up with Fast Fourier Transforms.
- A deep distrust of any system which has to throw away information, knowing that some of it in the most complex pieces may be audible . . . .
However, subject to the 14.5 kHz limitation of the Fraunhofer encoder I have been using - and one slight fluctuation at the start of the bi.wav test, I have to admit that it does a damn-good job and I am yet to find a piece of music which it degrades in a way I can hear very clearly.
(6 December However . . . . I understand from correspondents and discussion at http://www.vqf.com that most MP3-128 encoding is done with joint stereo, and that this causes all sorts of objectionable problems if the audio file has differing timings between the channels. This could happen with an analogue stereo master tape. In particular it would happen with audio cassettes - especially one that has been living in a motor car for some time!)
. . . .
. . . .
One more thing. Mr Dennis Bovell, AKA Blackbeard, AKA Dennis Matumbi was and hopefully still is a fabulous Dub artist. One of my most treasured records is "Strictly Dub Wize" (United Artists / Ballistic Records, pressed in London 1978 LBR 1013). It has a Sounds interview, bu Vivien Goldman, with Dennis Bovell on the front cover . . .
Me start mix now, the tape start run. I'm keeping in tune with the music while I'm mixing. I rock and t'ing, and one, two time I fell off my chair 'cos the rhythm touch my head just after the drum and bass start to rock. The snare is under heavy manners, murder reverb! It start sound like gunshot, it have fire you understand?So I put on platter on the mat (or is it "the matter on the platter"?), piped it into the delta-sigma Crystal/AKM ADC in my Sony D7 and from there optically into the Zefiro and the K6 computer. Auspicious signal and signal path!
This music has never been near an anti-alias filter! Its buxom and tweeter frying at the same time. Though most of the energy is below 12 kHz, it does have energy right up to the cut-off of the Zefiro's 48 to 44.1 kHz down-sampling filter. I took a section of the first track Cut after cut and passed it to Dr Fraunhofer to cook at 128 kbps. The music came back, limited to below 14.25 kHz as expected, and after careful listening, on headphones and speakers, including at half the sample rate I have to admit that I cannot hear the difference between the original and that which Dr Fraunhofer squeezed down to an eleventh of its former size.
I am impressed. However it seems that some popular MP3 software is as crappy as the speakers and headphones that most people listen to music through. (And it seems that joint-stereo MP3 is a no-no for some tracks.)
My provisional conclusions on MPEG Audio Layer 3 are:
- When done properly, MP3-128 can do a very good job of reproducing all the musical signals I can think of (not counting joint stereo encoding of tracks with phase problems between the channels) - although the Fraunhofer encoder does limit frequencies to below 14.25 kHz.
- Some encoders, decoders or combinations have serious problems with dropouts in volume or sudden shifts of volume from one channel to the other - at 128 kbps. These must represent problems in the software rather than what is possible with MPEG Audio Layer 3 at 128 kbps.
- Based on this, I assume that MPEG Audio Layer 3 at 192 kbps represents a substantial further margin by which audible artefacts can be eliminated.
AACBefore I do my listening tests, I will quote from the report on AAC vs. MP3 listening tests by David Meares, Kaoru Watanabe and Eric Scheirer (see the source of it in the AAC links section at the start of this page):
10.5. Differences between programme itemsFirst, “how does the performance of codecs differ by programme item?” We will consider each of the AAC codecs in turn, comparing the confidence interval of the diffscores for that codec for each item to the MP2 and MP3 results. If the confidence intervals do not overlap, we judge one coder to be better for that item.AAC Main 128:
Better than MP2 [192 kbps] for 3 items, worse for no items, equivalent for 7 items.
Better than MP3 [128 kbps] for 3 items, worse for no items, equivalent for 7 items.(End quote.)AAC Main 96:
Better than MP2 [192 kbps] for 1 item, worse for 1 item, equivalent for 8 items.
Better than MP3 [128 kbps] for 1 item, worse for no items, equivalent for 9 items.
AAC LC 128:
Better than MP2 [192 kbps] for 3 items, worse for no items, equivalent for 7 items.
Better than MP3 [128 kbps] for 3 items, worse for no items, equivalent for 7 items.
AAC LC 96:
Better than MP2 [192 kbps] for no items, worse for no items, equivalent for 10 items.
Better than MP3 [128 kbps] for 1 item, worse for no items, equivalent for 9 items.
AAC SSR 128:
Better than MP2 [192 kbps] for 1 item, worse for no items, equivalent for 9 items.
Better than MP3 [128 kbps] for 2 items, worse for no items, equivalent for 9 items.
Thus, we see that only the Main 96 codec is outperformed by any MP2 or MP3 codec for any of these examples. For many programme items, an AAC coder gives statistically superior results. Note that for items Tracy Chapman, Ornette Coleman and Dire Straits there were no significant differences between codecs – all codecs performed the same on these examples.
10.6. Comparison with MPEG-1 codecs“Is the performance of AAC codecs at the tested bitrate equal to or better than the performance of MPEG-1 Layer II and Layer III?” The accumulated results by codec are shown in Figure 5 (note the foreshortened vertical scale).
Figure 5. Overall results (averaged across programme items and position) for each coder.
We see from this figure that overall, AAC Main 128, AAC LC 128, and AAC SSR 128 give significantly better performance than do MP2 192 or MP3 128. In addition, AAC Main 96 gives better results than MP3 128.
There is no statistically significant improvement between AAC LC 96 and the MPEG-1 codecs.
Within the AAC codec group, AAC Main 128, AAC LC 128, and AAC SSR 128 are all superior to AAC LC 96. In addition, AAC Main 128 and AAC LC 128 are superior to AAC Main 96.
10.7. Statistical indistinguishability
“Is the performance of the coding of AAC codecs at the test bitrate distinguishable from the original signal?” In general, from Figure 5, we see that the performance of the AAC codecs is statistically distinguishable from the original signal. However, for certain items, the codecs give indistinguishable performance. The AAC Main 128 codec is indistinguishable from the original for 8 of 10 items, the AAC Main 96 for 3 items, the AAC LC 128 for 8 of the 10 items, AAC LC 96 for 4 items, and AAC SSR 128 for 8 items. For comparison, MPEG-1 Layer II was statistically indistinguishable for 4 items, and MPEG-1 Layer III for 3 items.
This is impressive. The Astrid/Quartex encoder/decoder is presumably the "main profile" AAC, but it is believed not to contain everything needed for the best quality encoding.
If I can't perceive any problems with AAC-96, then I will be very happy about AAC-128.
For the TR-808 test signal, I can't tell any difference at all between the original and the AAC-96 result - or the AAC-128. However this is effectively a mono signal. I can see differences in the waveforms - for instance the rimshot sound - if I look at them, but these are clearly not significant to my ears. There is no evidence that the AAC algorithm is adding noise before loud sounds the way Yamaha's VQ system did.
Click here to see a big image of a rimshot sound - original on top and AAC-96 below. This is the left channel only, and spans about 11 msec. Click here to see another comparison, with AAC-128 on the bottom.
Now I will listen to the bi.wav binaural sample. Trouble! The original has quite complex, very smoothly changing tones, but the AAC-128 and AAC-96 results have a very noticeable flutter to them. I can't hear any significant difference between them. Here is a frequency-time analysis of the right channel, between 1 and 2.5 seconds. The top image is the original and the bottom is the AAC-128. Frequencies are from 0 to about 5.5 kHz.
Clearly the result is not as smooth as the original. Even allowing for problems with the analysis and its conversion to a .gif file, it can be seen that some frequencies did not get any energy, apart from the puzzling low-level pulses of energy which are visible as purple vertical bars, which seem to be at about 45 Hz. I can't figure out if they are related to the fluctuation I hear. Perhaps the fluctuation is more diffuse and centred around 15 Hz or so.
Here is a mono file of the right channel, first five seconds of bi.wav - with the original first and then the AAC-128 version. The fluttering is clearly audible. bi-aac-right-orig-then-128-first-5-secs.wav (9 seconds mono, 805 k bytes .)
Whatever the problem, clearly this particular test signal has tripped up this particular combination of AAC encoder and decoder. But remember, this is developed from code which designed as a reference for testing decoders. The Astrid/Quartex version number on the encoder and decoder are 0.2 and 0.1. They are free - and as they say in the classics: "What do you want for nothing? Your money back?".
AAC specifies the data stream and the decoder. How you create a really good encoder which can cope with a vast range of sounds is another matter - and no-doubt a very complex business. Popular and recommended MP3 encoders/decoders in late 1998 still exhibit highly audible problems. Its early days for AAC - except for the encoders and decoders in the labs.
Still, I will listen to the sitar test piece - si.wav.
I cannot hear any difference between the original, the AAC-128 and the AAC-96. The AAC-64 has a different sample rate, but still sounds excellent. Based on this, I would say that AAC at 30 kbps modem streaming speed would sound pretty good in mono - for streaming audio. I have not listened in detail to the various streaming audio protocols - such as Real Audio and Liquid Audio. Streaming audio is tricky - because it has to degrade gracefully when packets are lost.
Looking at a frequency analysis of part of the si.wav results, it is clear that AAC-128 is fudging some details. That's its job - it hasn't got enough bits to tell the full story of a sitar. The diagram below shows the original above the AAC-128 result. This is for the left channel only, between 5 and 6.5 seconds. The vertical frequency scale covers 0 at the bottom to about 11 kHz at the top. The horizontal lines are the harmonics of the beautiful bright sitar main string. Lots of harmonics - 25 or so! The red vertical lines are drum-beats. The droop in the harmonics is a note being bent down - there are two instances of this. The lines continuing during the drop in pitch would be the reverberation of the room, and perhaps the sympathy strings. The original shows the harmonics as separate entities all bending together. Unfortunately AAC doesn't see it this clearly - at least in the highest frequencies - and so the resulting energy is mixed up rather than where it should be. But can our ears hear the difference? Apparently not.
It seems that MP3 at 128 kbps, when properly done is very impressive. However it may be that it is quite often not properly encoded and/or decoded - and/or that it does not cope in joint stereo mode with some music, due to otherwise unproblematic shifts in timing between the channels. Most of the time, with rock music, lousy speakers, lousy listening conditions and listeners who aren't particularly fussy - it doesn't matter. But its better than a worn out audio-cassette, a scratched record or a CD which skips or loops!
The Astrid/Quartex encoder and decoder is not to be regarded as what AAC is really about - but nonetheless it is a valuable contribution which enables us to see the value of this promising compression scheme.
I think that making these encoders (and to a lesser extent the decoders) really swing with the vast variety of sound which people will feed them is very challenging work. Presumably this has already largely been achieved - but not yet in code which is in public domain.
The question of software patents is a vexed one. I won't get into it now - I could argue it both ways.
It's clear that MP3 is sufficiently good to satisfy a lot of music listeners - but my question is whether it is good enough for purchasing music.
If I was paying significant money to an artist, I wouldn't want any question about the sound quality. Perhaps, if I could be convinced that a fully developed AAC system did not audibly degrade most music to most trained listeners at 128 kbps - then perhaps I would be happy to buy music encoded with AAC at 192 kbps.
I feel that we have made such excellent strides, such amazing progress with the CD and the delta-sigma ADC and oversampling filters driving 18 bit current-switching DACs . . . to have arrived at what I believe is audio heaven (at least for stereo in a recording with dynamics suitable for any conceivable listening situation) that we should not be content with compression schemes that take complex and questionable liberties with our music, which we can generally not hear.
Roll on the broadband Internet with traffic costs of below a cent a megabyte (its AUD$0.19 a megabyte in Australia in late 1998 - US 12 cents - due to the congested Pacific fibres) - then we won't mind purchasing 600 megabytes of music (an hour without compression) with a lossless compression algorithm which knocks it back to about 450 megabytes. Lossless is lossless: not a single bit changed. I would by much happier with this than the complex fudge of lossy compression.
I have a page on lossless audio compression: lossless/ .
Back to Page 1