First Principles logo
Note! This is an archival page from 1998, with a few updates.
For the current version of this page, click here: index.html

Lossless Compression of Audio


Copyright Robin Whittle 1998 rw@firstpr.com.au  Originally written 8 December 1998  (Links added to on various dates.  Latest update 2 November 2000.  )

Back to the First Principles main page - for material on telecommunications, music marketing via Internet delivery, stick insects . . .

To the /audiocomp/ directory, which leads to material on lossy audio compression, in particular, comparing AAC, MP3 and TwinVQ.
 
 
Note:  I will soon be conducting a new series of tests of the updated versions of the programs tested previously, and of several new programs as listed below.

I will try to keep links to lossless sites up to date, since I don't know of any other web site which attempts to maintain a comprehensive list of lossless audio links.




 
Note! This is an archival page from 1998, with a few updates.
For the current version of this page, click here: index.html

Tests: what can be achieved with lossless compression?

On 2 December 1998, I ran some tests to see what compression ratios could be achieved with Shorten, and MUSICompress (WaveZip), Wave Archiver and Pegasus SPS.  In the days which followed I also tested Sonarc and LPAC (and it's predecessor LTAC).  See the links section below for details on where to obtain this software and which versions I tested. My main interest is in the maximum attainable compression - rather than encode speed. Some of the compressors were very fast - faster than the music's playing time - and not much slower on a 180 MHz K6 than the limitations imposed by reading and writing the large files involved.  Generally the higher compression programs ran slower.
 
 
I have not tested the speed of these programs! A number of people - including the authors of the programs - are interested in this.  Its not my primary concern at present which of these is the fastest.  Generally the lower compression programs were faster - for instance WaveZIP was damn fast.  Since there is just a few percentage points between the various compression ratios, and a larger variation in compression speed, many users will select a program based on its speed and ease of use rather than it providing the best attainable compression ratio.  (A friend who does a lot of hard-disk recording is very happy with WinZip.) I can't anticipate what someone else will find easiest to use.  For some it will be a GUI program for X Windows (Unix) or MS Windows.  For others a command-line driven program is ideal since it can be easily driven from a script or .bat file. 

Compression size, for the music I have chosen, I can reliably report - user interface and compression speed, I can't. 

The actual speed of the program depends on many things, including:

  • CPU type and clock speed.  I could measure compression times on my K6 machine, but the same programs could run at quite different speeds, relative to each other, on a Pentium II - due to their differing use of floating point instructions and interaction with the cache. (None of these programs seem to use MMX instructions.) 
  • Motherboad, cache and RAM. 
  • Hard disk and hard disk controller. 
  • Other programs running on the computer. 
  • Operating system. 
  • Nature of the file to be compressed. 
All these programs are available for free, or on a trial basis, so please test the speeds on your system with your music.  If anyone wants to do a proper speed test, I could include it here - but be sure to specify all the above factors and give details of the music you are using.  I nominate Kylie Minogue's "I should be so lucky" as a severe and an easy to find test of a lossless algorithm.

All tracks were .WAV files read directly from audio CD and are either electronic productions or microphone based recordings - except for my Spare Luxury piece which was generated entirely with software.  The music constituted 775 Megabytes of data - 73 minutes of music.  The tabulation of these figures has involved quite a lot of manual and pocket calculator work, so it is possible that there are some errors.

My original test of Shorten was without options, but Tony Robinson pointed out that I neglected to use the "-p 8" option to get the best compression from Shorten.  I tried this, and while some resulting files were shorter, others were longer and in total the files were longer.  I tried again with "-p 3" and the sum total of all file sizes was still bigger than without options.  The results shown are from my initial tests with no options, but I suggest you experiment with the -p option to find which suits your music best.

WaveZIP had no options regarding compression speed.  I tested Waveform Archiver (WaveArc) at its two highest compression settings, since the highest was very slow.  More on Waveform Archiver times below.  As noted below, I used the highest compression setting for Pegasus-SPS.  With Sonarc I used the "-X" option.

When I tested it, on 4 December, LPAC was a new program from Tilman Liebchen, still at the beta state - a replacement for LTAC which I had begun testing.  I used LPAC with the "JS / AP 30 / AQ 7" settings which Tilman advised me were best for high compression.  This is joint stereo, and provided higher compression than LTAC in joint stereo mode for those files I tested before I he told me of LPAC, except for the software synthesis track for which LTAC produced a remarkably good compression to 36.50 %.
 

Audio files contain a certain amount of information - "entropy" - so they cannot be compressed losslessly to any size smaller than that.  So it is not realistic to expect an ever-increasing improvement in lossless compression algorithm performance.  The performance can only approach more closely whatever the basic entropy of the file is.  No-one quite knows what that entropy is of course . . . I think that would require understanding the datastream in a way which is exactly in tune with it's true nature.  For instance a .jpg image of handwriting would appear to contain a lot of data, unless you could see and recognise the handwriting and record its characters in a suitably compressed format.  The true nature of sound varies with its source, physical environment and recording method, and a lossless compression program cannot adapt itself entirely to the "true" nature of the sound in each piece of music.  Therefore it is not surprising that different algorithms work best on different kinds of music.


 
File sizes as % of original -->

Description of audio track 
(Size Megabytes)

Shorten WaveZIP:
MUSI-
Compress
WavArc, compr. 
level 4 &
level 5
Pegasus-
SPS
Sonarc LPAC
(beta)
Choral - Gothic Voices: Hildergard von Bingen: Columbia aspexit (hi.wav 55.9MB) 37.22% 44.81% 36.49%
34.73%
36.69% 41.24% 39.77%
Solo cello - Janos Starker J.S. Bach: Suite 1 in G Major (ce.wav 173.2MB) 43.02% 45.78% 42.49%
41.41%
42.12% 42.66% 41.48%
Orchestra - Beethoven 3rd Symphony (be.wav 43.6MB) 55.67% 57.99% 42.00%
40.72%
42.42% 53.28% 52.00%
Ballet - Offenbach, Can Can (cc.wav 24.4MB) 58.28% 60.29% 57.32%
54.57%
56.51% 56.42% 54.73%
Software synthesis: my "Spare Luxury" Csound binaural piece (sl.wav 85.0MB) 42.54% 45.23% 42.20%
39.64%
40.40% 41.44% 39.98%
(36.50%
LTAC-js)
Club techno - Bubbleman: Theme from Bubbleman (bm.wav 59.1MB) 74.07% 75.43% 69.51%
68.45%
70.70% 73.10% 71.42%
Rampant trance techno - ElBeano: Ventilator (eb.wav 44.0MB) 68.50% 69.55% 66.94%
66.22%
67.67% 69.21% 67.80%
Rock - Billy Idol, White Wedding (bi.wav 88.9MB) 65.04% 66.54% 62.07%
58.79%
62.48% 60.54% 58.13%
Pop - Kylie Minogue, I Should be so Lucky (ky.wav 35.9MB) 74.36% 75.28% 71.38%
70.41%
72.08% 71.63% 69.63%
Indian classical (mandolin and mridangam) - U. Srinivas: Sri Ganapathi (sr.wav 71.7MB) 53.54% 56.11% 46.70%
44.63%
52.39% 52.24% 50.75%
Indian classical (sitar and tabla) PT. Kartick Kumar & Niladri Kumar,: Misra Piloo (si.wav 89.4MB) 58.60% 61.50% 56.17%
50.99%
53.46% 51.31% 49.98%
Average final file size: 57.34% 59.86% 53.93%
51.87%
54.27% 55.73% 54.15%
Therefore, average compression ratio: 1.744:1 1.670:1 *1.854:1
1.928:1
1.843:1 1.794:1 1.847:1

 
 
Note! This is an archival page from 1998, with a few updates.
For the current version of this page, click here: index.html


* Prior to 22 March 2000, I had this figure as 1.845 which must have been a clerical error on my part from working things out with a calculator and on paper.  Thanks to Lin Xiao for pointing this out!
 

The pattern of compressed file sizes is: There is a factor of two difference between resultant file sizes between the easiest and hardest cases - Hildegard von Bingen and Kylie Minogue!  So the "average final file size" shown above is entirely dependent on the choice of music for this test.  As it happens, I like both Hildergard and Kylie - and Bach, techno and Indian classical music - so it is a realistic mix for me.  Death Metal buffs will probably find their music harder to compress than Kylie.

One experiment I tried was filtering the inaudible frequencies from a file.  I took the Kylie track - which according to graphic frequency analysis had high frequencies ending at about 20.5 kHz - and filtered it with an 18 kHz low-pass 20 order filter.  Visibly, this made the limit of high frequencies at 18.5 kHz.  Then I compressed the resulting .WAV file with Waveform Archiver at compression levels 4 and 5.  The results are as follows:
 
 
File sizes as % of original -->
Compression ratio
WavArc, 
comp.
level 4
WavArc, 
comp.
level 5
Notes
Pop - Kylie Minogue, I Should be so Lucky (35.9) 71.38%
1.401:1
70.41%
1.420:1
Level 5 gives 1.36% improvement over level 4.
As above, but frequencies above 18 kHz removed 71.11%
1.406:1
69.00%
1.449:1
Filtering makes 2% improvement on level 5 results, but only 0.36% improvement on level 4 results.

I tried using WinZip to compress an output file of each algorithm.  The Shorten file could be compressed to 0.9999746 of its original size, and the MUSICompress/WaveZIP file could be compressed to 0.9890647 of its original size. WinZip could not compress the Wave Archiver, Pegasus-SPS, Sonarc, LTAC or LTAC files which I tried.
 

While there is a 2:1 range in the final file percentages attainable in the range of music I chose, the average difference between the file sizes, for each file between the worst and best compressor, was 1.17:1.  So it seems that all algorithms are doing a reasonable job of reaching towards the fundamental information content of each file.

Waveform Archiver (v1.1) has a compression level parameter, from 0 to 5, with a default of 1.  At level 1, the compression was fast - probably limited by disk access speed.  The file sizes were rarely larger than those of Shorten, and sometimes a lot smaller.  At level 5, the program descends into deep thought!  On my K6 180 MHz machine it took 6.5 hours to compress the 775 Megabytes of music at compression level 5 whereas it took 34 minutes at level 4.  The resulting total file sizes were 49.098% and 51.322% respectively.  (Note that these figures the total size of all compressed files.  These differ from the average file size, because some pieces of music in my test selection which were easier to compress were much longer.)

This table shows Waveform Archiver's encode times and resultant file sizes for the compression levels 1 to 5, using the ElBeano trance techno track.
 
Compression
level
Time, seconds File size
1 52 67.915%
2 74 67.736%
3 78 67.093%
4 98 66.949%
5 868 66.621%

Remember that the results depicted here are entirely dependent on my choice of music.  All these programs are available freely - on a trial or unrestricted basis - so please experiment with them yourself.  The programmers may improve their software after I write this review, so please check the current versions with those that I used, as noted below.

At the time of writing - 5 December 1998 - there is a new version 2.0 of Waveform Archiver in the works, and LPAC is moving beyond the beta stage.  Be sure to check the developer's site for the latest information.

Thanks to the programmers for tackling this field of lossless audio compression and for making their software available!

Al Wegener: MUSICompress.Tony Robinson: Shorten.Dennis Lee: Waveform Archiver.The Pegasus team and Krishna Software : Pegasus-SPS.Richard P. Sprague: Sonarc.Tilman Liebchen: LTAC and LPAC.


Thanks also to Mat Hans for his thesis: Optimisation of Audio for Internet Transmission.

Links

Note! This is an archival page from 1998, with a few updates.
For the current version of this page, click here: index.html

Links specific to lossless audio compression

Sites of general interest regarding audio compression




 
Note! This is an archival page from 1998, with a few updates.
For the current version of this page, click here: index.html

My work

In November 1997 I spent some time pursuing an old interest - lossless compression of audio signals.  As far as I could tell, my approach had not been used.  It seemed simple and obvious to me - so maybe it was genuinely inspired, or maybe it was not in use because its performance was too low.

My interest is in lossless compression of music for electronic delivery by the Internet - now and in the future when more people at home have broadband ( > 2 Megabit/sec ) links.

I wrote some C code and ran part of the algorithm on some samples from CDs.  While I never did the final data-packing part of it, my system did not appear to be dramatically more efficient at reducing the data than existing systems. It did promise:

Assuming that my algorithm could have improved on the state-of-the-art by 10% - which I doubt and which would have been a major achievement - then all it would do is some combination of: However, assuming that bandwidth is doubling each year - for instance considering that the uptake of HFC cable modems and ADSL (see here for more info) is growing rapidly and these are 30 to 150 times faster than a telephone modem - then all this fab new algorithm would do is bring progress forward by a month or two.  On the other hand, saving 10% of storage requirements forever more in the music field would be well worth doing!

While I still think my approach to packing the variable length binary numbers (which are used in all these lossless algorithms to encode the difference between the predicted and the actual waveform) could be superior to the Huffman or Rice approaches generally used, I decided to pursue other things.  That approach is not easily explained, without a few pieces of paper, a pen or pencil a cup of tea and some good chocolate biscuits!
 



 

Updates in reverse order:

2000 July 15: Updated link to Monkey.2000 April 24: Added link to Monkey.2000 March 22: Added mention of AudioZip and fixed a crook compression ratio figure for WavArc level 4.1999 September 13:  Added link to Malcolm Taylor's site.1999 June 9: Updated URL for Seneschal and added material about MLP - Meridian Lossless Packing for DVD-A.1999 June 6:  Added link to DAKX.1999 January 9: Added link to Leonardo Maffi's site.1998 December 8: Added link to "Smacker".1998 December 3 - 6:  Added Sonarc, LTAC / LPAC and Archive Compressor - and a link to Mat Hans' thesis.1998 December 3:  Added link to RAR.1998 December 2:  Updated with material relating to Waveform Archiver and Pegasus SPS.1998 December 2:  Page established - testing Shorten and WaveZIP - and just as I finished, I found Waveform Archiver and later Pegasus SPS..


Robin Whittle  rw@firstpr.com.au
Return to the main page of the First Principles web site.
 
 
Note! This is an archival page from 1998, with a few updates.
For the current version of this page, click here: index.html