There’s been a lot of talk both recently and in years passed about LZMA compression. With any compression algorithm come benchmarks and these are often in the form of compression-rate, decompression-rate and compression-ratio:
I remember the gradual change from gzip compression to bzip2 which started to happen at the end of the 1990s and continued at the start of the 2000s. Back then I had a Pentium-II 333MHz as my main workstation and a 10Mbit university-provided Internet connection. Sometimes it would be quicker to download a larger gzip file than wait for the much slower bzip2 algorithm to complete its decompression. Today things are much the same: running Faelix means I have much more Internet bandwidth at my disposal than I could need for getting files, but CPU speeds are that much quicker that I’d now consider the bzip2 file over the gzip.
However, my position as a content provider (the owner of an ISP) and the asymmetric nature of the CPU costs of LZMA have changed my thinking on this subject. LZMA achieves better compression, typically, than bzip2 or gzip but is much slower to compress data than either of them. However, LZMA might be very slow to compress but it is much faster than bzip2 when decompressing data. This leads to the question, “Why do we compress data?”
The benefits this brings:
An algorithm with better compression will have lower transport/storage costs. Assuming that an algorithm which is fast under decompression uses fewer instructions per bit of decompressed data output, a faster decompression is also more “ecologically sound”. This is a bit of a hand-waving argument now, but I’m curious enough that I may experiment and confirm this hypothesis with actual data. In the use case of one large, pre-compressed file being made available to a large number of users, LZMA would appear to be a strong contender to replace bzip2 or gzip when ranked with a benchmark of “energy efficiency”:
Research to follow, I think…