Compression Algorithms: an argument for energy efficiency

There’s been a lot of talk both recently and in years passed about LZMA compression. With any compression algorithm come benchmarks and these are often in the form of compression-rate, decompression-rate and compression-ratio:

compression rate
Basically this is “less time spent waiting for the compressed output to be generated”. Where data must be sent real-time, or where the compression process adds latency or reduces the bandwidth of transmitting data, a higher compression rate is desired.
decompression rate
Having received a copy of the compressed data, a higher decompression rate is “less time spent waiting for the original output to be generated”. This can be important where mobile devices are decompressing data (e.g. streamed video) so that less CPU power is used.
compression ratio
How much smaller is the compressed file? The higher the ratio the “better” the algorithm, but not all algorithms perform equally on all sorts of data. Specialist algorithms tailored to the kinds of data being crunched are common (for example, lossless audio compression algorithms tend to outperform general compression algorithms in their niche).

I remember the gradual change from gzip compression to bzip2 which started to happen at the end of the 1990s and continued at the start of the 2000s. Back then I had a Pentium-II 333MHz as my main workstation and a 10Mbit university-provided Internet connection. Sometimes it would be quicker to download a larger gzip file than wait for the much slower bzip2 algorithm to complete its decompression. Today things are much the same: running Faelix means I have much more Internet bandwidth at my disposal than I could need for getting files, but CPU speeds are that much quicker that I’d now consider the bzip2 file over the gzip.

However, my position as a content provider (the owner of an ISP) and the asymmetric nature of the CPU costs of LZMA have changed my thinking on this subject. LZMA achieves better compression, typically, than bzip2 or gzip but is much slower to compress data than either of them. However, LZMA might be very slow to compress but it is much faster than bzip2 when decompressing data. This leads to the question, “Why do we compress data?”

  • smaller storage requirements
  • smaller transport requirements

The benefits this brings:

  • lower cost of storage
  • possibly leads to lower energy consumption for storage
  • lower cost of transport
  • transport of data across communications links will take less time
  • better use of capacity of existing communications links (more customers served)
  • possibly leads to lower energy consumption for transmission of data

An algorithm with better compression will have lower transport/storage costs. Assuming that an algorithm which is fast under decompression uses fewer instructions per bit of decompressed data output, a faster decompression is also more “ecologically sound”. This is a bit of a hand-waving argument now, but I’m curious enough that I may experiment and confirm this hypothesis with actual data. In the use case of one large, pre-compressed file being made available to a large number of users, LZMA would appear to be a strong contender to replace bzip2 or gzip when ranked with a benchmark of “energy efficiency”:

  • compression, though slow (therefore assumed to be energy-costly), is a one-time operation
  • transmission is smaller, and is a per-user operation
  • depression is fast (therefore assumed to be more energy-efficient), and is a per-user operation

Research to follow, I think…

Category: