LZ77 and LZ78

lossless data compression algorithms

LZ77 and LZ78 are two lossless data compression algorithms. Abraham Lempel and Jacob Ziv published them in papers, in 1977[1] and 1978.[2] They are also known as LZ1 and LZ2.

Today, there are many variations of these algorithms. Examples of such variations are LZW, LZSS, or LZMA. Lempel-Ziv-Welch (LZW) is a variant that was published in 1983, and that was used for the GIF format used by Compuserve, or the the DEFLATE algorithm used in PNG and ZIP.

They are both theoretically dictionary coders. LZ77 uses a sliding window during compression. Later, this was shown to be equivalent to the explicit dictionary constructed by LZ78. There are limitations though: The two methods are only equivalent when all of the data is to be decompressed.

Since LZ77 encodes and decodes from a sliding window over previously seen characters, decompression must always start at the beginning of the input. Conceptually, LZ78 decompression could allow random access to the input if the entire dictionary were known in advance. However, in practice the dictionary is created during encoding and decoding by creating a new phrase whenever a token is output.[3]

The algorithms were named an IEEE Milestone in 2004.[4] In 2021 Jacob Ziv was awarded the IEEE Medal of Honor for his involvement in their development.[5]

Efficiency

change

In the second of the two papers that introduced these algorithms they are analyzed as encoders defined by finite-state machines. A measure analogous to information entropy is developed for individual sequences (as opposed to probabilistic ensembles). This measure gives a bound on the data compression ratio that can be achieved. It is then shown that there exists finite lossless encoders for every sequence that achieve this bound as the length of the sequence grows to infinity. In this sense an algorithm based on this scheme produces asymptotically optimal encodings. This result can be proven more directly, as for example in notes by Peter Shor.[6]

References

change
  1. Ziv, Jacob; Lempel, Abraham (May 1977). "A Universal Algorithm for Sequential Data Compression". IEEE Transactions on Information Theory. 23 (3): 337–343. CiteSeerX 10.1.1.118.8921. doi:10.1109/TIT.1977.1055714. S2CID 9267632.
  2. Ziv, Jacob; Lempel, Abraham (September 1978). "Compression of Individual Sequences via Variable-Rate Coding". IEEE Transactions on Information Theory. 24 (5): 530–536. CiteSeerX 10.1.1.14.2892. doi:10.1109/TIT.1978.1055934.
  3. "Lossless Data Compression: LZ78". cs.stanford.edu.
  4. "Milestones:Lempel-Ziv Data Compression Algorithm, 1977". IEEE Global History Network. Institute of Electrical and Electronics Engineers. 2014-07-22. Retrieved 2014-11-09.
  5. Joanna, Goodrich. "IEEE Medal of Honor Goes to Data Compression Pioneer Jacob Ziv". IEEE Spectrum: Technology, Engineering, and Science News. Retrieved 2021-01-18.
  6. Peter Shor (2005-10-14). "Lempel-Ziv notes" (PDF). Archived from the original (PDF) on 2021-05-28. Retrieved 2014-11-09.

Other websites

change