LZ77 and LZ78

lossless data compression algorithms

LZ77 and LZ78 are two different types of lossless data compression algorithms.[1][2] They were both published in papers by Abraham Lempel and Jacob Ziv in 1977[1] and 1978.[2] Both of these types of algorithm, LZ77 and LZ78, are mainly designed to compress data without destroying the contents of the data when decompressed. Like all data compression algorithms, LZ77 and LZ78 both use a process called "data compression", or just "compression". Additionally, LZ77 and LZ78 both have their own processes for decompression (a type of process which reverses compression).

LZW is an improved version of LZ78.

LZ77Edit

LZ78Edit

LZ78 is an algorithm that has two variants of itself, LZ78 compression and LZ78 decompression.

LZ78 compressionEdit

In LZ78 compression, this algorithm does compression by replacing repeated occurrences of data with references to a dictionary. The dictionary used for this algorithm is built based on the input data stream. The dictionary stores tuples that represent a set of characters that were part of the input data stream. Each index of a dictionary contains a tuple and its respective character. A tuple stored in an index of the dictionary contains a certain index of the dictionary and a character. Each dictionary item with its respective tuple will be represented as dictionary[...] = {index, character}. Within this type of tuple, the index refers to a point in the dictionary where there is an occurrence of a duplicated part of the dictionary.

If the algorithm finds a unique character not yet found in the dictionary, then a new dictionary entry is added and the index for that new dictionary entry is set to 0. Otherwise, if there is already a dictionary entry, the algorithm just lists a new dictionary entry with an index equal to an existing dictionary entry's index and the character becomes whatever the following character is. Afterwards, a value representing each recorded tuple is stored into a file. The entire process continues until all characters in the input data stream are read.

Example of using LZ78 compressionEdit

Suppose the string in the input data stream is "abbaaaabbbbccabcbaaa".

Whenever doing LZ78 compression, the dictionary always starts out empty. Any new entries will start at the first index. The "empty" index is actually the 0th index.

  1. The first index in the dictionary is D[1] = {0, a} because it is the first new index. The "0" index refers to the default "empty" index. The "a" refers to the letter "a" that is the first in the string.
  2. The second index in the dictionary is D[2] = {0, b} because it is not the same as the first index.
  3. The third index in the dictionary is D[3] = {2, a}. This is different. There was already a "b" in the dictionary. That was in index 2. Combining index 2's character "b" and the new index 3 makes the string "ba" in the dictionary.
  4. The fourth index in the dictionary is D[4] = {1, a}. There was already an "a" in the dictionary. That was in index 1. "aa" is the string value made from the fourth tuple inside the dictionary. Note that the algorithm currently is starting on the 5th letter on the input data stream because of how many letters were scanned so far.
  5. The fifth index in the dictionary is D[5] = {1, b}. Note that the algorithm currently is starting on the 7th letter on the input data stream because of how many letters were scanned so far.
  6. The sixth index in the dictionary is D[6] = {5, b}. In this case, there is already an "a" in the dictionary. As always, the program checks to see the rest of the dictionary if there is a value in the dictionary equal to the currently checked value. As of this step, there is also already an "ab". Therefore, a new index to the dictionary can be made with an even longer value. This The value of this tuple is equal to "abb" because it combines the value of the dictionary's index 5 with an extra character "b".
  7. The seventh index in the dictionary is D[6] = {2, b}. The dictionary checks to see if there's a letter "b" in the dictionary. As of this step, there is a letter "b". The program will then check the next letter of the input data stream. Is there a "bb" in the dictionary? In this step, no. A new index in the dictionary is made. It is a tuple with the value "bb". This tuple contains both the index "2" and character "b".
  8. ...
  9. The last letter "a" is treated differently. It is still added to the dictionary, but the last index will become D[12] = {1} because an dictionary index with the value "a" already exists.

Again, a dictionary contains a set of indexes. Its indexes each contain a tuple that represents a certain value. After LZ78 compression is done, the dictionary should end up like this:

Index Tuple Value
1 {0, a} a
2 {0, b} b
3 {2, a} ba
4 {1, a} aa
5 {1, b} ab
6 {5, b} abb
7 {2, b} bb
8 {0, c} c
9 {8, a} ca
10 {2, c} bc
11 {3, a} baa
12 {1} a

After each time a certain section of the input data stream has been read, a representation of the currently recorded tuple will be stored into the file. The information stored into the file should end up looking like this:

0a
0b
2a
1a
1b
5b
2b
0c
8a
2c
3a
1

LZ78 decompressionEdit

In LZ78 decompression, the process is reversed. Instead of taking a string of many characters, LZ78 decompression scans a file containing a series of stored values. These stored values represent tuples that were featured in LZ78 compression. Like LZ78 compression, LZ78 decompression uses a dictionary to record values. In LZ78 decompression, it will take information that was stored in a file and then record these values into a dictionary. Once all values are added into the dictionary, the program will refer to the dictionary. From there, the program will convert each line of the file into a certain value. That value is determined from converting each line of the file into a tuple. This tuple is stored into the dictionary and will contain a dictionary index and a letter, just as it does in LZ78 compression.

Example of using LZ78 decompressionEdit

ReferencesEdit

  1. 1.0 1.1 Ziv, Jacob; Lempel, Abraham (May 1977). "A Universal Algorithm for Sequential Data Compression". IEEE Transactions on Information Theory. 23 (3): 337–343. CiteSeerX 10.1.1.118.8921. doi:10.1109/TIT.1977.1055714.
  2. 2.0 2.1 Ziv, Jacob; Lempel, Abraham (September 1978). "Compression of Individual Sequences via Variable-Rate Coding". IEEE Transactions on Information Theory. 24 (5): 530–536. CiteSeerX 10.1.1.14.2892. doi:10.1109/TIT.1978.1055934.

Other websitesEdit