Data compression

From Citizendium
Jump to navigation Jump to search
This article is developing and not approved.
Main Article
Definition [?]
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
This editable Main Article is under development and subject to a disclaimer.

Data compression is the modification of (digital) data such that it can be represented by fewer characters than in its original form. In computer science data compression is mainly used to reduce the memory needed to store certain information or to shorten the time needed to send it between two machines. There are two types of compression, lossless and lossy compression. While the complete information can be retrieved from data compressed using a lossless technique, some information is lost if data was compressed using a lossy technique.

[edit intro]


Suppose the following simple coding scheme for images:

y = yellow, b = black, r = red

Compression example.png

With this coding scheme the above image can be encoded by the following string:


Each pixel in the image is represented by the character corresponding to the color of the pixel. The order of the pixels is assumed to be from upper left to lower right.

This coding scheme can be modified as follows to achieve a compression of the data: Each character representing a color is mention only once and is followed by a digit. The digit represents the number of consecutive appearances of a single color. With this new coding scheme the image can be represented by the following string:


Using the first scheme the image is represented by 45 characters. The second scheme uses only 22 characters to encode the image. Thus the compression scheme achieves a rate of 50%.

Obviously this simple compression scheme is only effective for images that have large connected areas with the same color. If the color changes from pixel to pixel this scheme does not compress the data, but indeed increases the needed characters by a factor 2.