Types Of Data Compression Computer Science Essay

Data compression has come old in the last 20 years. Both the quantity and the quality of the body of books in this field provide ample proof this. There are many known methods for data compression. They are based on different ideas, are suitable for different types of data, and produce different results, nevertheless they are all based on the same theory, namely they compress data by detatching redundancies from the original data in the foundation file. This record discusses different types of data compression, the benefits of data compression and the strategies of data compression.


Data compression is important in this age due to amount of data that is moved inside a certain network. It creates the copy of data not too difficult [1]. This section explains and compares lossy and lossless compression techniques.


Lossless data compression makes use of data compression algorithms which allows the exact original data to be reconstructed from the compressed data. This can be contrasted to lossy data compression, which does not permit the exact original data to be reconstructed from the compressed data. Lossless data compression is utilized in many applications [2].

Lossless compression is employed when it's vital that the initial and the decompressed data be indistinguishable, or when no assumption can be produced on whether certain deviation is uncritical.

Most lossless compression programs implements two types of algorithms: one that creates a statistical model for the type data, and another which maps the suggestions data to piece strings employing this model so that "possible" (e. g. frequently encountered) data will produce shorter result than "improbable" data. Often, only the former algorithm is named, while the second is implied (through common use, standardization etc. ) or unspecified [3].


A lossy data compression technique is one where compressing data and its decompression retrieves data which may will be different from the original, but is "close enough" to be useful in some way.

There are two basic lossy compression techniques

First is lossy transform codecs, where examples of picture or sound are taken, chopped into small sections, transformed into a new basis space, and quantized. The causing quantized worth are then entropy coded [4].

Second is lossy predictive codecs, where past and/or succeeding decoded data can be used to predict the current sound test or image structure.

In some systems the two methods are employed, with transform codecs being used to compress the error signals produced by the predictive level.

The benefit of lossy methods over lossless methods is the fact in some cases a lossy method can produce a much smaller compressed document than any known lossless method, while still reaching certain requirements of the application [4].

Lossless compression strategies are reversible in-order for the initial data can be reconstructed, while lossy plans accept some lack of data to be able to achieve higher compression.

In practice, lossy data compression will also come to a spot where compressing again does not work, although an exceptionally lossy algorithm, which for example always takes out the previous byte of an file, will always compress a file until where it is empty [5].


Lossless and lossy data compressions are two methods that happen to be use to compressed data. Each strategy has its specific used. A compression between the two techniques can be summarised as follow [4-5]

Lossless technique maintains the foundation as it is during compression while a big change of the original source is expected in lossy approach but very near the origin.

Lossless strategy is reversible process which means that the original data can be reconstructed. However, the lossy approach is irreversible because of the lost of some data during extraction.

Lossless approach produces greater compressed file weighed against lossy approach.

Lossy technique is mostly used for images and sound.


Data compression is recognized as storing data in ways which requires fewer areas than the normal. Generally, it is saving of space by the reduction in data size [6]. This section explains Huffman coding and Lempel-Ziv-Welch (LZW) compression techniques.


Huffman coding is an entropy encoding method used for lossless data compression. The term means the use of an variable-length code desk for encoding a source sign (such as a personality in a file) where the variable-length code desk has been produced in a specific way predicated on the estimated probability of occurrence for each possible value of the foundation symbol. It had been produced by David A. Huffman while he was a Ph. D. university student at MIT, and posted in the 1952 paper "A WAY for the Development of Minimum-Redundancy Codes" [4].

Huffman coding implements a special method for choosing the representation for each and every symbol, resulting in a prefix code (sometimes called "prefix-free codes", that is, the bit string representing some particular mark is never a prefix of the little string representing any other icon) that expresses the most frequent source icons using shorter strings of bits than are used for less common source icons [5].

The technique works by setting up a binary tree of nodes. These can be stored in a regular array, the size of which will depend on the amount of icons, n. A node can be the leaf node or an internal node. First, all nodes are leaf nodes, that have the mark itself, the weight (rate of recurrence of appearance) of the image and optionally, a web link to a parent or guardian node which makes it readable the code (backwards) beginning with a leaf node. Internal nodes contain mark weight, links to two child nodes and the optional connect to a parent node.

The process pretty much starts with the leaf nodes including the possibilities of the mark they signify, and a new node whose children are the 2 nodes with smallest possibility is created, in a way that the new node's likelihood is add up to the sum of the children's likelihood. With the 2 2 nodes put together into one node (thus not considering them anymore), and with the new node being now considered, the procedure is repeated until only one node remains, the Huffman tree [4].

The simplest building algorithm is one where a priority queues where in fact the node with least expensive probability is given highest priority [5]

1. Generate a leaf node for each and every mark and add it to the main concern queue.

2. While there is several node in the queue

Remove both nodes of highest priority (lowest likelihood) from the queue.

Create a new interior node with both of these nodes as children and with possibility add up to the total of both nodes' probabilities.

Add the new node to the queue.

3. The rest of the node is the root node and the tree is complete [7].

Figure (1).


Lempel-Ziv-Welch (LZW) is a data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was printed by Welch in 1984 as a development of the LZ78 algorithm released by Lempel and Ziv in 1978. The algorithm is designed to be fast to apply but is not usually maximum because it performs only limited research of the info.

LZW can even be called a substitutional or dictionary-based encoding algorithm. The algorithm normally builds a data dictionary (also called a translation stand or string stand) of data happening in an uncompressed data stream. Habits of data (substrings) are identified in the info stream and are matched to entries in the dictionary. When the substring is not within the dictionary, a code term is created based on the data content of the substring, which is stored in the dictionary. The saying is then written to the compressed output stream [8].

When a reoccurrence of an substring is situated in the info, the key phrase of the substring already stored in the dictionary is written to the output. Because the term value has a physical size that is smaller than the substring it presents, data compression is achieved.

Decoding LZW data is the opposite of encoding. The decompressor reads the code from the stream and contributes the code to the info dictionary if it's not already there. The code is then translated in to the string it symbolizes which is written to the uncompressed output stream [8].

LZW moves beyond most dictionary-based compressors because it is not essential to keep carefully the dictionary to decode the LZW data stream. This can save quite a little of space when keeping the LZW-encoded data [9].

TIFF, among other document formats, can be applied the same method for graphic files. In TIFF, the pixel data is stuffed into bytes before being presented to LZW, so an LZW source byte might be a pixel value, part of your pixel value, or several pixel values, depending on image's tad depth and range of colour programs.

GIF requires each LZW type symbol to be a pixel value. Because GIF allows 1- to 8-piece deep images, there are between 2 and 256 LZW input icons in GIF, and the LZW dictionary is initialized consequently. It is not important the way the pixels might have been filled into storage space; LZW will package with them as a collection of symbols [9].

The TIFF way can not work very well for odd-size pixels, because packing the pixels into bytes creates byte sequences that do not match the original pixel sequences, and any habits in the pixels are obscured. If pixel limitations and byte restrictions recognize (e. g. , two 4-tad pixels per byte, or one 16-little pixel every two bytes), then TIFF's method works well [10].

The GIF approach increases results for odd-size little depths, but it is difficult to increase it to more than eight parts per pixel because the LZW dictionary must become very large to accomplish useful compression on large suggestions alphabets.

If variable-width codes were carried out, the encoder and decoder must be careful to change the width at the same tips in the encoded data, or they will disagree about where in fact the boundaries between individual codes fall season in the stream [11].


In conclusion, due to reality one can't hope to compress everything, all compression algorithms must suppose that there is some bias on the input information so that some inputs are more likely than others, i. e. that there will always be some unbalanced possibility distribution in the possible announcements. Most compression algorithms platform this "bias" on the structure of the announcements - i. e. , an assumption that repeated characters are more likely than random heroes, or that large white areas appear in "typical" images. Compression is therefore about probability.

Also We Can Offer!

Other services that we offer

If you don’t see the necessary subject, paper type, or topic in our list of available services and examples, don’t worry! We have a number of other academic disciplines to suit the needs of anyone who visits this website looking for help.

How to ...

We made your life easier with putting together a big number of articles and guidelines on how to plan and write different types of assignments (Essay, Research Paper, Dissertation etc)