Huffman coding based compression, Advanced Statistics

Assignment Help:

Huffman code is used to compress data file, where the data is represented as a sequence of characters. Huffman's greedy algorithm uses a table giving how often each character occurs; it then uses this table to build up an optimal way of representing each character as a binary string. We call the binary string the codeword for that character. A property of Huffman code is that it is a prefix code, i.e., in Huffman coding, no codeword is a prefix of some other codeword. The advantage of prefix code is that it makes decoding easier, as we do not need to use delimiter between two successive codewords. Given the frequency of each of the character, we can devise a greedy algorithm for finding the optimal Huffman codeword of each of the characters. For details of the greedy algorithm,

In this assignment, we will build a compression library that compress text les using Huffman coding scheme. This library will have two programs: compress, and decompress; compress accepts a text file and produces a compressed representation of that text file; decompress accepts a file that was compressed with the compress program, and recovers the original file.


Related Discussions:- Huffman coding based compression

Machine learning, Machine learning  is a term which literally means the ab...

Machine learning  is a term which literally means the ability of a machine to recognize patterns which have occurred repetitively and to improve its performance based on the past

Orthogonal, Orthogonal is a term which occurs in several regions of the st...

Orthogonal is a term which occurs in several regions of the statistics with different meanings in each case. Most commonly the encountered in the relation to two variables or t

Bartlett decomposition, Bartlett decomposition : The expression for the ra...

Bartlett decomposition : The expression for the random matrix A which has a Wishart distribution as the product of the triangular matrix and the transpose of it. Letting each of x

Disclosure risk, The risk of being able to recognize the respondent's confi...

The risk of being able to recognize the respondent's confidential information in the data set. Number of approaches has been proposed to measure the disclosure risk some of which c

Mosaic displays, Mosaic displays  is the graphical display of the standardi...

Mosaic displays  is the graphical display of the standardized residuals from the fitting a log-linear model to a contingency table in which the colour and outline of the mosaic's '

Implementation of huffman coding, Input to the compress is a text le with a...

Input to the compress is a text le with arbitrary size, but for this assignment we will assume that the data structure of the file fits in the main memory of a computer. Output of

Biplots, Biplots: It is the multivariate analogue of the scatter plots, wh...

Biplots: It is the multivariate analogue of the scatter plots, which estimates the multivariate distribution of the sample in a few dimensions, typically two and superimpose on th

Partial least squares, Partial least squares is an alternative to the mult...

Partial least squares is an alternative to the multiple regressions which, in spite of using the original q explanatory variables directly, constructs the new set of k regressor v

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd