How text files can be compressed by using a coding

Assignment Help Programming Languages
Reference no: EM131426372

Programming Assignment

This program will give you practice with binary trees and priority queues. In this program you will explore how text files can be compressed by using a coding scheme based on the frequency of characters. We will use a coding scheme called Huffman coding.  The basic idea is to abandon the way that text files are usually stored. Instead of using the usual seven or eight bits per character, Huffman's method uses only a few bits for characters that are used often, more bits for those that are rarely used.

You will solve this problem using a structure known as a priority queue. In a priority queue each value inserted into the queue has a priority that determines when it will be removed.  There are many ways to specify the priorities.  For this program you will construct objects that implement the Comparable interface, with objects that are less being given a higher priority (to be removed first).

The first step is to compute the frequency of each character in the file you wish to encode. This allows you to determine which characters should have the fewest bits, etc. The next step is to build a coding tree from the bottom up according to the frequencies.  An example will help make this clear. To make the example easier, suppose we only want to encode the five letters (a, b, c, d, e) and they have frequencies 3, 3, 1, 1, and 2, respectively.

Part 1: Making a Code

For our purposes, we will encode what are known as bytes (8 bits).  This will allow us to encode standard text files that use ASCII and binary files as well.  From the point of view of your Huffman code, you can think about the individual bytes as simple integers in the range of 0 to 255, each representing the ASCII value of a particular character.  In part 1, you are working with a program called MakeCode. It prompts the user for a file to examine and it computes the frequency of each character in the file.  These counts are passed as an array to your HuffmanTree constructor.

The array passed to your constructor will have exactly 256 values in it, but your program should not depend on this.  Instead, you can use the length field of the array to know how many there are.  In your constructor, you should use a priority queue to build up the tree as described above. First you will add a leaf node for each character that has a frequency greater than 0 (we don't include the other characters in our tree). These should be added in increasing character order (character 0, character 1, and so on).

Then you build the tree.  Initially you have a bunch of leaf nodes.  Your goal is to get a single tree.  While you haven't gotten down to a single tree, you remove two values from the priority queue and combine them to make a new branch node which you put back into the queue, as described above.  You continue combining subtrees until you get down to one tree.  This becomes your Huffman tree.

You are to define a class called HuffmanTree with the following public methods (more methods will be added in part 2 of this assignment):

Method

Description

HuffmanTree(int[] count)

Constructs a Huffman tree using the given array of frequencies where count[i] is the number of occurrences of the character with ASCII value i.

void write(PrintStream output)

Writes the current tree to the given output stream in standard format.

In defining your class, you will also define a node class called HuffmanNode.  You may decide what data fields to include with this class.

Part 2: Encoding and Decoding a File

There are two new main programs that are used in this part of the assignment: Encode.java and Decode.java.  You will also need BitInputStream.java, BitOutputStream.java. Recall that MakeCode.java examined an input file and produced a code file for compressing it.  The program Encode.java uses this code and the original file to produce a binary file that is the compressed version of the original (Here is what you should get for the two text files that are given to you: short.short and hamlet.short).  The program Decode.java uses the code and the binary file from Encode to reconstruct the original file.  Encode is a complete program that doesn't need the Huffman tree.  Decode depends on your HuffmanTree class to do most of the work.

When complete, your Decode program should be able to take one of the short files (short.short or hamlet.short) and the corresponding code file (short.code or hamlet.code) to reconstruct the original file.  (Warning: different operating systems, particularly windows and OS X, have different ways of representing the breaks between lines in a text file.  The sample encoded files were created on a windows machine, and, when decoded properly, will produce a text file with correct windows line breaks.  If you look at the output file or run the program on OS X there may be slight differences - in particular the byte counts might not be the same - even if your program is working properly.  You can check by running your code on a windows box.)

To complete your Decode program, you will have to add two new methods to your HuffmanTree class:

Method

Description

HuffmanTree(Scanner input)

Constructs a Huffman tree from the Scanner.  Assumes the Scanner contains a tree description in standard format.

void decode(BitInputStream input, PrintStream output, int eof)

Reads bits from the given input stream and writes the corresponding characters to the output.  Stops reading when it encounters a character with value equal to eof.  This is a pseudo-eof character, so it should not be written to the output file.  Assumes the input stream contains a legal encoding of characters for this tree's Huffman code.

The first of these methods is an alternative constructor. In part 1 you wrote a constructor that took an array of frequencies and constructed an appropriate tree given those frequencies.  In this part you are given a Scanner that contains the file produced by your write method from part 1.  In other words, the input for this part is the output you produced in part 1.  You are using your own output to recreate the tree.  For this second part, the frequencies are irrelevant because the tree has already been constructed once, but you are using the same node class as before.  You can set all of the frequencies to some standard value like 0 or -1 for this part.

Attachment:- Assignment Files.rar

Reference no: EM131426372

Questions Cloud

Write a solution trail for given problem : Find the probability that a randomly selected baseball has a weight greater than 5.14 ounces. Write a Solution Trail for this problem.
Organizations effectively manage diversity : How does awareness about the layers of diversity help organizations effectively manage diversity?
Tirm focuses on information risks : TIRM focuses on information risks. What other sorts of risks exist that TIRM by itself cannot adequately address? What sort of assumptions are we making, or led to make because of this?
Find probability that candle burns at least seven hours : Find the mean burning time and the probability that the burning time of a randomly selected candle will be within v one standard deviation of the mean.
How text files can be compressed by using a coding : This program will give you practice with binary trees and priority queues. In this program you will explore how text files can be compressed by using a coding scheme based on the frequency of characters
What is the implied probability of default : The YTM on 5-year govt. bond is 5%. The YTM on AAA rated bond is 5.8% and B rated bond is 9.8%. Suppose the bond holder expects same rate of return on all these bonds , what is the implied probability of default?
Probability that both values are between 30 and 40 : Suppose X is a uniform random variable with a = 25 and b = 65.- Suppose two values of X are selected at random. What is the probability that both values are between 30 and 40?
Observe the difference between input and output : ENEE 3517- For the NOR gate, connect input B to 5V, and input A to a 0-5V, 100kHz, square wave. Display both input and output at the oscilloscope, and observe the difference between input and output.
What is the interest accrued on the bond : On Mar 1,2003 ABC Inc is selling a corporate bond with a face value $1000 and 7% coupon paid semi-annually. The next coupon payment after Mar 1,2003 is on June 30,2003.What is the interest accrued on the bond?

Reviews

len1426372

3/14/2017 2:20:46 AM

In the instructions page you can download the starter codes. You will have to turn in all of the java files that are in your project (even those that were given to you). In terms of correctness, your class must provide all of the functionality described above. In terms of style, we will be grading on your use of comments, good variable names, consistent indentation and good coding style to implement these operations. Remember that you will lose points if you declare variables as data fields that can instead be declared as local variables. You should also avoid extraneous cases (e.g., don’t make something into a special case if it doesn’t have to be).

Write a Review

Programming Languages Questions & Answers

  Display the property tax for n property owners

Write a java application that calculates and displays the property tax for N property owners. N should be declared as a constant and it should be equal to the largest digit of your student ID number

  Design a grade average program using raptor

Design a grade average program using raptor that will produce the numerical grade average of test scores input by a user

  Create program to keep track of game collection

You wish to create a program which will keep track of the game collection at home. The program must recognize the platform.

  Write pseudocode for determining grade of student

Write a pseudocode for determining the grade of a student given his/her total score. The program should display a grade of A if the score is above 85.

  Perform a paired t-test to verify your claim

Which type of t-test is most appropriate to investigate your co-investigator's claim? Why is it most appropriate t- test? Please explain your answer in 1-2 lines.

  Program asks for number of shares-whole dollar portion price

Program asks for number of shares held, whole dollar portion of price, and fraction portion. fraction portion is to be input as two integer values, one for numerator and one for the denominator.

  Write the code to display a menu and test it

Write a method to display instructions to the user, Write the code to display a menu and test it, Write the code to display the contents of the data file line by line

  Design-write program to enter score repetition structure

Design and write a program that asks the user to enter five test scores using a repetition structure. The program should display the letter grade for each score and the average test score at the end of the program.

  Reducing the average memory access time

Suppose that increasing the line size to 128 bytes increases the H to 0.97. Does this reduce the average memory access time?

  Write application that allows users to enter student id

Write an application that enables users to enter student ID and three exam scores. Provide a method to compute and return the overall exam average.

  Write a method to store the product of the two arrays

Design an application that includes three arrays of type int. Allow the user to enter values into the first two.Write a method to store the product of the two arrays in the third array

  Develop console application to generate random number

Develop C++ console application which generates the random number which should be guessed by user. Program will indicate to user if their guess is correct, too high or too low.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd