Write a function that load the contents of a given text file

Assignment Help Computer Engineering
Reference no: EM131917138

Programming Project Assignment: Data Compression using Huffman Codes

Introduction

This assignment focuses on building a lossless data compression algorithm using Huffman Codes. In 1952, David Huffman developed a way to find a set of optimal prefix codes for a given set of input symbols. Essentially, Huffman Codes are variable-length bit strings which uniquely represent a set of input symbols. The main benefit offered by this scheme is that the more common the input symbol, the fewer the number of bits that are required to represent it. Clearly, this is advantageous when compressing data.

Compressing ASCII Text using Huffman Codes

The algorithm always begins with the source set of symbols to be compressed. For this assignment, we will focus only on using ASCII text but the algorithm is applicable to any set of symbols. Below is an example source string to be compressed:

to be or not to be

The first step is find the frequency of each letter in the source text. These symbol frequencies are a fundamental aspect of the algorithm. Table 1 gives a summary of these frequencies for this example. One approach to compressing this data is to use a block encoding representation that uses the minimum number of bits that permit all the letters to be encoded, i.e. 3 bits per letter for this example. Since there are 18 characters in this string, 54bits (3 x 18) would be required to represent this sentence using block encoding. Using the Huffman Codes shown in Table 2, the same sentence can be represented with just 47bits - a 13% reduction in size.

Letter

Freq

Space

5

o

4

t

3

b

2

e

2

n

1

r

1

Table 1 - Letter Frequencies

The second step of the algorithm is to construct a binary tree using the data in Table 1. The pseudo-code below describes a way to build a binary tree. It makes use of a priority queue to construct the tree, i.e. in this kind of data structure items with higher priority are placed at the front of the queue. In this case the letter frequency is used as the priority indicator with lower letter frequencies having higher priority.

CREATE leaf node for each letter

ADD all leaf nodes to priority queue

LOOP WHILE more than one node in priority queue REMOVE two nodes from priority queue*

CREATE new node with two removed nodes as children

SET new node's priority to sum of two child node priorities ADD new node to priority queue

END LOOP

REMOVE remaining node from priority queue SET node as root

*when more than two items have the same priority, any node pair is permitted

The third step of the algorithm is to traverse the constructed binary tree to build Huffman Codes for each letter. To do this all links within the tree must be labelled 0 or 1. By convention, each left child link is a 0 and each right child link is a 1. Figure 1 gives an example binary tree for the example string.

1887_Figure-1–Example-Binary-Tree.jpg
Figure 1 - Example Binary Tree

Each Huffman Code is found by traversing the tree from the root node to the letter in question. Table 2 shows the result of traversing the tree for each letter in the example string.

Letter

Huffman Code

Space

00

o

11

t

011

b

010

e

100

n

1010

r

1011

Table 2 - Huffman Codes

Assignment Tasks

There are 8 tasks to complete for this assignment worth a total of 100 marks. Each task needs to be completed before the next one can be attempted (except the first task which can be completed at any time). You have been provided with a sign-off sheet to be completed during tutorial sessions by your tutor. Your tutor may give you partial marks for any incomplete tasks and so please try to complete as many as possible. It is therefore essential that you attend the remaining tutorial sessions. After you have completed a task you must ask a tutor to sign it off on your sheet.

NOTE: You should take extra care to keep your sign-off sheet safe because you will need to provide a scan or photograph of it when you submit your assignment to Blackboard.

Task 1

Huffman Coding is one example of a lossless compression algorithm. Write a short 500-750 word academic report describing the similarities and differences between lossless and lossy compression algorithms and under what circumstances you might favour one kind of compression algorithm over the other. Your report should include brief descriptions of one example algorithm that is used for each kind of compression (excluding Huffman Coding). You should try to use diagrams and illustrations where applicable to help your explanations of your chosen compression algorithms. Marks will be given for good UWE Harvard referencing, good grammar and spelling, and a good report structure.

Task 2

Write a function that loads the contents of a given text file into memory. Once loaded into memory you should output the text to the Console Window. You are expected to use the ‘ToCompress.txt' file for this part of the assignment (which is provided for you on Blackboard).

Task 3

The following code represents a letter-frequency pair structure:

struct LetterFrequencyPair
{
char character; int frequency;
};

Using this structure and the lecture slides to help you, develop a Linked List data structure that can be used to store a list of letters-frequency pairs. You should use the following test data to demonstrate to your tutor that you can store letter-frequency pairs in your Linked List:

Character

Frequency

A

2

a

3

{a space character}

5

;

1

1

2

Task 4

For this task, you need to write a function to find the frequency of each letter in the text supplied in the ‘ToCompress.txt' file. Remember this is a lossless algorithm and so your implementation must be case- sensitive and include all punctuation. You will need to make use of the code you produced for Task 2 and Task 3 to complete this task, i.e. a Linked List of letter-frequency pairs. Finally, you should print the letter- frequency pairs you have discovered to the Console Window.

Task 5

The following code represents a binary tree node that can be used to build a Binary Tree:

struct BinaryTreeNode
{
struct LetterFrequencyPair* letter_frequency_pair; struct BinaryTreeNode* leftChild;
struct BinaryTreeNode* rightChild;
};

Write a function that takes two BinaryTreeNode parameters, merges them according to the Huffman Coding Algorithm (i.e. it creates a new node, points the left and right child to the supplied BinaryTreeNode parameters, sets the frequency of the new node to be the sum of the parameter's frequencies and returns the new node). You should demonstrate to your tutor that this function performs correctly by using the following test data:

Character

Frequency

a

2

b

3

NOTE: you will need to initialise the new node objects leftChild and rightChild equal to NULL to indicate that they do not point to anything

Task 6

Write a function that takes a Linked List of BinaryTreeNode objects as a parameter and returns the BinaryTreeNode with the lowest frequency. This BinaryTreeNode should also be removed from the Linked List. You should demonstrate to your tutor that this function performs correctly by using the following test data:

Character

Frequency

x

4

y

2

z

5

w

3

NOTE: you will need to create a Linked List of BinaryTreeNode objects using the four letter-frequency pairs given above

Task 7

Write a function that builds a Huffman Tree making full use of the code produced in the previous 5 tasks (Tasks 2-6). You should make use of the algorithm given in the explanation above when building the Huffman Tree for the supplied ‘ToCompress.txt' file. You should demonstrate to your tutor that your program has correctly built a Huffman Tree for this file.

Task 8

The final task is to create a recursive function that prints out the Huffman Codes using the Huffman Tree created in Task 7. Your function needs to recursively traverse the Huffman Tree whilst building the Huffman Codes as it goes. All Huffman Codes should be output to the Console Window together with the letter they represent, e.g. A: 001

NOTE: you will need to use Dynamic Memory Allocation for your Linked Lists and Huffman Tree. To avoid Memory Leak, you must ensure that all allocated memory is deallocated before the program exits. Extra credit will be given if this is correctly done, i.e. there should be no examples of Memory Leak.

Reference no: EM131917138

Questions Cloud

What is the change in temperature of the solution : The temperature of the resulting solution increases from 22.5 degrees celcius to 32.3 degrees celcius. What is the change in temperature of the solution?
What has become a key competitive : Internal innovation and dialog had always been a big part of Coloplast's way of doing business. This has over time resulted in a number of improvements.
Initiating precipitation of chloride : If a precipitate will not form, what chloride ion concentration will cause a precipitate of silver chloride to form?
What would the minimum penalty cost have to be : Suppose that the company wants to crash the project a total of 4 days. Based on the information provided in the table and in part c) and the results.
Write a function that load the contents of a given text file : Write a function that loads the contents of a given text file into memory. Once loaded into memory you should output the text to the Console Window.
Trade credit-nominal cost and effective cost : The Thompson Corporation projects an increase in sales from $1 million to $3 million, What is the effective, or equivalent, annual cost of the trade credit?
The income tax expense and the income tax currently payable : what are the income tax expense and the income tax currently payable if Abbey and Benjamin file a consolidated tax return as an affiliated group?
What is the concentration of f : What is the concentration of F- in the solution when the reaction is complete?
What is the best use for these common-size : What is the best use for these common-size? when would you use them?

Reviews

Write a Review

Computer Engineering Questions & Answers

  Mathematics in computing

Binary search tree, and postorder and preorder traversal Determine the shortest path in Graph

  Ict governance

ICT is defined as the term of Information and communication technologies, it is diverse set of technical tools and resources used by the government agencies to communicate and produce, circulate, store, and manage all information.

  Implementation of memory management

Assignment covers the following eight topics and explore the implementation of memory management, processes and threads.

  Realize business and organizational data storage

Realize business and organizational data storage and fast access times are much more important than they have ever been. Compare and contrast magnetic tapes, magnetic disks, optical discs

  What is the protocol overhead

What are the advantages of using a compiled language over an interpreted one? Under what circumstances would you select to use an interpreted language?

  Implementation of memory management

Paper describes about memory management. How memory is used in executing programs and its critical support for applications.

  Define open and closed loop control systems

Define open and closed loop cotrol systems.Explain difference between time varying and time invariant control system wth suitable example.

  Prepare a proposal to deploy windows server

Prepare a proposal to deploy Windows Server onto an existing network based on the provided scenario.

  Security policy document project

Analyze security requirements and develop a security policy

  Write a procedure that produces independent stack objects

Write a procedure (make-stack) that produces independent stack objects, using a message-passing style, e.g.

  Define a suitable functional unit

Define a suitable functional unit for a comparative study between two different types of paint.

  Calculate yield to maturity and bond prices

Calculate yield to maturity (YTM) and bond prices

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd