Reference no: EM13712367 
                                                                               
                                       
You will be implementing a relatively simple text-compression scheme. your code should work as follows:
1.  It should run from the command line, using the  command:
java  Compress <FILE_NAME>
where  the <FILE_NAME> argument  is  the name of a  text-file  found  in  the same directory as the  executable program.  (This  name  is  passed  into your  Java program  as  an argument, and will  then be  accessible  from position  0  in  the  array  args  that  is  input  to  the main() method.
2.  It should read through the text-file, word-by-word (using  white-space  to  delineate words, as  usual,  so  "words"  may  contain  punctuation,  etc.).  For  each  word,  it  should  count occurrences  of  any  substring  of  3  or more  characters.  For  instance,  if  it  reads  in  the  word "googoo"  it  should  count  2  occurrences  of  the  substring  "goo",  1  occurrence  of  "oog",  and  1 occurrence each of  "goog",  "oogo",  "ogoo",  "googo",  and  "oogoo".  Finally,  it  should  count one occurrence of 6-letter substring,  "googoo".
This counting should occur over  the entire file, and it should be handle efficiently using a hash-map  structure  to map  substrings  to  the  number  of  times  they  occur. You  can  use  a built-in Java structure  here.
3.  Once the counting is complete, the hash-map will contain a collection of  substrings, each with a corresponding count value (1 or greater).  You should  then order  these substrings:
- Create a new class that can store a substring and its count value. 
- Make  that  class  properly  Comparable  to  other  things  of  its  type.  In  the  compareTo ordering,  each  object  should  be  ordered  by  its  impact  factor, which  is  the  product  of the substring's  length and  the number of  times  it occurs.  For  instance,  if  the substring "the" occurs 100 times, it has an impact factor of (3 × 100 = 300).
Two of your class objects should be ordered so  that  things with higher  impact factor come before things with a  lower impact factor. Thus, if "the" has impact factor 300, and "goo" has impact factor 100, then "the" comes before "goo" in comparison  ordering.
- Once you have completed the class, place each of your substring and count value  pairs into  a priority queue,  so  that  they will be  in order, with highest  impact  factor  at  the front of the queue. Again, you can use a built-in Java structure  here.
4.  Now, you can generate a compressed encoding of  the highest-impact substrings.  For each of  the  first 52  such  substrings  (or al l  of  them,  if your  input doesn't  contain 52 distinct substrings  of   length  3  or  greater),  you  will  encode  each  such  string  with  a  2-letter  coding, consisting  of  the  plus  sign  '+',  followed  by  a  single  character  (in  range  a-z  or  A-Z).
For instance,  if   "the"  is our highest-impact substring,  then  its encoding would be "+a".
Note 01: when generating codes, you should ignore all substrings that either contain,  or are contained by, one you have already encoded. For instance, if you have already encoded "ther",  then  you  should  ignore  any  future  substrings  like  "there",  which  contains  it,  or "the", which is contained by it. You should still attempt to generate 52 distinct codings,  if that is possible after discarding overlapping substrings.
Note 02: for this to work, it assumes that the text-file does not actually include the  special marker character, '+'; if this is not so, another special character would have to be  used.
5.  Your code should now read through the file again, writing out a compressed  version to  a  second  file. To  do  compression,  read  in  each word  again,  and  replace  each  substring  in the word with its encoding, if such an encoding exists, and write the result.
For instance, if we read in the word "teethes", and we have an encoding of "+a" for "the", then we should re-write the word as "tee+as".
Note  01:  if  a word  contains multiple  possible  substrings,  then  you  should  replace  them all. When there are multiple choices, the replacement should happen by order of length of substrings, so that we get maximum compression of each word. For instance, if we have  an encoding of  "+a"  for "the", "+b"  for "theo", and  "+c"  for "log",  then  the word "theology" would first be encoded as "+blogy", since "theo" is the longest substring; it would then  be encoded finally as "+b+cy", and written to the output file. If a word contains multiple possible substrings of  the same  length, remove  them  in  the order given by Java String sorting (this means that every word has a uniquely defined  coding).
Note 02: when writing to the output file, make sure that line-breaks and other  white-space are preserved from the original file.
Note 03: the output  file  should have a name  related, but not  identical,  to  that of  the original input. In particular,  if  the  input  is named  "FILE.txt",  then  the  compressed output  file  should be named  "FILE_comp.txt"  (where  "FILE"  can  be  any  file-name).
6.  The code above will somewhat compress a text-file, but we will not be able to de-compress it, unless we know the encoding used. Modify your code so that when it creates the output file, it first writes information about the compression into the first part  of the file (in a format of your own choosing), and then follows it with the compressed text. Then, write a second program, which can also be called from the command line:
java  Decompress <FILE_NAME>
When  called,  this  should  open  the  named  file,  if  possible,  and  (assuming  it  is  in  proper format), de-compress it, producing a new version of the file that is identical to the  original input to the compression program.