What is complexity of the gnome sort for the average case

Assignment Help Data Structure & Algorithms
Reference no: EM13138841

Part 1

Mining information from sales data is a common task in both traditional and online stores. Past sales information can guide purchasing and enable the company to make recommendations to customers about what they may like to buy.

Part 1 performs two data mining tasks. First,  it determines  the ranking of products according to how many of each product has been sold. Second, it also determines the five other products most commonly bought with each product.  

While these tasks are not particularly complex, performing these tasks on large data sets is computationally expensive and the use of data structures and algorithms that facilitate fast searching and sorting is desirable.

Requirements

Write a Java program called SalesInfoMiner  that reads in a file of products, stored in text format (2 lines per entry). The program must then read in information about past sales transactions and output all the products to a new file, sorted  by product description,  along with their sales rank and the five other products with which they were most commonly purchased.

The program accepts four command line arguments and must not expect any user interaction from the keyboard. The first parameter to the program is the name of the product list to read in; the second parameter is the file containing information about past transactions. The third parameter is the name of the file to which the ordered product list will be written, and the fourth parameter is the name of the file to which any error or warning messages will be output.  

Product File

This file contains data for one or more products
-   Each product occupies two lines
-   Line 1 contains the product id, which is a string of 13 digits
-   Product IDs are unique
-   Line 2 is the description of the product, which is not an empty string
-   You can assume that the file is not empty.
 
Thus, the data for a product has the format shown below: 
 
<ProductID> (A thirteen digit string) 
<Product Description> (A text string) 
 
For example: 
 
1234567890123 
Milk - full cream 600ml 
9832983102938 
Bread - wholemeal loaf 
1299128391283 
Eggs, Extra Large, Barn laid  

Sales Transaction File
 
This file contains data for one or more sale transactions
-   Each sale transaction has 2 + n lines, where n is the number of items in this sale transaction (as specified on line 2 of the sale transaction)
-   Line 1 has the format Receipt number: <receipt number> where <receipt number> is a positive integer
-   Receipt numbers are unique
-   Line 2 has the format  Number of items: <number of items> where <number of items> is a positive integer
-   Each sale item occupies a line, which contains the product id of the item
-   All product IDs in a sale transaction are distinct and must exist in the product file
-   You can assume that the file is not empty
 
Thus, data for a sale has the format shown below:
 
Receipt number: <ReceiptNumber>   (<ReceiptNumber> is an integer) 
Number of items: <NumberItems>   (<NumberItems> is an integer) 
<ProductID>            
... 
 
For example: 
 
Receipt number: 1 
Number of items: 3 
1234567890123 
9832983102938 
1299128391283 
Receipt number: 2 
Number of items: 4 
9832983102938 
1234567890123 
1299128391283 
1234567890123 

Product Output File 
-   For each product, the program determines its sale transaction count, i.e. number of transactions that the product appears in.
-   The program lists the products in an output file, sorted by sale transaction count, in decreasing order.
-   When there is a tie in the sale transaction count, the products are listed in alphabetical order of their descriptions (see example below)
-   For each product, the program also lists the top 5 products that are most frequently bought with this product.
-   The "top 5" list may have less than 5 products. It can also have for than 5 due to ties.
Whenever  there is a tie in the frequency, the products are listed in alphabetical order of their descriptions (see example below)
-   The format is as shown below
 
Product: <Product Description> 
ID: <ProductID> 
Sales Count: <SalesCount> 
This product was purchased most often with: 
1: (<ProductID>) - <Product Description> 
2: (<ProductID>) - <Product Description> 
3: (<ProductID>) - <Product Description> 
4: (<ProductID>) - <Product Description> 
5: (<ProductID>) - <Product Description> 
 
For example: 
 
Product: Milk - full cream 600ml 
ID: 1234567890123 
Sales Count: 82
This product was purchased most often with: 
1: (1299128391283) Eggs, Extra Large, Barn laid [8]
2: (9832983102938) Bread - wholemeal loaf [5]
3: (3289742398473) Apples - Golden Delicious [4]
4: (9348029348934) Chocolate, Dark, 200g [4] 
5: (3024823094839) Noodles, Udon - Fresh 250g [4] 
6:  (0002011106917) Nuts, almonds, dry roasted [4] 
Product: Pork - cured, ham, separable fat, boneless 
ID: 0000140956153
Sales Count: 82 
This product was purchased most often with:
. . .
 
"Sales count" is the number of transactions that a product appears in. The numbers in square brackets are the number of transactions a product appears with another product

Calculating the most  frequent items purchased with a product: With every product, we need to output the 5 other items with which it was most frequently purchased. These other items must be listed in order of frequency. The item listed at position 1 would have been bought more frequently with the item than the item at position 2, and so on. Though we are interested in the "top 5", the list may have more than five items due to ties. In that case, more than five items are to be included and items involved in a tie are to be ordered by product description. If a product has not been bought with 5 other items, then less than 5 items are output.

Error File

In the sales transaction file, a sales transaction "record"  is considered to include all the lines from the line starting with "Receipt number" 
?  Up to, but not included, the next line starting with "Receipt number:" 
?  Or up to the end of the file.
When a particular line does not conform, as expected, to the description given above, 
?  the error should be detected
?  an error message is written to the error file (the fourth command-line argument), which indicate 
o  the type of the error
o  the file in which the error occurs
o  and the line number of the line on which the error occurs
?  no result (i.e. listing of products as described in Section B) is output, i.e. the product output file (the third command-line-argument) is empty.
?  and the program terminates
 
Based on the specification in Section A, Based on the descriptions of the product file (the first command-line-argument) and the transaction file (the second command-line-argument), we can have the following types of data error (in addition to errors such as file
does not exists, etc.)

For product file:
 
  1.  Product ID is invalid - not consisting of 13 digits
  2.  Product description is empty/missing
  3.  Product ID is not unique
 
For sales transaction file:
  1.  Line 1 of a sale transaction does not start with "Receipt number:"
  2.  Receipt number (on line 1) is not a positive integer
  3.  Line 2 of a sale transaction does not start with "Number of items:"
  4.  Number of items (on line 2) is not a positive integer
  5.  Product ID does not consists of 13 digits
  6.  Product ID in a sale transaction has duplicates
  7.  Product ID does not exist in the product file
  8.  There are not enough product ID's listed for a particular transaction (i.e. less than the number listed on line 2)
 
The error messages will have the following format: 
 
ERROR: <Error Name> 
<Error Description> 
<Solution> 
 
For example: 
ERROR: Invalid Product ID 
Product 8763726273678 in receipt 4 is not in the product file 
Product discarded, processing of transaction continued 
The program is terminated
 
You can assume that in the sales transaction file
o  There are no blank lines
o  There are no duplicated receipt numbers 
 
Restriction on the Use of Classes in the Java Collection Framework
 
Except for ArrayList and Linkedlist, you are not permitted to use any other classes in the Java Collection Framework.
 
Testing Your Program
 
You will test your program against some large data sets, which will be posted on LMS. When testing your program against large data set, please  do not  run your program on  the  latcs6 server, use the  simula  server instead. For some large data sets you may need to request, especially when you test your program on Windows, more memory by using the -Xmx option of the java command.

Execution of the Program
 
The program will be run with the following command:
 
java SalesInfoMiner <products> <transactions> <output-file> 
<error-log>

If an incorrect number of arguments is supplied, then the program must output an error message explaining the arguments required before terminating. 

Part 2
 
(a) In terms of the big-oh notation, what is the complexity of the gnome sort for the average case? Justify your answer. The justification can be based on approximate calculations.

  Hint: Initially m (the "marker") is at position 1. After a number of comparisons (and swapping), m is at position 2 for the first time. Then after a number of comparisons, m is at position 3 for the first time. Now, suppose m is at position i, approximately how many comparisons would it take for it to advance to position i + 1 for the first time?

(b) Using the classes GnomeSort and RandomIntegerArray given below, write a program to generate some numerical data to verify your answer for the gnome sort's complexity.

Include your answers in the report. Part 2 is worth 10 marks.   
 
public class GnomeSort
{
  /*  Performs the gnome sort and returns the number of comparisons
  */
  public static long sort(int [] a)
  {
    int m = 1;  // m for 'marker', the place from which we 
              // compare the element with the one on the
              // left to see if we should move left or right
    
    long comparisons = 0;  // to count the number of comparisons
                      // for the sorting
    while(true)
    {
      if (m == a.length)  // m is past the last element, 
      {                // the sorting is done
        break;
      }
      else if (m == 0)  // m is at the beginning of the array,
      {                // move right
        m = 1;
      }
      else if (a[m] >= a[m-1]) // the element at m is greater than or
      {                   // equal to the element on the left, 
        m++;               // move right
        comparisons++;
      }
      else            // the element at m is < the element
      {              // on the left, swap the two elements
        int temp = a[m];    // and move to the left
        a[m] = a[m-1];
        a[m-1] = temp;
        m--;
 
        comparisons++;
      }
    }
 
    return comparisons;
  }


public class RandomIntegerArray
{
  /*  Return an array of size n, of random integers of value
    btween 0 and n, inclusively
  */
  public static int [] getRandomIntegerArray(int n)
  {
    int [] a = new int[n];
    for(int i = 0; i < n; i++)
    {
      a[i] = (int) (Math.random() * (n+1));
    }
    return a;
  }
}

Reference no: EM13138841

Questions Cloud

Council of chalcedon : What was the major question addressed by the Council of Nicea and what were the Nicene doctrine and its alternatives ? What was the major question addressed by the Council of Chalcedon,and what were the Chalcedonian doctrine and its alternatives?
Explain a production plant has a requirement for a counter : A production plant has a requirement for a counter that will count 4,000 times before recycling and starting over. How many flip flpos are required?
Why this not exist for that organization : addressing the extent of that organization's leadership's strategic tunnel vision, "all or nothing" thinking, fleixiblility, and focus on key factors. You will provide an argument for why you think each does or does not exist for that organizatio..
Test assumption that experienced consultant has satisfaction : Test the assumption that the more experienced consultant has higher satisfaction ratings at the 5% significance level. Show all steps.
What is complexity of the gnome sort for the average case : What is the complexity of the gnome sort for the average case? Justify your answer. The justification can be based on approximate calculations.
Define asynchronous and synchronous communication mode : What are the respective advantages and disadvantages of the asynchronous and synchronous communication modes from a reliability perspective?
Spiritual gifts-types and basis : An analysis of Spiritual Gifts, description of each gift and the importance of studying these gifts. What role do spiritual gifts play in Christianity?
Six beginning-of-the-year lease payments : The lease is for 6 years and the machine is estimated to have an unguaranteed residual value of $40,000. If the lessor's interest rate implicit in the lease is 12%, the six beginning-of-the-year lease payments would be ??
Language of title vii : When you read the language of Title VII, it prohibits discrimination based on race. It also bans discrimination based on color. Most people on reading the Act see this and think that Congress went a bit overboard in dealing with racial discrimination..

Reviews

Write a Review

Data Structure & Algorithms Questions & Answers

  Create a shell script to locate executable files

Create a shell script to locate executable documents? The script takes a list of document names from the command line and determines which would be executed had these names been given as commands.

  Question about communication recovery plan

Think about a natural or man made disaster, and explain how a communications network could be recovered from such a disaster.

  What is minimum number of nodes expanded for bfs and dfs

Consider the following graph representing the state space and operators of a navigation problem: What is the minimum number of nodes expanded and the storage needed for BFS and DFS?

  Online vs. face-to-face classes

Communication A significant distinction between online and face-to-face classes lies in the area of communication.

  Primitives-remove ambiguities in algorithm-s representation

Describe how the use of primitives helps remove ambiguities in an algorithm's representation.

  Create the algorithm to read information through file

Create the algorithm which will read through file and compute numbers of married men, single men, married women and single women.

  Explain how to modify knuth-morris-pratt algorithm

Explain how to modify Knuth-Morris-Pratt algorithm to support patterns with these wild cards, and analyze modified algorithm. Your algorithm must find first substring in text which matches the pattern.

  Explain an application level protocol

Create and explain an application level protocol to be used in an automatic teller machine and a bank's centralized computer. Your protocol should permit a user's card and password to be verified,

  Describe algorithm that finds maximum feasible flow in graph

Describe an algorithm that finds a maximum feasible flow in G. Denote by MF(|V|, |E|) the worst-case running time of an ordinary maximum flow algorithm.

  How output of leaky bucket policer can be fed in second

Illustrate how output of the leaky bucket policer can be fed into second leaky bucket policer so that two leaky buckets in series police average rate, peak rate, and burst size.

  Draw flowchart to print average for each student

Draw a flowchart to print the average for each student in a class. Input. Input consists of student records each containing a student's name(STUDENT-NAME), score for first test(TEST), score for second test(TEST2), and score for third test(TEST3)..

  Illustrate how b-tree will expand

Illustrate how tree will expand (after inserting each Part#), and what the final tree would like. (b) Repeat item (a), but use a B-tree of order p = 4 instead of a B+-tree.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd