I this assignment you will implement the compact

Assignment Help Data Structure & Algorithms
Reference no: EM13375400

In this assignment, you will implement the compact representation of the compressed suffix trie ADT for DNA analyses.

A template of the compressed suffix trie class is shown as follows:

public class CompressedSuffixTrie
{
/** You need to define your data structures for the compressed trie */

/** Constructor */

public CompressedSuffixTrie( String f ) // Create a compressed suffix trie from file f
{ }

/** Method for finding the first occurrence of a pattern s in the DNA sequence */

public static int findString( String s )
{ }

/** Method for finding the longest common subsequence of two DNA sequences stored
in two text files f1 and f2 */

public static float similarityAnalyser(String f1, String f2)
{ }
}

The data structures for the compressed suffix trie are not given in the above template. You need to define them yourself. You may introduce any helper methods to facilitate the implementation of these two methods.

The constructor creates a compact representation of the compressed suffix trie from an input text file f that stores a DNA sequence. All the characters of the DNA sequence are A, C, G and T. The findString(s) method has only one parameter: a pattern s. If s appears in the DNA sequence, findString(s) will return the starting index of the first occurrence of s in the DNA sequence. Otherwise, it will return -1. For example, if the DNA sequence is AAACAACTTCGTAAGTATA, then findString("CAACT") will return 3 and findString("GAAG") will return -1. Note that the index of the first character of the DNA sequence is 0.

Warning: If your findString(s) method is slower than O(|s|) (|s| is the length of s), you will get 0 mark for it.

The method similarityAnalyser(String f1, String f2) returns the similarity of two DNA sequences stored in the text files f1 and f2. The similarity of two DNA sequences S1 and S2 is equal to |lcs(S1,S2)|/max{|S1|,|S2|}, where |lcs(S1,S2)|, |S1| and |S2| are the lengths of a longest common subsequence of S1 and S2, S1 and S2, respectively. For simplicity, you may assume that each file contains at most 1000 DNA characters. When your program reads a DNA sequence from a file, it needs to ignore all non-DNA characters such as the newline character. Notice that this method does not need to use any compressed suffix trie. The running time of your method similarityAnalyser(f1, f2) is required to be at most O(mn) , where m and n are the sizes of f1 and f2, respectively. Any method with a higher time complexity will be given 0 mark.

You need to give the running time analyses of all the methods in terms of the Big O notation. Include your running time analyses in the source file of the CompressedSuffixTrie class and comment out them.

Reference no: EM13375400

Questions Cloud

What is the primary difference between agency funds and : what is the primary difference between agency funds and trust funds? what are the primary applications of each and how
From the scenario assuming katrinas candies is operating in : from the scenario assuming katrinas candies is operating in the monopolistically competitive market structure and faces
With the rise of the knowledge economy the traditional : with the rise of the knowledge economy the traditional valuation of an enterprise as consisting solely of measurable
Please answer questions one below and the other : please answer questions one below and the other attached1presented below are three independent situations. solve below
I this assignment you will implement the compact : in this assignment you will implement the compact representation of the compressed suffix trie adt for dna analyses.a
1 construct a scenario leading to the worst-case : 1. construct a scenario leading to the worst-case performance of the fifo buffer replacement policy.2. is it possible
Lets start by loading the data file included with this : lets start by loading the data file included with this empirical exercise into gretl. the filename is
Company a is an american firm which produces engine for : company a is an american firm which produces engine for trucks. there is a huge growth potential exist for the american
1 discuss the optimal method for procuring a modest number : 1. discuss the optimal method for procuring a modest number of standardized inputs that are sold by many firms in the

Reviews

Write a Review

Data Structure & Algorithms Questions & Answers

  Explain pros and cons of algorithm

You can start by taking 3-4 schemes for example and then show each step of the GA based algorithm numerically. Explain each step (selection, cross-over, mutation) in detail. You can show in any way as long as each step is shown and explained numer..

  Boundary value analysis

Several functions have an almost infinite number of input values. Testing all of these values is not possible in most cases, and does not necessarily tell us more than testing a few prices.

  How many different undirected graphs are there with v vertix

Graph enumeration: How many different undirected graphs are there with V vertices and E edges (and no parallel edges)? Assume the graph is represented in adjacency-list form

  Single binary search tree

You must store the words and the counts of the words in a single binary search tree and each word occurring in the text can only be stored once in the tree

  Cost control techniques

Assume your company has just completed the Initiation Process for implementing an Email System Upgrade. It was identified in a recent meeting with management leaders from the Sales,

  Give a worst-case algorithm

The input is an N by N matrix of numbers that is already in memory. Each individual row is increasing from left to right. Each individual column is increasing from top to bottom.

  Conduct space complexity analysis of the algorithm

conduct time complexity analysis of the algorithm (and also mention best case and worst case analysis if applicable).

  Problem 1given n courses along with their course strength

problem 1given n courses along with their course strength and m examination halls along with their capacity assign the

  What are the equivalence classes of this relation

Show that the reachability matrix R for an undirected graph with n vertices can be constructed in 0 ( n 2 )time.

  Algorithm-decide whether language recognized by dfa is empty

Give an algorithm to decide whether the language recognized by a DFA is empty. Given two DFAs M1 and M2, give an algorithm to decide whether L(M1)subset or equal to L(M2).

  Question about unix and shell scripting

Explain the results of executing each of the following grep commands in your home directory.

  Write algorithm using pseudocode to recognize substrings

Write the algorithm, using pseudocode, to do the following task, Given the string of numbers, recognize all the substrings which form numbers which are divisible by 3.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd