INF 503 Large-scale Data Structures And Organization

Assignment Help C/C++ Programming
Reference no: EM132839594

INF 503 Large-scale Data Structures And Organization - Northern Arizona University

Problem - Fun with direct access arrays

Create a class called FASTAreadset_DA. The purpose of the class will be to contain a FASTA read set (similar to homeworks #1 and #2) and all of the functions needed to operate on this set. Use the direct access hash table data-structure to store the genomic sequences of the given read dataset (hint: use an array of Boolean values - bool[] for your hash table). You will need to read in the genomic sequence fragments (feel free to ignore / discard all headers), covert them to a radix notation number (hint: try using an unsigned int to store the radix value), and flip the proper Boolean in the hash array to TRUE. If the Boolean is already "ON" (i.e. you are seeing a duplicate fragment), you'll need to record this ‘collision'.

At minimum, the class must contain:
• A constructor
• A destructor
• A function to search the hash table for a given 16-mer sequence
• A function to insert a given 16-mer sequence into the hash table
• Private variables to store the total # of collisions and # of elements stored in the array

A. Getting started: read in the read data set into your data structure
• What is the size of your hash table?
• How many collisions did you observe?
• How many unique sequences did you observe (number of "ON" Boolean values)?
• What is the load (αT) in your hash table?
B. Search time in direct access arrays: read in the genome sequence provided above, iterate through all 16-mers found in the genome, and use them to query the read set (similar to what you did in HW#2, problem 2B).
• How many genome 16-mer fragments were found in your read set?
• How long did it take to complete the entire search process (all 16-mers)?

Problem #2: The hash table with chaining
Create a class called FASTAreadset_Chain. Use the hash table data-structure to store the genomic sequences of the given read dataset (hint: you will need to provide the size of the hash table). If you have a duplicate sequence fragment or a duplicate hash value, use chaining method to resolve collisions. Resizing is optional - you can hard-code the proper hash table size through the constructor. Use Radix / division scheme for hash function implementation.
At minimum, the class must contain:
• A constructor
• A destructor
• A function to search the hash table for a given 16-mer sequence
• A function to insert a given 16-mer sequence into the hash table
• A private variable to set the hash table size
• A private variable to count the number of collisions during hash table creation

A. Assessing the impact of the hash table size. For this you will need to set the hash table to a fixed value (m, see below) and read in the read set to populate the hash table. Set the size of your hash table (m) to 10 thousand, 100 thousand, 1 million, and 10 million elements.
• For each of your 4 hash table sizes, how many collisions did you observe while populating the hash?
• For each of your 4 hash table sizes, how long did it take you to read the sequence fragment file?
• Do the results make sense? Explain.

B. Searching in the chain-linked hash table. Set the hash size to 10,000,000 and populate it using the read set. Read in the genome, iterate through all 16-mers found in the genome, and use them to query the read set (similar to what you did in HW#2, problem 2B).
• How many genome 16-mer fragments were found in your read set?
• How long did it take to complete the entire search process (all 16-mers)?
• How does that compare to the direct access array search times you've implemented as part of problem 1B?

Attachment:- Direct access arrays.rar

Reference no: EM132839594

Questions Cloud

Offensive and defensive strategies : Determine whether to pursue offensive or defensive strategies to improve an organization's market position
Prepare extraordinary item portion of financial statement : First Bank Corporation has an effective tax rate of 35%. Prepare the extraordinary item portion of First Bank Corporation's financial statement
Describe the benefits of the guidance approach : For this assignment, create a digital brochure that could be emailed to families explaining the guidance approach and its importance for teaching children.
Prepare the journal entry for the debt service fund : Prepare the journal entry for the debt service fund to reflect the transfer of funds from the General Fund to the debt service fund in anticipation
INF 503 Large-scale Data Structures And Organization : INF 503 Large-scale Data Structures And Organization Assignment Help and Solution, Northern Arizona University - Assessment Writing Service
Create an activity that demonstrates the oral language : Create an activity that demonstrates the oral language or listening enhancing feature of the literature selection. Consider diverse learners. You will use this.
Who has the final responsibility for positions taken : She pays alimony to her former husband. Who has the final responsibility for positions taken with respect to the controversial matters on the tax return?
What is the value on November : What is the value on November 8, 2000 of making quarterly payments of $260 over 11 years if the first payment is on February 8, 2009
Customer responsiveness and innovation : How do these resources enable Southwest to improve one or more of the following: ef-ficiency, quality, customer responsiveness and innovation?

Reviews

Write a Review

C/C++ Programming Questions & Answers

  Receiving a beneficial card

The version of the game will imagine only a single suit of cards, so 13 unique cards, {2,3,4,5,6,7,8,9,10,J,Q,K,A}. Given two cards from the set

  Subscripts to indicate indexing

Use an equation editor for the equations. If you cannot get one, then use subscripts to indicate indexing. Graphs should be detailed and easy to read.

  Write a program that takes four command line arguments

Write a program that takes four command line arguments: start, stop, step, and file. The first three arguments are nonnegative integers and the fourth is a string.

  A charitable organization wants to design a special atm

A charitable organization wants to design a special ATM machine to be used by needy people.   The association supplies the needy person with a pin number to be able to use the ATM machine.   He/she can then retrieve up to $150 per day depending on..

  Draw the heap

Eric Rowe Starting with an empty minheap, draw the heap after each the completion of the following operations  and repeat for a maxheap (replace removeMin() with removeMax()). Upload your drawings here. Hand drawn is fine:

  Write a function that accepts an int array

Element 0 of the argument arrayshould be copied to element 1 of the new array, element 1 of the argument array should be copied to element 2 of the new array, and so forth. The function shouldreturn a pointer to the new array.

  Write a program to analyze students course marks

Write a program to analyze students course marks. The program should read in students course marks from "m2-grading.txt". The data file has a headline including students ID#, course-1marks, course-2 marks and course-3 markes.

  What are some of the benefits of modularity

What are some of the benefits of modularity and what is functional abstraction and what is information hiding

  Compare run times using an array

Write a iterative and recursive versions of binary search and compare their run times using the array a[i]=i, i=0,...,n-1 and the subsequent test method:

  Function that shifts the stored value of five character

Write a function that shifts the stored value of five character variables in a circular fashion. Your function should work in the following way. Suspose that C1, C2, C3, C4, C5 are variables of type char, and suspose that the values of these varia..

  Determining the total number of employees

Write a computer program to print a report with employee name and total salary for those employees whose total salary is more than 10,000. Total Salary = BASIC + HRA. At the end, the program should also print the total number of employees whose to..

  You need to prepare a program linear solver

Write a C program, called linear solver.c, that solves single-variable linear equations. The pro- gram should prompt the user to enter a linear equation of the form

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd