Implement the smith-waterman alignment

Assignment Help C/C++ Programming
Reference no: EM132842759

Problem #1 - Implement Smith-Waterman algorithm

A. Write a function in C++ that would implement the Smith-Waterman alignment between two genomic sequences.
o The function must take a genomic similarity scoring matrix (+2 for match, -1 for mismatch) and a gap penalty (-3) as an input.
o The function must return:
• score for best alignment
• alignment in text format (hint: use a struct of 3 character arrays, 2 for sequences one for alignment codes {x, I , } - use whitespace for gaps). See below.
o Test your code using the SARS-COV2 viral genome found in appendix A at the bottom of this homework assignment and sequence fragments found in appendix B at the bottom of this homework.
o You will need to submit a screenshot of the output as part of the homework write up (it should look something like the text below)

B. Generate 1K, 10K, 100K, and 1M (million) completely random genomic sequences (50nt) to use as targets for alignment and use SARS-COV2 genome as subject. Perform alignment of the queries to the subject sequence and record time to completion (in seconds / minutes).
Special note: Depending on the speed of your alignment implementation, this homework may take hours or days to complete. The goal is to get a sense for how slow these ‘optimal' alignment algorithms are... for the explicit purpose of establishing a baseline to be able to compare the improved algorithms and data structures we will be discussing later in the course. Please be aware of this and ‘give up' on larger benchmarks as appropriate.

Problem #2 - Having a BLAST

A. Implement a seed-based Smith Waterman. This means:
o Use the genome in Appendix A and break I down into seeds with word size = 11.
o Load your seeds (created in part A) into memory
o For each read disassemble it into k-mers (of size 11). Compare your read k-mers to the SARS-COV2 seeds.
o If a seed-match is found, extend the seed by cutting out the appropriate segment of the subject (Genome of SARS-COV2) and running the Smith Waterman on the two sequences (original read and the segment from SARS-COV2)
• Beware of edge cases
• Ok to just expand one seed and be done (multiple seeds from a read can be found, typically necessitating multiple seed expansions and decision on what is the best alignment)
o Test your code on the 50-mers I have provided below in Appendix B. You must report alignment for the 50-mers I've provided as part of the homework solution.

B. Test your code on a set of 1K, 10K, 100K, and 1M (million) completely random 50-mers, aligning them to SARS-COV2 genome. How long did it take? Compare it to the results in problem 1B.

C. Test algorithm's exhaustiveness. Randomly select 100,000 fragments from the SARS-COV2 genome and use these fragments to query the SARS-COV2 genome using the seed-based SW you implemented in part A. How many fragments were you able to find? Now introduce random errors into your 100,000 fragments at a 5% per-base error rate (every character has a 5% change of being changed to some other random character). Use these error-filled 100,000 fragments to query the SARS-COV2 genome again. How many fragments were you able to find?

Attachment:- algorithm_Homework.rar

Reference no: EM132842759

Questions Cloud

Determine the speculator profit the euro appreciates : 1- The current spot exchange rate is $1.20/£ and the three-month forward rate is $1.18/£. Based on your research, you expect the exchange rate to be $1.19/£ in
How you have grown in the areas of educational theory : Analyze how you have grown in the areas of educational theory, design, and analysis throughout your time in the M.Ed. program.
Assess the effect of the gag rule on abolitionist movement : Analyze how the women's rights movement would gain momentum from the antislavery movement. Assess the effect of the Gag Rule on Abolitionist Movement.
What is the value added cost : A competitor keeps 10 days of inventory on hand the competitor's carrying costs average $1,000 per day. What is the value added cost
Implement the smith-waterman alignment : Write a function in C++ that would implement the Smith-Waterman alignment between two genomic sequences and Implement a seed-based Smith Waterman
Calculate the total return on bond equivalent basis : Suppose that an investor has 6 years investment horizon. The investor is considering a 15-year semi-annual coupon bond selling at par and having a coupon rate
Determine weaknesses to help you grow in your field : Reflection empowers us to develop and make sense of all the great (and not so great) learning and working experiences we have been through.
Describe four different historical time periods : Describe four different historical time periods in which international business activities occurred?
What impacts of mexican immigration across the united states : What was going on during this time period in relation to Mexican immigration into the southern states as well as the political, social, and economical?

Reviews

Write a Review

C/C++ Programming Questions & Answers

  Write a function called reverselist that creates a new list

Write a function called reverseList that creates a new linked list containing student_t nodes of an input linked list in the reverse order without destroying.

  Explain use of access modifiers for methods and variables

Explain use of access modifiers (public, private, ...) for your methods and variables.

  Describe the most fun aspect of the given assignment

Describe most fun aspect of the assignment. Describe the most challenging aspect of the assignment. Describe the most difficult aspect of the assignment to understand. Provide any suggestions for improving the assignment in the future.

  Write the program with indentation and formatting style

Write the program with indentation and formatting style as discussed in class, and given in the program requirement. Write documentation for your programs as discussed in class and given in the program requirement.

  Create a serial object s

The function call operator is overloaded and will generate a sequential integer each time the operator is used and the object can be created with the sequence start value specified.

  1 prepare a program to read 3 numbers x y z use conditional

1. prepare a program to read 3 numbers x y z. use conditional statement and evaluate values of variables a b and c.

  What the final value of the variable sum e was

Write a program that asks the user to enter an integer dollar amount between 1 and 5,000. Your program should display the corresponding class description using the following table. Write the program so that it executes until the user inputs some v..

  Simulate a simple microcontroller called the simpletron

Building a microcontroller simulator is a fantastic way to build your understanding of how computers work. Deitel & Deitel have an interesting problem called the Simpletron and the Simple Compiler.

  Define an array and why you would use one

Define an Array and why you would use one. Provide an example that is not from the text.

  Grade book program for his class

Your English instructor, realizing you are a programmer, asks you to write a Grade Book program for his class to help him compute final grades. Design a program that asks for the student's name and four test grades.

  Write a program in which the program prints out

Use (switch statement) to write a program in which the program prints out the input (single character) if the character is not '2','t', or 'w'. Use 'default' and 'break' wisely.

  Total cost for buying the books

Print out the title of the book. If a 20% discount is provided for buying the book, calculate and print out the discounted price for the book. Also, calculate and print out the total cost for buying the books at the discounted price.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd