Solution-Implement the smith-waterman alignment

Implement the smith-waterman alignment

Assignment Help C/C++ Programming

Reference no: EM132842759

Problem #1 - Implement Smith-Waterman algorithm

A. Write a function in C++ that would implement the Smith-Waterman alignment between two genomic sequences.
o The function must take a genomic similarity scoring matrix (+2 for match, -1 for mismatch) and a gap penalty (-3) as an input.
o The function must return:
• score for best alignment
• alignment in text format (hint: use a struct of 3 character arrays, 2 for sequences one for alignment codes {x, I , } - use whitespace for gaps). See below.
o Test your code using the SARS-COV2 viral genome found in appendix A at the bottom of this homework assignment and sequence fragments found in appendix B at the bottom of this homework.
o You will need to submit a screenshot of the output as part of the homework write up (it should look something like the text below)

B. Generate 1K, 10K, 100K, and 1M (million) completely random genomic sequences (50nt) to use as targets for alignment and use SARS-COV2 genome as subject. Perform alignment of the queries to the subject sequence and record time to completion (in seconds / minutes).
Special note: Depending on the speed of your alignment implementation, this homework may take hours or days to complete. The goal is to get a sense for how slow these ‘optimal' alignment algorithms are... for the explicit purpose of establishing a baseline to be able to compare the improved algorithms and data structures we will be discussing later in the course. Please be aware of this and ‘give up' on larger benchmarks as appropriate.

Problem #2 - Having a BLAST

A. Implement a seed-based Smith Waterman. This means:
o Use the genome in Appendix A and break I down into seeds with word size = 11.
o Load your seeds (created in part A) into memory
o For each read disassemble it into k-mers (of size 11). Compare your read k-mers to the SARS-COV2 seeds.
o If a seed-match is found, extend the seed by cutting out the appropriate segment of the subject (Genome of SARS-COV2) and running the Smith Waterman on the two sequences (original read and the segment from SARS-COV2)
• Beware of edge cases
• Ok to just expand one seed and be done (multiple seeds from a read can be found, typically necessitating multiple seed expansions and decision on what is the best alignment)
o Test your code on the 50-mers I have provided below in Appendix B. You must report alignment for the 50-mers I've provided as part of the homework solution.

B. Test your code on a set of 1K, 10K, 100K, and 1M (million) completely random 50-mers, aligning them to SARS-COV2 genome. How long did it take? Compare it to the results in problem 1B.

C. Test algorithm's exhaustiveness. Randomly select 100,000 fragments from the SARS-COV2 genome and use these fragments to query the SARS-COV2 genome using the seed-based SW you implemented in part A. How many fragments were you able to find? Now introduce random errors into your 100,000 fragments at a 5% per-base error rate (every character has a 5% change of being changed to some other random character). Use these error-filled 100,000 fragments to query the SARS-COV2 genome again. How many fragments were you able to find?

Attachment:- algorithm_Homework.rar

Reference no: EM132842759

Questions Cloud

Determine the speculator profit the euro appreciates : 1- The current spot exchange rate is $1.20/£ and the three-month forward rate is $1.18/£. Based on your research, you expect the exchange rate to be $1.19/£ in

How you have grown in the areas of educational theory : Analyze how you have grown in the areas of educational theory, design, and analysis throughout your time in the M.Ed. program.

Assess the effect of the gag rule on abolitionist movement : Analyze how the women's rights movement would gain momentum from the antislavery movement. Assess the effect of the Gag Rule on Abolitionist Movement.

What is the value added cost : A competitor keeps 10 days of inventory on hand the competitor's carrying costs average $1,000 per day. What is the value added cost

Implement the smith-waterman alignment : Write a function in C++ that would implement the Smith-Waterman alignment between two genomic sequences and Implement a seed-based Smith Waterman

Calculate the total return on bond equivalent basis : Suppose that an investor has 6 years investment horizon. The investor is considering a 15-year semi-annual coupon bond selling at par and having a coupon rate

Determine weaknesses to help you grow in your field : Reflection empowers us to develop and make sense of all the great (and not so great) learning and working experiences we have been through.

Describe four different historical time periods : Describe four different historical time periods in which international business activities occurred?

What impacts of mexican immigration across the united states : What was going on during this time period in relation to Mexican immigration into the southern states as well as the political, social, and economical?

User Account

All Pages

Implement the smith-waterman alignment

Reference no: EM132842759

Reference no: EM132842759

Questions Cloud

Reviews

Write a Review

C/C++ Programming Questions & Answers

Create program that uses functions and reference parameters

Write a program using vectors and iterators

Write the code required to analyse and display the data

Write a webservices application

Iimplement a client-server of the game

Model-view-controller

Design a nested program

Convert celsius temperatures to fahrenheit temperatures

Evaluate and output the value in the given base

Design a base class shape with virtual functions

Implementation of classes

Technical paper: memory management

Assured A++ Grade

Academics

Major Subjects

Majors

Get In Touch

TERMS & POLICIES

HELP & SUPPORT