Implement the smith-waterman alignment

Assignment Help C/C++ Programming
Reference no: EM132842759

Problem #1 - Implement Smith-Waterman algorithm

A. Write a function in C++ that would implement the Smith-Waterman alignment between two genomic sequences.
o The function must take a genomic similarity scoring matrix (+2 for match, -1 for mismatch) and a gap penalty (-3) as an input.
o The function must return:
• score for best alignment
• alignment in text format (hint: use a struct of 3 character arrays, 2 for sequences one for alignment codes {x, I , } - use whitespace for gaps). See below.
o Test your code using the SARS-COV2 viral genome found in appendix A at the bottom of this homework assignment and sequence fragments found in appendix B at the bottom of this homework.
o You will need to submit a screenshot of the output as part of the homework write up (it should look something like the text below)

B. Generate 1K, 10K, 100K, and 1M (million) completely random genomic sequences (50nt) to use as targets for alignment and use SARS-COV2 genome as subject. Perform alignment of the queries to the subject sequence and record time to completion (in seconds / minutes).
Special note: Depending on the speed of your alignment implementation, this homework may take hours or days to complete. The goal is to get a sense for how slow these ‘optimal' alignment algorithms are... for the explicit purpose of establishing a baseline to be able to compare the improved algorithms and data structures we will be discussing later in the course. Please be aware of this and ‘give up' on larger benchmarks as appropriate.

Problem #2 - Having a BLAST

A. Implement a seed-based Smith Waterman. This means:
o Use the genome in Appendix A and break I down into seeds with word size = 11.
o Load your seeds (created in part A) into memory
o For each read disassemble it into k-mers (of size 11). Compare your read k-mers to the SARS-COV2 seeds.
o If a seed-match is found, extend the seed by cutting out the appropriate segment of the subject (Genome of SARS-COV2) and running the Smith Waterman on the two sequences (original read and the segment from SARS-COV2)
• Beware of edge cases
• Ok to just expand one seed and be done (multiple seeds from a read can be found, typically necessitating multiple seed expansions and decision on what is the best alignment)
o Test your code on the 50-mers I have provided below in Appendix B. You must report alignment for the 50-mers I've provided as part of the homework solution.

B. Test your code on a set of 1K, 10K, 100K, and 1M (million) completely random 50-mers, aligning them to SARS-COV2 genome. How long did it take? Compare it to the results in problem 1B.

C. Test algorithm's exhaustiveness. Randomly select 100,000 fragments from the SARS-COV2 genome and use these fragments to query the SARS-COV2 genome using the seed-based SW you implemented in part A. How many fragments were you able to find? Now introduce random errors into your 100,000 fragments at a 5% per-base error rate (every character has a 5% change of being changed to some other random character). Use these error-filled 100,000 fragments to query the SARS-COV2 genome again. How many fragments were you able to find?

Attachment:- algorithm_Homework.rar

Reference no: EM132842759

Questions Cloud

Determine the speculator profit the euro appreciates : 1- The current spot exchange rate is $1.20/£ and the three-month forward rate is $1.18/£. Based on your research, you expect the exchange rate to be $1.19/£ in
How you have grown in the areas of educational theory : Analyze how you have grown in the areas of educational theory, design, and analysis throughout your time in the M.Ed. program.
Assess the effect of the gag rule on abolitionist movement : Analyze how the women's rights movement would gain momentum from the antislavery movement. Assess the effect of the Gag Rule on Abolitionist Movement.
What is the value added cost : A competitor keeps 10 days of inventory on hand the competitor's carrying costs average $1,000 per day. What is the value added cost
Implement the smith-waterman alignment : Write a function in C++ that would implement the Smith-Waterman alignment between two genomic sequences and Implement a seed-based Smith Waterman
Calculate the total return on bond equivalent basis : Suppose that an investor has 6 years investment horizon. The investor is considering a 15-year semi-annual coupon bond selling at par and having a coupon rate
Determine weaknesses to help you grow in your field : Reflection empowers us to develop and make sense of all the great (and not so great) learning and working experiences we have been through.
Describe four different historical time periods : Describe four different historical time periods in which international business activities occurred?
What impacts of mexican immigration across the united states : What was going on during this time period in relation to Mexican immigration into the southern states as well as the political, social, and economical?

Reviews

Write a Review

C/C++ Programming Questions & Answers

  Create program that uses functions and reference parameters

Create program that uses functions and reference parameters, and asks user for the outside temperature.

  Write a program using vectors and iterators

Write a program using vectors and iterators that allows a user to maintain a personal list of DVD titles

  Write the code required to analyse and display the data

Calculate and store the average for each row and column. Determine and store the values for the Average Map.

  Write a webservices application

Write a webservices application that does a simple four function calculator

  Iimplement a client-server of the game

Iimplement a client-server version of the rock-paper-scissors-lizard-Spock game.

  Model-view-controller

Explain Model-View-Controller paradigm

  Design a nested program

How many levels of nesting are there in this design?

  Convert celsius temperatures to fahrenheit temperatures

Write a C++ program that converts Celsius Temperatures to Fahrenheit Temperatures.

  Evaluate and output the value in the given base

Write C program that will input two values from the user that are a Value and a Base with which you will evaluate and output the Value in the given Base.

  Design a base class shape with virtual functions

Design a base class shape with virtual functions

  Implementation of classes

Implementation of classes Chart and BarChart. Class barChart chould display a simple textual representation of the data

  Technical paper: memory management

Technical Paper: Memory Management, The intent of this paper is to provide you with an in depth knowledge of how memory is used in executing, your programs and its critical support for applications.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd