Implement smith-waterman algorithm

Assignment Help Other Subject
Reference no: EM132846268

Problem #1 - Implement Smith-Waterman algorithm

A. Write a function in C++ that would implement the Smith-Waterman alignment between two genomic sequences.
o The function must take a genomic similarity scoring matrix (+2 for match, -1 for mismatch) and a gap penalty (-3) as an input.
o The function must return:
• score for best alignment
• alignment in text format (hint: use a struct of 3 character arrays, 2 for sequences one for alignment codes {x, I , } - use whitespace for gaps). See below.
o Test your code using the SARS-COV2 viral genome found in appendix A at the bottom of this homework assignment and sequence fragments found in appendix B at the bottom of this homework.
o You will need to submit a screenshot of the output as part of the homework write up (it should look something like the text below)

B. Generate 1K, 10K, 100K, and 1M (million) completely random genomic sequences (50nt) to use as targets for alignment and use SARS-COV2 genome as subject. Perform alignment of the queries to the subject sequence and record time to completion (in seconds / minutes).
Special note: Depending on the speed of your alignment implementation, this homework may take hours or days to complete. The goal is to get a sense for how slow these ‘optimal' alignment algorithms are... for the explicit purpose of establishing a baseline to be able to compare the improved algorithms and data structures we will be discussing later in the course. Please be aware of this and ‘give up' on larger benchmarks as appropriate.

Problem #2 - Having a BLAST

A. Implement a seed-based Smith Waterman. This means:
o Use the genome in Appendix A and break I down into seeds with word size = 11.
o Load your seeds (created in part A) into memory
o For each read disassemble it into k-mers (of size 11). Compare your read k-mers to the SARS-COV2 seeds.
o If a seed-match is found, extend the seed by cutting out the appropriate segment of the subject (Genome of SARS-COV2) and running the Smith Waterman on the two sequences (original read and the segment from SARS-COV2)
• Beware of edge cases
• Ok to just expand one seed and be done (multiple seeds from a read can be found, typically necessitating multiple seed expansions and decision on what is the best alignment)
o Test your code on the 50-mers I have provided below in Appendix B. You must report alignment for the 50-mers I've provided as part of the homework solution.

B. Test your code on a set of 1K, 10K, 100K, and 1M (million) completely random 50-mers, aligning them to SARS-COV2 genome. How long did it take? Compare it to the results in problem 1B.

C. Test algorithm's exhaustiveness. Randomly select 100,000 fragments from the SARS-COV2 genome and use these fragments to query the SARS-COV2 genome using the seed-based SW you implemented in part A. How many fragments were you able to find? Now introduce random errors into your 100,000 fragments at a 5% per-base error rate (every character has a 5% change of being changed to some other random character). Use these error-filled 100,000 fragments to query the SARS-COV2 genome again. How many fragments were you able to find?

Attachment:- algorithm_Homework.rar

Reference no: EM132846268

Questions Cloud

What is the change in operating income for the year : What is the change in operating income for the year if $6.50 is the new selling price and costs remain the same but sales increase to 500,000
Should original sources be credited since technology driven : Should the original sources be credited since it is technology driven, or is it similar to a person learning about a topic and creating their own work?
Prepare the journal entry to record the expense : Equipment with a cost of $120,000 has an estimated salvage value of $15,000. Prepare the journal entry to record the expense under the unit of activity method
Discuss the financial benefits of chatbots : Discuss how IBM Watson will reach 1 billion people by 2018 and what the implications of that are. Discuss the financial benefits of chatbots.
Implement smith-waterman algorithm : Implement Smith-Waterman algorithm - Test your code using the SARS-COV2 viral genome found in appendix A at the bottom
Describe various ways that knowledge management systems : Describe various ways that knowledge management systems could help firms with sales and marketing or with manufacturing and production.
Find the balance of the investment in associate account : Berry Limited revalued its Plant downwards by $50,000 during the current financial period. Find the balance of the investment in associate account
Compute and display the average sales value : Compute and display the average sales value and the largest and the smallest daily sales values of the numbers entered. Write program that requests daily sale.
How much will service revenue will be reflected : If Peter & Sons makes the appropriate adjusting entry, how much will service revenue will be reflected on the December 31, 2019 income statement

Reviews

Write a Review

Other Subject Questions & Answers

  Define role and responsibilities in hls

Select one such agency and describe their role and responsibilities in HLS. Which agencies do you believe they would work most closely

  What your classmates have already said

The goal of this discussion forum is to have a single conversation about the topic, not a series of 30 separate conversations. This means that every post.

  Ultimate database on the topic of the ethics of drug testing

Choose article within either the ABI/INFORM Collection database or the Business Source Ultimate database on the topic of the ethics of drug testing

  Effective leaders recognize the needs of their employees

According to the SL-II model, effective leaders recognize the needs of their employees and adapt their style accordingly

  What are some examples of e-contracts

Today, when nearly everything is moving online, agreements are no different. Even in your daily life, you would have accepted licensing and other agreements.

  What health promotion activities often practiced by group

What health promotion activities are often practiced by this group? Describe at least one approach using the three levels of health promotion prevention.

  How you might use theoretical and operational definitions

Discuss how the theory might be used to support nursing practice (clinical, education, or administration). Include in the discussion the purpose of the practice

  Rebuild cities that experience natural events

Considering the 2004 Tsunami, does it make sense to rebuild cities that experience these natural events after such devastation.

  What are the possible ramifications to our society

What are the possible ramifications to our society of such a notion?

  How the pyramids at giza were constructed

There have been many theories regarding how the pyramids at Giza were constructed. Most experts agree that they were constructed as burial monuments.

  Some sexual practices cause social harm

Some sexual practices cause social harm and are outlawed and subject to state control. Identify and discuss these outlawed forms of deviant sexuality

  Describe the factors affecting grief in response

Describe the factors affecting grief in response to the death of a parent and the death of a spouse.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd