Define zero order markov model for sequence

Assignment Help Advanced Statistics
Reference no: EM131008526

Question 1. Profile HMMs for sequence families

a) Define matching (M), insert (I) and delete (D) states of the multiple sequence alignment (MSA) shown in Figure 1

b) Derive parameters of profile HMM for MSA given in figure 1
I. Emission counts for match states
II. Emission counts for insert states
III. Counts of transitions between states
IV. Emission probabilities for match, insert, and hidden states

Figure 1. Multiple sequence alignment of five DNA sequences

T--CT-

-AA-TA

T--CTA

TC-G-A

C-CGAC

Feel free to use Durbin's Figure 5.7c format

2. Provide 1-1.5 page review for the paper "Genome-wide genetic marker discovery and genotyping using next-generation sequencing" available under this week's course content

Some guidelines:
- Underline main points of the paper.
- Keep your work structured.
- While focusing on big picture keep in mind our class is on statistical processes.

3. Use <HW3_solution_reviewN.pptx> file available in course content for this week tor write and submit R-script which will:

a. Define HMM model for Q4 in Homework 3
b. Parse the Homework 3 Q4 sequence to show sequence of hidden states using Viterbi algorithm:;

Homework 3 Solution Question 4: (a) Define zero order Markov model for sequence2_A2, which represents portion of non-coding sequence of Mycobacterium tuberculosis (refer to course content)
zero order for sequence2_A2:
P(A) 107 0.195255474
P(C) 156 0.284671533
P(G) 183 0.333941606
P(T) 102 0.186131387

b) Use zero order Markov models defined for sequence1_A2 and sequence2_A2 and apply Viterbi algorithm to find the most likely path for sequence CGCGTTACTTCAATG without taking frame into consideration

Assume:
Initial transition probabilities
a0c= a0n =0.5
State transition probabilities
acc 0.55
acn 0.45
ann 0.5
anc 0.5

where, aij is transition probability, c- coding, n-non-coding

sequence CGCGTTACTTCAATG
path of hidden states CCCCNNCCNNCCCCC

Attachment:- post.xlsx

Reference no: EM131008526

Questions Cloud

What is a typical value for this data set : Construct a back-to-back stem-and-leaf display for the wireless percentage of the states in the West and the states in the East. How do the distributions of wireless percentages compare for states in the East and states in the West?
Write a perl program that asks a user for a motif : Write a Perl program that asks a user for a motif (like QDSV or MKPL) and returns a message saying whether the motif is found in the sequence or not - Write a program that calculates and prints
Prepare any journal entry necessary as a direct result : Determine the amounts to be reported for each of the five items shown above from the 2009 and 2010 financial statements when those amounts are reported again in the 2009-2011 comparative financial statements.
Was the community experience better or worse than expected : Newgroveton is a community of 445,000. In the most recent year, there were 750 new cases of disease A in the community. Assume the expected incidence rate for disease A is 245 per 100,000 people. Was the community's experience better or worse than..
Define zero order markov model for sequence : Page review for the paper "Genome-wide genetic marker discovery and genotyping using next-generation sequencing" available under week's course content
The energy stored in the dielectric in joules : A dielectric slab with 500mm x 500mm cross-section is 0.4m long. The slab is subjected to a uniform electric field of E = 6ax + 8aykV /mm . The relative permittivity of the dielectric material is equal to 2. The value of constant ε0 is8.85 × 10-12F /..
What is the mad for the moving average forecast : What is the forecast for year 13 based on the 5-year moving average? What is the forecast for year 13 based on the 5-year weighted moving average? What is the MAD for the moving average forecast
Summarize this information using a comparative bar graph : Summarize this information using a comparative bar graph that shows differences between males and females within the two different age groups. Comment on the interesting features of your graphical display.
The effective capacitance across the terminals : Three capacitors C1, C2, and C3 whose values are 10µF, 5µF and 2µF respectively, have breakdown voltages of 10V, 5V and 2V respectively. For the interconnection shown, the maximum safe voltage in Volts that can be applied across the combination and t..

Reviews

Write a Review

Advanced Statistics Questions & Answers

  Relationship between speed, flow and geometry

Write a project proposal on relationship between speed, flow and geometry on single carriageway roads.

  Logistic regression model

Compute the log-odds ratio for each group in Logistic regression model.

  Logistic regression

Foundations of Logistic Regression

  Probability and statistics

The tubes produced by a machine are defective. If six tubes are inspected at random , determine the probability that.

  Solve the linear model

o This is a linear model. If your model needs a different engine, then you need to rethink your approach to the model. Remember, there are no IF, Max, or MIN statements in linear models.

  Plan the analysis

Plan the analysis

  Quantitative analysis

State the hypotheses that you are going to test.

  Modelise as a markov chain

modelise as a markov chain

  Correlation and regression

What are the degrees of freedom for regression

  Construct a frequency distribution for payment method

Construct a frequency distribution for Payment method

  Perform simple linear regression

Perform simple linear regression

  Quality control analysis

Determining the root causes

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd