Calculate the most likely hidden state sequence

Assignment Help Applied Statistics
Reference no: EM132252822

Machine Learning Homework - HMM for DNA Sequence

The goal of this assignment is for you to gain familiarity with the hidden Markov model (HMM). Specifically, you will use HMM to decode a simple DNA sequence. It is well known that a DNA sequence is a series of components from A, C, G, T. Now let's assume there is one hidden variable S that controls the generation of DNA sequence. S takes 2 possible states S1, S2. Assume the following transition probabilities for HMM

P(S1|S1) = 0.8, P(S2|S1) = 0.2, P(S1|S2) = 0.2, P(S2|S2) = 0.8

emission probabilities as following

P(A|S1) = 0.3, P(C|S1) = 0.2, P(G|S1) = 0.3, P(T|S1) = 0.2

P(A|S2) = 0.1, P(C|S2) = 0.4, P(G|S2) = 0.1, P(T|S2) = 0.4

and initial probabilities as following

P(S1) = 0.5, P(S2) = 0.5

All transition, emission, initial probabilities are together referred to as θ. Assuming the observation sequence is O = CGTCA, in the first part of this assignment, you will manually calculate the most likely hidden state sequence using the Viterbi algorithm.

In the second part of this assignment, you are provided with a new observation sequence of O = ATCG. Please compute the probability of observing O together with intermediate calculations. If you would like to report log-probability, that also works. Please use the natural logarithm.

Questions -

1. Manually calculate the most likely hidden state sequence using the Viterbi algorithm.

2. Report the decoded state sequence.

3. Together with intermediate calculations, including the V-matrix and backtracking matrix.

4. Provided with a new observation sequence of O = ATCG.

5. Compute the probability of observing O together with intermediate calculations.

6. Report log-probability. Please use the natural logarithm.

Attachment:- Assignment File.rar

Reference no: EM132252822

Questions Cloud

Is the following constant declaration valid : 1) Is the following constant declaration valid? 2) Which of the following C++ statements declares and initializes degrees to 3.25%?
Determine whether the manager is making good decisions : Given the importance of proper assumptions, your boss asked you to assess the accuracy of certain business assumptions.
Context free gramamr in chomsky normal form : Show that if G is a Context Free Gramamr in Chomsky normal form, then for any string ?? L(G), |?|=n=1, then exactly 2n-1 steps are required for anyderivation
How will you evaluate effectiveness : List appropriate nursing interventions for your chosen patient or community. How will you evaluate effectiveness? Include an evaluation tool or rubric.
Calculate the most likely hidden state sequence : CAP5610 Machine Learning Homework - HMM for DNA Sequence. Calculate the most likely hidden state sequence using the Viterbi algorithm
Display salesorderid-orderdate : Display salesorderid, orderdate, totaldue, and territory name from salesorderheader and salesterritory for all totaldue that are greater
Describe potential risks associated with this project : Share other important components that a project manager should consider as this project continues into the execution phase.
Explain what the processor will do in this fragment : Explain what the processor will do in this fragment? What will be stored in "m"?
Who are the project stakeholders : How should they communicate to different stakeholders during the project? What information should be shared with the project stakeholders?

Reviews

len2252822

3/10/2019 11:32:56 PM

Note: Homework modified from Eric Xing at Carnegie Mellon. (100 points) Please submit: A report named report first name lastname.pdf. Please report the de-coded state sequence (20 points), together with intermediate calculations, including the V-matrix (40 points) and backtracking matrix (40 points). (25 Bonus points) In the second part of this assignment, you are provided with a new observation sequence of O = ATCG. Please submit: A report named report first name lastname.pdf. Please compute the probability of observing O together with intermediate calculations. If you would like to report log-probability, that also works. Please use the natural logarithm.

Write a Review

Applied Statistics Questions & Answers

  Which of the following statements are true probabilities can

Which of the following statements are true? A.Probabilities can be any positive value. B.Probabilities must be nonnegative. C.Probabilities must be negative. D.Probabilities can either be positive or negative.

  Based on the number and types of variables present select t

Based on the number and types of variables present, select the most appropriate display for each of the following: Rent charged (in dollars) and apartment size (in sq. ft.) of a sample of one-bedroom apartments in State College. A) Bar Graph B) Histo..

  What influences the probability of a type ii error

What influences the probability of a Type II error?- What is the difference between statistical significance and practical significance?

  A gallup poll used telephone interviews to survey

A Gallup Poll used telephone interviews to survey a sample of 1025 U.S. residents over the age of 18 regarding their use of credit cards. The poll reported that 76% of Americans said that they had at least one credit card. Give the 95% margin of erro..

  Determine the chi-square value

1.Use α=0.01, and Determine the Chi-Square value, and come to the appropriate conclusion concerning this goodness of fit procedure. *From the Table of Random Numbers...all have a probability of 1/10 "numbers from 0-9"

  Find the probability that the sales for next month

Find the probability that the sales for next month was 15,000 or larger.

  What is the standard deviation

Look at the first two outcomes in the previous problem, X(0) and X(1). What is the standard deviation s (not the variance) of Z

  A quality control engineer at a potato chip company tests

A quality control engineer at a potato chip company tests the bag-filling machine by weighing bags of potato chips. Not every bag contains exactly the same weight. But if more than 15% of bags are overfilled, then they stop production to fix the mach..

  What is the coefficient of determination adjusted for degree

Perform a multiple regression in Excel and provide excel output for the regression model = βo + βo(Lot size) + βo(Trees) + βo(Distance) + ∈ Write down the equation for regression line. What is the standard error of estimate? Interpret its value. What..

  Determine the expected value of the number guessed

Determine the expected value of the number guessed by the caller.

  The center for epidemiologic studies depression scale

The Center for Epidemiologic Studies Depression Scale (CES-D scale) is often utilized to measure depressive symptomology (Radloff, 1977). It is a self-assessment that is completed by the individual. The CES-D contains 20-items rated on a 4-poin..

  The cije and rie parts of the eric system

What is the difference between the CIJE and RIE parts of the ERIC system?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd