Reference no: EM132252822
Machine Learning Homework - HMM for DNA Sequence
The goal of this assignment is for you to gain familiarity with the hidden Markov model (HMM). Specifically, you will use HMM to decode a simple DNA sequence. It is well known that a DNA sequence is a series of components from A, C, G, T. Now let's assume there is one hidden variable S that controls the generation of DNA sequence. S takes 2 possible states S1, S2. Assume the following transition probabilities for HMM
P(S1|S1) = 0.8, P(S2|S1) = 0.2, P(S1|S2) = 0.2, P(S2|S2) = 0.8
emission probabilities as following
P(A|S1) = 0.3, P(C|S1) = 0.2, P(G|S1) = 0.3, P(T|S1) = 0.2
P(A|S2) = 0.1, P(C|S2) = 0.4, P(G|S2) = 0.1, P(T|S2) = 0.4
and initial probabilities as following
P(S1) = 0.5, P(S2) = 0.5
All transition, emission, initial probabilities are together referred to as θ. Assuming the observation sequence is O = CGTCA, in the first part of this assignment, you will manually calculate the most likely hidden state sequence using the Viterbi algorithm.
In the second part of this assignment, you are provided with a new observation sequence of O = ATCG. Please compute the probability of observing O together with intermediate calculations. If you would like to report log-probability, that also works. Please use the natural logarithm.
Questions -
1. Manually calculate the most likely hidden state sequence using the Viterbi algorithm.
2. Report the decoded state sequence.
3. Together with intermediate calculations, including the V-matrix and backtracking matrix.
4. Provided with a new observation sequence of O = ATCG.
5. Compute the probability of observing O together with intermediate calculations.
6. Report log-probability. Please use the natural logarithm.
Attachment:- Assignment File.rar