Solution-What is the full likelihood of observed and latent

What is the full likelihood of observed and latent variables

Assignment Help Basic Statistics

Reference no: EM131096878

10-701 Machine Learning - Spring 2012 - Problem Set 5

Q1. Hidden Markov Model

Hidden Markov Model is an instance of the state space model in which the latent variables are discrete. Let K be the number of hidden states. We use the following notations: x are the observed variables, z are the hidden state variables (we use 1-of-K representation: z_k = 1, z_j≠k = 0 means the hidden state is k). The transition probabilities are given by a K × K matrix A, where A_jk = p(z_n,k = 1|z_n-1,j = 1) and the initial state variable z₁ are given by a vector of probabilities π: p(z₁|π) = _k=1∏^K π^z_1k_k. Finally, the emission distribution for a hidden state k is parametrized by φ_k: p(x_n|φ_k). Let Θ = {A, π, φ}.

1.1 The full likelihood of a data set

If we have a data set X = {x₁, . . . , x_N}:

1. What is the full likelihood of observed and latent variables: p(X, Z|Θ)? Note Z = {z₁, . . . , z_N} are the hidden states of the corresponding observations.

2. What is the likelihood of the data set? (e.g. p(X|Θ).

1.2 Expectation-Maximization (EM) for Maximum Likelihood Learning-

We'd like to derive formulas for estimating A and φ to maximize the likelihood of the data set p(X|Θ).

1. Assume we can compute p(X, Z|Θ) in O(1) time complexity, what is the time complexity of computing p(X|Θ)?

We use EM algorithm for this task:

-In the E step, we take the current parameter values and compute the posterior distribution of the latent variables p(Z|X, Θ^old).

-In the M step, we find the new parameter values by solving an optimization problem:

Θ^new = argmax_ΘQ(Θ, Θ^old) (1)

where

Q(Θ, Θ^old) = ∑_Zp(Z|X, Θ^old) ln p(X, Z|Θ) (2)

2. Show that

Q(Θ, Θ^old) =_k=1∑^Kγ(z₁k) ln π_k + _n=2∑^N_j=1∑^K_k=1∑^Kξ(z_n-1,j, z_nk) ln A_jk (3)

+ _n=1∑^N_k=1∑^Kγ(z_nk) ln p(x_n|φ_k) (4)

where

γ(z_nk) = E_p(z_n|X,Θ^old)[z_nk] (5)

ξ(z_n-1,j, z_nk) = E_p(z_n-1,z_n|X,Θ^old)[z_n-1,j z_nk] (6)

Show your derivations.

3. Show that

p(X|z_n-1, z_n) = p(x₁, . . . , x_n-1|z_n-1)p(x_n|z_n)p(x_n+1, . . . x_N |z_n) (7)

4. In class, we discuss how to compute:

α(z_n) = p(x₁, . . . , x_n, z_n) (8)

β(z_n) = p(x_n+1, . . . , x_N |z_n) (9)

Show that

ξ(z_n-1, z_n) = p(z_n-1, z_n|X) (10)

= α(z_n-1)p(x_n|z_n)p(z_n|z_n-1)β(z_n)/p(X) (11)

How would you compute p(X)?

5. Show how to compute γ(z_nk) and ξ(z_n-1,j , z_nk) using α(z_n), β(z_n) and ξ(z_n-1, z_n).

6. Show that if any elements of the parameters π or A for a hidden Markov model are initially set to 0, then those elements will remain zero in all subsequent updates of the EM algorithm.

1.3 A coin game-

Two students X and Y from Cranberry Lemon University play a stochastic game with a fair coin. X and Y take turn with X going first. All the coin flips are recorded and the game finishes when a sequence of THT first appears. The player who last flips the coin is the winner. Two players can flip the coin many times as follows. At his turn, each time X flips the original coin, he also flips an extra biased coin (p(H) = 0.3.) He stops only if the extra coin lands head, otherwise he repeats flipping the original and extra coins, .... (The flips of this extra coin are not recorded.) On the other hand, at his turn, Y flips the coin until T appears (All of his flips are recorded).

You are given a sequence of recorded coin flips, you would like to infer the winner and as well as the flips of each player.

1. Describe a HMM to model this game.

2. How would you use this HMM model to infer the (most probable) winner and the (most probable) flips of each player?

Q2. Dimensionality Reduction

2.1 Singular value decomposition

In linear algebra, the singular value decomposition (SVD) is a factorization of a real matrix X as:

X = USV^T (12)

If the dimension of X is m × n, where without loss of generality m ≥ n, U is an m × n matrix, S is an n × n diagonal matrix and V^T is also an n × n matrix. Furthermore, U and V are orthonormal matrices: UU^T = I and VV^T = I.

2.2 PCA and SVD-

Consider a dataset of observations {x_n} where n = 1, . . . , N. We assume that the examples are zero-centered such that x¯ = _n=1∑^N x_n = 0. PCA algorithm computes the covariance matrix:

S = 1/N _n=1∑^Nx_nx^T_n (13)

The principal components {u_i} are eigenvectors of S.

Let X = [x₁, . . . , x_N], a D × N matrix where each column is one example x_n. If US'V^Tis a SVD of X,

1. Show that the principal components {u_i} are columns of U. This shows the relationship between PCA and SVD.

2. When the number of dimensions is much larger than the number of data points (D >> N), is it better to do PCA by using the covariance matrix or using SVD?

3. Consider the following data set:

where ∈ is a tiny number. Each column is one example. First zero-center the data set and then do PCA using two techniques: 1) by using the covariance matrix and 2) by using SVD. What do you observe? Hints: What is the "dimension" of this dataset? You can use Matlab, try ∈ = 1e - 10, which techniques return sensible result.

Q3. Markov Decision Process

1. A standard MDP is described by a set of states S, a set of actions A, a transition function T, and a reward function R. Where T(s, a, s') gives the probability of transitioning to s' after taking action a in state s, and R(s) gives the immediate reward of being in state s. A k-order MDP is described in the same way with one exception. The transition function T depends on the current state s and also the previous k-1 states. That is, T(s_k-1, . . . s₁, s, a, s') = p(s', a, s, s₁, . . . s_k-1) gives the probability of transitioning to state s' given that action a was taken in state s and the previous k - 1 states were (s_k-1, . . . , s₁).

Given a k-order MDP M = (S; A; T; R) describe how to construct a standard (first-order) MDP M' = (S', A', T', R') that is equivalent to M. Here equivalent means that a solution to M' can be easily converted into a solution to M. Be sure to describe S', A', T', and R'. Give a brief justification your construction.

2. The Q-learning update rule for deterministic MDPs is as follows:

Q(s, a) ← R(s, a) + γ max_a'Q(s', a') (15)

where s' = f(s, a) is the action to be taken. Prove that Q-learning converges in deterministic MDPs.

Reference no: EM131096878

Questions Cloud

Maximum rate of increase in the surface area : In a nutrient medium, the rate of increase in surface area of a cell culture can be modeled by the quadratic function S = -0.008t2 + 0.04t where S is the rate of increase in the surface area in square millimetres per hour, and t is the time, in ho..

Probability that somebody sits next to his or her spouse : Three married couples (6 guests altogether) attend a dinner party. They sit at a round table randomly in such a way that each outcome is equally likely. What is the probability that somebody sits next to his or her spouse?

Can the problem be solved during context-sensitive analysis : Can the problem be solved during context-sensitive analysis?

Making an interpretation with the essay : Although we are all familiar with the essay form, we may not be comfortable analyzing essays as arguments. However, essays, like all forms of writing, implicitly or explicitly take a stand, make an argument.

What is the full likelihood of observed and latent variables : 10-701 Machine Learning - Spring 2012 - Problem Set 5. What is the full likelihood of observed and latent variables: p(X, Z|Θ)? Note Z = {z1, . . . , zN} are the hidden states of the corresponding observations

Draw the symbol table and its contents at the point labelled : Draw the symbol table and its contents at the point labelled here.

Describes the damage to the structures : Identifies which nervous system structures are involved in that sensory system and Describes the damage to the structures

Find the rate of the jet : The president of a company traveled 1800 mi by jet and 200 mi on a prop plane. The rate of the jet was four times the rate of the prop plane. The entire trip took 5 h. Find the rate of the jet.

Describe why humans have a blind spot : Describe why humans have a blind spot and describe the functional and anatomic differences between rods and cones.

User Account

All Pages

What is the full likelihood of observed and latent variables

Reference no: EM131096878

Reference no: EM131096878

Questions Cloud

Reviews

Write a Review

Basic Statistics Questions & Answers

Statistics-probability assignment

What is the least number

Determine the value of k

What is the probability

Binomial distributions

Caselet on mcdonald’s vs. burger king - waiting time

Generate descriptive statistics

Sampling variability and standard error

Estimate the population mean

Conduct a marketing experiment

Find out the probability

Linear programming models

Assured A++ Grade

Academics

Major Subjects

Majors

Get In Touch

TERMS & POLICIES

HELP & SUPPORT