Implement gradient descent to minimize the loss

Assignment Help Other Subject

Reference no: EM133285031

Problem 1: Variational Autoencoders

Consider the following two joint distributions:

q_data,Φ (x, z) = p_data(x)q_Φ(z|x)

and

p_η,θ(x,z) = p_η,(z)p_θ(x|z)

where p_data(x) is the data density and qΦ(z|x), and p_θ (x |z) are are respectively the densities of the variational posterior, the latent variable, and the decoder with parameters Φ, η, and θ.

Question 1. Show that the KL divergence between q_data,Φ(x,z) and p_η,θ(x,z) is

E_pdata(x) )L_VAE(x;Φ, η, θ)

where L_VAE = D_KL [q_Φ(z|x)||P_η)(Z)] - EqΦ{z|x) is the negative cif the Evidence Lower Bound (ELBO).

Question 2. Next, consider the Fisher Divergence between two distributions q_data,Φ(x,z) and p_η,θ(x,z) defined as

1153_Variational Autoencoders.jpg

For any distribution p(x), define

37_Variational Autoencoders1.jpg

where ∇ and Δ respectively denote the gradient and the Laplacian operators. Show that

513_Variational Autoencoders2.jpg

with

2475_Variational Autoencoders3.jpg

and C is a constant independent of Φ, η, and θ.

Problem 2: Slow Feature Analysis

Consider the following 4 signals:

x₁(t) = cos(10t) - 1
x₂(t) = sin²(5t) + sin(t/3)cos(2t/3) -1/2
x₃(t) = sin(2t/3)oos(t/3)
x₄(t) = sin(5t)

Question 1. In the provided Python notebook template, create IMO datapoints equally spaced by 3 in (0,10) from the aforementioned signals. Next, implement the function

f_θ(t) = ∑_i=1⁴ a_i x_i^pi(t) + a₀,

with θ = (a0, a1, a2, a3, a4, p1, p2, p3, p4), where p1, p2, p3, p4 ≥ 1, a1, a2, a3, a4 are real numbers not equal to zero, and a₀ ∈ R.

2. implement gradient descent (from scratch) to minimize the following loss:

L(θ) = 1/999∑_t=1⁹⁹⁸ ||f_θ((t +1)δ) - f_θ(tδ)||²

subject to the constraints

1/ 1000 ∑_t=0⁹⁹⁹ f_θ(δt) = 0

and

1/ 1000 ∑_t=0⁹⁹⁹ f_θ(δt)² = 1

Plot t vs f_θ(t), for 1000 equally-spaced values of t in [0,10).

Problem 3: Dictionary Learning with MCP and Group LASSO

In this problem, we will explore. sparse coding For the MNIST dataset Each MNIST image x is in R^784x1 after flattening. From this, we define a dictionary D∈R^784x100 where each column of the dictionary is dictionary element in R^784x1 We also define the sparse representation of x to be h ∈ R^100x1.

1. Initialize the dictionary D with torch. nn, . normal_ (tensor niean=0 . 0, std=1 CD, and normalize each column to have a norm of 1 by dividing each column with its min m Specifically, after initializing, for each column D[:, i], of the dictionary, normalize via D[:,i] = D[:, i]/||D:, i||2.

2. Implement code that solves for the sparse representation h^(t), using each of the LASSO, MCP, and Group LASSO methods, by using gradient descent to minimize their following loss functions (LtAssea, Limo. and respectively). In class we gave the exact formulation for LASSO:

2475_Variational Autoencoders4.jpg

We provide the formulae for MCP and Group LASSO here!

MCP:

195_Variational Autoencoders5.jpg

where a and λ are hyperparameters; We set a = 2, λ = 0,1 Next,

Group LASSO:

592_Variational Autoencoders6.jpg

In Group LASSO, we separate h into m = 5. groups, i.e., h (h1, h2 ,h5).. Each hr for 1 ≤ t ≤ m represents a group of entries from h. The parameter pt represents the number of entries in the 2-Eh group hr_ in this problem, each group has the same number of entries, i.e.,. hi E E [1, (Separate entries of .h by order, i.e h1 includes the first 20 entries, h includes the following 20 entries, etc.)

3. Implement code that uses the projected gradient descent algorithm to solve for the update of dictionary D, given sparse representations Perform the update in three separate eases, using the sparse representations solved for in the previous part

4. Iterate Step 2 and Step 3 until convergence.

5. Plot the validation loss of training both the dictionary D and sparse representation h., with respect to epochs_ You should have six plots:
(a) validation loss 01D with LASSO,
(b) average validation loss of testing A's with LASSO,
(c) validation loss of D with MCP,
(d) average validation loss of testing ft's with MCP,
(e) validation loss of D with Group LASSO, (1) and average validation loss of testing it's with Group LASSO_
With a dictionary D ∈ R^784x100 we learn 100 dictionary elements which are in R^784x1. With each loss, we learn a different dictionary D_LASSO, D_{Group LASSO}, and D_MCP. Plot the first 10 dictionary elements learned with each loss(i..e., D_LASSO[:, 0:9], D_{Group LASSO}[:, 0:9]) and D_MCP[:., i], D_{Group LASSO}[:, i]. Specifically, reshape each of the first ten dictionary elements (D_LASSO[:i] D_{Group LASSO} [:, i] and D_MCP[:, i], ∈ {0, 1,...9}) to 28 x 28 and plot
(a) the first 10 dictionary elements. of D_LASSO,
(b) the first 10 dictionary elements of D_Group, LASSO, and
(c) the irst 10 dictionary elements of DMCP

Attachment:- Term.rar

Reference no: EM133285031

Questions Cloud

Define abaca in film. example from citizen kane : THEA-141 Southern Illinois University, Edwardsville Define Importance of the Opening Scene. Example from Groundhog Day

How to deploy a successful quality improvement program : BUSI 411 Operations Management Assignment - Quality Improvement Program Presentation, Liberty University - How to deploy successful quality improvement program

Do you think skills would be enough knowledge to survive : These life skills include problem solving, critical thinking, communication skills, decision-making, creative thinking, interpersonal relationship skills.

Test the method on the minimal surface equation : Test the method on the minimal surface equation, which is of the same form, but in general more nonlinear and difficult to solve. A slight change in the method

Implement gradient descent to minimize the loss : create IMO datapoints equally spaced by 3 in (0,10) from the aforementioned signals. and implement gradient descent (from scratch) to minimize

Examine the importance of ethics in public administration : Examine the importance of ethics in public administration. Does ethics provides accountability between the public and the administration

Describe an event directly influenced by a technology : Mass Society Theory Outline - Describe an event directly influenced by a technology that either you or someone around you has experienced

How can you apply mindfulness techniques in daily life : How can you apply mindfulness techniques in your daily life? What value do you place on becoming more mindful?

What is each author''s attitude toward love : Compare and Contrast Goethes Faust Discussion - What is each author's attitude toward love and romantic relationships as they are conducted by his protagonist

User Account

All Pages