Reference no: EM133266153
Question 1.
What problems will you have when you apply large and small gradients in the back propagation process?
Question 2.
1. In deep learning, the global optimum for the training objective is often in quite a small range, list three optimization strategies to reach the global optimum rather than the local one.
2. Explain how each above strategy can help.
Question 3.
Given a mini-batch {x_i, y_i}, i=1,...n, what is the normalized batch?
Why do most neural netoworks have the normalization layer ?
Question 4.
1. Suppose you have a dropout layer in a neural model, explain what is drop out rate ?
2. How can you setup a good drop out rate?
Question 5.
1. Use a specific neural network to explain what is vanishing/exploding gradients?
2. How can you avoid vanishing/exploding gradients?
Question 6
1. Suppose there is an input feature map which has size (128, 128, 1), now apply 64 filters with each filter size (3,3), given a stride of 1 and no padding, what is the output feature map size?
2. Without the bias parameters, how many parameters are there in the above convolution module?
Question 7
1. Suppose you have a VGG model which is trained from Image Net, write the transfer learning steps if you want to use part of the model for a face recognition task.
Question 8
1. Use formula to explain what a residual module is.
2. List three common modules used in popular CNN models.
Question 9
1. List two different ways/strategies to learn word embeddings, give details about the input/output of the training data and loss functions.
2. Explain the what is the long and short memory in LSTM.
Question 10
1. What is the difference between Standard Autoencoder, Variational Autoencoder and Generative Adversarial Network ?