Reference no: EM133659291
Assignment: Machine Learning & Artificial Intelligence for Bioinformatics
This assignment needs to be completed on the Google Collaboratory, and the results submitted as screenshots in a .doc or .pdf. Include the completed run of the corresponding code the question refers too along with your written answer (you can include additional code if you want). You will need to
Also you are welcome to run Tensorflow code outside of the Collaboratory, if you have such a setup, please note though that the submission need to follow the same format, meaning code cells-> output as shown on the Collaboratory (for example do not submit Python interactive command terminal code)
In preparation for the assignment, you can review again the Google Collaboratory posted in the last lecture. Watch the YouTube videos "Getting Started with Google CoLab | How to use Google Colab" and "Google Colab Tutorial for Beginners | Get Started with Google Colab" in order to become familiar with the Collaboratory (feel free to watch any additional on Youtube):
Note: You need to run cells from top to bottom (since top code cells generate dependencies for the lower cells), so you have to copy-paste and run the code cells in your own Google Collaboratory, in the same order shown in the code each question points you too. Then as the questions request you to do (for example, adjusting the number of epochs), you have to edit the code in the corresponding cells and re-run each cell. If you are still confused on how this works, re-watch the above videos with tutorials on the Google Collaboratory and also additional videos.
Question A
NOTE: Use instead of "from keras.layers.normalization import BatchNormalization" the "from keras.layers import BatchNormalization".
Run the following code on the Collaboratory. Tip: If you are logged in your Google account and click the "Copy to Drive" button on the top. This will make a full copy of this Google Collaboratory sheet under your own account, and save you a lot of typing and copy-pasting compared to starting a new sheet and transferring everything over manually.
I. How many different types of neural networks (and what kind of networks) are being used to classify the digits - show the corresponding part of the code where these networks are implemented.
II. Run the code with both types of neural networks that are in it, based on the metrics, which one does it classify the digits better? Please explain your answer by also defining the metrics (so you understand what each metric means).
III. Could you try a different activation function instead of softmax in the final layer and see what happens with the model predictions and its metrics?
Question B
Run the following code on the Collaboratory (you can skip the part showing the images if you wish). You will need to copy this code in a new, clean sheet of the Google Collaboratory.
I. Modify the number of Convolutional and Max Pooling layers, for example add a pair or two, and remove a layer or two:
model = tf.keras.Sequential([
tf.keras.layers.experimental.preprocessing.Rescaling(1./255),
tf.keras.layers.Conv2D(32, 3, activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Conv2D(32, 3, activation='relu'),
....
Then rerun the training with the modifications
model.compile (
...
and also
model.fit (
train_ds,
..
What do you observe changing in the metrics? (Just run it for 3 epochs as it is)
II. Modify the number of epochs increasing them gradually (you might reach a point where it gets too slow in the Google Collaboratory). What do you observe in the metrics as you increase the epochs, is there a point where the metrics plateau?
III. In which part of the code we split the dataset in training / validations and what portions? What is the purpose of doing this?
IV. Look at the structure of the Convolutional Neural Network as specified in the code for this image classification example. What are the differences? Make those adjustments to modify the code you just made on a - c above, and re-run the model (use 5 epochs or so). What do you observe in the model metrics?
Question C
Run the following code on Deep Learning for genomics on the Google Collaboratory:
I. Describe in a couple of sentences the overall function of this neural network for bioinformatics predictions - what the predictions taking place, what are the data used, and what type of neural network we are using? From which parts of the code you can find the answers to each of these points ?
II. How many prediction classes this neural network has, and describe what are these classes. In addition to finding this from the text cells in the code, also point the parts of the actual code that would demonstrate the number of prediction classes (it should be one of the final layers in the network).
III. What portion of the data we use for training, validation and testing? Where do you see that in the code?
IV. Run the code in your Google Collaboratory up to the point where we have the model lost / accuracy graphs (including printing these graphs). What do you observe in these graphs if you modify the testing and validation portions of the datasets? You would need to re-run the cells from all the way up (where we define the training / validation portions) up and including the cells generating the graphs. Similarly if you reduce significantly the number of epochs, what do you observe in those graphs?