Advanced machine learning assignment

Assignment Help Python Programming
Reference no: EM132665697

Advanced Machine Learning Assignment

Minipong

1679_figure.jpg

Figure 1: 4 frames from our data, using +1 valued pixels for +, -1 for the - paddle.

In this assignment we work with data and a simulation of a simple version of "pong". Two objects appear on the field: a + object as "ball", and a paddle that can take different spots, but only in the bottom row. Pixels of the two objects are represented with different the values -1 and 1 while background pixels have the value 0. The two markers at the top corners are fixed (-1 and +1, respectively) and appear in every frame.

Preparation Download the Minipong.py and sprites.py python files. The class Minipong.py implements the pong game simulation. Running sprites.py will create datasets of pong screenshots for your first task.

A new pong game can be created like here:
from minipong import Minipong pong = Minipong(level=1, size=5)
In this, level sets the information a RL agent gets from the environment, and size sets the size of the game (in number of different paddle positions). Both paddle and + are 3 pixels wide, and cannot leave the field. A game of size 5 is (15 15) pixels, and the ball x- and y-coordinates can be values between 1 and 13. The paddle can be in 5 different locations (from 0 to 4).

Task 1: Train a CNN to predict object positions

The python program sprites.py creates a training and test set of "minipong" scenes, trainingpix.csv (676 samples) and testingpix.csv (169 samples). Each row represents a 15 15 screenshot (flattened in row-major order). Labels appear in sep- arate files, traininglabels.csv and testlabels.csv. They contain 3 labels for each example (x/y/z), the x/y-coordinates for the + marker with values between 1 and 13, and z between 0...4, for the location of the - paddle.

Steps
1. Create the datasets by running the sprites.py code. 2.Create a CNN that predicts the x-coordinate of the + marker.
You can (but don't have to) use an architecture similar to what we used for classifying MNIST, but be aware the input dimensions and outputs are different, so you will have to make at least some changes.
• You can normalise/standardise the data if it helps improve the training. 3.Create a CNN that predicts all three outputs ( x/y/z) from each input1.
• Compute the accuracy on the test data set.

What to submit:
• Submit the python code of your solutions (two versions).
For your report, write a brief description of your steps to create the models and your prediction. What did you do? Please also include answers to the following questions:
- What loss did you use, why? What is your loss for the second model?
- For how long did you train your model (number of epochs, time taken)? What is the performance on the test set?
For all solutions: the way you try to solve tasks and your description is more important that absolute performance of your code. If things do not work as you hope, submit your steps and describe what the specific problem is.

Task 2: Train a convolutional autoencoder 10 points Instead of predicting positions, create a convolutional autoencoder that compresses the
pong screenshots to a small number of bytes (the encoder), and transforms them back to original (in the decoder part).

Steps
1. Create and train an (undercomplete) convolutional autoencoder and train it using the training data set from the first task.
2. You can choose the architecture of the network and size of the representation h = f (x). The goal is to learn a representation that is smaller than the original, and still leads to recognisable reconstructions of the original.
3. For the encoder you can use the same architecture that you used for the first task, but you cannot use the labels for training. You can also create a completely dif- ferent architecture.
4.(No programming): In theory, what would be the absolute minimal size of the hidden layer representation that allows perfect reconstruction of the original im- age?

What to submit:
• Submit the python code of your solution.
For your report, write a brief description of your steps to create the models and your prediction. What did you do (e.g., what loss function, how big is the encoded image in your architecture, how many steps did the learning take)?
Include screenshots of 1-2 output images next to the original inputs (e.g., select a good and a bad example).

Task 3: Create a RL agent for Minipong (level 1)

The code in minipong.py provides an environment to create an agent that can be trained with reinforcement learning (a complete description at the end of this sheet). It uses the objects as described above. The following is a description of the environment dynamics:

The + marker moves a diagonal step at each step of the environment. When it hits the paddle or a wall (on the top, left, or right) it reflects.
The agent can control the paddle ( ), by moving it one 3-pixel slot every step. The agent has three actions available: it can choose to do nothing, or it can move it to the left or right. The paddle cannot be moved outside the boundaries.
The agent will receive a positive reward when the + reflects from the paddle. In this case, the + may also move by 1 or 2 random pixels to the left or right.
An episode is finished when + reaches the bottom row without reflecting from the paddle.
In a level 1 version of the game, the observed state (the information made available to the agent after each step) consists of one number: dz. It is the relative position of the
+, relative to the centre of the paddle: a negative number if + is on one side, a positive
one on the other.
For this task, you can initialise pong like this:
pong = Minipong(level=1, size=5)
or like this:
pong = Minipong(level=1, size=5, normalise = False)
In the first version, step() returns normalised values of dz (values between -1...1) for the state, while in the second version it returns pixel differences (-13...13).

Steps

1. Manually create a policy (no RL) that successfully plays pong, just selecting ac- tions based on the state information. The minipong.py code contains a tem- plate that you can use and modify.

2. Create a (tabular or deep) TD agent that learns to play pong. For choosing actions with s-greedy action selection, set s = 1, initially, and reduce it during your training to a minimum of 0.1.

3. Run your training, resetting after every episode. Store the sum of rewards. After or during the training, plot the total sum of rewards per episode. This plot - the Training Reward plot - indicates the extent to which your agent is learning to improve his cumulative reward. It is your decision when to stop training. It is not required to submit a perfectly performing agent, but show how it learns.

4. After you decide the training to be completed, run 50 test episodes using your trained policy, but with s = 0.0 for all 50 episodes. Again, reset the environment at the beginning of each episode. Calculate the average over sum-of-rewards-per- episode (call this the Test-Average), and the standard deviation (the Test-Standard- Deviation). These values indicate how your trained agent performs.
5. If you had initialised pong with pong = Minipong(level=2, size=5), the observed state would consist of 2 values: the ball y-coordinate, and the relative
+- position dz from level 1. Will this additional information help or hurt the learning? (No programming required).

Task 4: Create a RL agent for Minipong (level 3)

In a level 3 version of the game, the observed state (the information made available to the agent after each step) consists of three number: y, dx, dz. These are y, the ball y- coordinate; dx, the change in ball x-coordinate from last step to now; and dz (same as previous levels).
For this task, you can initialise pong in two ways:
pong = Minipong(level=3, size=5)
pong = Minipong(level=3, size=5, normalise = False)
In the first version, step() returns normalised values of y and dz (values between 1...1), while in the second version these values are unnormalised. The dx values are always unnormalised (but should be -1 or 1 in most cases, except after the paddle has
been hit).

Steps

1. Create a (neural-network based) RL agent that finds a policy using (all) level 3 state information. Use a discount factor γ = 0.95.

2. You can choose the algorithm (deep TD or deep policy gradient).

3. Try to train an agent that achieves a running reward > 300 (the minipong.py file has an example for how to calculate this).

4. Don't go overboard with the number of hidden layers as this will significantly increase training time. Try one hidden layer.

5. Write a description explaining how your approach works, and how it performs. If some (or all) of your attempts are unsuccessful, also describe some of the things that did not work, and which changes made a difference.

Attachment:- Advanced Machine Learning Assignment.rar

Reference no: EM132665697

Questions Cloud

How do estimate the total fixed operating costs per month : Great Eastern Inns has a total of 2,000 rooms in its chain of motels located in eastern Canada. Estimate the total fixed operating costs per month.
What maximum amount perfume division would be willing to pay : The Bottle Division has sufficient capacity to meet all external market, What is the maximum amount Perfume Division would be willing to pay for the bottles?
How much of the joint cost will be assigned to product b : Joint products A and B, How much of the joint cost will be assigned to Product B if joint costs are allocated on the basis of relative sales values?
Reflection activity for emerging threats and countermeas : We are going to complete a reflection activity for Emerging Threats and Countermeas. What were some of the more interesting assignments to you?
Advanced machine learning assignment : Advanced Machine Learning Assignment - Create a RL agent that finds a policy using (all) level 3 state information - Write a description explaining
Derive a total cost formula : SR Co. uses the? high-low method to derive a total cost formula. Using a range of units produced from 2,000 to 5,000?, and a range of total costs
Should LF Manufacturing replace the old machine : Should LF Manufacturing replace the old machine. Dietrich Computers makes 5,600 units of a circuit board, CB76 at a cost of $240 each.
Research how erp systems impact business process : We focus on business process. Research how ERP systems impact business process.
Create the overproduction incentive in absorption costing : Why accrual basis accounting may create the overproduction incentive in the absorption costing but not in the variable costing method?

Reviews

len2665697

10/16/2020 1:01:50 AM

Using python programming in jupyter notebook. Also provide a word doc explaining the steps taken for developing the models

Write a Review

Python Programming Questions & Answers

  Write a python program to implement the diff command

Without using the system() function to call any bash commands, write a python program that will implement a simple version of the diff command.

  Write a program for checking a circle

Write a program for checking a circle program must either print "is a circle: YES" or "is a circle: NO", appropriately.

  Prepare a python program

Prepare a Python program which evaluates how many stuck numbers there are in a range of integers. The range will be input as two command-line arguments.

  Python atm program to enter account number

Write a simple Python ATM program. Ask user to enter their account number, and print their initail balance. (Just make one up). Ask them if they wish to make deposit or withdrawal.

  Python function to calculate two roots

Write a Python function main() to calculate two roots. You must input a,b and c from keyboard, and then print two roots. Suppose the discriminant D= b2-4ac is positive.

  Design program that asks user to enter amount in python

IN Python Design a program that asks the user to enter the amount that he or she has budget in a month. A loop should then prompt the user to enter his or her expenses for the month.

  Write python program which imports three dictionaries

Write a Python program called hours.py which imports three dictionaries, and uses the data in them to calculate how many hours each person has spent in the lab.

  Write python program to create factors of numbers

Write down a python program which takes two numbers and creates the factors of both numbers and displays the greatest common factor.

  Email spam filter

Analyze the emails and predict whether the mail is a spam or not a spam - Create a training file and copy the text of several mails and spams in to it And create a test set identical to the training set but with different examples.

  Improve the readability and structural design of the code

Improve the readability and structural design of the code by improving the function names, variables, and loops, as well as whitespace. Move functions close to related functions or blocks of code related to your organised code.

  Create a simple and responsive gui

Please use primarily PHP or Python to solve the exercise and create a simple and responsive GUI, using HTML, CSS and JavaScript.Do not use a database.

  The program is to print the time

The program is to print the time in seconds that the iterative version takes, the time in seconds that the recursive version takes, and the difference between the times.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd