INM707 Deep Reinforcement Learning Assignment

Assignment Help Other Subject

Reference no: EM132970407

INM707 Deep Reinforcement Learning - City, University of London

This coursework builds on the material covered in the tutorials and lecture. On completing this coursework, you should be able to implement and understand classical tabular Reinforcement Learning (week 1-5) as well as Deep Reinforcement Learning (week 6-10) algorithms. This coursework builds on the material covered in the tutorial and lecture slides. You will make use of the different concepts learned in the module:
• How to define and implement a Reinforcement Learning (RL) problem
• How to implement solutions in Pythons
• How to evaluate different algorithms

The Tasks
In this coursework, you are expected to demonstrate what you have learned in the module in terms of Tabular RL and Deep RL. Additionally, you have the occasion to work on a problem of your choosing (related to RL) for additional points.

The maximum number of marks which can be scored is 100. You can gain up to 20 additional marks in Task 4, but the maximum of marks obtained for the Report and Code is capped at 100.

In all tasks, you can use the built-in libraries of python (math, random, ...), numpy, and matplotlib. If you think that you might benefit from using another library, you can ask about it on Moodle. You will use PyTorch in Task 3, and you can use any library in Task 4.

Task 1:

You need to design and develop a tabular RL environment that follows an Markov Decision Process. In this task, the environment will provide a state that is used to train your RL algorithm. Your report should answer the following points:

- Description of the environment
- Description of the agent and its actions

- Description of the different dynamics of the environment (the rules of the game), as well as rewards

The environment should be non trivial and:

- Include some stochasticity (for example, obstacles that moves randomly)
- Terminate (define the termination conditions)
- Be different from the environments proposed in the labs.

In this first task, the number of different states that the agent can reach should be finite. The total number of states can be very large, but bear in mind that classical RL algorithms might not converge. So a good idea is to parameterize the number of states (e.g. in the labs, the number of different states where dependent on the size of the environment), to make sure that your problem is solvable.

An alternative is to use an external environment that you modify. However the modifications should be substantial and demonstrate an effort on par with the development of a new environment.

Task 2:

The second task is about implementing classical RL algorithms. You have the choice between implementing Q-learning or Dyna-Q.

For this task, you should:

- Describe and explain the algorithm you chose.
- Conduct a case study of how this algorithm performs on the environment you implemented in Task 1.
- Evaluate the performance of your algorithm.
- You should evaluate the effect of the different hyper-parameters.
- If your environment is parameterizable, a good addition would be to evaluate how your RL algorithms performs when the environment becomes more and more complex.

Task 3:

The third task requires to implement a Deep Reinforcement Learning algorithm. You cannot present DQN and its different improvements for this Task, as we will cover them in the labs.

You can select any of the following algorithms to implement and evaluate:

- Policy Optimization (e.g. Proximal Policy Optimization, Advantage Asynchronous Actor Critic, ...)
- Q-learning: Hindsight Experience Replay
- World Models
- Soft Actor Critic

You can use any Deep RL environment available online, or use a scaled up version of your environment developed in Task 1. For example, if the complexity of the environment in Task 1 can be scaled up to the point where tabular approaches (e.g. Q-learning and Dyna-Q) can't perform well, then it is appropriate to use them. You might need to adapt your environment slightly to return observations instead of states.

For this task, you should:

- Describe and explain the algorithm you chose.
- Describe the environment, if it is different from Task 1.
- Conduct a case study of how this algorithm performs on the environment you choose.
- Evaluate the performance of your algorithm
- You should evaluate the effect of the different hyper-parameters
- If your environment is parameterizable, a good addition would be to evaluate how your RL algorithms performs when the environment becomes more and more complex.

Task 4: - optional

For this last optional task, you can work on a topic of your choosing. It must be related to the Module, and different from Tasks 1 to 3. Note that it can constitute a continuation of Task 1, 2 or 3.

Reports
You must submit a Summary page that includes:
- Your name, student id, and team member
- For each task, the percentage of borrowed code and reference to the sources. You need to present a fair estimate, and you should declare if you wrote everything yourself.
- Your personal reflection

Your final report should cover each of the aspects described in this document (and any other element of your work that you believe should be reported). Graphically illustration of your results is expected as well as numerical results.

Attachment:- Deep Reinforcement Learning.rar

Reference no: EM132970407

Questions Cloud

What is the amount of gain or loss on redemption : Bonds Payable has a debit balance of $9,500. If the issuing corporation redeems the bonds at 102, what is the amount of gain or loss on redemption?

Determine the maximum amount of personal tax credits : Determine the maximum amount of 2018 personal tax credits, including transfers from a spouse or dependant, that can be applied against federal Tax Payable

What assets might the business have liabilities : What are the investments Dr. Allen Ronald has probably made in his business and how would each affect the basic accounting equation?

What average annual profitability would put forward order : Your customers know nothing about calculating profitability. What way calculating your average annual profitability would you put forward order to the bad year?

INM707 Deep Reinforcement Learning Assignment : INM707 Deep Reinforcement Learning Assignment Help and Solution, City, University of London - Assessment Writing Service

How could flipkart have ensured the success : Flipkart's much-touted 'Big Billion Day' sale turned out to be a fiasco. What do you think Flipkart did wrong? How could Flipkart have ensured the success

What models might use to value the stocks : If the Fed stimulates the economy at this point, you believe that you would be better off with stocks than with bonds...explain why?

Explain lifo and fifo inventory methods : Provide 1 example of a business that uses LIFO and 1 example of a business that uses FIFO. Explain LIFO and FIFO inventory methods.

Which of statements regarding retirement trends is correct : Which of the statements regarding retirement trends is correct? The savings rate in the United States has consistently been increasing over the last 30 years.

User Account

All Pages