Reference no: EM132970407
INM707 Deep Reinforcement Learning - City, University of London
This coursework builds on the material covered in the tutorials and lecture. On completing this coursework, you should be able to implement and understand classical tabular Reinforcement Learning (week 1-5) as well as Deep Reinforcement Learning (week 6-10) algorithms. This coursework builds on the material covered in the tutorial and lecture slides. You will make use of the different concepts learned in the module:
• How to define and implement a Reinforcement Learning (RL) problem
• How to implement solutions in Pythons
• How to evaluate different algorithms
The Tasks
In this coursework, you are expected to demonstrate what you have learned in the module in terms of Tabular RL and Deep RL. Additionally, you have the occasion to work on a problem of your choosing (related to RL) for additional points.
The maximum number of marks which can be scored is 100. You can gain up to 20 additional marks in Task 4, but the maximum of marks obtained for the Report and Code is capped at 100.
In all tasks, you can use the built-in libraries of python (math, random, ...), numpy, and matplotlib. If you think that you might benefit from using another library, you can ask about it on Moodle. You will use PyTorch in Task 3, and you can use any library in Task 4.
Task 1:
You need to design and develop a tabular RL environment that follows an Markov Decision Process. In this task, the environment will provide a state that is used to train your RL algorithm. Your report should answer the following points:
- Description of the environment
- Description of the agent and its actions
- Description of the different dynamics of the environment (the rules of the game), as well as rewards
The environment should be non trivial and:
- Include some stochasticity (for example, obstacles that moves randomly)
- Terminate (define the termination conditions)
- Be different from the environments proposed in the labs.
In this first task, the number of different states that the agent can reach should be finite. The total number of states can be very large, but bear in mind that classical RL algorithms might not converge. So a good idea is to parameterize the number of states (e.g. in the labs, the number of different states where dependent on the size of the environment), to make sure that your problem is solvable.
An alternative is to use an external environment that you modify. However the modifications should be substantial and demonstrate an effort on par with the development of a new environment.
Task 2:
The second task is about implementing classical RL algorithms. You have the choice between implementing Q-learning or Dyna-Q.
For this task, you should:
- Describe and explain the algorithm you chose.
- Conduct a case study of how this algorithm performs on the environment you implemented in Task 1.
- Evaluate the performance of your algorithm.
- You should evaluate the effect of the different hyper-parameters.
- If your environment is parameterizable, a good addition would be to evaluate how your RL algorithms performs when the environment becomes more and more complex.
Task 3:
The third task requires to implement a Deep Reinforcement Learning algorithm. You cannot present DQN and its different improvements for this Task, as we will cover them in the labs.
You can select any of the following algorithms to implement and evaluate:
- Policy Optimization (e.g. Proximal Policy Optimization, Advantage Asynchronous Actor Critic, ...)
- Q-learning: Hindsight Experience Replay
- World Models
- Soft Actor Critic
You can use any Deep RL environment available online, or use a scaled up version of your environment developed in Task 1. For example, if the complexity of the environment in Task 1 can be scaled up to the point where tabular approaches (e.g. Q-learning and Dyna-Q) can't perform well, then it is appropriate to use them. You might need to adapt your environment slightly to return observations instead of states.
For this task, you should:
- Describe and explain the algorithm you chose.
- Describe the environment, if it is different from Task 1.
- Conduct a case study of how this algorithm performs on the environment you choose.
- Evaluate the performance of your algorithm
- You should evaluate the effect of the different hyper-parameters
- If your environment is parameterizable, a good addition would be to evaluate how your RL algorithms performs when the environment becomes more and more complex.
Task 4: - optional
For this last optional task, you can work on a topic of your choosing. It must be related to the Module, and different from Tasks 1 to 3. Note that it can constitute a continuation of Task 1, 2 or 3.
Reports
You must submit a Summary page that includes:
- Your name, student id, and team member
- For each task, the percentage of borrowed code and reference to the sources. You need to present a fair estimate, and you should declare if you wrote everything yourself.
- Your personal reflection
Your final report should cover each of the aspects described in this document (and any other element of your work that you believe should be reported). Graphically illustration of your results is expected as well as numerical results.
Attachment:- Deep Reinforcement Learning.rar