Reference no: EM132372275
Artificial and Computational Intelligence Assignment - Project: Investigating Reinforcement Learning
Overview - Within SIT215 you have been learning about a range of problems that can be solved using techniques from artificial and computational intelligence. This study has included coverage of both models and algorithms suitable for AI and CI solutions. A particular limitation of all of the solutions that we have considered is that they are designed by hand, or rely on the problem being formulated as an optimisation task.
In this project you are going to explore an advanced technique for solving many interesting and challenging real world problems. One in which an agent learns a solution to a problem through interaction with the environment, and through perception of a reinforcement, or feedback signal. This field is called, naturally, reinforcement learning (RL). RL can also be seen as an online method for solving Markov Decision Problems - as opposed to the offline methods of policy iteration, value iteration or dynamic programming, presented in lectures (in week 9 & 10).
This project will require you to undertake self-directed study and learning of RL solution methods, building upon topics and content covered in the first 10 weeks of this course. While this might seem daunting (not being told how to solve the problem), you've been practicing this approach throughout the unit in the group- based PBL tasks, and so this is your chance to demonstrate individually what you've learned about problem solving methodology.
Learning Objectives - This project addresses:
- Design and implement software artefacts to demonstrate effectiveness and efficiency of solutions for intelligent systems development
- Apply theoretical concepts and models to explain and communicate the design of intelligent systems.
Specifically, these are addressed through achievement of the following task-specific learning objectives:
- Demonstrate ability to work with and extend software systems and frameworks for RL.
- Describe and model RL problems using specific concepts and models.
- Implement, evaluate and analyse the performance of different solutions on a range of RL problems.
- Effectively communicate the process and outcomes of your research and development project.
Preparatory Learning Activities -
In order to complete this assessment task you will need to have first developed an understanding of a range of topics covered in this unit in weeks 1 to 10. Given the assessment deadline, this may require you to complete independent study of these topics prior to their presentation in lectures. The topics that you will need to be familiar with are:
- Bayesian AI (working with probabilistic representations of uncertainty).
- State Space Search (understanding state space representations of systems).
- Normative Decision Theory (definitions of rational action, utility, intertemporal utility, payoff/reward).
- Markov Decision Problems (representing sequential decision problems for agents acting in complex domains, reward processes and finite horizon decision problems, optimal policies).
- Dynamic Programming (optimal solutions to sequential decision problems under specified constraints).
Ultimately you will be able to complete this assessment task without a sound theoretical grounding in each of these areas. However, having some knowledge of these areas and understanding of how they inter-relate will make it far easier to understand learning materials on reinforcement learning, and far easier to explain and describe your investigations and outcomes in this project. Our advice is that you use this project as a basis for further study of these underlying areas, to assist in integrating the knowledge covered in this unit into a meaningful 'whole', which supports completing this assessment task.
Task Requirements - This project will require you to use the OpenAI Gym environment for experimenting with reinforcement learning tasks. You should start by reviewing the website for the Gym.
To complete this project, you need to complete the following requirements and sub-tasks.
1. Read the relevant documentation for installing AI Gym.
2. Read and complete the tutorial ensuring that you can reproduce all steps discussed.
3. Write a brief report (2-3 pages at most) on the Taxi problem, including a mathematical description of the reinforcement learning problem and the Q-learning algorithm for its solution. To do this, you may want to refer to a good textbook on reinforcement learning. A good starting point is the "bible" of RL: "Reinforcement Learning: An Introduction", by Sutton & Barto. You can find this book online as a free PDF download. There's even a 2nd edition draft completed just this year. In your report you should contrast the quality of solution of a random policy versus the "optimal" policy obtained by Q- learning.
4. Complete the tutorial (attached) to explore the Cart-Pole environment in the Gym. In this case, implement a random policy and Q-learning. It's not essential that you attempt the policy gradient method, but you might like to try it.
5. Extend your report to cover briefly the Cart-Pole problem, highlighting any differences with the Taxi problem. Compare performance of Q-learning on both of these problems, presenting evidence (such as graphs) to support your evaluation.
If you've gotten to this point and created a good report that details what you've learned, you've met the minimum requirements for this assessment task. Assuming a reasonable quality of report and evidence, you can expect to earn a credit grade. Continue on to achieve a higher grade.
6. [Distinction] Select another environment from the OpenAI Gym, and implement Q-learning for this environment. Extend your report to describe this new environment, including a mathematical model. Evaluate performance of Q-learning on this model, and identify any significant outcomes or limitations of this approach on this new problem, compared to previous problems. Attempt to explain any difference or limitations.
7. [High Distinction] Implement Temporal Difference learning on the new environment you completed for step 6, as well as one of the Taxi problem, or the Cart-Pole problem. Contrast the performance of TD learning and Q-learning in your report, providing evidence such as graphs and performance data.
Attachment:- Artificial and Computational Intelligence Assignment File.rar