Write a brief report on the Taxi problem

Assignment Help Other Subject
Reference no: EM132372275

Artificial and Computational Intelligence Assignment - Project: Investigating Reinforcement Learning

Overview - Within SIT215 you have been learning about a range of problems that can be solved using techniques from artificial and computational intelligence. This study has included coverage of both models and algorithms suitable for AI and CI solutions. A particular limitation of all of the solutions that we have considered is that they are designed by hand, or rely on the problem being formulated as an optimisation task.

In this project you are going to explore an advanced technique for solving many interesting and challenging real world problems. One in which an agent learns a solution to a problem through interaction with the environment, and through perception of a reinforcement, or feedback signal. This field is called, naturally, reinforcement learning (RL). RL can also be seen as an online method for solving Markov Decision Problems - as opposed to the offline methods of policy iteration, value iteration or dynamic programming, presented in lectures (in week 9 & 10).

This project will require you to undertake self-directed study and learning of RL solution methods, building upon topics and content covered in the first 10 weeks of this course. While this might seem daunting (not being told how to solve the problem), you've been practicing this approach throughout the unit in the group- based PBL tasks, and so this is your chance to demonstrate individually what you've learned about problem solving methodology.

Learning Objectives - This project addresses:  

  • Design and implement software artefacts to demonstrate effectiveness and efficiency of solutions for intelligent systems development
  • Apply theoretical concepts and models to explain and communicate the design of intelligent systems.

Specifically, these are addressed through achievement of the following task-specific learning objectives:

  • Demonstrate ability to work with and extend software systems and frameworks for RL.
  • Describe and model RL problems using specific concepts and models.
  • Implement, evaluate and analyse the performance of different solutions on a range of RL problems.
  • Effectively communicate the process and outcomes of your research and development project.

Preparatory Learning Activities -

In order to complete this assessment task you will need to have first developed an understanding of a range of topics covered in this unit in weeks 1 to 10. Given the assessment deadline, this may require you to complete independent study of these topics prior to their presentation in lectures. The topics that you will need to be familiar with are:

  • Bayesian AI (working with probabilistic representations of uncertainty).
  • State Space Search (understanding state space representations of systems).
  • Normative Decision Theory (definitions of rational action, utility, intertemporal utility, payoff/reward).
  • Markov Decision Problems (representing sequential decision problems for agents acting in complex domains, reward processes and finite horizon decision problems, optimal policies).
  • Dynamic Programming (optimal solutions to sequential decision problems under specified constraints).

Ultimately you will be able to complete this assessment task without a sound theoretical grounding in each of these areas. However, having some knowledge of these areas and understanding of how they inter-relate will make it far easier to understand learning materials on reinforcement learning, and far easier to explain and describe your investigations and outcomes in this project. Our advice is that you use this project as a basis for further study of these underlying areas, to assist in integrating the knowledge covered in this unit into a meaningful 'whole', which supports completing this assessment task.

Task Requirements - This project will require you to use the OpenAI Gym environment for experimenting with reinforcement learning tasks. You should start by reviewing the website for the Gym.

To complete this project, you need to complete the following requirements and sub-tasks.

1. Read the relevant documentation for installing AI Gym.

2. Read and complete the tutorial ensuring that you can reproduce all steps discussed.

3. Write a brief report (2-3 pages at most) on the Taxi problem, including a mathematical description of the reinforcement learning problem and the Q-learning algorithm for its solution. To do this, you may want to refer to a good textbook on reinforcement learning. A good starting point is the "bible" of RL: "Reinforcement Learning: An Introduction", by Sutton & Barto. You can find this book online as a free PDF download. There's even a 2nd edition draft completed just this year. In your report you should contrast the quality of solution of a random policy versus the "optimal" policy obtained by Q- learning.

4. Complete the tutorial (attached) to explore the Cart-Pole environment in the Gym. In this case, implement a random policy and Q-learning. It's not essential that you attempt the policy gradient method, but you might like to try it.

5. Extend your report to cover briefly the Cart-Pole problem, highlighting any differences with the Taxi problem. Compare performance of Q-learning on both of these problems, presenting evidence (such as graphs) to support your evaluation.

If you've gotten to this point and created a good report that details what you've learned, you've met the minimum requirements for this assessment task. Assuming a reasonable quality of report and evidence, you can expect to earn a credit grade. Continue on to achieve a higher grade.

6. [Distinction] Select another environment from the OpenAI Gym, and implement Q-learning for this environment. Extend your report to describe this new environment, including a mathematical model. Evaluate performance of Q-learning on this model, and identify any significant outcomes or limitations of this approach on this new problem, compared to previous problems. Attempt to explain any difference or limitations.

7. [High Distinction] Implement Temporal Difference learning on the new environment you completed for step 6, as well as one of the Taxi problem, or the Cart-Pole problem. Contrast the performance of TD learning and Q-learning in your report, providing evidence such as graphs and performance data.

Attachment:- Artificial and Computational Intelligence Assignment File.rar

Reference no: EM132372275

Questions Cloud

Describe one traumatic reaction you noticed : Post the identity of the military personnel you selected and describe one traumatic reaction you noticed. Explain how you might normalize this reaction.
Evaluate local and global environments : Managing Across Cultures - Evaluate local and global environments and make suitable choices amongst available alternatives and Identify the types of insights
What recommendations would you make to reduce liablility : Upper management at the Waldorf Widget Factory does not understand why they have any liability when it comes to workplace violence. How would you explain.
How you or your healthcare institution has address challenge : Identify three (3) strategic management challenges within your current workplace. Explain how you or your healthcare institution has addressed the challenges.
Write a brief report on the Taxi problem : SIT215 - Artificial and Computational Intelligence Assignment - Project: Investigating Reinforcement Learning, Deakin University, Australia
What are effects of karma on the individual soul : Hinduism, Buddhism, Jainism-have an ethical system that contains the notion of karma. What is karma, generally? What are its effects on the individual's soul?
How baseline measures will be obtained : The measures for evaluating the outcomes and observing change including: How baseline measures will be obtained. How often follow-up measures will be.
Discuss historical events that have shaped formation of race : Discuss three to five historical events that have shaped the formation of race in our society as we know it today. Examine how Columbus's treatment.
What is an ideology in general : Explain how the Ford cost-benefit analysis showed that it would not be right to fix the exploding Pintos. Did the utilitarian analysis work, in this instance?

Reviews

len2372275

9/18/2019 3:38:03 AM

Submission Components & Due Dates - This is an individual assessment task and as such, each student will complete their own project and submission components. To be eligible for assessment in this task you must submit the following artefacts to the relevant submission folder on the Unit Site no later than the given deadline: The report detailing your models, experiments and outcomes of your reinforcement learning problems and solutions. Your report should provide adequate information to evidence your learning against the objectives stated above, and in line with the assessment rubric provided.

len2372275

9/18/2019 3:37:57 AM

All code developed or used in this project. Your code must include appropriate documentation (internal comments are sufficient) that explains what the code does. You should also provide instructions on how to execute the code (for example, in a README file). You may assume that the assessment team has access to OpenAI Gym and can execute your code. If you rely on any third party libraries or applications that are required to run your solution, you need to provide those, or make them accessible to the assessor (e.g., by providing a link to a dowload site, and instructions on how to install and use the library in your solution).

len2372275

9/18/2019 3:37:50 AM

Assignment Marking - This assignment will be marked on the following scale Does Not Meet Minimum Standards N, Meets Minimum Standards P or C, Exceeds Minimum Standards D, Greatly Exceeds Minimum Standards HD. A numeric mark will be assigned based on the assessor's determination as to where within the relevant grade category the standard or work sits. A rubric will be provided on the Unit Site, under the Resources>Assessment folder, to indicate the criteria upon which your submission components will be assessed and the standards that will be applied for these criteria. Please contact the teaching team if you have any concerns or questions regarding how you will be assessed.

len2372275

9/18/2019 3:37:43 AM

Penalties - In accordance with Faculty assessment policies, late submissions to the submission folder will incur a penalty of 5% of the total available marks per day, up to five days total, after which the score for this part of the task is 0. Such penalties will be deducted from the awarded numeric mark to determine the final grade for this task. Submissions will not be accepted or marked more than five days after the final submission deadline, except in cases where an extension has been approved prior to the deadline.

len2372275

9/18/2019 3:37:38 AM

Feedback - Students will receive verbal, written or recorded audio feedback on their project submission as part of their assessment. Due to the timing of assessment and scheduling of exams by DSA, it cannot be guaranteed that this feedback will be provided before the unit exam. Where a student requires specific feedback prior to the exam, they should contact the Unit Chair, allowing sufficient time prior to the exam for this feedback to be provided. Students are actively encouraged to seek formative feedback from peers and teaching staff, on their work completed before the submission deadline, to ensure they are on track with this task. Feedback may be obtained during weekly scheduled practical classes upon request. Talk to us and we'll support you!

Write a Review

Other Subject Questions & Answers

  Discuss skills for managing personnel through turbulence

Analyze at least three (3) challenges of succession planning for your selected agency. Propose at least three (3) components of effective implementation of succession planning.

  Three of topics that conflict between work and family life

Could someone describe and discuss in detail three of topics that conflict between Work and Family Life, Child Care, Gay and Lesbian Families?

  Adult talk phenomenon

Do you think that adults require someone around to talk with?

  Differentiate between checks and balances in the separation

Differentiate between checks and balances in the separation of power. Specify two (2) examples related to health care from your state government.

  Post and discuss two to three ideas of current issues

Post and discuss two to three ideas of current issues and trends in Criminal Justice they feel may be an interesting topic to tackle for this course.

  What is the future of minority coalitions

According to Rivera, Miller and Wright, what is the future of minority coalitions? Given that we are currently witnessing an era of increased minority.

  Definition of formative assessment and give one example

Research assessment strategies that include formal, informal, formative, and summative assessments.

  How the sociological viewpoint toward social problems

ociology lives when we engage it - we read about it, we discuss it, we debate it, we frame our research questions with it, we put it to the test of empiricism, and every once in a while we build it ourselves. Therefore, the success of this course..

  Discuss specific features from at least three buildings

What features did Early Christian architects take from Roman builders, and what new techniques did they develop?

  Important factor in political socialization

The most important factor in political socialization for the American public is

  Discuss virtuous and vicious acts in law enforcement

CRJ 220: Explain the two (2) factors that you believe would be the most critical for police to consider before making an arrest in a domestic violence situation

  Which is the stronger bond-love or economics

Do we in Western societies see marriage as more for love, or more for economic reasons? Which is the stronger bond, love or economics?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd