INM707 Deep Reinforcement Learning Assignment

Assignment Help Other Subject
Reference no: EM132970407

INM707 Deep Reinforcement Learning - City, University of London

This coursework builds on the material covered in the tutorials and lecture. On completing this coursework, you should be able to implement and understand classical tabular Reinforcement Learning (week 1-5) as well as Deep Reinforcement Learning (week 6-10) algorithms. This coursework builds on the material covered in the tutorial and lecture slides. You will make use of the different concepts learned in the module:
• How to define and implement a Reinforcement Learning (RL) problem
• How to implement solutions in Pythons
• How to evaluate different algorithms

The Tasks
In this coursework, you are expected to demonstrate what you have learned in the module in terms of Tabular RL and Deep RL. Additionally, you have the occasion to work on a problem of your choosing (related to RL) for additional points.

The maximum number of marks which can be scored is 100. You can gain up to 20 additional marks in Task 4, but the maximum of marks obtained for the Report and Code is capped at 100.

In all tasks, you can use the built-in libraries of python (math, random, ...), numpy, and matplotlib. If you think that you might benefit from using another library, you can ask about it on Moodle. You will use PyTorch in Task 3, and you can use any library in Task 4.

Task 1:

You need to design and develop a tabular RL environment that follows an Markov Decision Process. In this task, the environment will provide a state that is used to train your RL algorithm. Your report should answer the following points:

- Description of the environment
- Description of the agent and its actions

- Description of the different dynamics of the environment (the rules of the game), as well as rewards

The environment should be non trivial and:

- Include some stochasticity (for example, obstacles that moves randomly)
- Terminate (define the termination conditions)
- Be different from the environments proposed in the labs.

In this first task, the number of different states that the agent can reach should be finite. The total number of states can be very large, but bear in mind that classical RL algorithms might not converge. So a good idea is to parameterize the number of states (e.g. in the labs, the number of different states where dependent on the size of the environment), to make sure that your problem is solvable.

An alternative is to use an external environment that you modify. However the modifications should be substantial and demonstrate an effort on par with the development of a new environment.

Task 2:

The second task is about implementing classical RL algorithms. You have the choice between implementing Q-learning or Dyna-Q.

For this task, you should:

- Describe and explain the algorithm you chose.
- Conduct a case study of how this algorithm performs on the environment you implemented in Task 1.
- Evaluate the performance of your algorithm.
- You should evaluate the effect of the different hyper-parameters.
- If your environment is parameterizable, a good addition would be to evaluate how your RL algorithms performs when the environment becomes more and more complex.

Task 3:

The third task requires to implement a Deep Reinforcement Learning algorithm. You cannot present DQN and its different improvements for this Task, as we will cover them in the labs.

You can select any of the following algorithms to implement and evaluate:

- Policy Optimization (e.g. Proximal Policy Optimization, Advantage Asynchronous Actor Critic, ...)
- Q-learning: Hindsight Experience Replay
- World Models
- Soft Actor Critic

You can use any Deep RL environment available online, or use a scaled up version of your environment developed in Task 1. For example, if the complexity of the environment in Task 1 can be scaled up to the point where tabular approaches (e.g. Q-learning and Dyna-Q) can't perform well, then it is appropriate to use them. You might need to adapt your environment slightly to return observations instead of states.

For this task, you should:

- Describe and explain the algorithm you chose.
- Describe the environment, if it is different from Task 1.
- Conduct a case study of how this algorithm performs on the environment you choose.
- Evaluate the performance of your algorithm
- You should evaluate the effect of the different hyper-parameters
- If your environment is parameterizable, a good addition would be to evaluate how your RL algorithms performs when the environment becomes more and more complex.

Task 4: - optional

For this last optional task, you can work on a topic of your choosing. It must be related to the Module, and different from Tasks 1 to 3. Note that it can constitute a continuation of Task 1, 2 or 3.

Reports
You must submit a Summary page that includes:
- Your name, student id, and team member
- For each task, the percentage of borrowed code and reference to the sources. You need to present a fair estimate, and you should declare if you wrote everything yourself.
- Your personal reflection

Your final report should cover each of the aspects described in this document (and any other element of your work that you believe should be reported). Graphically illustration of your results is expected as well as numerical results.

Attachment:- Deep Reinforcement Learning.rar

Reference no: EM132970407

Questions Cloud

What is the amount of gain or loss on redemption : Bonds Payable has a debit balance of $9,500. If the issuing corporation redeems the bonds at 102, what is the amount of gain or loss on redemption?
Determine the maximum amount of personal tax credits : Determine the maximum amount of 2018 personal tax credits, including transfers from a spouse or dependant, that can be applied against federal Tax Payable
What assets might the business have liabilities : What are the investments Dr. Allen Ronald has probably made in his business and how would each affect the basic accounting equation?
What average annual profitability would put forward order : Your customers know nothing about calculating profitability. What way calculating your average annual profitability would you put forward order to the bad year?
INM707 Deep Reinforcement Learning Assignment : INM707 Deep Reinforcement Learning Assignment Help and Solution, City, University of London - Assessment Writing Service
How could flipkart have ensured the success : Flipkart's much-touted 'Big Billion Day' sale turned out to be a fiasco. What do you think Flipkart did wrong? How could Flipkart have ensured the success
What models might use to value the stocks : If the Fed stimulates the economy at this point, you believe that you would be better off with stocks than with bonds...explain why?
Explain lifo and fifo inventory methods : Provide 1 example of a business that uses LIFO and 1 example of a business that uses FIFO. Explain LIFO and FIFO inventory methods.
Which of statements regarding retirement trends is correct : Which of the statements regarding retirement trends is correct? The savings rate in the United States has consistently been increasing over the last 30 years.

Reviews

len2970407

8/20/2021 11:49:05 PM

This project needs to finish with 4parts. I need Coding as jupyter notebook and the summary page for the project. There is an assignment file that contains what needs to be done. Thank you.

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd