What can be determine qualitatively about the optimal policy

Assignment Help Basic Computer Science
Reference no: EM131678091

Question: Consider an undiscounted MDP having three states, (1, 2, 3), with rewards -1, -2, 0 respectively. State 3 is a terminal state. In states 1 and 2 there are two possible actions: a and b. The transition model is as follows: In state 1, action a moves the agent to state 2 with probability 0.8 and makes the agent stay put with probability 0.2. In state 2, action a moves the agent to state 1 with probability 0.8 and makes the agent stay put with probability 0.2.a In either state 1 or state 2, action b moves the agent to state 3 with probability 0.1 and makes the agent stay put with probability 0.9. Answer the following questions:

a. What can be determined qualitatively about the optimal policy in states 1 and 2?

b. Apply policy iteration, showing each step in full, to determine the optimal policy and the values of states 1 and 2. Assume that the initial policy has action b in both states.

c. What happens to policy iteration if the initial policy has action a in both states? Does discounting help? Does the optimal policy depend on the discount factor?

Reference no: EM131678091

Questions Cloud

Reflect a bit on the ideas of isolation and connection : Reflect a bit on the ideas of isolation and connection, and why they are important to these authors, and to urban anthropology in general
Determine the finite search problem : Can any finite search problem be translated exactly into a Markov decision problem such that an optimal solution of the latter is also an optimal solution.
What areas might require special attention during our audit : What areas might require special attention during our audit so that we minimize our Audit Risk - Companys business and the related environmental factors
Discuss the ramifications if pete accepts henrys proposal : Write a paper that discusses ramifications if Pete accepts Henry's proposal. What laws may be violated and ethical considerations should be taken into account?
What can be determine qualitatively about the optimal policy : Consider an undiscounted MDP having three states, (1, 2, 3), with rewards -1, -2, 0 respectively. State 3 is a terminal state.
Identify evidence regarding the organizations : Find examples of operations strategies used by an organization. Identify evidence regarding the organization's.
Discuss the actors use to communicate their feelings : What techniques do the actors use to communicate their feelings? Do they appear to address the viewer directly
What is the legal requirement : Hosmer's book condominium owner's verses condominium employees:What is the legal requirement ,What is the increased rights for one .
Four types of mass customization in your product : You will be using all four types of mass customization in your product. Describe each type and how you will implement it for your product.

Reviews

Write a Review

Basic Computer Science Questions & Answers

  Identifies the cost of computer

identifies the cost of computer components to configure a computer system (including all peripheral devices where needed) for use in one of the following four situations:

  Input devices

Compare how the gestures data is generated and represented for interpretation in each of the following input devices. In your comparison, consider the data formats (radio waves, electrical signal, sound, etc.), device drivers, operating systems suppo..

  Cores on computer systems

Assignment : Cores on Computer Systems:  Differentiate between multiprocessor systems and many-core systems in terms of power efficiency, cost benefit analysis, instructions processing efficiency, and packaging form factors.

  Prepare an annual budget in an excel spreadsheet

Prepare working solutions in Excel that will manage the annual budget

  Write a research paper in relation to a software design

Research paper in relation to a Software Design related topic

  Describe the forest, domain, ou, and trust configuration

Describe the forest, domain, OU, and trust configuration for Bluesky. Include a chart or diagram of the current configuration. Currently Bluesky has a single domain and default OU structure.

  Construct a truth table for the boolean expression

Construct a truth table for the Boolean expressions ABC + A'B'C' ABC + AB'C' + A'B'C' A(BC' + B'C)

  Evaluate the cost of materials

Evaluate the cost of materials

  The marie simulator

Depending on how comfortable you are with using the MARIE simulator after reading

  What is the main advantage of using master pages

What is the main advantage of using master pages. Explain the purpose and advantage of using styles.

  Describe the three fundamental models of distributed systems

Explain the two approaches to packet delivery by the network layer in Distributed Systems. Describe the three fundamental models of Distributed Systems

  Distinguish between caching and buffering

Distinguish between caching and buffering The failure model defines the ways in which failure may occur in order to provide an understanding of the effects of failure. Give one type of failure with a brief description of the failure

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd