In this assignment, you will write pseudo-code for Markov Decision Process.
A Markov Decision Process also known as MDP model contains the following set of features:
Consider the following Grid (3 by 3):
Fire |
Diamond |
3 | |
---|---|---|---|
2 | |||
Start |
Blocked |
1 | |
1 | 2 | 3 |
An agent lives in a grid. It starts at grid number (1 * 1) and can roam around in the grid using the following actions:
UP, DOWN, LEFT, RIGHT
The goal of the agent is to reach the grid number (3 * 3) with the diamond state.
The agent must avoid the fire state at grid number (3 * 1) at any cost.
Also, there is a block grid at (1 * 3) state, which the agent can’t pass and must choose an alternate route.
The agent cannot pass a wall. For example, in the starting grid (1 * 1), the agent can only go either UP or RIGHT.
Based on the above information, write a pseudo-code in Java or Python to solve the problem using the Markov decision process.
Your pseudo-code must do the following
//Pseudocode for solving it using Markov Decision Process.
procedure value_iteration(P,r, θ )
inputs:
P is state transition function specifying P(s'|a,s)
r is a reward function R(s, a,s')
θ a threshold θ > 0
returns:
π[s] approximately optimal policy
V[s] value function
data structures:
Vk [s] a sequence of value functions
begin
for k = 1 : ∞
for each state s
Vk [s] = max(a) Summation of s' P(s'|a,s)(R(s, a,s') + γ Vk−1[s'])
if ∀s |Vk (s) − Vk−1(s)| < θ
for each state s
π(s) = arg max(a) Summation of s' P(s'|a,s)(R(s, a,s') + γ Vk−1[s'])
return π, Vk
end
Get Answers For Free
Most questions answered within 1 hours.