In a coin game, you repeatedly toss a biased coin (0.75 for head, 0.25 for tail). Each head represent 3 points and tail represents 1 points. You can either Toss or Stop if the total number of points you have tossed is no more than 7. Otherwise, you must Stop. When you Stop, your utility is equal to your total points (up to 7), or zero if you get a total of 8 or higher. When you Toss, you receive no utility. There is no discount ( = 1).
i. What are the states and the actions for this MDP?
State:
Action:
ii. What is the transition function and the reward function for this MDP?
Transition function:
A.
B.
C.
D.
E.
Reward function:
A.
B.
C.
iii. Give an intuitively good policy for this problem (you do not need to calculate the optimal policy).
(i). What are the states and the actions for this MDP?
State:current points if stop plus a terminal state, that is, 0,1,2,3,4,5,6,7,DONE
Action:Toss,Stop
(ii). What is the transition function and the reward function for this MDP?
Transition function:
T(Si , TOSS, Si+3) = 0.75 if i < 3
T(Si , TOSS, DONE) = 0.75 if i ≥ 3
T(Si , TOSS, Si+1) = 0.25 if i < 7
T(Si , TOSS, DONE) = 0.75 if i = 7
T(Si , STOP, DONE) = 1
Reward function:
R(Si , TOSS, ANY ) = 0
R(Si , STOP, DONE) = i
R(DONE, STOP, DONE) = 0
(iii). Give an intuitively good policy for this problem
Optimal policy: Toss for 0,1,2; STOP for others
The value iteration will converge at iteration 3. Result of iteration 3 is as follow,
V3:
0: 4.5 from Toss; 1: 5.4 from Toss; 2: 5.9 from Toss; 3: 3 from Stop; 4: 4 from Stop; 5: 5 from Stop; 6: 6 from Stop; 7: 7 from Stop
Get Answers For Free
Most questions answered within 1 hours.