i. Epsilon-Greedy is a technique that is used
a. to improve the model free Monte Carlo algorithms
b. to tune Q-learning algorithms to enable exploitation from the very beginning
c. to tune Q-learning algorithms to enable exploration all the time
d. to tune Q-learning algorithms to balance exploration and exploitation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`
ii. To obtain high Q-value as well as high average utility in Epsilon-Greedy technique
a. the best policy would be set Epsilon to zero (0) to enable exploration
b. the best policy would be to set Epsilon to one (1) to enable exploitation
c. the best policy would be set Epsilon to one (1) in the beginning to enable exploration, zero (0) at the end to enable exploitation and some value in between to combine exploration and exploitation
d. the best policy would be set Epsilon to a constant value like 0.5
Answer 1-
The correct Answer is — to tune Q-learning algorithms to balance exploration and exploitation (option D)
Explanation:
Answer 2-
The correct Answer- the best policy would be set Epsilon to one (1) in the beginning to enable exploration, zero (0) at the end to enable exploitation and some value in between to combine exploration and exploitation(option C)
Explanation:
Note- Please do upvote, if any problem then comment in
box sure I will help.
Get Answers For Free
Most questions answered within 1 hours.