Work out the details of gradient ascent on the preference function found on pages 38+. Justify each step.
HW2
Derive a formula for in terms of and .
Derive a formula for in terms of and .
Use these two relations to derive recursions for and .
Substute the greedy policy into these recursions to derive recursions for and .
HW3
Work the gamblers problem in example 4.2 on page 84. Use value iteration. Turn in your code and plots of the value function and the policy like those in the book.
Exercise 4.9 on page 84.
Exercise 4.10 on page 84.
HW 4
Write code for policy iteration for the grid world example 3.5 and compare your results to those in example 3.8 on page 65. For this part, use the state value function in your policy iteration.