Assignments

HW1

Work out the details of gradient ascent on the preference function found on pages 38+. Justify each step.

HW2

$v_\pi$ $q_\pi$ $\pi$ .
$q_\pi$ $v_\pi$ $p$ .
$v_\pi$ $q_\pi$ .
$v_\ast$ $q_\ast$ .

HW3

Work the gamblers problem in example 4.2 on page 84. Use value iteration. Turn in your code and plots of the value function and the policy like those in the book.
Exercise 4.9 on page 84.
Exercise 4.10 on page 84.

HW 4

$v_\pi$ in your policy iteration.
$q_\pi$ in your policy iteration.