Chapter 3 Notes

in terms of and

Q: Give an equation for in terms of and .

A: Expand from the definition by introducing and marginalizing as follows,

in terms of and

Q: Given an equation for in terms of and four-argument .

A: Expand return and evaluate terms finally applying the Markov property as follows,

Notice that we found

even though we have the definition .

 

Gridworld Example

Observation: The state-value recursion (Bellman's equation) is a linear system of equations.

Here is some Matlab code that does all of this. Note in particularl the way that the four-argument function is set.

Here are the coefficient matrix and vector .

Here is the solution for the state-value function .

12345
13.30908.78934.42765.32241.4922
21.52162.99232.25011.90760.5474
30.05080.73820.67310.3582-0.4031
4-0.9736-0.4355-0.3549-0.5856-1.1831
5-1.8577-1.3452-1.2293-1.4229-1.9752

As the number of states, actions, and rewards increase, this method of solving becomes intractable.

Q: Why is the value for state (state A) less than the reward while the value for state (state B) greater than the reward ?