Jake Gunther
2020/15/1
Assume: static data
Perform: multiple passes over data
Need: online, incremental methods
Assume: stationary function
Need: adaptation to time-varying functions
\[ \mathbf{w}_{t+1} = \mathbf{w}_t + \alpha\left[ U_t - \hat{v}(S_t,\mathbf{w}_t)\right] \nabla \hat{v}(S_t,\mathbf{w}_t) \]
Convergence to local opt is guaranteed.
Exercise 9.1 page 209. How are the tabular methods special cases of linear function approximation. What is \(\mathbf{x}(s)\)?
In many problems, states naturally expressed as numbers:
Simple but don’t work as well as other bases.
Easy to use and perform well in many RL tasks.
Without overlap, coarse coding is just state aggregation.
Think about what kind of approximations would arise from linear combinations of binary vectors on these shapes.
Receptive field shape strong effect on generalization(first row) but little effect on asymptotic approximation quality (last row)
\[ x_i(s) = \exp\left(-\frac{\|s-c_i\|^2}{2\sigma_i^2}\right) \]
To learn in about \(\tau\) steps, let
\[ \alpha = \frac{1}{\tau E\|\mathbf{x}\|^2} \]
(You already know about this.)
\[ \begin{gather} \mathbf{b} = E[R_{t+1}\mathbf{x}_t] \quad \text{and} \quad \mathbf{A} = E[ \mathbf{x}_t(\mathbf{x}_t - \gamma \mathbf{x}_{t+1})^T] \\ \mathbf{w}_\text{TD} = \mathbf{A}^{-1} \mathbf{b} \end{gather} \]
\[ \begin{gather} \hat{\mathbf{A}}_t = \sum_{k=0}^{t-1} \mathbf{x}_k (\mathbf{x}_k-\gamma \mathbf{x}_{k+1})^T + \varepsilon \mathbf{I} \qquad \hat{\mathbf{b}}_t = \sum_{k=0}^{t-1} R_{t+1} \mathbf{x}_k \\ \mathbf{w}_t = \hat{\mathbf{A}}_t^{-1} \hat{\mathbf{b}}_t \end{gather} \]
\[ \begin{gather} \hat{\mathbf{A}}_{t}^{-1} = \hat{\mathbf{A}}_{t-1}^{-1} - \frac{\hat{\mathbf{A}}_{t-1}^{-1} \mathbf{x}_t (\mathbf{x}_t - \gamma\mathbf{x}_{t+1})^T \hat{\mathbf{A}}_{t-1}^{-1} }{1 + (\mathbf{x}_t - \gamma\mathbf{x}_{t+1})^T \hat{\mathbf{A}}_{t-1}^{-1} \mathbf{x}_t},\;\; \hat{\mathbf{A}}_0 = \varepsilon \mathbf{I} \end{gather} \]
Previous methods:
\[ \hat{v}(s,\mathcal{D}) = \sum_{s' \in \mathcal{D}} k(s,s') g(s') \]
\[ k(s,s') \]
\[ k(s,s') \]