Jake Gunther
2020/20/2
Sigmoid activation function in ANN
Board representation (pg. 423) - like a one-hot encoding
Reward is zero except for a win
\[ \begin{gather} \delta_t = R_{t+1} + \gamma \max_a \tilde{q}(S_{t+1},a,\mathbf{w}) - \hat{q}(S_t,A_t,\mathbf{w}_t) \\ \mathbf{w}_{t+1} = \mathbf{w}_t + \alpha \text{clip}(\delta_t) \nabla \hat{q}(S_t,A_t,\mathbf{w}) \\ \text{clip}(\delta_t) = \begin{cases} +1, & \delta_t > +1 \\ \delta_t, & -1 \leq \delta_t \leq +1, \\ -1, & \delta_t < -1\end{cases} \end{gather} \]