As I read papers on reinforcement learning, there are key points that I like to look for and I've listed some of these below along with questions to prompt thinking.
Features are mappings from state space or state-action spaces into a domain where they can be combined with model weights to approximate value functions or policies, e.g. linear features for value function approximation and .
Describe the features used in the paper.
Were they hand-crafted by human experts?
Did they encode knowledge of the problem domain?
Were they learned using neural networks?
If neural networks are used:
Describe the learning algorithm used in the paper.
Describe the roots of the learning algorithm.
Is the algorithm on or off policy?
How does the algorithm balance exploration and exploitation?
Describe the algorithm relative to the deadly triad (off-policy, bootstrapping, function approximation).
Does the algorithm learn a value function or a policy or both?
Where does the algorithm fall in the Monte Carlo vs. Bootstrapping spectrum?
Is thte method applicable to episodic as well as continuing tasks?
What optimization methods were used: stochastic gradient descent, RMSProp, ADMM, etc.?
What is known about the bias, convergence, and asympmtotic performance of the algorithm?
What advantages does the proposed algorithm have over previous methods?
What special considerations are made for learning/traning?
What was learned through training and simulation?
How was the data used/reused in the training? (Experience replay seems to reappear in several papers.)
What tricks were used in learning to get the algorithm to converge, to reduce variance, and/or to acceleate convergence?