We propose an efficient new method for training neural networks in reinforcement learning tasks. Our method leverages advantages of both trust-region and natural gradient methods, by employing natural gradient direction as a way to approximately solve the trust-region sub-problems. We show that our method performs favorably compared with other well-tuned reinforcement learning methods on the F16 model. In particular, our method reaches higher reward using fewer iterations, which makes it outstanding among best-case results.
Keywords: high order optimization, reinforcement learning
This work is supported in part by the DARPA Assured Autonomy program.
Sudhir Kylasa (Purdue University)
Bing Yuan (Purdue University)
ORGANIZATION
Purdue University, West Lafayette, Indiana, USA