We propose an efficient new method for training neural networks in reinforcement learning tasks. Our method leverages advantages of both trust-region and natural gradient methods, by employing natural gradient direction as a way to approximately solve the trust-region sub-problems. We show that our method performs favorably compared with other well-tuned reinforcement learning methods on the F16 model. In particular, our method reaches higher reward using fewer iterations, which makes it outstanding among best-case results.
Keywords: high order optimization, reinforcement learning
Sudhir Kylasa (Purdue University)
Bing Yuan (Purdue University)
Purdue University, West Lafayette, Indiana, USA