PROPEL | Assured Autonomy Tools Portal

PROPEL

PROPEL observes that neural policy representations are amenable to gradient-based learning but are hard to verify or interpret. On the other hand, symbolic/programmatic policy representations are relatively easy to verify and interpret but are more difficult to learn because of the combinatorial nature of program synthesis. Why not then simultaneously maintain two representations, one neural and one symbolic, during learning?

The PROPEL approach formalizes this insight by viewing program learning as constrained mirror ascent, a generalization of gradient ascent to constrained optimization settings. We consider two classes of policies: a highly expressive class H (implemented in practice using a mix of neural networks and symbolic functions) that possesses approximate gradients, and a more constrained, and possibly non-differentiable, class F of “desirable” symbolic/ programmatic policies

Acknowledgements

This work is supported in part by the  DARPA Assured Autonomy  program.

Contacts

Swarat Chaudhuri (University of Texas, Austin)

ORGANIZATION

University Of Texas, Austin, TX, USA

Contributors

Swarat Chaudhuri

References

Verma, A., Le, H. M., Yue, Y., & Chaudhuri, S. (2019). Imitation-Projected Programmatic Reinforcement Learning. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada (p. 15726\textendash15737). Retrieved from http://papers.nips.cc/paper/9705-imitation-projected-programmatic-reinforcement-learning