In this work we show how to represent and learn policies that are themselves programs, i.e. stateful procedures with learnable parameters. Towards learning the parameters of such policies we develop connections between black box variational inference and existing policy learning approaches. We then explain how such learning can be implemented in a probabilistic programming system. Using our own novel implementation of such a system we demonstrate both conciseness of policy representation and automatic policy parameter learning for a range of canonical reinforcement learning problems.
Full text: https://arxiv.org/pdf/1507.