Policy search methods can allow robots to learn control policies for a wide range of tasks, but practical applications of policy search often require hand-engineered components for perception, state estimation, and low-level control. In this paper, we aim to answer the following question: does training the perception and control systems jointly end-to-end provide better performance than training each component separately? To this end, we develop a method that can be used to learn policies that map raw image observations directly to torques at the robot’s motors. The policies are represented by deep convolutional neural networks (CNNs) with 92,000 parameters, and are trained using a partially observed guided policy search method, which transforms policy search into supervised learning, with supervision provided by a simple trajectory-centric reinforcement learning method. We evaluate our method on a range of real-world manipulation tasks that require close coordination between vision and control, such as screwing a cap onto a bottle, and present simulated comparisons to a range of prior policy search methods.
Full text: https://arxiv.org/abs/1504.00702
We deal with linear approaches to the Markov Decision Process (MDP). In particular, we describe Policy Evaluation (PE) methods and Value Iteration (VI) methods as a matrix multiplication. We then use these algorithms with representations of MDPs that are compressed using a linear operator. Subsequently, we use these methods in the context of the options framework, which is a way of employing temporal abstraction to speed up MDP solving. The main novel contributions are: the analysis of convergence of the linear compression framework, a condition for when a linear compression framework is optimal, an in-depth analysis of the LSTD algorithm, the formulation of value iteration with options in the linear framework and the combination of linear state aggregation and options.
Download (PDF, 665KB)
Data-efficient reinforcement learning (RL) in continuous state-action spaces using very high-dimensional observations remains a key challenge in developing fully autonomous systems. We consider a particularly important instance of this challenge, the pixels-to-torques problem, where an RL agent learns a closed-loop control policy (“torques”) from pixel information only. We introduce a data-efficient, model-based reinforcement learning algorithm that learns such a closed-loop policy directly from pixel information. The key ingredient is a deep dynamical model for learning a low-dimensional feature embedding of images jointly with a predictive model in this low-dimensional feature space. Joint learning is crucial for long-term predictions, which lie at the core of the adaptive nonlinear model predictive control strategy that we use for closed-loop control. Compared to state-of-the-art RL methods for continuous states and actions, our approach learns quickly, scales to high-dimensional state spaces, is lightweight and an important step toward fully autonomous end-to-end learning from pixels to torques.
Full text: http://arxiv.org/abs/1510.02173