We introduce the value iteration network: a fully differentiable neural network with a `planning module’ embedded within. Value iteration networks are suitable for making predictions about outcomes that involve planning-based reasoning, such as predicting a desired trajectory from an observation of a map. Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented as a convolutional neural network, and trained end-to-end using standard backpropagation. We evaluate our value iteration networks on the task of predicting optimal obstacle-avoiding trajectories from an image of a landscape, both on synthetic data, and on challenging raw images of the Mars terrain.
Full text: http://arxiv.org/abs/1602.02867
This paper presents to the best of our knowledge the first end-to-end object tracking approach which directly maps from raw sensor input to object tracks in sensor space without requiring any feature engineering or system identification in the form of plant or sensor models. Specifically, our system accepts a stream of raw sensor data at one end and, in real-time, produces an estimate of the entire environment state at the output including even occluded objects. We achieve this by framing the problem as a deep learning task and exploit sequence models in the form of recurrent neural networks to learn a mapping from sensor measurements to object tracks. In particular, we propose a learning method based on a form of input dropout which allows learning in an unsupervised manner, only based on raw, occluded sensor data without access to ground-truth annotations. We demonstrate our approach using a synthetic dataset designed to mimic the task of tracking objects in 2D laser data — as commonly encountered in robotics applications — and show that it learns to track many dynamic objects despite occlusions and the presence of sensor noise.
Full text: http://arxiv.org/abs/1602.00991
We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. We present asynchronous variants of four standard reinforcement learning algorithms and show that parallel actor-learners have a stabilizing effect on training allowing all four methods to successfully train neural network controllers. The best performing method, an asynchronous variant of actor-critic, surpasses the current state-of-the-art on the Atari domain while training for half the time on a single multi-core CPU instead of a GPU. Furthermore, we show that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task involving finding rewards in random 3D mazes using a visual input.
Full text: http://arxiv.org/abs/1602.01783