ICLR Poster Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic

Poster

Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic

Mikael Henaff · Alfredo Canziani · Yann LeCun

Great Hall BC #15

Keywords: [ model-based reinforcement learning ] [ stochastic video prediction ] [ autonomous driving ]

[ Abstract ]

Abstract:

Learning a policy using only observational data is challenging because the distribution of states it induces at execution time may differ from the distribution observed during training. In this work, we propose to train a policy while explicitly penalizing the mismatch between these two distributions over a fixed time horizon. We do this by using a learned model of the environment dynamics which is unrolled for multiple time steps, and training a policy network to minimize a differentiable cost over this rolled-out trajectory. This cost contains two terms: a policy cost which represents the objective the policy seeks to optimize, and an uncertainty cost which represents its divergence from the states it is trained on. We propose to measure this second cost by using the uncertainty of the dynamics model about its own predictions, using recent ideas from uncertainty estimation for deep networks. We evaluate our approach using a large-scale observational dataset of driving behavior recorded from traffic cameras, and show that we are able to learn effective driving policies from purely observational data, with no environment interaction.

Live content is unavailable. Log in and register to view live content