Back to all publications...

Generalizing from a few environments in safety-critical reinforcement learning

Before deploying autonomous agents in the real world, we need to be confident they will perform safely in novel situations. Ideally, we would expose agents to a very wide range of situations during training (e.g. many simulated environments), allowing them to learn about every possible danger. But this is often impractical: simulations may fail to capture the full range of situations and may differ subtly from reality. This paper investigates generalizing from a limited number of training environments in deep reinforcement learning. Our experiments test whether agents can perform safely in novel environments, given varying numbers of environments at train time. Using a gridworld setting, we find that standard deep RL agents do not reliably avoid catastrophes on unseen environments – even after performing near optimally on 1000 training environments. However, we show that catastrophes can be significantly reduced (but not eliminated) with simple modifications, including Q-network ensembling to represent uncertainty and the use of a classifier trained to recognize dangerous actions.

Zac Kenton, Angelos Filos, Owain Evans, Yarin Gal
ICLR 2019 Workshop on Safe Machine Learning

Are you looking to do a PhD in machine learning? Did you do a PhD in another field and want to do a postdoc in machine learning? Would you like to visit the group?

How to apply


We are located at
Department of Computer Science, University of Oxford
Wolfson Building
Parks Road
Twitter: @OATML_Oxford
Github: OATML