Back to all members...

Angelos Filos

Angelos is a DPhil student in the Department of Computer Science at the University of Oxford, where he works in the Applied and Theoretical Machine Learning group (OATML) under the supervision of Yarin Gal. His research interests span multi-agent systems, meta-learning and reinforcement learning. He obtained an undergraduate and master’s degree from the Department of Electrical and Electronic Engineering at Imperial College London. He also contracts with J.P. Morgan Artificial Intelligence Research group, working on generative models, distributional reinforcement learning and inverse reinforcement learning.

Publications

Benchmarking Bayesian Deep Learning with Diabetic Retinopathy Diagnosis

We propose a new Bayesian deep learning (BDL) benchmark, inspired by a realworld medical imaging application on diabetic retinopathy diagnosis. In contrast to popular toy regression experiments on the UCI datasets, our benchmark can be used to assess both the scalability and the effectiveness of different techniques for uncertainty estimation, going beyond RMSE and NLL. A binary classification task on visual inputs (512 × 512 RGB images of retinas) is considered, where model uncertainty is used for medical pre-screening—i.e. to refer patients to an expert when model diagnosis is uncertain. We provide a comprehensive comparison of well-tuned BDL techniques on the benchmark, including Monte Carlo dropout, mean-field variational inference, an ensemble of deep models, an ensemble of dropout models, as well as a deterministic (deep) model. Baselines are ranked according to metrics derived from expert-domain to reflect real-world use of model uncertainty in automated diagnosis. We show that some current techniques which solve benchmarks such as UCI ‘overfit’ their uncertainty to UCI—when evaluated on our benchmark these underperform in comparison to simpler baselines—while other techniques that solve UCI do not scale or fail on the new benchmark. The code for the benchmark, its baselines, and a simple API for evaluating new models are made available at https://github.com/oatml/bdl-benchmarks.


Angelos Filos, Sebastian Farquhar, Aidan Gomez, Tim G. J. Rudner, Zac Kenton, Lewis Smith, Milad Alizadeh, Arnoud de Kroon, Yarin Gal
Preprint, 2019
[Preprint] [BibTex] [Code]

Towards Inverse Reinforcement Learning for Limit Order Book Dynamics

We investigate whether Inverse Reinforcement Learning (IRL) can infer rewards from agents within real financial stochastic environments: limit order books (LOB). Our results illustrate that complex behaviours, induced by non-linear reward functions amid agent-based stochastic scenarios, can be deduced through inference, encouraging the use of inverse reinforcement learning for opponent-modelling in multi-agent systems.


Jacobo Roa-Vicens, Cyrine Chtourou, Angelos Filos, Francisco Rullan, Yarin Gal, Ricardo Silva
Oral Presentation, Multi-Agent Learning Workshop at the 36th International Conference on Machine Learning, 2019
[arXiv] [BibTex]

Generalizing from a few environments in safety-critical reinforcement learning

Before deploying autonomous agents in the real world, we need to be confident they will perform safely in novel situations. Ideally, we would expose agents to a very wide range of situations during training (e.g. many simulated environments), allowing them to learn about every possible danger. But this is often impractical: simulations may fail to capture the full range of situations and may differ subtly from reality. This paper investigates generalizing from a limited number of training environments in deep reinforcement learning. Our experiments test whether agents can perform safely in novel environments, given varying numbers of environments at train time. Using a gridworld setting, we find that standard deep RL agents do not reliably avoid catastrophes on unseen environments – even after performing near optimally on 1000 training environments. However, we show that catastrophes can be significantly reduced (but not eliminated) with simple modifications, including Q-network ensembling to represent uncertainty and the use of a classifier trained to recognize dangerous actions.


Zac Kenton, Angelos Filos, Owain Evans, Yarin Gal
ICLR 2019 Workshop on Safe Machine Learning
[paper]


Reproducibility and Code

Code for Bayesian Deep Learning Benchmarks

In order to make real-world difference with **Bayesian Deep Learning** (BDL) tools, the tools must scale to real-world settings. And for that we, the research community, must be able to evaluate our inference tools (and iterate quickly) with real-world benchmark tasks. We should be able to do this without necessarily worrying about application-specific domain knowledge, like the expertise often required in medical applications for example. We require benchmarks to test for inference robustness, performance, and accuracy, in addition to cost and effort of development. These benchmarks should be at a variety of scales, ranging from toy MNIST-scale benchmarks for fast development cycles, to large data benchmarks which are truthful to real-world applications, capturing their constraints.

Code
Angelos Filos, Sebastian Farquhar, Aidan Gomez, Tim G. J. Rudner, Zac Kenton, Lewis Smith, Milad Alizadeh, Yarin Gal


Blog Posts

Poor generalization can be dangerous in RL!

We want to develop reinforcement learning (RL) agents that can be trusted to act in high-stakes situations in the real world. That means we need to generalize about common dangers that we might have experienced before, but in an unseen setting. For example, we know it is dangerous to touch a hot oven, even if it’s in a room we haven’t been in before. …

Full post...


Zac Kenton, Angelos Filos, Yarin Gal, 02 Jul 2019

Bayesian Deep Learning Benchmarks

In order to make real-world difference with Bayesian Deep Learning (BDL) tools, the tools must scale to real-world settings. And for that we, the research community, must be able to evaluate our inference tools (and iterate quickly) with real-world benchmark tasks. We should be able to do this without necessarily worrying about application-specific domain knowledge, like the expertise often required in medical applications for example. We require benchmarks to test for inference robustness, performance, and accuracy, in addition to cost and effort of development. These benchmarks should be at a variety of scales, ranging from toy MNIST-scale benchmarks for fast development cycles, to large data benchmarks which are truthful to real-world applications, capturing their constraints. …

Full post...


Angelos Filos, Sebastian Farquhar, Aidan Gomez, Tim G. J. Rudner, Zac Kenton, Lewis Smith, Milad Alizadeh, Yarin Gal, 14 Jun 2019

Contact

We are located at
Department of Computer Science, University of Oxford
Wolfson Building
Parks Road
OXFORD
OX1 3QD
UK
Twitter: @OATML_Oxford
Github: OATML
Email: oatml@cs.ox.ac.uk


Are you looking to do a PhD in machine learning? Did you do a PhD in another field and want to do a postdoc in machine learning? Would you like to visit the group?

How to apply