#### Back to all members...

Yarin Gal

Yarin leads the *Oxford Applied and Theoretical Machine Learning (OATML)* group. He is an Associate Professor of Machine Learning at the Computer Science department, University of Oxford. He is also the Tutorial Fellow in Computer Science at Christ Church, Oxford, and a Turing Fellow at the Alan Turing Institute, the UK’s national institute for data science and artificial intelligence.
Prior to his move to Oxford he was a Research Fellow in Computer Science at St Catharine’s College at the University of Cambridge. He obtained his PhD from the Cambridge machine learning group, working with Prof Zoubin Ghahramani and funded by the Google Europe Doctoral Fellowship. Prior to that he studied at Oxford Computer Science department for a Master’s degree under the supervision of Prof Phil Blunsom. Before his MSc he worked as a software engineer for 3 years at IDesia Biometrics developing code and UI for mobile platforms, and did his undergraduate in mathematics and computer science at the Open University in Israel.

**Fields Yarin has published work in include:**
*Bayesian deep learning • deep learning • adversarial machine learning • approximate Bayesian inference • Gaussian processes • Bayesian modelling • Bayesian non-parametrics • scalable MCMC • generative modelling.*
**With applications including:**
*AI safety • ML interpretability • reinforcement learning • active learning • natural language processing • computer vision • medical analysis.*
**The arching theme** leading his research is *understanding empirically developed machine learning techniques*. A full list of publications is available here.

Publications

**Benchmarking Bayesian Deep Learning with Diabetic Retinopathy Diagnosis**

We propose a new Bayesian deep learning (BDL) benchmark, inspired by a realworld medical imaging application on diabetic retinopathy diagnosis. In contrast to popular toy regression experiments on the UCI datasets, our benchmark can be used to assess both the scalability and the effectiveness of different techniques for uncertainty estimation, going beyond RMSE and NLL. A binary classification task on visual inputs (512 × 512 RGB images of retinas) is considered, where model uncertainty is used for medical pre-screening—i.e. to refer patients to an expert when model diagnosis is uncertain. We provide a comprehensive comparison of well-tuned BDL techniques on the benchmark, including Monte Carlo dropout, mean-field variational inference, an ensemble of deep models, an ensemble of dropout models, as well as a deterministic (deep) model. Baselines are ranked according to metrics derived from expert-domain to reflect real-world use of model uncertainty in automated diagnosis. We show that some current techniques which solve benchmarks such as UCI ‘overfit’ their uncertainty to UCI—when evaluated on our benchmark these underperform in comparison to simpler baselines—while other techniques that solve UCI do not scale or fail on the new benchmark. The code for the benchmark, its baselines, and a simple API for evaluating new models are made available at https://github.com/oatml/bdl-benchmarks.

Angelos Filos, Sebastian Farquhar, Aidan Gomez, Tim G. J. Rudner, Zac Kenton, Lewis Smith, Milad Alizadeh, Arnoud de Kroon, Yarin Gal

*Preprint, 2019*

[Preprint] [BibTex] [Code]

**BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning**

We develop BatchBALD, a tractable approximation to the mutual information between a batch of points and model parameters, which we use as an acquisition function to select multiple informative points jointly for the task of deep Bayesian active learning. BatchBALD is a greedy linear-time 1−1/e-approximate algorithm amenable to dynamic programming and efficient caching. We compare BatchBALD to the commonly used approach for batch data acquisition and find that the current approach acquires similar and redundant points, sometimes performing worse than randomly acquiring data. We finish by showing that, using BatchBALD to consider dependencies within an acquisition batch, we achieve new state of the art performance on standard benchmarks, providing substantial data efficiency improvements in batch acquisition.

Andreas Kirsch, Joost van Amersfoort, Yarin Gal

*Under review*

[arXiv]

**Galaxy Zoo: Probabilistic Morphology through Bayesian CNNs and Active Learning**

We use Bayesian CNNs and a novel generative model of Galaxy Zoo volunteer responses to infer posteriors for the visual morphology of galaxies. Bayesian CNN can learn from galaxy images with uncertain labels and then, for previously unlabelled galaxies, predict the probability of each possible label. Using our posteriors, we apply the active learning strategy BALD to request volunteer responses for the subset of galaxies which, if labelled, would be most informative for training our network. By combining human and machine intelligence, Galaxy Zoo will be able to classify surveys of any conceivable scale on a timescale of weeks, providing massive and detailed morphology catalogues to support research into galaxy evolution.

Mike Walmsley, Lewis Smith, Chris Lintott, Yarin Gal, Steven Bamford, Hugh Dickinson, Lucy Fortson, Sandor Kruk, Karen Masters, Claudia Scarlata, Brooke Simmons, Rebecca Smethurst, Darryl Wright

*Under review*

[arXiv]

**An Ensemble of Bayesian Neural Networks for Exoplanetary Atmospheric Retrieval**

Recent work demonstrated the potential of using machine learning algorithms for atmospheric retrieval by implementing a random forest to perform retrievals in seconds that are consistent with the traditional, computationally-expensive nested-sampling retrieval method. We expand upon their approach by presenting a new machine learning model, exttt{plan-net}, based on an ensemble of Bayesian neural networks that yields more accurate inferences than the random forest for the same data set of synthetic transmission spectra.

Adam D. Cobb, Michael D. Himes, Frank Soboczenski, Simone Zorzan, Molly D. O'Beirne, Atılım Güneş Baydin, Yarin Gal, Shawn D. Domagal-Goldman, Giada N. Arney, Daniel Angerhausen

**The Astronomical Journal**[arXiv] [Code]

**Towards Inverse Reinforcement Learning for Limit Order Book Dynamics**

We investigate whether Inverse Reinforcement Learning (IRL) can infer rewards from agents within real financial stochastic environments: limit order books (LOB). Our results illustrate that complex behaviours, induced by non-linear reward functions amid agent-based stochastic scenarios, can be deduced through inference, encouraging the use of inverse reinforcement learning for opponent-modelling in multi-agent systems.

Jacobo Roa-Vicens, Cyrine Chtourou, Angelos Filos, Francisco Rullan, Yarin Gal, Ricardo Silva

**Oral Presentation, Multi-Agent Learning Workshop at the 36th International Conference on Machine Learning, 2019**[arXiv] [BibTex]

**Generalizing from a few environments in safety-critical reinforcement learning**

Before deploying autonomous agents in the real world, we need to be confident they will perform safely in novel situations. Ideally, we would expose agents to a very wide range of situations during training (e.g. many simulated environments), allowing them to learn about every possible danger. But this is often impractical: simulations may fail to capture the full range of situations and may differ subtly from reality. This paper investigates generalizing from a limited number of training environments in deep reinforcement learning. Our experiments test whether agents can perform safely in novel environments, given varying numbers of environments at train time. Using a gridworld setting, we find that standard deep RL agents do not reliably avoid catastrophes on unseen environments – even after performing near optimally on 1000 training environments. However, we show that catastrophes can be significantly reduced (but not eliminated) with simple modifications, including Q-network ensembling to represent uncertainty and the use of a classifier trained to recognize dangerous actions.

Zac Kenton, Angelos Filos, Owain Evans, Yarin Gal

**ICLR 2019 Workshop on Safe Machine Learning**[paper]

**Bayesian Deep Learning for Exoplanet Atmospheric Retrieval**

An ML-based retrieval framework called Intelligent exoplaNet Atmospheric RetrievAl (INARA) that consists of a Bayesian deep learning model for retrieval and a data set of 3,000,000 synthetic rocky exoplanetary spectra generated using the NASA Planetary Spectrum Generator.

Frank Soboczenski, Michael D. Himes, Molly D. O'Beirne, Simone Zorzan, Atilim Gunes Baydin, Adam D. Cobb, Yarin Gal, Daniel Angerhausen, Massimo Mascaro, Giada N. Arney, Shawn D. Domagal-Goldman

*Workshop on Bayesian Deep Learning, NeurIPS 2018*[arXiv]

**On the Connection between Neural Processes and Gaussian Processes with Deep Kernels**

Neural Processes (NPs) are a class of neural latent variable models that combine desirable properties of Gaussian Processes (GPs) and neural networks. Like GPs, NPs define distributions over functions and are able to estimate the uncertainty in their predictions. Like neural networks, NPs are computationally efficient during training and prediction time. We establish a simple and explicit connection between NPs and GPs. In particular, we show that, under certain conditions, NPs are mathematically equivalent to GPs with deep kernels. This result further elucidates the relationship between GPs and NPs and makes previously derived theoretical insights about GPs applicable to NPs. Furthermore, it suggests a novel approach to learning expressive GP covariance functions applicable across different prediction tasks by training a deep kernel GP on a set of datasets

Tim G. J. Rudner, Vincent Fortuin, Yee Whye Teh, Yarin Gal

**Workshop on Bayesian Deep Learning, NeurIPS 2018**[Paper] [BibTex]

**On the Importance of Strong Baselines in Bayesian Deep Learning**

Like all sub-fields of machine learning, Bayesian Deep Learning is driven by empirical validation of its theoretical proposals. Given the many aspects of an experiment, it is always possible that minor or even major experimental flaws can slip by both authors and reviewers. One of the most popular experiments used to evaluate approximate inference techniques is the regression experiment on UCI datasets. However, in this experiment, models which have been trained to convergence have often been compared with baselines trained only for a fixed number of iterations. What we find is that if we take a well-established baseline and evaluate it under the same experimental settings, it shows significant improvements in performance. In fact, it outperforms or performs competitively with numerous to several methods that when they were introduced claimed to be superior to the very same baseline method. Hence, by exposing this flaw in experimental procedure, we highlight the importance of using identical experimental setups to evaluate, compare and benchmark methods in Bayesian Deep Learning.

Jishnu Mukhoti, Pontus Stenetorp, Yarin Gal

**Workshop on Bayesian Deep Learning, NeurIPS 2018**[Paper] [arXiv] [BibTex]

**Evaluating Bayesian Deep Learning Methods for Semantic Segmentation**

Deep learning has been revolutionary for computer vision and semantic segmentation in particular, with Bayesian Deep Learning (BDL) used to obtain uncertainty maps from deep models when predicting semantic classes. This information is critical when using semantic segmentation for autonomous driving for example. Standard semantic segmentation systems have well-established evaluation metrics. However, with BDL's rising popularity in computer vision we require new metrics to evaluate whether a BDL method produces better uncertainty estimates than another method. In this work we propose three such metrics to evaluate BDL models designed specifically for the task of semantic segmentation. We modify DeepLab-v3+, one of the state-of-the-art deep neural networks, and create its Bayesian counterpart using MC dropout and Concrete dropout as inference techniques. We then compare and test these two inference techniques on the well-known Cityscapes dataset using our suggested metrics. Our results provide new benchmarks for researchers to compare and evaluate their improved uncertainty quantification in pursuit of safer semantic segmentation.

Jishnu Mukhoti, Yarin Gal

*arXiv*

[arXiv] [BibTex]

**Evaluating Uncertainty Quantification in End-to-End Autonomous Driving Control**

Self-driving has benefited from significant performance improvements with the rise of deep learning, with millions of miles having been driven with no human intervention. Despite this, crashes and erroneous behaviours still occur, in part due to the complexity of verifying the correctness of DNNs and a lack of safety guarantees. In this paper, we demonstrate how quantitative measures of uncertainty can be extracted in real-time, and their quality evaluated in end-to-end controllers for self-driving cars. We propose evaluation techniques for the uncertainty on two separate architectures which use the uncertainty to predict crashes up to five seconds in advance. We find that mutual information, a measure of uncertainty in classification networks, is a promising indicator of forthcoming crashes.

Rhiannon Michelmore, Marta Kwiatkowska, Yarin Gal

*In submission*

[arXiv] [BibTex]

**Targeted Dropout**

Neural networks are extremely flexible models due to their large number of parameters, which is beneficial for learning, but also highly redundant. This makes it possible to compress neural networks without having a drastic effect on performance. We introduce targeted dropout, a strategy for post hoc pruning of neural network weights and units that builds the pruning mechanism directly into learning. At each weight update, targeted dropout selects a candidate set for pruning using a simple selection criterion, and then stochastically prunes the network via dropout applied to this set. The resulting network learns to be explicitly robust to pruning, comparing favourably to more complicated regularization schemes while at the same time being extremely simple to implement, and easy to tune.

Aidan Gomez, Ivan Zhang, Kevin Swersky, Yarin Gal, Geoffrey E. Hinton

**Workshop on Compact Deep Neural Networks with industrial applications, NeurIPS 2018**[Paper] [BibTex]

**A Unifying Bayesian View of Continual Learning**

Some machine learning applications require continual learning—where data comes in a sequence of datasets, each is used for training and then permanently discarded. From a Bayesian perspective, continual learning seems straightforward: Given the model posterior one would simply use this as the prior for the next task. However, exact posterior evaluation is intractable with many models, especially with Bayesian neural networks (BNNs). Instead, posterior approximations are often sought. Unfortunately, when posterior approximations are used, prior-focused approaches do not succeed in evaluations designed to capture properties of realistic continual learning use cases. As an alternative to prior-focused methods, we introduce a new approximate Bayesian derivation of the continual learning loss. Our loss does not rely on the posterior from earlier tasks, and instead adapts the model itself by changing the likelihood term. We call these approaches likelihood-focused. We then combine prior- and likelihood-focused methods into one objective, tying the two views together under a single unifying framework of approximate Bayesian continual learning.

Sebastian Farquhar, Yarin Gal

**NeurIPS 2018 workshop on Bayesian Deep Learning**[Paper] [BibTex]

**Using Bayesian Optimization to Find Asteroids' Pole Directions**

Near-Earth asteroids (NEAs) are being discovered much faster than their shapes and other physical properties can be characterized in detail. One of the best ways to spatially resolve NEAs from the ground is with planetary radar observations. Radar echoes can be decoded in round-trip travel time and frequency to produce two-dimensional delay-Doppler images of the asteroid. Given a series of such images acquired over the course of the asteroid's rotation, one can search for the shape and other physical properties that best match the observations. However, reconstructing asteroid shapes from radar data is, like many inverse problems, a computationally intensive task. Shape modeling also requires extensive human oversight to ensure that the fitting process is finding physically reasonable results. In this paper we use Bayesian optimisation for this difficult task.

Marshall, Sean, Cobb, Adam, Raïssi, Chedy, Yarin Gal, Rozek, Agata, Busch, Michael W., Young, Grace, McGlasson, Riley

**American Astronomical Society (AAS), 2018**[Citation] [BibTex]

**An Empirical study of Binary Neural Networks' Optimisation**

Binary neural networks using the Straight-Through-Estimator (STE) have been shown to achieve state-of-the-art results, but their training process is not well-founded. This is due to the discrepancy between the evaluated function in the forward path, and the weight updates in the back-propagation, updates which do not correspond to gradients of the forward path. Efficient convergence and accuracy of binary models often rely on careful fine-tuning and various ad-hoc techniques. In this work, we empirically identify and study the effectiveness of the various ad-hoc techniques commonly used in the literature, providing best-practices for efficient training of binary models. We show that adapting learning rates using second moment methods is crucial for the successful use of the STE, and that other optimisers can easily get stuck in local minima. We also find that many of the commonly employed tricks are only effective towards the end of the training, with these methods making early stages of the training considerably slower. Our analysis disambiguates necessary from unnecessary ad-hoc techniques for training of binary neural networks, paving the way for future development of solid theoretical foundations for these. Our newly-found insights further lead to new procedures which make training of existing binary neural networks notably faster.

Milad Alizadeh, Javier Fernández-Marqués, Nicholas D. Lane, Yarin Gal

**International Conference on Learning Representations (ICLR), 2019**[Paper] [Code]

**BRUNO: A Deep Recurrent Model for Exchangeable Data**

We present a novel model architecture which leverages deep learning tools to perform exact Bayesian inference on sets of high dimensional, complex observations. Our model is provably exchangeable, meaning that the joint distribution over observations is invariant under permutation: this property lies at the heart of Bayesian inference. The model does not require variational approximations to train, and new samples can be generated conditional on previous samples, with cost linear in the size of the conditioning set. The advantages of our architecture are demonstrated on learning tasks that require generalisation from short observed sequences while modelling sequence variability, such as conditional image generation, few-shot learning, and anomaly detection.

Iryna Korshunova, Jonas Degrave, Ferenc Huszár, Yarin Gal, Arthur Gretton, Joni Dambre

*arXiv, 2018*

[arXiv] [BibTex]

**NIPS, 2018**[Paper] [BibTex]

**Sufficient Conditions for Idealised Models to Have No Adversarial Examples: a Theoretical and Empirical Study with Bayesian Neural Networks**

We prove, under two sufficient conditions, that idealised models can have no adversarial examples. We discuss which idealised models satisfy our conditions, and show that idealised Bayesian neural networks (BNNs) satisfy these. We continue by studying near-idealised BNNs using HMC inference, demonstrating the theoretical ideas in practice. We experiment with HMC on synthetic data derived from MNIST for which we know the ground-truth image density, showing that near-perfect epistemic uncertainty correlates to density under image manifold, and that adversarial images lie off the manifold in our setting. This suggests why MC dropout, which can be seen as performing approximate inference, has been observed to be an effective defence against adversarial examples in practice; We highlight failure-cases of non-idealised BNNs relying on dropout, suggesting a new attack for dropout models and a new defence as well. Lastly, we demonstrate the defence on a cats-vs-dogs image classification task with a VGG13 variant.

Lewis Smith, Yarin Gal

*arXiv, 2018*

[arXiv] [BibTex]

**Automating Asteroid Shape Modeling From Radar Images**

Characterizing the shapes and spin states of near-Earth asteroids is essential both for trajectory predictions to rule out potential future Earth impacts and for planning spacecraft missions. But reconstructing objects’ shapes and spins from delay-Doppler data is a computationally intensive inversion problem. We implement a Bayesian optimization routine that uses SHAPE to autonomously search the space of spin-state parameters, yielding spin state constraints within a factor of 3 less computer runtime and minimal human supervision. These routines are now being incorporated into radar data processing pipelines at Arecibo.

Michael W. Busch, Agata Rozek, Sean Marshall, Grace Young, Adam Cobb, Chedy Raissi, Yarin Gal, Lance Benner, Shantanu Naidu, Marina Brozovic, Patrick Taylor

**COSPAR (Committee on Space Research) Assembly, 2018**[Program] [Blog Post (Adam Cobb)] [BibTex]

**Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam**

Uncertainty computation in deep learning is essential to design robust and reliable systems. Variational inference (VI) is a promising approach for such computation, but requires more effort to implement and execute compared to maximum-likelihood methods. In this paper, we propose new natural-gradient algorithms to reduce such efforts for Gaussian mean-field VI. Our algorithms can be implemented within the Adam optimizer by perturbing the network weights during gradient evaluations, and uncertainty estimates can be cheaply obtained by using the vector that adapts the learning rate. This requires lower memory, computation, and implementation effort than existing VI methods, while obtaining uncertainty estimates of comparable quality. Our empirical results confirm this and further suggest that the weight-perturbation in our algorithm could be useful for exploration in reinforcement learning and stochastic optimization.

Mohammad Emtiyaz Khan, Didrik Nielsen, Voot Tangkaratt, Wu Lin, Yarin Gal, Akash Srivastava

**ICML, 2018**[Paper] [arXiv] [BibTex]

**Differentially private continual learning**

Catastrophic forgetting can be a significant problem for institutions that must delete historic data for privacy reasons. For example, hospitals might not be able to retain patient data permanently. But neural networks trained on recent data alone will tend to forget lessons learned on old data. We present a differentially private continual learning framework based on variational inference. We estimate the likelihood of past data given the current model using differentially private generative models of old datasets. The differentially private training has no detrimental impact on our architecture's continual learning performance, and still outperforms the current state-of-the-art non-private continual learning.

Sebastian Farquhar, Yarin Gal

**Privacy in Machine Learning and Artificial Intelligence workshop, ICML, 2018**[Paper] [BibTex]

**Loss-Calibrated Approximate Inference in Bayesian Neural Networks**

Current approaches in approximate inference for Bayesian neural networks minimise the Kullback-Leibler divergence to approximate the true posterior over the weights. However, this approximation is without knowledge of the final application, and therefore cannot guarantee optimal predictions for a given task. To make more suitable task-specific approximations, we introduce a new loss-calibrated evidence lower bound for Bayesian neural networks in the context of supervised learning, informed by Bayesian decision theory. By introducing a lower bound that depends on a utility function, we ensure that our approximation achieves higher utility than traditional methods for applications that have asymmetric utility functions. Furthermore, in using dropout inference, we highlight that our new objective is identical to that of standard dropout neural networks, with an additional utility-dependent penalty term. We demonstrate our new loss-calibrated model with an illustrative medical example and a restricted model capacity experiment, and highlight failure modes of the comparable weighted cross entropy approach. Lastly, we demonstrate the scalability of our method to real world applications with per-pixel semantic segmentation on an autonomous driving data set.

Adam D. Cobb, Stephen J. Roberts, Yarin Gal

**Theory of deep learning workshop, ICML, 2018**[arXiv] [Code] [BibTex]

**Using Pre-trained Full-Precision Models to Speed Up Training Binary Networks For Mobile Devices**

Binary Neural Networks (BNNs) are well-suited for deploying Deep Neural Networks (DNNs) to small embedded devices but state-of-the-art BNNs need to be trained from scratch. We show how weights from a trained full-precision model can be used to speed-up training binary networks. We show that for CIFAR-10, accuracies within 1% of the full-precision model can be achieved in just 5 epochs.

Milad Alizadeh, Nicholas D. Lane, Yarin Gal

**16th ACM International Conference on Mobile Systems (MobiSys), 2018**[Abstract] [BibTex]

**Towards Robust Evaluations of Continual Learning**

Continual learning experiments used in current deep learning papers do not faithfully assess fundamental challenges of learning continually, masking weak-points of the suggested approaches instead. We study gaps in such existing evaluations, proposing essential experimental evaluations that are more representative of continual learning's challenges, and suggest a re-prioritization of research efforts in the field. We show that current approaches fail with our new evaluations and, to analyse these failures, we propose a variational loss which unifies many existing solutions to continual learning under a Bayesian framing, as either 'prior-focused' or 'likelihood-focused'. We show that while prior-focused approaches such as EWC and VCL perform well on existing evaluations, they perform dramatically worse when compared to likelihood-focused approaches on other simple tasks.

Sebastian Farquhar, Yarin Gal

**Lifelong Learning: A Reinforcement Learning Approach workshop, ICML, 2018**[arXiv] [BibTex]

**Understanding Measures of Uncertainty for Adversarial Example Detection**

Measuring uncertainty is a promising technique for detecting adversarial examples, crafted inputs on which the model predicts an incorrect class with high confidence. But many measures of uncertainty exist, including predictive entropy and mutual information, each capturing different types of uncertainty. We study these measures, and shed light on why mutual information seems to be effective at the task of adversarial example detection. We highlight failure modes for MC dropout, a widely used approach for estimating uncertainty in deep models. This leads to an improved understanding of the drawbacks of current methods, and a proposal to improve the quality of uncertainty estimates using probabilistic model ensembles. We give illustrative experiments using MNIST to demonstrate the intuition underlying the different measures of uncertainty, as well as experiments on a real world Kaggle dogs vs cats classification dataset.

Lewis Smith, Yarin Gal

**UAI, 2018**[Paper] [arXiv] [BibTex]

**Vprop: Variational Inference using RMSprop**

Many computationally-efficient methods for Bayesian deep learning rely on continuous optimization algorithms, but the implementation of these methods requires significant changes to existing code-bases. In this paper, we propose Vprop, a method for variational inference that can be implemented with two minor changes to the off-the-shelf RMSprop optimizer. Vprop also reduces the memory requirements of Black-Box Variational Inference by half. We derive Vprop using the conjugate-computation variational inference method, and establish its connections to Newton’s method, natural-gradient methods, and extended Kalman filters. Overall, this paper presents Vprop as a principled, computationally-efficient, and easy-to-implement method for Bayesian deep learning.

Mohammad Emtiyaz Khan, Zuozhu Liu, Voot Tangkaratt, Yarin Gal

**Bayesian Deep Learning workshop, NIPS, 2017**[Paper] [arXiv] [BibTex]

Reproducibility and Code

**Code for BatchBALD blog post and paper (active learning)**

This is the code for the blog post Human in the Loop: Deep Learning without Wasteful Labelling and the paper BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning.

Code, PublicationAndreas Kirsch, Joost van Amersfoort, Yarin Gal

**Code for Bayesian Deep Learning Benchmarks**

In order to make real-world difference with **Bayesian Deep Learning** (BDL) tools, the tools must scale to real-world settings. And for that we, the research community, must be able to evaluate our inference tools (and iterate quickly) with real-world benchmark tasks. We should be able to do this without necessarily worrying about application-specific domain knowledge, like the expertise often required in medical applications for example. We require benchmarks to test for inference robustness, performance, and accuracy, in addition to cost and effort of development. These benchmarks should be at a variety of scales, ranging from toy MNIST-scale benchmarks for fast development cycles, to large data benchmarks which are truthful to real-world applications, capturing their constraints.

CodeAngelos Filos, Sebastian Farquhar, Aidan Gomez, Tim G. J. Rudner, Zac Kenton, Lewis Smith, Milad Alizadeh, Yarin Gal

**Code for "An Ensemble of Bayesian Neural Networks for Exoplanetary Atmospheric Retrieval"**

Recent work demonstrated the potential of using machine learning algorithms for atmospheric retrieval by implementing a random forest to perform retrievals in seconds that are consistent with the traditional, computationally-expensive nested-sampling retrieval method. We expand upon their approach by presenting a new machine learning model, exttt{plan-net}, based on an ensemble of Bayesian neural networks that yields more accurate inferences than the random forest for the same data set of synthetic transmission spectra.

Code, PublicationAdam D. Cobb, Michael D. Himes, Frank Soboczenski, Simone Zorzan, Molly D. O'Beirne, Atılım Güneş Baydin, Yarin Gal, Shawn D. Domagal-Goldman, Giada N. Arney, Daniel Angerhausen

Blog Posts

**Poor generalization can be dangerous in RL!**

We want to develop reinforcement learning (RL) agents that can be trusted to act in high-stakes situations in the real world. That means we need to generalize about common dangers that we might have experienced before, but in an unseen setting. For example, we know it is dangerous to touch a hot oven, even if it’s in a room we haven’t been in before. …

Full post...Zac Kenton, Angelos Filos, Yarin Gal, 02 Jul 2019

**Human in the Loop: Deep Learning without Wasteful Labelling**

In *Active Learning* we use a “human in the loop” approach to data labelling,
reducing the amount of data that needs to be labelled drastically, and
making machine learning applicable when labelling costs would be too high otherwise.
In our paper
[1]
we present BatchBALD: a new *practical* method for choosing batches of informative points in Deep Active Learning which
avoids labelling redundancies that plague existing methods. Our approach is based on information theory and expands
on useful intuitions. We have also made our implementation available on GitHub at
https://github.com/BlackHC/BatchBALD.
…

Andreas Kirsch, Joost van Amersfoort, Yarin Gal, 24 Jun 2019

**Bayesian Deep Learning Benchmarks**

In order to make real-world difference with **Bayesian Deep Learning** (BDL) tools, the tools must scale to real-world settings. And for that we, the research community, must be able to evaluate our inference tools (and iterate quickly) with real-world benchmark tasks. We should be able to do this without necessarily worrying about application-specific domain knowledge, like the expertise often required in medical applications for example. We require benchmarks to test for inference robustness, performance, and accuracy, in addition to cost and effort of development. These benchmarks should be at a variety of scales, ranging from toy MNIST-scale benchmarks for fast development cycles, to large data benchmarks which are truthful to real-world applications, capturing their constraints. …

Angelos Filos, Sebastian Farquhar, Aidan Gomez, Tim G. J. Rudner, Zac Kenton, Lewis Smith, Milad Alizadeh, Yarin Gal, 14 Jun 2019