Sergio Calvo Ordoñez

Back to all members...

Associate Member (PhD), started 2025

Sergio is a DPhil student in the Random Systems CDT at the University of Oxford, co-supervised by Álvaro Cartea, Yarin Gal, and José Miguel Hernández-Lobato (University of Cambridge). His research interests lie broadly in Probabilistic Machine Learning. In particular, he studies the connections between Gaussian Processes and Deep Learning, and explores advances in Generative Modelling (particularly LLMs, diffusion models, and flow matching). His work has two main aims: to develop theoretically grounded and efficient techniques for quantifying model uncertainty, and to design methods that accelerate training and/or inference in state-of-the-art generative models.

Before arriving in Oxford, Sergio completed the MPhil in Machine Learning and Machine Intelligence at the University of Cambridge and earned a BSc in Theoretical Physics from QMUL. He has prior industry experience in both quantitative finance and big tech. As a Research Scientist Intern at Spotify, he worked at the intersection of Gaussian Processes and Deep Learning theory. Later, as a Quant Research Intern at a hedge fund, he developed an LLM-based systematic trading strategy. He is funded by the Man Group through the Oxford-Man Institute Scholarship.

Publications while at OATML • News items mentioning Sergio Calvo Ordoñez • Reproducibility and Code • Blog Posts

Publications while at OATML:

Richer Bayesian Last Layers with Subsampled NTK Features

Bayesian Last Layers (BLLs) provide a convenient and computationally efficient way to estimate uncertainty in neural networks. However, they underestimate epistemic uncertainty because they apply a Bayesian treatment only to the final layer, ignoring uncertainty induced by earlier layers. We propose a method that improves BLLs by leveraging a projection of Neural Tangent Kernel (NTK) features onto the space spanned by the last-layer features. This enables posterior inference that accounts for variability of the full network while retaining the low computational cost of inference of a standard BLL. We show that our method yields posterior variances that are provably greater or equal to those of a standard BLL, correcting its tendency to underestimate epistemic uncertainty. To further reduce computational cost, we introduce a uniform subsampling scheme for estimating the projection matrix and for posterior inference. We derive approximation bounds for both types of subsampling. Empir... [full abstract]

Sergio Calvo Ordoñez, Jonathan Plenk, Richard Bergna, Álvaro Cartea, Yarin Gal, Jose Miguel Hernández-Lobato, Kamil Ciosek
arxiv
[paper]

Activation-Space Uncertainty Quantification for Pretrained Networks

Reliable uncertainty estimates are crucial for deploying pretrained models; yet, many strong methods for quantifying uncertainty require retraining, Monte Carlo sampling, or expensive second-order computations and may alter a frozen backbone's predictions. To address this, we introduce Gaussian Process Activations (GAPA), a post-hoc method that shifts Bayesian modeling from weights to activations. GAPA replaces standard nonlinearities with Gaussian-process activations whose posterior mean exactly matches the original activation, preserving the backbone's point predictions by construction while providing closed-form epistemic variances in activation space. To scale to modern architectures, we use a sparse variational inducing-point approximation over cached training activations, combined with local k-nearest-neighbor subset conditioning, enabling deterministic single-pass uncertainty propagation without sampling, backpropagation, or second-order information. Across regression, class... [full abstract]

Richard Bergna, Stefan Depeweg, Sergio Calvo Ordoñez, Jonathan Plenk, Álvaro Cartea, Jose Miguel Hernández-Lobato
arxiv
[paper]

Weighted Conditional Flow Matching

Conditional flow matching (CFM) has emerged as a powerful framework for training continuous normalizing flows due to its computational efficiency and effectiveness. However, standard CFM often produces paths that deviate significantly from straight-line interpolations between prior and target distributions, making generation slower and less accurate due to the need for fine discretization at inference. Recent methods enhance CFM performance by inducing shorter and straighter trajectories but typically rely on computationally expensive mini-batch optimal transport (OT). Drawing insights from entropic optimal transport (EOT), we propose Weighted Conditional Flow Matching (W-CFM), a novel approach that modifies the classical CFM loss by weighting each training pair (x,y) with a Gibbs kernel. We show that this weighting recovers the entropic OT coupling up to some bias in the marginals, and we provide the conditions under which the marginals remain nearly unchanged. Moreover, we establ... [full abstract]

Sergio Calvo Ordoñez, Matthieu Meunier, Álvaro Cartea, Christoph Reisinger, Yarin Gal, Jose Miguel Hernández-Lobato
arxiv
[paper]

A Gaussian Process View on Observation Noise and Initialization in Wide Neural Networks

Performing gradient descent in a wide neural network is equivalent to computing the posterior mean of a Gaussian Process with the Neural Tangent Kernel (NTK-GP), for a specific prior mean and with zero observation noise. However, existing formulations have two limitations: (i) the NTK-GP assumes noiseless targets, leading to misspecification on noisy data; (ii) the equivalence does not extend to arbitrary prior means, which are essential for well-specified models. To address (i), we introduce a regularizer into the training objective, showing its correspondence to incorporating observation noise in the NTK-GP. To address (ii), we propose a \textit{shifted network} that enables arbitrary prior means and allows obtaining the posterior mean with gradient descent on a single network, without ensembling or kernel inversion. We validate our results with experiments across datasets and architectures, showing that this approach removes key obstacles to the practical use of NTK-GP equivalen... [full abstract]

Sergio Calvo Ordoñez, Jonathan Plenk, Richard Bergna, Álvaro Cartea, Jose Miguel Hernández-Lobato, Konstantina Palla, Kamil Ciosek
arxiv
[paper]

More publications on Google Scholar.