OATML conference papers at NeurIPS 2025
OATML group members and collaborators are proud to present 8 papers at NeurIPS 2025.
MIP against Agent: Malicious Image Patches Hijacking Multimodal OS Agents
Lukas Aichberger, Alasdair Paren, Guohao Li, Philip Torr, Yarin Gal, Adel Bibi
Temporal-Difference Variational Continual Learning
Luckeciano Carvalho Melo, Alessandro Abate, Yarin Gal
Scaling Up Active Testing to Large Language Models
Gabrielle Berrada, Jannik Kossen, Muhammed Razzak, Freddie Bickford Smith, Yarin Gal, Tom Rainforth
Spotlight! Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning
Gunshi Gupta, Karmesh Yadav, Zsolt Kira, Yarin Gal, Rahaf Aljundi
Fundamental Limitations in Defending LLM Finetuning APIs
Xander Davies, Eric Winsor, Tomek Korbak, Alexandra Souly, Robert Kirk, Christian Schroeder de Witt, Yarin Gal
SECCODEPLT: A Unified Benchmark for Evaluating the Security Risks and Capabilities of Code GenAI
Yuzhou Nie, Zhun Wang, Yu Yang, Ruizhe Jiang, Yuheng Tang, Xander Davies, Yarin Gal, Bo Li, Wenbo Guo, Dawn Song
Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition
Andy Zou, Maxwell Lin, Eliot Krzysztof Jones, Micha V. Nowak, Mateusz Dziemian, Nick Winter, Valent Nathanael, Ayla Croft, Xander Davies, Jai Patel, Robert Kirk, Yarin Gal, Dan Hendrycks, J Zico Kolter, Matt Fredrikson
Measuring what Matters: Construct Validity in Large Language Model Benchmarks
Andrew M. Bean, Ryan Othniel Kearns, Angelika Romanou, Franziska Sofia Hafner, Harry Mayne, Jan Batzner, Negar Foroutan, Chris Schmitz, Karolina Korgul, Hunar Batra, Oishi Deb, Emma Beharry, Cornelius Emde, Thomas Foster, Anna Gausen, María Grandury, Simeng Han, Valentin Hofmann, Lujain Ibrahim, Hazel Kim, Hannah Rose Kirk, Fangru Lin, Gabrielle Kaili-May Liu, Lennart Luettgau, Jabez Magomere, Jonathan Rystrøm, Anna Sotnikova, Yushi Yang, Yilun Zhao, Adel Bibi, Antoine Bosselut, Ronald Clark, Arman Cohan, Jakob Nicolaus Foerster, Yarin Gal, Scott A. Hale, Inioluwa Deborah Raji, Christopher Summerfield, Philip Torr, Cozmin Ududec, Luc Rocher, Adam Mahdi