Back to all members...
Daniella (Zihuiwen) Ye
Postdoc, started 2024
Daniella (Zihuiwen) Ye is a DPhil alumna in Computer Science at the University of Oxford, supervised by Yarin Gal and Phil Blunsom. Her research interests lie in enhancing the reliability and robustness of Large Language Models (LLMs) for real-world applications, with a particular focus on text generation. She is also interested in exploring methods of controlled generation of language models, employing statistical and linguistic approaches.
Some of the topics she has researched in previous years include improving human preference learning for LLMs with synthetic critiques, a project she undertook during her internship at Cohere. She has also explored augmenting text-to-code generation processes through self-play, and applying diffusion models for non-autoregressive text planning. She is a recipient of the DeepMind scholarship.
Publications while at OATML • News items mentioning Daniella (Zihuiwen) Ye • Reproducibility and Code • Blog Posts
Publications while at OATML:
Likelihood hacking in probabilistic program synthesis
When language models are trained by reinforcement learning (RL) to write probabilistic programs, they can artificially inflate their marginal-likelihood reward by producing programs whose data distribution fails to normalise instead of fitting the data better. We call this failure likelihood hacking (LH). We formalise LH in a core probabilistic programming language (PPL) and give sufficient syntactic conditions for its prevention, proving that a safe language fragment safe satisfying these conditions cannot produce likelihood-hacking programs. Empirically, we show that GRPO-trained models generating PyMC code discover LH exploits within the first few training steps, driving violation rates well above the untrained-model baseline. We implement safe's conditions as 𝚂𝚊𝚏𝚎𝚂𝚝𝚊𝚗, a LH-resistant modification of Stan, and show empirically that it prevents LH under optimisation pressure. These results show that language-level safety constraints are both theoretically grounded and effective... [full abstract]
Jacek Karwowski, Younesse Kaddar, Daniella (Zihuiwen) Ye, Nikolay Malkin, Sam Staton
arxiv
[paper]
Uncertainty-Aware Step-wise Verification with Generative Reward Models
Complex multi-step reasoning tasks, such as solving mathematical problems, remain challenging for large language models (LLMs). While outcome supervision is commonly used, process supervision via process reward models (PRMs) provides intermediate rewards to verify step-wise correctness in solution traces. However, as proxies for human judgement, PRMs suffer from reliability issues, including susceptibility to reward hacking. In this work, we propose leveraging uncertainty quantification (UQ) to enhance the reliability of step-wise verification with generative reward models for mathematical reasoning tasks. We introduce CoT Entropy, a novel UQ method that outperforms existing approaches in quantifying a PRM's uncertainty in step-wise verification. Our results demonstrate that incorporating uncertainty estimates improves the robustness of judge-LM PRMs, leading to more reliable verification.
Daniella (Zihuiwen) Ye, Luckeciano Carvalho Melo, Younesse Kaddar, Phil Blunsom, Sam Staton, Yarin Gal
arXiv
[paper]