Nov 29, 2023



Event Description

Abstract: A phase shift in public interest, private investment, and research priorities in AI occurred in response to the release of ChatGPT in November 2022. In this talk, I will provide an overview of the “classical” approach to the key method behind ChatGPT’s capabilities, reinforcement learning from human feedback (RLHF), including the formal problem statement and typical implementation. I will then describe our recently proposed alternative algorithm, Direct Preference Optimization (DPO), which reduces the computational cost, implementation complexity, and potential instabilities of the classic RLHF pipeline (catalyzing strong open source chat models such as HuggingFace Zephyr and AI2’s Tülu). Finally, I will discuss applications of DPO to current challenges in NLP, such as improving factuality and reducing “hallucinations” of large language models.

Bio: Eric Mitchell is a final-year PhD student at Stanford University, advised by Chelsea Finn and Christopher D Manning. His research develops methods to improve the reliability of large language models, in particular through enabling them to better align with human values, respond to changes in the state of the world, and reason soundly.

Special Event