Date
Nov 4, 2024, 2:00 pm – 3:00 pm
Location
https://www.youtube.com/@PrincetonPLI

Details

Event Description

Tulu 3: Exploring Frontiers in Open Language Model Post-Training 

Reinforcement learning from human feedback (RLHF) and other post-training techniques are driving an increasing proportion of innovations on leading, primarily closed, language models. To date, RLHF's application to open, generalist language models has largely been utilizing small datasets and restricted to so-called "vibes" evaluations like Alpaca-Eval, MT-Bench, Arena-hard, etc. In this talk, we cover the full life cycle of Tulu 3 to date -- refreshing the entire stack of language model post training on open resources. Tulu 3's training consists of new capability focused synthetic datasets, scaled on-policy preference tuning, and reinforcement learning from ground truth outputs. The Tulu 3 suite also comes with modern recommendations for how to evaluate modern chat models and tools to decontaminate popular datasets and benchmarks. The final models with Tulu 3 surpass Llama 3.1 Instruct when training from the base model and show strong performance across many sizes (405B coming soon).

https://www.natolambert.com/

Sponsor
Organized by PLI