Details

Abstract: Weak-to-Strong Generalization (Burns et al., 2024) is the phenomenon whereby a strong student, say GPT-4, learns a task from a weak teacher, say GPT-2, and ends up significantly outperforming the teacher. We show that this phenomenon does not require a strong learner like GPT-4. We consider student and teacher that are random feature models, described by two-layer networks with a random and fixed bottom layer and a trained top layer. A "weak" teacher, with a small number of units (i.e. random features), is trained on the population, and a "strong" student, with a much larger number of units (i.e. random features), is trained only on labels generated by the weak teacher. We demonstrate, prove, and understand how the student can outperform the teacher, even though trained only on data labeled by the teacher. We also explain how such weak-to-strong generalization is enabled by early stopping. Importantly, we also show the quantitative limits of weak-to-strong generalization in this model.
Short Bio: Zhiyuan Li is a tenure-track assistant professor at Toyota Technological Institute at Chicago (TTIC). He received PhD from Princeton University in 2022, advised Prof. Sanjeev Arora. He then spent one year for postdoc at Stanford University working with Prof. Tengyu Ma. His main research interests are in the theoretical foundation of machine learning, especially deep learning theory and large language models. He is a recipient of Microsoft Research PhD Fellowship and has served as Area Chair for major machine learning conferences such as Neurips and ICML.
Additional information: https://arxiv.org/abs/2503.02877