Details
Abstract: In this talk, I will discuss the important role that developmental analysis plays in understanding discontinuities (e.g. phase transitions, emergence) during LLM training. Although most interpretability research focuses only on understanding the behavior and features of a fully trained model, certain insights into model behavior can only be accessed by observing the trajectory of the training process. I will present a case study of Syntactic Attention Structure (SAS), a naturally emerging property of MLMs wherein specific Transformer heads tend to focus on specific syntactic relations. I will demonstrate how SAS acquisition leads to the subsequent acquisition of complex linguistic abilities and co-occurs with a steep drop in the loss. I will also discuss how SAS competes with other beneficial traits during training, and how briefly suppressing SAS can even accelerate learning. These findings offer an interpretation of a real-world example of both simplicity bias and breakthrough training dynamics.