Speaker
Details
Whispers in the Weight: Unraveling the Mysteries of LLM Compression
Modern Large Language Models (LLMs) have revolutionized Natural Language Processing, yet their computational demands require compression. Through a series of studies, we delve into the intricacies of LLM compression and explore potential remedies. First, we challenge conventional compression evaluation metrics by introducing the Knowledge-Intensive Compressed LLM BenchmarK (LLM-KICK). This curated task collection provides nuanced insights into compression methods beyond perplexity. We illuminate pitfalls in existing pruning and quantization techniques, uncovering , for instance, the robustness of pruned LLMs in contextually demanding tasks. Next, we navigate the trade-offs of post-compression re-training and explore the promise of prompt-driven recovery. Through cleverly designed and learned prompting, prompts can be autonomously selected based on context, resulting in a notable performance boost across a diverse range of tasks. Further, drawing inspiration from genomics, we conduct a holistic scientific study to examine weight redundancy in LLMs, articulating our findings as the "Junk DNA Hypothesis" for LLMs. This challenges common assumptions about low-magnitude weights, revealing their pivotal role in complex tasks, and that removing them risks irreversible knowledge loss.
Bio
Professor Zhangyang “Atlas” Wang is a tenured Associate Professor and holds the Temple Foundation Endowed Faculty Fellowship #7, in the Chandra Family Department of Electrical and Computer Engineering at The University of Texas at Austin. He is also a faculty member of UT Computer Science and the Oden Institute. Meanwhile, in a part-time role, he serves as the Director of AI Research & Technology for Picsart, where he leads the development of cutting-edge, GenAI-powered tools for creative visual editing. Prof. Wang has broad research interests spanning from the theory to the application aspects of machine learning (ML). At present, his core research mission is to leverage, understand and expand the role of low dimensionality in machine learning and optimization, whose impacts span over many important topics such as: efficient scaling, training and inference of large language models (LLMs); robustness and trustworthiness; learning to optimize (L2O); and generative vision. Prof. Wang has received many research awards and is fortunate enough to work with a sizable group of accomplished students.