Welcome to the PLI Blog! Our blog post will cover topics such as new AI research, cross-disciplinary applications, societal implications, and more!
Yu Meng1*, Mengzhou Xia2*, Danqi Chen2
*Equal Contribution
1Computer Science Department,…
Jun-Jie Zhu, Meiqi Yang, Jinyue Jiang, Yiming Bai, Zhiyong Jason Ren
Department of Civil and Environmental Engineering and Andlinger Center for Energy and the Environment, Princeton University
Research Gap
Our discussion (Zhu et al.,…
A striking feature of large language models is their ability for in-context learning. In-context learning (ICL) is the ability to predict the response to a query based on illustrative examples presented…
Chongyi Zheng, Benjamin Eysenbach
Unsupervised learning is really powerful. It lies at the heart of large language models (just predict the next token), generative image models (predict what…
Naman Agarwal, Daniel Suo, Xinyi Chen, Elad Hazan
One of the biggest challenges for the…
The theoretical framework of structured state space duality (SSD) (see part 1 and 2 of this blogpost series) connects SSMs and (linear) attention through structured matrices. As mentioned in Part I, this connection allows us to derive new algorithms for selective SSMs that are faster than the parallel associative scan in Mamba-1 by…
The International Conference on Learning Representations (ICLR) is the premier gathering of professionals dedicated to the advancement of the branch of artificial intelligence called representation learning, but generally referred to as deep learning. Taking place between May 7th…
Mengzhou Xia*
Sadhika Malladi*
This post is based on the following work:
LESS: Selecting Influential Data for Targeted Instruction Tuning
Mengzhou Xia*1, Sadhika Malladi*1, Suchin Gururangan2, Sanjeev Arora1, Danqi Chen1
*denotes equal…
James Liu1*
Guangxuan Xiao1
Kai Li2
Jason D. Lee2
Song Han1,3
Tri Dao2,4
Tianle Cai2,4*
*indicates equal contribution
1MIT, 2Princeton University, 3NVIDIA, 4Together AI
ALiBi is a simple and widely used method to improve Transformer quality and length extrapolation. With a hardware-efficient implementation, we speed up ALiBi by 3-5x, unlocking new use cases of ALiBi for large-scale training.