Department of Statistics and Actuarial Science, HKU

Date	Wednesday, 11 September 2024
Time	2:30 p.m. – 3:30 p.m.
Venue	RR301, Run Run Shaw Building

Title	Statistical approaches towards efficient transformers
Abstract	Transformer is the backbone architecture of most recent phenomenal language models. In this talk, I will delve into the approximation techniques for the attention and the MLP modules in Transformers. Firstly, I will discuss the connection between attention mechanisms and kernel estimators, and accordingly adapt the Nyström techniques for fast kernel computation to attention approximation. Then, we turn to the compression of the MLP layers in Transformers, which preserves the neural tangent kernel (NTK) thereof and accelerates both fine-tuning and inference for large language models. The two aspects collectively showcase the statistical structures behind popular deep learning designs.
About the speaker	Yifan Chen is an assistant professor in computer science and math at HKBU. He is broadly interested in developing efficient algorithms for machine learning, encompassing both statistical and deep learning models. Before joining HKBU, Yifan earned his Ph.D. degree in Statistics from UIUC in 2023.
	Poster preview Close poster

Seminar by Prof. Yifan Chen from Department of Computer Science, Hong Kong Baptist University