HKU HKU Dept of Statistics & Actuarial Science, HKU
 
 

Seminar by Prof. Yifan Chen from Department of Computer Science, Hong Kong Baptist University


DateWednesday, 11 September 2024
Time2:30 p.m. – 3:30 p.m.
VenueRR301, Run Run Shaw Building
 
TitleStatistical approaches towards efficient transformers
Abstract

Transformer is the backbone architecture of most recent phenomenal language models. In this talk, I will delve into the approximation techniques for the attention and the MLP modules in Transformers. Firstly, I will discuss the connection between attention mechanisms and kernel estimators, and accordingly adapt the Nyström techniques for fast kernel computation to attention approximation. Then, we turn to the compression of the MLP layers in Transformers, which preserves the neural tangent kernel (NTK) thereof and accelerates both fine-tuning and inference for large language models. The two aspects collectively showcase the statistical structures behind popular deep learning designs.

About the speaker

Yifan Chen is an assistant professor in computer science and math at HKBU. He is broadly interested in developing efficient algorithms for machine learning, encompassing both statistical and deep learning models. Before joining HKBU, Yifan earned his Ph.D. degree in Statistics from UIUC in 2023.