HKU HKU Dept of Statistics & Actuarial Science, HKU
 
 

Seminar by Prof. Weidi XIE from Shanghai Jiao Tong University and Shanghai Artificial Intelligence Laboratory


DateTuesday, 30 April 2024
Time10:30 a.m. – 11:30 a.m.
VenueRR101, Run Run Shaw Building
 
TitleMultimodal representation learning in videos and medicine
Abstract

Understanding multimodal signals is of great interest for the artificial intelligence community. In this talk, I will intend to cover some of our recent research from two different domains, namely, video understanding, and AI4Medicine. On video understanding, comparing to the analysis on static images, the extra time axis introduces both challenges and opportunities. I will discuss some recent works on long video understanding, for example, grounded visual question answering on egocentric videos, retrieval-augmented video understanding, etc. On AI4Medicine, I would like to present some of our recent efforts on developing foundation models towards generalist models, from the perspectives of dataset construction, model design, and benchmark evaluations.

About the speaker

Weidi XIE is an associate professor at Shanghai Jiao Tong University, a young research scientist at Shanghai AI Laboratory, a visiting researcher at Visual Geometry Group at Oxford. He was a recipient of the Google-DeepMind Full Scholarship, Excellence Award from Oxford University, Science Fund Program for Excellent Young Scientists (overseas), Shanghai High-level Talent Program (overseas). His research interest is on computer vision, and AI4science, where he has published over 60 papers at various venues, including CVPR, ICCV, NeurIPS, IJCV, MedIA, Nature research journals, etc, and has been cited by over 10K times.

Homepage: https://weidixie.github.io/