Department of Statistics and Actuarial Science, HKU

Date	Tuesday, 30 April 2024
Time	10:30 a.m. – 11:30 a.m.
Venue	RR101, Run Run Shaw Building

Title	Multimodal representation learning in videos and medicine
Abstract	Understanding multimodal signals is of great interest for the artificial intelligence community. In this talk, I will intend to cover some of our recent research from two different domains, namely, video understanding, and AI4Medicine. On video understanding, comparing to the analysis on static images, the extra time axis introduces both challenges and opportunities. I will discuss some recent works on long video understanding, for example, grounded visual question answering on egocentric videos, retrieval-augmented video understanding, etc. On AI4Medicine, I would like to present some of our recent efforts on developing foundation models towards generalist models, from the perspectives of dataset construction, model design, and benchmark evaluations.
About the speaker	Weidi XIE is an associate professor at Shanghai Jiao Tong University, a young research scientist at Shanghai AI Laboratory, a visiting researcher at Visual Geometry Group at Oxford. He was a recipient of the Google-DeepMind Full Scholarship, Excellence Award from Oxford University, Science Fund Program for Excellent Young Scientists (overseas), Shanghai High-level Talent Program (overseas). His research interest is on computer vision, and AI4science, where he has published over 60 papers at various venues, including CVPR, ICCV, NeurIPS, IJCV, MedIA, Nature research journals, etc, and has been cited by over 10K times. Homepage: https://weidixie.github.io/
	Poster preview Close poster

Seminar by Prof. Weidi XIE from Shanghai Jiao Tong University and Shanghai Artificial Intelligence Laboratory