Agenda

26 Jan 2026 11:00

Few-shot and test-time adaptation of foundational vision-language models

Sala Riunioni B - Edificio ZETA B | Campus Scientifico

Speaker:
Ismail Ben Ayed, École de Technologie Supérieure, Montreal

Abstract:
Vision-language models (VLMs) are currently transfiguring computer vision, emerging as a promising solution toward true generalization. These foundation models learn from large-scale amounts of unlabeled images, each associated with some noisy text caption, without specialized labels built with intensive human efforts. Such models yield robust features, providing powerful alternatives to standard supervised-learning algorithms trained on orders of magnitude lower amounts of data. Along with these very recent developments, there is currently a substantial interest in adapting foundation models to downstream tasks, given only a handful of labeled samples in the target conditions (aka few-shot adaptation) or unlabeled test samples (aka test-time adaptation), and under limited computation/memory resources (aka parameter-efficient fine-tuning). Yet, current adaptation methods rely on many ad hoc choices, which are still not well understood and give rise to interesting questions: How to adapt the models efficiently and effectively? What model parameters to update? What loss function to optimize and do the hyper-parameters depend on the target task? This presentation discusses recent trends and developments in the wide-interest subject of fine-tuning large-scale VLMs, under the practical constraints of limited supervision and computation/memory resources. Specifically, I will give an overview of state-of-the-art few-shot methods, which adapt VLMs using only a few labeled samples in the target conditions. I will also highlight very recent results, which point to important limitations of current experimental evaluations in this setting, and question the progress made by an abundant recent literature, mostly based on convoluted prompt-learning strategies. Furthermore, I will discuss a recent, highly competitive few-shot adaptation method developed by our team (Huang et al., LP++: A Surprisingly Strong Linear Probe for Few-Shot CLIP, CVPR 2024). Specifically, we propose a generalization of the linear probe adapter, in which the classifier weights are expressed as functions of the text embeddings, with class-wise multipliers integrating image and text features.

Bio sketch:
Ismail Ben Ayed a Full Professor at the ETS Montreal, where he holds a research Chair on Artificial Intelligence on Medical Imaging. He is also a Principal Investigator affiliated with The University of Montreal Hospital Research Centre. His interests are in machine learning, computer vision, optimization, and medical image computing. Ismail authored over 180 fully peer-reviewed papers, mostly published in the top venues of those areas, along with 2 books and 7 US patents. In the last 5 years, he gave over 30 invited talks, including several tutorials on foundation models at flagship conferences (such as MICCAI’25 and ICASSP’23). His research has been covered in several visible media outlets, such as Radio Canada (CBC) and Quebec Science Magazine. His team received several international distinctions, such the best paper award at the Medical Imaging with Deep Learning (MIDL) conference in 2021, as well as several top-ranking positions in internationally visible contests. Ismail has served regularly as Program Committee for the MICCAI conference. He also served as Program Chair for the IPMI’25 and MIDL’20, and currently serves as board member of the MIDL foundation.

Language

The event will be held in English

Organized by

M. Pelillo

Search in the agenda