Agenda

13 Dic 2023 10:30

Diffusion Models for Image Editing and Novel View Synthesis

Aula B, edificio ZETA - Campus Scientifico via Torino

Speaker: Loris Bazzani, Amazon Research

Abstract:
In this talk, we present 2 lines of work that show how to adapt diffusion models to perform image editing and novel view synthesis. In the first part, we present a novel method for text-guided image editing, namely iEdit, that generates images conditioned on a source image and a textual edit prompt. As a fully-annotated dataset with target images does not exist, we propose to automatically construct a dataset derived from LAION-5B, containing pseudo-target images with their descriptive edit prompts given input image-caption pairs. This enables us to introduce a weakly-supervised loss to generate the pseudo-target image from the latent noise of the source image conditioned on the edit prompt. In the second part, we show how to generate novel views of an object by presenting a training-free algorithm that can be integrated into existing pre-trained diffusion models, named ViewFusion. Our approach adopts an auto-regressive method that implicitly leverages previously generated views as context for next view generation, ensuring robust multi-view consistency during the novel-view generation process. Through a diffusion process that fuses known-view information via interpolated denoising, our framework successfully extends single-view conditioned models to work in multiple-view conditional settings without any additional fine-tuning.                                 

Bio Sketch:
Loris is a Principal Scientist at Amazon in Berlin. He has experience in prototyping and developing video understanding models for Amazon Video, multimodal product understanding models for recommendations and creating novel interactive shopping experiences. He obtained my Ph.D. in Computer Science from the University of Verona (Italy) in 2012 and visited the University of British Columbia working on tracking, re-identification, and attentional models. Before the current position, Loris was a postdoctoral fellow at Dartmouth College and a postdoctoral fellow at the Italian Institute of Technology, working on object recognition, localization, and temporal saliency prediction for videos.

Lingua

L'evento si terrà in inglese

Organizzatore

Dipartimento di Scienze Ambientali, Informatica e Statistica - Sebastiano Vascon

Cerca in agenda