I will give a short introduction to generative diffusion models and present their performance compared to other generative models (including GANs). Moreover, I will present two additional works:
(i) Denoising Diffusion Gamma Models: Generative diffusion processes are an emerging and effective tool for image and speech generation. In the existing methods, the underlying noise distribution of the diffusion process is Gaussian noise. However, fitting distributions with more degrees of freedom could improve the performance of such generative models. In this work, we investigate other types of noise distribution for the diffusion process. Specifically, we introduce the Denoising Diffusion Gamma Model (DDGM) and show that noise from Gamma distribution provides improved results for image and speech generation. Our approach preserves the ability to efficiently sample state in the training diffusion process while using Gamma noise.
(ii) Noise Estimation for Generative Diffusion Models: Generative diffusion models have emerged as leading models in speech and image generation. However, to perform well with a small number of denoising steps, a costly tuning of the set of noise parameters is needed. In this work, we present a simple and versatile learning scheme that can step-by-step adjust those noise parameters, for any given number of steps, while the previous work needs to retune for each number separately. Furthermore, without modifying the weights of the diffusion model, we can significantly improve the synthesis results, for a small number of steps. Our approach comes at a negligible computation cost.
Eliya Nachmani is a researcher at Facebook AI Research (FAIR) and a Ph.D. student at Tel-Aviv University. His research focuses on machine learning for speech processing and Information Theory.