External Publication

DDPM ( ELBO vs Stochastic DE )

Hugging Face Forums [Unofficial] June 1, 2026

Hello, I decided to ask here as I didn’t know which forum can help. I just took extensive notes after watching this course on Diffucion Models]( Flow Matching and Diffusion Models ). Prior to that i viewed other material that dealt with VAEs and presented the math using Evidence Lower Bound. There is also the Tutorial on Diffusion Models for Imaging and Vision(arXiv:2403.18103). The following is from the MIT course notes and I need some mathematical support or code examples to understand why ELBO is not favoured by the instructors. Am I understanding this wrongly ? > Discrete time vs. continuous time. The first denoising diffusion model papers [41, 42, 17] did not use SDEs but constructed Markov chains in discrete time, i.e. with time steps t = 0, 1, 2, 3, . . . . To this date, you will find a lot of works in the literature working with this discrete-time formulation. While this construction is appealing due to its simplicity, the disadvantage of the time-discrete approach is that it forces you to choose a time discretization before training. Further, the loss function needs to be approximated via an evidence lower bound (ELBO) - which is, as the name suggests, only a lower bound to the loss we actually want to minimize. Later, Song et al. [45] showed that these constructions were essentially an approximation of a time-continuous SDEs. Further, the ELBO loss becomes tight (i.e. it is not a lower bound anymore) in the continuous time case (e.g. note that Theorem 12 and Theorem 22 are equalities and not lower bounds - this would be different in the discrete time case). This made the SDE construction popular because it was considered mathematically “cleaner” and that one could control the simulation error via ODE/SDE samplers post training. It is important to note however that both models employ the same loss and are not fundamentally different. Thanks, Mohan

Discussion in the ATmosphere