{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiewxf52fqnudfpgl7unzxu5vzw4obpz22mao6yify2cnawiv6wuyy",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mnbhlen3brz2"
  },
  "path": "/t/ddpm-elbo-vs-stochastic-de/176444#post_2",
  "publishedAt": "2026-06-02T01:37:27.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "Denoising Diffusion Probabilistic Models",
    "DDPM project page",
    "Understanding Diffusion Models: A Unified Perspective",
    "Score-Based Generative Modeling through Stochastic Differential Equations",
    "Generative Modeling by Estimating Gradients of the Data Distribution",
    "score_sde",
    "A Variational Perspective on Diffusion-Based Generative Models and Score Matching",
    "Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation",
    "OpenReview: Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation",
    "The Annotated Diffusion Model",
    "Diffusers DDPM docs",
    "Diffusion Meets Flow Matching: Two Sides of the Same Coin",
    "An Introduction to Flow Matching and Diffusion Models",
    "Flow Matching and Diffusion Models",
    "Flow Matching Guide and Code",
    "DDPM",
    "Yang Song’s score-based generative modeling blog",
    "Score-Based Generative Modeling through SDEs",
    "Diffusion Meets Flow Matching"
  ],
  "textContent": "Hmm… This relationship is apparently quite easy to get tangled up in:\n\n* * *\n\nA useful way to read the “DDPM ELBO vs stochastic differential equation” issue is that these are not really two competing theories of diffusion models. They are more like different coordinate systems for describing almost the same family of generative models.\n\nMy current understanding is:\n\n  * **DDPM / ELBO view** : natural if you start from a discrete-time latent-variable model.\n  * **Denoising score matching view** : natural if you want to explain why the practical noise-prediction loss works.\n  * **Continuous-time SDE view** : natural if you want a unified forward/reverse process and sampler story.\n  * **Flow matching view** : natural if you want an even more direct vector-field regression story.\n\n\n\nSo I would be careful with the phrasing “ELBO is not favored”. A more precise statement is probably:\n\n> The ELBO is the natural objective when DDPM is presented as a discrete-time latent-variable model, but the score/SDE/flow views often give a cleaner explanation of the practical training objective and the sampling dynamics. That does not mean the ELBO view is wrong, obsolete, or unrelated.\n\n## 1. The short version\n\nQuestion | My answer\n---|---\nIs the DDPM ELBO wrong? | No. It is the natural variational objective for a discrete-time latent-variable Markov chain.\nIs the simplified DDPM loss “just” an ELBO? | Not exactly. It can be derived from a simplified / reweighted variational bound, but it is also very naturally interpreted as denoising score matching or noise prediction.\nIs the SDE view cleaner? | Often yes, especially conceptually. It directly says: learn the score of noisy marginals, then run a reverse-time SDE or probability-flow ODE.\nDoes the SDE view eliminate ELBOs? | No. Continuous-time diffusion also has variational / likelihood lower-bound interpretations.\nWhat does “equality” usually mean in score matching or flow matching notes? | Usually equality up to a parameter-independent constant between an intractable marginal objective and a tractable conditional objective. It does **not** mean the whole generative modeling problem becomes exact.\nAre DDPM, score-based diffusion, SDE diffusion, and flow matching different things? | They can be different parameterizations / discretizations / objectives, but many common cases are deeply equivalent or transformable.\n\n## 2. DDPM was not “ELBO only” even in the original paper\n\nThe original Denoising Diffusion Probabilistic Models paper presents DDPMs as latent-variable models trained through a variational bound, but the paper already emphasizes a connection to denoising score matching. The project page also summarizes the method as using “a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics”:\nDDPM project page.\n\nSo I would not frame DDPM as:\n\n> DDPM = ELBO, while score/SDE = something completely different.\n\nA better framing is:\n\n> DDPM starts from a discrete-time variational latent-variable model, but its most useful simplified objective is already closely related to denoising score matching.\n\nThis is also why DDPM implementations often look like simple noise prediction rather than like a textbook VAE objective.\n\n## 3. Why the ELBO derivation feels indirect\n\nIn the discrete DDPM story, the forward process adds noise step by step:\n\n  * start with data;\n  * define a fixed noising Markov chain;\n  * learn a reverse Markov chain;\n  * introduce latent variables for all intermediate noisy states;\n  * optimize a variational lower bound on the data likelihood;\n  * simplify / reweight the resulting terms;\n  * end up with a denoising-style objective, usually noise prediction.\n\n\n\nThis is legitimate, but it can feel roundabout because the implemented objective often looks much simpler:\n\n> sample a timestep, corrupt the data with Gaussian noise, and train a network to predict the noise / clean data / score / velocity.\n\nThat is why tutorials like Calvin Luo’s Understanding Diffusion Models: A Unified Perspective are useful: they explicitly connect the variational perspective and the score-based perspective. The tutorial derives variational diffusion models as a special case of a Markovian hierarchical VAE, then shows that optimization can be viewed as predicting one of several equivalent targets, such as the clean input, the injected noise, or the score.\n\nThe blog version is also readable:\nUnderstanding Diffusion Models: A Unified Perspective.\n\n## 4. Why the SDE view feels cleaner\n\nThe SDE view, especially from Score-Based Generative Modeling through Stochastic Differential Equations, says something like this:\n\n  1. Define a continuous-time forward process that gradually turns data into noise.\n  2. The reverse-time process exists and depends on the time-dependent score of the perturbed data distribution.\n  3. Learn that score with a neural network.\n  4. Generate samples by solving the reverse-time SDE, or a related probability-flow ODE.\n\n\n\nThis is conceptually clean because it separates several ideas that are somewhat entangled in the discrete DDPM presentation:\n\nComponent | DDPM / discrete view | SDE view\n---|---|---\nForward process | finite noising chain | continuous-time noising SDE\nReverse process | learned denoising Markov chain | reverse-time SDE using the score\nTraining target | variational bound, often simplified to noise prediction | score / denoising score matching\nSampling | fixed or chosen denoising schedule | numerical SDE / ODE solver\nDiscretization | built into the model description | often treated as a solver choice\n\nThis is why the SDE formulation often feels more “modern” or more “principled”. It gives a unified language for DDPM-like variance-preserving processes, score-based / variance-exploding processes, reverse SDEs, probability-flow ODEs, predictor-corrector samplers, etc.\n\nYang Song’s blog post Generative Modeling by Estimating Gradients of the Data Distribution is a very good intuitive bridge here. The official code repository is also useful for orientation:\nscore_sde.\n\n## 5. But “SDE is exact, ELBO is only a lower bound” is too compressed\n\nThere is a subtle but important distinction here.\n\nIn score matching / denoising score matching, one often proves that an intractable score-matching objective and a tractable denoising score-matching objective differ only by a parameter-independent constant. In flow matching, a similar pattern appears: an intractable marginal flow matching objective can be replaced by a tractable conditional flow matching objective, again with the same optimizer / gradient up to terms independent of the model.\n\nThis is probably the kind of “equality” that many modern lecture notes are referring to.\n\nBut that does **not** mean:\n\n> continuous-time SDE diffusion gives exact maximum likelihood with no variational or approximation issue.\n\nThe approximations just move to different places:\n\n  * the score network is approximate;\n  * the SDE / ODE solver is numerical;\n  * likelihood computation, if needed, has its own assumptions and costs;\n  * the training objective may still be a surrogate for the downstream metric one cares about;\n  * weighting across noise levels matters a lot.\n\n\n\nSo the safe version is:\n\n> The score/SDE/flow formulations often make the training target cleaner, but they do not magically remove all approximation from generative modeling.\n\n## 6. Continuous-time SDEs also have a variational interpretation\n\nA key paper here is A Variational Perspective on Diffusion-Based Generative Models and Score Matching. It develops a variational framework for continuous-time generative diffusion and connects score matching to likelihood lower bounds for the plug-in reverse SDE.\n\nThis is important because it prevents the mistaken dichotomy:\n\n> discrete DDPM = ELBO\n>  continuous SDE = no ELBO\n\nThe relationship is more like:\n\n> discrete DDPM has a natural ELBO derivation; continuous-time diffusion also admits variational / likelihood lower-bound interpretations; score matching and ELBO are connected rather than opposed.\n\nAnother relevant paper is Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation, which argues that commonly used diffusion objectives are closely related to ELBOs over different noise levels. The OpenReview page is here:\nOpenReview: Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation.\n\nThis makes the statement “ELBO is not favored” look too strong. A better statement is:\n\n> In practice, people often do not optimize the original discrete DDPM ELBO literally. They optimize reweighted denoising objectives that are easier to train and often better for perceptual quality. But these objectives remain closely connected to ELBO-like interpretations.\n\n## 7. The practical DDPM loss and score matching are very close\n\nIn many Gaussian diffusion setups, predicting the injected noise is equivalent to predicting the score up to a known scaling. This is why people can talk about:\n\n  * `epsilon` prediction;\n  * `x0` prediction;\n  * `v` prediction;\n  * score prediction;\n\n\n\nas different parameterizations of closely related training targets.\n\nThe exact equivalence depends on the noise schedule, parameterization, and loss weighting. So I would avoid saying “they are exactly the same loss” without qualifications. But conceptually:\n\n> DDPM noise prediction is one parameterization of denoising score estimation.\n\nThat is also why the Hugging Face implementation-oriented materials often present DDPM mostly as noise prediction and scheduling rather than as an explicit ELBO optimizer. See:\nThe Annotated Diffusion Model and Diffusers DDPM docs.\n\n## 8. Where flow matching fits\n\nFlow matching adds one more viewpoint.\n\nThe blog/paper Diffusion Meets Flow Matching: Two Sides of the Same Coin argues that diffusion models and Gaussian flow matching are deeply connected; different model specifications can lead to different network outputs, schedules, and loss weightings, while describing essentially the same generative model in many common cases.\n\nThe MIT notes An Introduction to Flow Matching and Diffusion Models are also useful because they put ODEs, SDEs, flow matching, score matching, and modern diffusion models in one framework. The course page is here:\nFlow Matching and Diffusion Models.\n\nFor a more implementation-oriented flow matching reference, see Meta’s Flow Matching Guide and Code and the associated arXiv paper:\nFlow Matching Guide and Code.\n\nThis reinforces the same point:\n\n> These frameworks often differ less in the underlying generative family and more in the chosen probability path, parameterization, objective weighting, and numerical sampler.\n\n## 9. A useful mental model\n\nI would summarize the relationship like this:\n\nView | Natural starting point | Main object learned | What it explains well | What can be confusing\n---|---|---|---|---\nDDPM / ELBO | discrete latent-variable model | reverse denoising kernels, often parameterized by noise prediction | why diffusion can be trained as a variational model | the practical simple loss can look disconnected from the ELBO\nDenoising score matching | noisy data distributions | score or equivalent noise target | why denoising regression works | likelihood interpretation needs extra work\nContinuous-time SDE | forward and reverse stochastic processes | time-dependent score | unification of DDPM, score models, reverse SDE, probability-flow ODE | SDE notation can hide discretization and approximation issues\nFlow matching | probability paths and vector fields | velocity / vector field | direct regression objective and ODE sampling | relationship to diffusion depends on path, parameterization, and weighting\n\n## 10. My answer to the original confusion\n\nIf the question is:\n\n> Is the DDPM ELBO less favored because the SDE formulation gives an equality instead of a lower bound?\n\nI would answer:\n\nNot exactly.\n\nThe DDPM ELBO is the natural derivation when DDPM is treated as a discrete-time latent-variable model. It is not wrong, and it is not merely historical baggage. However, the simplified objective used in practice is more directly understood as denoising score matching / noise prediction. The continuous-time SDE formulation makes this score-based interpretation cleaner and unifies the sampling dynamics.\n\nThe “equality” in denoising score matching or flow matching usually refers to equality up to a parameter-independent constant between a hard marginal objective and an easier conditional objective. It should not be interpreted as “the whole likelihood problem is now exact and ELBOs are obsolete.”\n\nIn fact, continuous-time diffusion also has variational interpretations, and modern diffusion objectives can often be understood as ELBO-related objectives with particular noise-level weightings. So the best mental model is not:\n\n> ELBO versus SDE\n\nbut rather:\n\n> ELBO, score matching, SDEs, and flow matching are different but tightly connected views of the same underlying denoising / transport problem.\n\n## 11. Suggested reading order\n\nIf someone wants to untangle this efficiently, I would read in this order:\n\nOrder | Resource | Why read it\n---|---|---\n1 | Understanding Diffusion Models: A Unified Perspective | Best bridge between variational and score-based views.\n2 | DDPM | Original DDPM paper; shows the ELBO and denoising score matching connection.\n3 | Yang Song’s score-based generative modeling blog | Intuitive score matching and SDE explanation.\n4 | Score-Based Generative Modeling through SDEs | Main SDE unification paper.\n5 | A Variational Perspective on Diffusion-Based Generative Models and Score Matching | Shows continuous-time diffusion still has a variational / likelihood-bound interpretation.\n6 | Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation | Explains why practical diffusion objectives remain closely related to ELBOs.\n7 | Diffusion Meets Flow Matching | Explains the diffusion / flow matching connection in a readable way.\n8 | An Introduction to Flow Matching and Diffusion Models | Comprehensive modern notes covering ODEs, SDEs, flow matching, score matching, and guidance.\n\n## 12. One-sentence version\n\nDDPM’s ELBO, denoising score matching, continuous-time SDE diffusion, and flow matching are not mutually exclusive explanations; the ELBO view gives a latent-variable / likelihood-bound story, while the score/SDE/flow views often give a cleaner training-and-sampling story, and modern theory connects these views rather than replacing one with another.",
  "title": "DDPM ( ELBO vs Stochastic DE )"
}