Tutorial on Diffusion Models for Imaging and Vision
Tutorial on diffusion models
Skipping the first two sections for now
Section 3: score-matching langevin diffusion

- langevin equation is just gradient ascent on the log likelihood of the distribution + noise
How do you estimate the score function? Turns out trying to optimize it directly is quite hard. But through some logarithm derivative magic, you can show that the objective for minimizing L2 loss directly Is up to a constant the same as the objective for predicting the conditional score or something. 

In the denoising score matching objective, if you choose a Gaussian noise, does give you something tractable. 
So then you just run Langevin Diffusion with various noise levels.
Section 4 SDEs
Any iterative algorithm can be converted to an ODE by just letting x_i = x(t + ∆t) and x_{i-1} = x(t), and then you also have to make your learning rate a continuously evolving function of time