| 研究生: |
許展榮 CHAN-JUNG HSU |
|---|---|
| 論文名稱: |
基於擴散模型的生成式 AI A Mathematical Study of Generative AI Models |
| 指導教授: |
胡偉帆
Wei-Fan Hu |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
理學院 - 數學系 Department of Mathematics |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 英文 |
| 論文頁數: | 33 |
| 中文關鍵詞: | 生成式模型 、擴散模型 、一致性模型 |
| 外文關鍵詞: | generative AI model, Diffusion model, Consistency model |
| 相關次數: | 點閱:19 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著深度學習在圖像生成領域的快速發展,擴散模型(Diffusion Models)已成為一種兼具生成品質與靈活性的前沿方法。本論文系統性地回顧了三大類擴散生成模型:一是以離散馬可夫鏈為基礎的 Denoising Diffusion Probabilistic Models(DDPM);二是透過隨機微分方程(SDE)與分數匹配(score matching)建立的 Score–Based Diffusion Models;三是以「一致性訓練」(Consistency Training)為核心,旨在以單步或少量步驟完成高品質生成的 Consistency Models。在實驗部分,本研究採用 MNIST、CIFAR-10 與 Cat Faces 三個資料集,並以Fréchet Inception Distance(FID)評估生成影像之真實度,同時計算反向取樣過程的函數評估次數(NFE)以衡量效率。結果顯示:DDPM 在相同步數下達到最低 FID.Consistency Models 雖能將取樣速度提升數倍,但在採用 LPIPS 感知損失時,其生成品質方能逼近傳統擴散模型。
Diffusion models have recently emerged as a powerful class of generative techniques, offering state-of-the-art sample quality in image generation tasks. This thesis provides a
comprehensive review of three representative diffusion-based frameworks: (1) Denoising Diffusion Probabilistic Models (DDPM), which employ discrete Markov chains to model forward noising and reverse denoising; (2) Score-Based Diffusion Models, which generalize this approach via continuous-time stochastic differential equations (SDEs) and scorematching; and (3) Consistency Models, which aim to collapse iterative sampling into a single or few learned steps through consistency training. We conduct experiments on MNIST, CIFAR-10, and a Cat Faces dataset, evaluat-ing generation fidelity using the Fréchet Inception Distance (FID) and measuring by the Number of Function Evaluations (NFE). Our findings reveal that DDPM consistently achieves the lowest FID under equal step counts. Consistency Models dramatically accelerate sampling—reducing NFE by orders of magnitude—but only attain comparable fidelity when trained with a perceptual LPIPS loss.
Bibliography
[1] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Advances
in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.
Balcan, and H. Lin, Eds., vol. 33, Curran Associates, Inc., 2020, pp. 6840–6851.
[2] Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based
generative modeling through stochastic differential equations,” in International Conference
on Learning Representations, 2021.
[3] Y. Song, P. Dhariwal, M. Chen, and I. Sutskever, “Consistency models,” in International
Conference on Machine Learning, PMLR, 2023, pp. 32 211–32 252.
[4] Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distri-
bution,” Advances in neural information processing systems, vol. 32, 2019.
[5] A. Hyvärinen, “Estimation of non-normalized statistical models by score matching,” J.
Mach. Learn. Res., vol. 6, pp. 695–709, Dec. 2005, issn: 1532-4435.
[6] P. Vincent, “A connection between score matching and denoising autoencoders,” Neural
Computation, vol. 23, no. 7, pp. 1661–1674, 2011. doi: 10.1162/NECO_a_00142.
[7] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effective-
ness of deep features as a perceptual metric,” in 2018 IEEE/CVF Conference on Computer
Vision and Pattern Recognition, 2018, pp. 586–595. doi: 10.1109/CVPR.2018.00068.
[8] F. Ferlito, Cat-faces-dataset, 2019. [Online]. Available: https://github.com/fferlito/
Cat-faces-dataset.
[9] T. Salimans et al., “Improved techniques for training gans,” in Advances in Neural Infor-
mation Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett,
Eds., vol. 29, Curran Associates, Inc., 2016. [Online]. Available: https://proceedings.
neurips.cc/paper_files/paper/2016/file/8a3363abe792db2d8761d6403605aeb7-
Paper.pdf.
[10] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by
a two time-scale update rule converge to a local nash equilibrium,” Advances in neural
information processing systems, vol. 30, 2017.
24
[11] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical
image segmentation,” in Medical image computing and computer-assisted intervention–
MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, pro-
ceedings, part III 18, Springer, 2015, pp. 234–241.