跳到主要內容

簡易檢索 / 詳目顯示

研究生: 許展榮
CHAN-JUNG HSU
論文名稱: 基於擴散模型的生成式 AI
A Mathematical Study of Generative AI Models
指導教授: 胡偉帆
Wei-Fan Hu
口試委員:
學位類別: 碩士
Master
系所名稱: 理學院 - 數學系
Department of Mathematics
論文出版年: 2025
畢業學年度: 113
語文別: 英文
論文頁數: 33
中文關鍵詞: 生成式模型擴散模型一致性模型
外文關鍵詞: generative AI model, Diffusion model, Consistency model
相關次數: 點閱:19下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著深度學習在圖像生成領域的快速發展,擴散模型(Diffusion Models)已成為一種兼具生成品質與靈活性的前沿方法。本論文系統性地回顧了三大類擴散生成模型:一是以離散馬可夫鏈為基礎的 Denoising Diffusion Probabilistic Models(DDPM);二是透過隨機微分方程(SDE)與分數匹配(score matching)建立的 Score–Based Diffusion Models;三是以「一致性訓練」(Consistency Training)為核心,旨在以單步或少量步驟完成高品質生成的 Consistency Models。在實驗部分,本研究採用 MNIST、CIFAR-10 與 Cat Faces 三個資料集,並以Fréchet Inception Distance(FID)評估生成影像之真實度,同時計算反向取樣過程的函數評估次數(NFE)以衡量效率。結果顯示:DDPM 在相同步數下達到最低 FID.Consistency Models 雖能將取樣速度提升數倍,但在採用 LPIPS 感知損失時,其生成品質方能逼近傳統擴散模型。


    Diffusion models have recently emerged as a powerful class of generative techniques, offering state-of-the-art sample quality in image generation tasks. This thesis provides a
    comprehensive review of three representative diffusion-based frameworks: (1) Denoising Diffusion Probabilistic Models (DDPM), which employ discrete Markov chains to model forward noising and reverse denoising; (2) Score-Based Diffusion Models, which generalize this approach via continuous-time stochastic differential equations (SDEs) and scorematching; and (3) Consistency Models, which aim to collapse iterative sampling into a single or few learned steps through consistency training. We conduct experiments on MNIST, CIFAR-10, and a Cat Faces dataset, evaluat-ing generation fidelity using the Fréchet Inception Distance (FID) and measuring by the Number of Function Evaluations (NFE). Our findings reveal that DDPM consistently achieves the lowest FID under equal step counts. Consistency Models dramatically accelerate sampling—reducing NFE by orders of magnitude—but only attain comparable fidelity when trained with a perceptual LPIPS loss.

    Contents page 摘要 ii Abstract iii Contents iv 1 Introduction 1 2 Diffusion Model 2 2.1 Diffusion Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2 Denising Diffusion Probabilistic Models (DDPM). . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.3 Score-Based Diffusion Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.4 Denoising Score Matching with Langevin Dynamics (SMLD) . . . . . . . . . . . . . . . 5 2.5 Consistency models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Loss Function of Diffusion Models 7 4 Experiments 11 4.1 MNIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.2 CIFAR-10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.3 Cat Faces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5 Conclusion 23 6 Bibliography 24

    Bibliography
    [1] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Advances
    in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.
    Balcan, and H. Lin, Eds., vol. 33, Curran Associates, Inc., 2020, pp. 6840–6851.
    [2] Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based
    generative modeling through stochastic differential equations,” in International Conference
    on Learning Representations, 2021.
    [3] Y. Song, P. Dhariwal, M. Chen, and I. Sutskever, “Consistency models,” in International
    Conference on Machine Learning, PMLR, 2023, pp. 32 211–32 252.
    [4] Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distri-
    bution,” Advances in neural information processing systems, vol. 32, 2019.
    [5] A. Hyvärinen, “Estimation of non-normalized statistical models by score matching,” J.
    Mach. Learn. Res., vol. 6, pp. 695–709, Dec. 2005, issn: 1532-4435.
    [6] P. Vincent, “A connection between score matching and denoising autoencoders,” Neural
    Computation, vol. 23, no. 7, pp. 1661–1674, 2011. doi: 10.1162/NECO_a_00142.
    [7] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effective-
    ness of deep features as a perceptual metric,” in 2018 IEEE/CVF Conference on Computer
    Vision and Pattern Recognition, 2018, pp. 586–595. doi: 10.1109/CVPR.2018.00068.
    [8] F. Ferlito, Cat-faces-dataset, 2019. [Online]. Available: https://github.com/fferlito/
    Cat-faces-dataset.
    [9] T. Salimans et al., “Improved techniques for training gans,” in Advances in Neural Infor-
    mation Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett,
    Eds., vol. 29, Curran Associates, Inc., 2016. [Online]. Available: https://proceedings.
    neurips.cc/paper_files/paper/2016/file/8a3363abe792db2d8761d6403605aeb7-
    Paper.pdf.
    [10] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by
    a two time-scale update rule converge to a local nash equilibrium,” Advances in neural
    information processing systems, vol. 30, 2017.
    24
    [11] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical
    image segmentation,” in Medical image computing and computer-assisted intervention–
    MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, pro-
    ceedings, part III 18, Springer, 2015, pp. 234–241.

    QR CODE
    :::