跳到主要內容

簡易檢索 / 詳目顯示

研究生: 陳俊宇
Chun-Yu Chen
論文名稱: 以卷積神經網路為基礎之改良型可解釋性深度學習模型
An Improved CNN-Based Interpretable Deep Learning Model
指導教授: 蘇木春
Mu-Chun Su
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 120
中文關鍵詞: 可解釋人工智慧深度學習色彩感知彩色影像
外文關鍵詞: Explainable Artificial Intelligence, Deep Learning, Color Perception, Color Images
相關次數: 點閱:154下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著深度學習模型在醫學影像與電腦視覺等高風險應用領域取得優異表現,其「黑箱性」亦日益受到關注。為提升模型的透明度與可解釋性,
    本研究以 2024 年提出之 RGB CNN-based Interpretable Model(RGBCIM)為基礎,針對其關鍵模組進行多項改良。

    首先,在彩色卷積模組方面,我們棄用傳統 PCCS 色環,改採 CIELAB 色彩空間中,經 K-means 聚類後均勻分布之 30 種濾波器,
    以更貼近人眼感知的方式強化濾波器對色彩特徵之表徵能力。
    其次,在高斯卷積模組中,我們引入餘弦相似度作為卷積運算基礎,於保有相似度意涵並提升準確度的同時,大幅降低對超參數微調之敏感度。
    最後,在可解釋性流程整合方面,我們延續原 RGBCIM 之視覺化流程,提出濾波器監測指標,並新增 Grad-CAM 篩選機制,以產出更清晰且聚焦之解釋圖。

    本研究於 Colored MNIST、Colored Fashion MNIST、Colored Shape、PathMNIST、BloodMNIST、CIFAR-10、RetinalMNIST 等七個資料集上進行實驗驗證,
    新模型在所有資料集上的分類準確率皆優於原始 RGBCIM,平均提升幅度顯著。
    同時,面對複雜背景時亦能產出更具辨識力之可解釋性圖像,充分證實本研究改良方案在模型精度與可解釋性之間達成良好平衡。


    With the outstanding performance of deep learning models in high-risk application domains such as medical imaging and computer vision, increasing attention has been drawn to their "black-box" nature. To enhance model transparency and interpretability, this study builds upon the RGB CNN-based Interpretable Model (RGBCIM) proposed in 2024 and introduces several improvements to its key modules.

    First, in the color convolution module, we replace the traditional PCCS color circle with 30 uniformly distributed filters obtained through K-means clustering in the CIELAB color space. This design better aligns with human color perception and enhances the filters’ ability to represent color features.
    Second, in the Gaussian convolution module, we adopt cosine similarity as the basis for convolution operations. This not only preserves the semantic meaning of similarity and improves accuracy but also significantly reduces the sensitivity to hyperparameter tuning.
    Lastly, in the integration of the interpretability pipeline, we extend the original RGBCIM’s visualization process by proposing a filter monitoring metric and introducing a Grad-CAM-based filtering mechanism to produce clearer and more focused explanation maps.

    Experiments conducted on seven datasets — Colored MNIST, Colored Fashion MNIST, Colored Shape, PathMNIST, BloodMNIST, CIFAR-10, and RetinalMNIST — demonstrate that the improved model achieves higher classification accuracy across all datasets compared to the original RGBCIM, with significant average gains. Additionally, it generates more distinguishable interpretability maps when faced with complex backgrounds, confirming that the proposed improvements successfully strike a balance between model accuracy and interpretability.

    摘要iv Abstract vi 誌謝viii 目錄ix 一、緒論1 1.1 研究動機.................................................................. 1 1.2 研究目的.................................................................. 2 1.3 論文架構.................................................................. 3 二、文獻回顧4 2.1 可解釋人工智慧之分類................................................ 4 2.2 特徵貢獻解釋............................................................ 7 2.2.1 擾動式方法...................................................... 7 2.2.2 顯著性方法...................................................... 8 2.3 範例基礎解釋............................................................ 10 2.4 CNN-based Interpretable Model........................................ 11 2.4.1 模型架構......................................................... 11 2.4.2 可解釋性設計................................................... 13 2.5 RGB CNN-based Interpretable Model................................. 14 三、研究方法16 3.1 模型名稱.................................................................. 16 3.1.1 模型架構......................................................... 16 3.1.2 模型符號說明................................................... 19 3.2 彩色卷積模組設計與改進............................................. 21 3.2.1 彩色卷積模組設計概述....................................... 21 3.2.2 色差計算與CIELAB 色彩空間.............................. 22 3.2.3 原始設計問題與本研究改進................................. 24 3.3 卷積模組設計與改進................................................... 26 3.3.1 FM 與RM 的定義.............................................. 26 3.3.2 原始模組設計................................................... 28 3.3.3 原始設計問題分析............................................. 28 3.3.4 本研究改進方法:改用餘弦相似度........................ 30 3.4 與RGBCIM 模型相同設計模組...................................... 30 3.4.1 灰階前處理設計................................................ 30 3.4.2 響應篩選模組設計............................................. 31 3.4.3 空間合併模組設計............................................. 32 3.5 可解釋性.................................................................. 34 3.5.1 CI 的意義......................................................... 34 3.5.2 可解釋之視覺化................................................ 36 3.5.3 RM-CI 的可解釋性............................................. 37 3.6 可解釋圖失敗原因與濾波器分布.................................... 39 3.6.1 尋找可解釋性圖-理想情況................................... 39 3.6.2 尋找可解釋性圖-失敗原因................................... 41 3.7 理想濾波器反應輸出分布............................................. 42 3.7.1 失敗案例-色彩與輪廓第1 層與第2 層.................... 44 3.7.2 成功案例-色彩第0 層......................................... 45 3.8 可解釋性圖改善方式................................................... 46 3.8.1 濾波器監測指標................................................ 46 3.8.2 利用Grad-CAM 篩選不重要的特徵........................ 48 四、實驗設計與結果51 4.1 資料集介紹............................................................... 51 4.1.1 紅藍綠三色資料集............................................. 51 4.1.2 小尺寸色彩資料集............................................. 54 4.1.3 大尺寸色彩資料集............................................. 58 4.2 實驗設計.................................................................. 63 4.2.1 紅藍綠三色資料集- 實驗設定與模型架構................ 63 4.2.2 小尺寸色彩資料集- 實驗設定與模型架構................. 64 4.2.3 大尺寸色彩資料集- 實驗設定與模型架構................. 66 4.3 實驗結果.................................................................. 68 4.3.1 實驗結果資料................................................... 68 4.3.2 可解釋性圖片................................................... 71 4.4 實驗分析.................................................................. 84 4.4.1 與RGBCIM 比較............................................... 84 4.4.2 不同濾波器大小設計比較.................................... 87 4.4.3 不同色彩空間比較............................................. 88 4.4.4 不同卷積計算方式比較....................................... 89 4.4.5 色彩濾波器數量比較.......................................... 90 4.4.6 不同層數比較................................................... 91 4.4.7 各類Grad-CAM 方法之比較................................. 92 4.4.8 本模型與Grad-CAM 在可解釋性上比較.................. 95 4.4.9 模型限制與失敗案例.......................................... 95 五、總結98 參考文獻100

    [1] F. Doshi-Velez and B. Kim, “Towards a rigorous science of interpretable machine learning,”
    arXiv preprint arXiv:1702.08608, 2017.
    [2] D. Gunning and D. Aha, “Xai-explainable artificial intelligence,” Defense Advanced Research
    Projects Agency (DARPA), nd Web, 2019.
    [3] Z. C. Lipton, “The mythos of model interpretability,” Queue, vol. 16, no. 3, pp. 31–57,
    2018.
    [4] 凃建名, 以卷積神經網路為基礎之新型可解釋性深度學習模型, 碩士論文, 國立中
    央大學資訊工程學系, 2024.
    [5] C. Patrício, J. C. Neves, and L. F. Teixeira, “Explainable deep learning methods in medical
    image classification: A survey,” ACM Comput. Surv., vol. 56, no. 4, Oct. 2023.
    [6] M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should i trust you?: Explaining the predictions
    of any classifier,” in Proceedings of the 22nd ACM SIGKDD International Conference
    on Knowledge Discovery and Data Mining, ser. KDD ’16, San Francisco, California,
    USA: Association for Computing Machinery, 2016, pp. 1135–1144.
    [7] S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,”
    in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S.
    Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30, Curran
    Associates, Inc., 2017.
    [8] V. Petsiuk, A. Das, and K. Saenko, “Rise: Randomized input sampling for explanation of
    black-box models,” in Proceedings of the British Machine Vision Conference (BMVC),
    2018.
    [9] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “ Learning Deep Features
    for Discriminative Localization,” in 2016 IEEE Conference on Computer Vision and Pattern
    Recognition (CVPR), Los Alamitos, CA, USA: IEEE Computer Society, Jun. 2016,
    pp. 2921–2929.
    [10] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam:
    Visual explanations from deep networks via gradient-based localization,” in 2017 IEEE
    International Conference on Computer Vision (ICCV), 2017, pp. 618–626.
    [11] A. Chattopadhyay, A. Sarkar, P. Howlader, and V. N. Balasubramanian, “Grad-cam+
    +: Improved visual explanations for deep convolutional networks,” arXiv preprint
    arXiv:1710.11063, 2018.
    [12] H. Wang, Z. Wang, M. Du, F. Yang, Z. Zhang, S. Ding, P. Mardziel, and X. Hu,
    “Score-cam: Score-weighted visual explanations for convolutional neural networks,”
    arXiv preprint arXiv:1910.01279, 2020.
    [13] R. L. Draelos and L. Carin, “Use hirescam instead of grad-cam for faithful explanations
    of convolutional neural networks,” arXiv preprint arXiv:2011.08891, 2020.
    [14] M. B. Muhammad and M. Yeasin, “Eigen-cam: Class activation map using principal
    components,” arXiv preprint arXiv:2008.00299, 2020.
    [15] P.-T. Jiang, C.-B. Zhang, Q. Hou, M.-M. Cheng, and Y. Wei, “Layercam: Exploring hierarchical
    class activation maps for localization,” IEEE Transactions on Image Processing,
    2021.
    [16] S. Desai and H. G. Ramaswamy, “Ablation-cam: Visual explanations for deep convolutional
    network via gradient-free localization,” in Proceedings of the IEEE/CVF Winter
    Conference on Applications of Computer Vision (WACV), 2020, pp. 972–980.
    [17] J. Gildenblat and contributors, Pytorch library for cam methods, https://github.com/
    jacobgil/pytorch-grad-cam, 2021.
    [18] C. Chen, O. Li, C. Tao, A. J. Barnett, J. Su, and C. Rudin, “This looks like that: Deep
    learning for interpretable image recognition,” in Proceedings of the International Conference
    of Neural Information Processing Systems (NeurIPS), 2019.
    [19] J. Donnelly, A. J. Barnett, and C. Chen, Deformable protopnet: An interpretable image
    classifier using deformable prototypes, arXiv preprint arXiv:2111.15000, 2021.
    [20] E. Kim, S. Kim, M. Seo, and S. Yoon, “Xprotonet: Diagnosis in chest radiography with
    global and local explanations,” in Proceedings of the IEEE/CVF Conference on Computer
    Vision and Pattern Recognition (CVPR), 2021, pp. 15 719–15 728.
    [21] 楊景豐, 一種以卷積神經網路為基礎的具可解釋性的深度學習模型, 碩士論文, 國
    立中央大學資訊工程學系, 2023.
    [22] J.-H. Chu. “Chapter 2 色彩體系,” Accessed: Jun. 17, 2024. [Online]. Available: https:
    //www.charts.kh.edu.tw/teaching-web/98color/color2-3.htm.
    [23] F. Martančík. “Naučte sa definovať a používať spektrálne meranie farieb,” Accessed:
    May 27, 2025. [Online]. Available: https://www.polygrafia- fotografia.sk/naucte- sadefinovat-
    a-pouzivat-spektralne-meranie-farieb/.
    [24] T. Riemersma. “Colour metric,” Accessed: Jun. 17, 2024. [Online]. Available: https :
    //www.compuphase.com/cmetric.htm.
    [25] J. Yang, R. Shi, D. Wei, Z. Liu, L. Zhao, B. Ke, H. Pfister, and B. Ni, “Medmnist v2-a
    large-scale lightweight benchmark for 2d and 3d biomedical image classification,” Scientific
    Data, vol. 10, no. 1, p. 41, 2023.
    [26] A. Krizhevsky, V. Nair, and G. Hinton, “Cifar-10 (canadian institute for advanced research),”
    [27] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to
    document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
    [28] H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: A novel image dataset for benchmarking
    machine learning algorithms,” 2017.
    [29] D. Nagpal, S. Panda, M. Malarvel, P. A. Pattanaik, and M. Zubair Khan, “A review of diabetic
    retinopathy: Datasets, approaches, evaluation metrics and future trends,” Journal
    of King Saud University - Computer and Information Sciences, vol. 34, no. 9, pp. 7138–
    7152, 2022.
    [30] B. Graham, Kaggle diabetic retinopathy detection: Min-pooling solution, Kaggle Discussion,
    2015.
    [31] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional
    neural networks,” in Advances in Neural Information Processing Systems, F.
    Pereira, C. Burges, L. Bottou, and K. Weinberger, Eds., vol. 25, Curran Associates, Inc.,
    2012.
    [32] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
    in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
    (CVPR), Jun. 2016.
    [33] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke,
    and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE Conference
    on Computer Vision and Pattern Recognition (CVPR), Jun. 2015.
    [34] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected convolutional
    networks,” in Proceedings of the IEEE Conference on Computer Vision and
    Pattern Recognition (CVPR), Jul. 2017.

    QR CODE
    :::