| 研究生: |
陳俊宇 Chun-Yu Chen |
|---|---|
| 論文名稱: |
以卷積神經網路為基礎之改良型可解釋性深度學習模型 An Improved CNN-Based Interpretable Deep Learning Model |
| 指導教授: |
蘇木春
Mu-Chun Su |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 中文 |
| 論文頁數: | 120 |
| 中文關鍵詞: | 可解釋人工智慧 、深度學習 、色彩感知 、彩色影像 |
| 外文關鍵詞: | Explainable Artificial Intelligence, Deep Learning, Color Perception, Color Images |
| 相關次數: | 點閱:154 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著深度學習模型在醫學影像與電腦視覺等高風險應用領域取得優異表現,其「黑箱性」亦日益受到關注。為提升模型的透明度與可解釋性,
本研究以 2024 年提出之 RGB CNN-based Interpretable Model(RGBCIM)為基礎,針對其關鍵模組進行多項改良。
首先,在彩色卷積模組方面,我們棄用傳統 PCCS 色環,改採 CIELAB 色彩空間中,經 K-means 聚類後均勻分布之 30 種濾波器,
以更貼近人眼感知的方式強化濾波器對色彩特徵之表徵能力。
其次,在高斯卷積模組中,我們引入餘弦相似度作為卷積運算基礎,於保有相似度意涵並提升準確度的同時,大幅降低對超參數微調之敏感度。
最後,在可解釋性流程整合方面,我們延續原 RGBCIM 之視覺化流程,提出濾波器監測指標,並新增 Grad-CAM 篩選機制,以產出更清晰且聚焦之解釋圖。
本研究於 Colored MNIST、Colored Fashion MNIST、Colored Shape、PathMNIST、BloodMNIST、CIFAR-10、RetinalMNIST 等七個資料集上進行實驗驗證,
新模型在所有資料集上的分類準確率皆優於原始 RGBCIM,平均提升幅度顯著。
同時,面對複雜背景時亦能產出更具辨識力之可解釋性圖像,充分證實本研究改良方案在模型精度與可解釋性之間達成良好平衡。
With the outstanding performance of deep learning models in high-risk application domains such as medical imaging and computer vision, increasing attention has been drawn to their "black-box" nature. To enhance model transparency and interpretability, this study builds upon the RGB CNN-based Interpretable Model (RGBCIM) proposed in 2024 and introduces several improvements to its key modules.
First, in the color convolution module, we replace the traditional PCCS color circle with 30 uniformly distributed filters obtained through K-means clustering in the CIELAB color space. This design better aligns with human color perception and enhances the filters’ ability to represent color features.
Second, in the Gaussian convolution module, we adopt cosine similarity as the basis for convolution operations. This not only preserves the semantic meaning of similarity and improves accuracy but also significantly reduces the sensitivity to hyperparameter tuning.
Lastly, in the integration of the interpretability pipeline, we extend the original RGBCIM’s visualization process by proposing a filter monitoring metric and introducing a Grad-CAM-based filtering mechanism to produce clearer and more focused explanation maps.
Experiments conducted on seven datasets — Colored MNIST, Colored Fashion MNIST, Colored Shape, PathMNIST, BloodMNIST, CIFAR-10, and RetinalMNIST — demonstrate that the improved model achieves higher classification accuracy across all datasets compared to the original RGBCIM, with significant average gains. Additionally, it generates more distinguishable interpretability maps when faced with complex backgrounds, confirming that the proposed improvements successfully strike a balance between model accuracy and interpretability.
[1] F. Doshi-Velez and B. Kim, “Towards a rigorous science of interpretable machine learning,”
arXiv preprint arXiv:1702.08608, 2017.
[2] D. Gunning and D. Aha, “Xai-explainable artificial intelligence,” Defense Advanced Research
Projects Agency (DARPA), nd Web, 2019.
[3] Z. C. Lipton, “The mythos of model interpretability,” Queue, vol. 16, no. 3, pp. 31–57,
2018.
[4] 凃建名, 以卷積神經網路為基礎之新型可解釋性深度學習模型, 碩士論文, 國立中
央大學資訊工程學系, 2024.
[5] C. Patrício, J. C. Neves, and L. F. Teixeira, “Explainable deep learning methods in medical
image classification: A survey,” ACM Comput. Surv., vol. 56, no. 4, Oct. 2023.
[6] M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should i trust you?: Explaining the predictions
of any classifier,” in Proceedings of the 22nd ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, ser. KDD ’16, San Francisco, California,
USA: Association for Computing Machinery, 2016, pp. 1135–1144.
[7] S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,”
in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S.
Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30, Curran
Associates, Inc., 2017.
[8] V. Petsiuk, A. Das, and K. Saenko, “Rise: Randomized input sampling for explanation of
black-box models,” in Proceedings of the British Machine Vision Conference (BMVC),
2018.
[9] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “ Learning Deep Features
for Discriminative Localization,” in 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), Los Alamitos, CA, USA: IEEE Computer Society, Jun. 2016,
pp. 2921–2929.
[10] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam:
Visual explanations from deep networks via gradient-based localization,” in 2017 IEEE
International Conference on Computer Vision (ICCV), 2017, pp. 618–626.
[11] A. Chattopadhyay, A. Sarkar, P. Howlader, and V. N. Balasubramanian, “Grad-cam+
+: Improved visual explanations for deep convolutional networks,” arXiv preprint
arXiv:1710.11063, 2018.
[12] H. Wang, Z. Wang, M. Du, F. Yang, Z. Zhang, S. Ding, P. Mardziel, and X. Hu,
“Score-cam: Score-weighted visual explanations for convolutional neural networks,”
arXiv preprint arXiv:1910.01279, 2020.
[13] R. L. Draelos and L. Carin, “Use hirescam instead of grad-cam for faithful explanations
of convolutional neural networks,” arXiv preprint arXiv:2011.08891, 2020.
[14] M. B. Muhammad and M. Yeasin, “Eigen-cam: Class activation map using principal
components,” arXiv preprint arXiv:2008.00299, 2020.
[15] P.-T. Jiang, C.-B. Zhang, Q. Hou, M.-M. Cheng, and Y. Wei, “Layercam: Exploring hierarchical
class activation maps for localization,” IEEE Transactions on Image Processing,
2021.
[16] S. Desai and H. G. Ramaswamy, “Ablation-cam: Visual explanations for deep convolutional
network via gradient-free localization,” in Proceedings of the IEEE/CVF Winter
Conference on Applications of Computer Vision (WACV), 2020, pp. 972–980.
[17] J. Gildenblat and contributors, Pytorch library for cam methods, https://github.com/
jacobgil/pytorch-grad-cam, 2021.
[18] C. Chen, O. Li, C. Tao, A. J. Barnett, J. Su, and C. Rudin, “This looks like that: Deep
learning for interpretable image recognition,” in Proceedings of the International Conference
of Neural Information Processing Systems (NeurIPS), 2019.
[19] J. Donnelly, A. J. Barnett, and C. Chen, Deformable protopnet: An interpretable image
classifier using deformable prototypes, arXiv preprint arXiv:2111.15000, 2021.
[20] E. Kim, S. Kim, M. Seo, and S. Yoon, “Xprotonet: Diagnosis in chest radiography with
global and local explanations,” in Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR), 2021, pp. 15 719–15 728.
[21] 楊景豐, 一種以卷積神經網路為基礎的具可解釋性的深度學習模型, 碩士論文, 國
立中央大學資訊工程學系, 2023.
[22] J.-H. Chu. “Chapter 2 色彩體系,” Accessed: Jun. 17, 2024. [Online]. Available: https:
//www.charts.kh.edu.tw/teaching-web/98color/color2-3.htm.
[23] F. Martančík. “Naučte sa definovať a používať spektrálne meranie farieb,” Accessed:
May 27, 2025. [Online]. Available: https://www.polygrafia- fotografia.sk/naucte- sadefinovat-
a-pouzivat-spektralne-meranie-farieb/.
[24] T. Riemersma. “Colour metric,” Accessed: Jun. 17, 2024. [Online]. Available: https :
//www.compuphase.com/cmetric.htm.
[25] J. Yang, R. Shi, D. Wei, Z. Liu, L. Zhao, B. Ke, H. Pfister, and B. Ni, “Medmnist v2-a
large-scale lightweight benchmark for 2d and 3d biomedical image classification,” Scientific
Data, vol. 10, no. 1, p. 41, 2023.
[26] A. Krizhevsky, V. Nair, and G. Hinton, “Cifar-10 (canadian institute for advanced research),”
[27] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to
document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[28] H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: A novel image dataset for benchmarking
machine learning algorithms,” 2017.
[29] D. Nagpal, S. Panda, M. Malarvel, P. A. Pattanaik, and M. Zubair Khan, “A review of diabetic
retinopathy: Datasets, approaches, evaluation metrics and future trends,” Journal
of King Saud University - Computer and Information Sciences, vol. 34, no. 9, pp. 7138–
7152, 2022.
[30] B. Graham, Kaggle diabetic retinopathy detection: Min-pooling solution, Kaggle Discussion,
2015.
[31] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional
neural networks,” in Advances in Neural Information Processing Systems, F.
Pereira, C. Burges, L. Bottou, and K. Weinberger, Eds., vol. 25, Curran Associates, Inc.,
2012.
[32] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), Jun. 2016.
[33] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke,
and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), Jun. 2015.
[34] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected convolutional
networks,” in Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), Jul. 2017.