跳到主要內容

簡易檢索 / 詳目顯示

研究生: 吳宥俞
You-Yu Wu
論文名稱: 結合影像增強與 Transformer 架構於胰臟分割之研究
A Study on Pancreas Segmentation by Integrating Image Enhancement and Transformer Architecture
指導教授: 蘇木春
Mu-Chun Su
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 64
中文關鍵詞: 胰臟分割深度學習醫學影像電腦視覺Transformer影像處理
外文關鍵詞: Coronary Artery, Deep Learning, Medical Image Processing, Computer Vision, Transformer, Image Processing
相關次數: 點閱:23下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 2024 年台灣衛生福利部統計處統計,胰臟癌已被列入 113 年十大癌症死因中的第七名,更是令人聞之色變的癌症。本研究針對胰臟在醫學影像分割任務中準確率偏低的情況,提出以深度學習技術提升胰臟自動分割的效能。胰臟因面積小、形狀變異大及邊界模糊,常常導致傳統分割方法難以達到高準確率。隨著電腦斷層掃描(CT)與人工智慧技術的發展,結合影像增強與深度學習模型已成為提升分割表現的重要方向。因此,本研究以 Transformer 架構為基礎,改良 SAM2 模型,並針對無顯影劑之 CT 影像當成資料集,並進行適當的前處理與特徵強化,期望提升胰臟及其他器官的分割準確率與穩定性。
    在實驗設計上,本研究蒐集多個公開腹部 CT 資料集,並針對影像進行 HU 值調整、CLAHE 對比度增強等前處理,將每層影像視為影片幀依序的輸入模型。模型架構以改良後的 SAM2 為核心,並探討不同特徵融合策略、prompt encoder 與 memory encoder 設計對分割效能的影響。實驗結果顯示,本研究的方法於 NIH TCIA 資料集的胰臟分割 Dice Score達到 92.3%,優於現有方法,且在肝臟、腎臟、脾臟等多器官分割任務亦具不錯的泛化能力。最後進一步以 Grad-CAM 等技術分析模型特徵學習的狀況,驗證其於小器官辨識的解釋性,證明本研究方法在醫學影像分割領域的可行性與潛力。


    According to statistics from the Ministry of Health and Welfare of Taiwan in 2024, pancreatic cancer ranked seventh among the top ten causes of cancer death, making it a particularly concerning disease. This study addresses the challenge of low segmentation accuracy for the pancreas in medical imaging by proposing a deep learning-based approach to improve automatic segmentation of the pancreas and multiple organs. Due to its small size, variable shape, and indistinct boundaries, the pancreas is difficult to segment accurately using traditional methods. With the advancement of computed tomography (CT) and artificial intelligence technologies, combining image enhancement with deep learning models has become an important direction for improving segmentation performance. Therefore, this research is based on a Transformer architecture, modifies the SAM2 model, and applies preprocessing and feature enhancement to non-contrast CT images, aiming to improve the segmentation accuracy and stability for the pancreas and other organs.

    In the experimental design, multiple publicly available abdominal CT datasets were collected. Preprocessing steps such as HU value adjustment and CLAHE contrast enhancement were applied, and each CT slice was treated as a video frame for model input. The core model is a modified SAM2 architecture, and the effects of different feature fusion strategies, prompt encoder, and memory encoder designs on segmentation performance were explored. Experimental results show that the proposed method achieved a Dice Score of 92.3\% for pancreas segmentation on the NIH TCIA dataset, outperforming existing methods. The model also demonstrated good generalization in multi-organ segmentation tasks, such as for the liver, kidneys, and spleen. Further analysis using Grad-CAM and related techniques verified the model’s interpretability in recognizing small organs, confirming the feasibility and potential of the proposed approach in the field of medical image segmentation.

    摘要 vi Abstract viii 誌謝 x 目錄 xi 一、 緒論 1 1.1 研究動機 .................................................................. 1 1.2 研究目的 .................................................................. 3 1.3 論文架構 .................................................................. 4 二、 文獻回顧及背景知識 5 2.1 文獻回顧 .................................................................. 5 2.1.1 利用醫學影像進行各器官分割成效之研究 ............... 5 2.1.2 對於 NIH TCIA Pancreas-CT 資料集之研究............... 7 2.1.3 SAM 模型應用於醫學影像之研究 .......................... 8 2.1.4 Grad-CAM 之研究.............................................. 9 2.2 背景知識 .................................................................. 10 2.2.1 電腦斷層掃描 ................................................... 10 2.2.2 Hounsfield Units................................................. 10 2.2.3 人體的胰臟結構 ................................................ 12 2.2.4 Vision Transformer 模型 ....................................... 14 2.2.5 Segment Anything Model 模型 ............................... 15 三、 研究方法 17 3.1 研究假設與目的 ......................................................... 17 3.2 系統架構 .................................................................. 17 3.3 Image preprocessing stage............................................... 19 3.3.1 CT 圖的連接及尺寸縮放...................................... 19 3.3.2 背景裁剪 ......................................................... 19 3.3.3 HU 值以及限制對比度自適應直方圖均衡化調整 ....... 20 3.3.4 將 CT 圖轉化為影片 ........................................... 21 3.4 convmem-SAM2 model stage .......................................... 21 3.4.1 圖像編碼器模組(image encoder block) ................. 22 3.4.2 提示編碼器模組(prompt encoder block)與遮罩解 碼器模組(mask decoder block) .................................... 24 3.4.3 記憶編碼器模組(memory encoder block)............... 25 3.4.4 記憶體庫模組(memory bank block)與記憶體注意 力模組(memory attention block)................................... 28 3.5 損失函數 Loss function ................................................. 30 3.6 評估指標 Evaluation Metrics........................................... 30 四、 實驗設計與結果 32 4.1 基本介紹 .................................................................. 32 4.2 資料集介紹 ............................................................... 32 4.2.1 主要比較之資料集 ............................................. 32 4.2.2 其他資料集 ...................................................... 33 4.3 實驗設備 .................................................................. 34 4.4 實驗設計與成果 ......................................................... 34 4.4.1 模型訓練參數 ................................................... 34 4.4.2 prompt encoder 的特徵融合方法比較....................... 36 4.4.3 memory encoder 模組的增強策略比較 ..................... 36 4.4.4 模型訓練成果 ................................................... 37 4.4.5 Baseline 模型分割結果比較 .................................. 38 4.5 相關文獻比較 ............................................................ 39 4.6 其他器官分割結果 ...................................................... 40 4.7 GradCAM 可視化分析.................................................. 41 五、 總結 44 5.1 結論 ........................................................................ 44 5.2 未來展望 .................................................................. 46 參考文獻 47

    [1] 衛生福利部統計處. “113 年國人死因統計結果,” 衛生福利部統計處, Accessed:
    May 16, 2025. [Online]. Available: https://www.mohw.gov.tw/cp-16-79055-1.html.
    [2] National Institute of Biomedical Imaging and Bioengineering. “Computed tomography
    (CT),” National Institute of Biomedical Imaging and Bioengineering, Accessed: May 16,
    2025. [Online]. Available: https : / / www . nibib . nih . gov / science - education / science -
    topics/computed-tomography-ct.
    [3] Radiopaedia.org. “Abdominal and pelvic CT,” Accessed: May 27, 2024. [Online]. Available: https://www.radiologyinfo.org/en/info/abdominct.
    [4] S. Suri, S. Gupta, and R. Suri, “Computed tomography in abdominal tuberculosis.,”
    British Journal of Radiology, vol. 72, no. 853, pp. 92–98, May 2014.
    [5] R. J. Alfidi, J. Haaga, T. F. Meaney, W. J. MacIntyre, L. Gonzalez, R. Tarar, M. G.
    Zelch, M. Boller, S. A. Cook, and G. Jelden, “Computed tomography of the thorax and
    abdomen; a preliminary report,” Radiology, vol. 117, no. 2, pp. 257–264, Nov. 1975.
    [6] Johns Hopkins Medicine. “Magnetic resonance imaging (MRI),” Accessed: May 16,
    2025. [Online]. Available: https://www.hopkinsmedicine.org/health/treatment-testsand-therapies/magnetic-resonance-imaging-mri.
    [7] K. Doi, “Computer-aided diagnosis in medical imaging: Historical review, current status and future potential,” Computerized Medical Imaging and Graphics, vol. 31, no. 4,
    pp. 198–211, 2007.
    [8] H. Cao, Y. Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang, “Swin-unet:
    Unet-like pure transformer for medical image segmentation,” in European conference
    on computer vision, Springer, 2022, pp. 205–218.
    [9] A. Hatamizadeh, Y. Tang, V. Nath, D. Yang, A. Myronenko, B. Landman, H. R. Roth,
    and D. Xu, “Unetr: Transformers for 3d medical image segmentation,” in Proceedings of
    the IEEE/CVF winter conference on applications of computer vision, 2022, pp. 574–584.
    [10] Z. Huang, H. Wang, Z. Deng, J. Ye, Y. Su, H. Sun, J. He, Y. Gu, L. Gu, S. Zhang, et al.,
    “Stu-net: Scalable and transferable medical image segmentation models empowered by
    large-scale supervised pre-training,” arXiv preprint arXiv:2304.06716, 2023.
    [11] S. Chen, K. Ma, and Y. Zheng, “Med3d: Transfer learning for 3d medical image analysis,” arXiv preprint arXiv:1904.00625, 2019.
    [12] O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S.
    McDonagh, N. Y. Hammerla, B. Kainz, et al., “Attention u-net: Learning where to look
    for the pancreas,” arXiv preprint arXiv:1804.03999, 2018.
    [13] J. Chen, Y. Lu, Q. Yu, X. Luo, E. Adeli, Y. Wang, L. Lu, A. L. Yuille, and Y. Zhou,
    “Transunet: Transformers make strong encoders for medical image segmentation,” arXiv
    preprint arXiv:2102.04306, 2021.
    [14] H. R. Roth, H. Oda, X. Zhou, N. Shimizu, Y. Yang, Y. Hayashi, M. Oda, M. Fujiwara, K.
    Misawa, and K. Mori, “An application of cascaded 3d fully convolutional networks for
    medical image segmentation,” Computerized Medical Imaging and Graphics, vol. 66,
    pp. 90–99, 2018.
    [15] Q. Yu, L. Xie, Y. Wang, Y. Zhou, E. K. Fishman, and A. L. Yuille, “Recurrent saliency
    transformation network: Incorporating multi-stage visual cues for small organ segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition,
    2018, pp. 8280–8289.
    [16] F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, and K. H. Maier-Hein, “Nnu-net: A selfconfiguring method for deep learning-based biomedical image segmentation,” Nature
    methods, vol. 18, no. 2, pp. 203–211, 2021.
    [17] Y.-C. Chen, Y.-C. Lin, C.-P. Wang, C.-Y. Lee, W.-J. Lee, T.-D. Wang, and C.-M. Chen,
    Coronary artery segmentation in cardiac ct angiography using 3d multi-channel u-net,
    2019.
    [18] R. Azad, M. Heidari, M. Shariatnia, E. K. Aghdam, S. Karimijafarbigloo, E. Adeli, and
    D. Merhof, Transdeeplab: Convolution-free transformer-based deeplab v3+ for medical
    image segmentation, 2022.
    [19] A. Hatamizadeh, V. Nath, Y. Tang, D. Yang, H. Roth, and D. Xu, Swin unetr: Swin
    transformers for semantic segmentation of brain tumors in mri images, 2021.
    [20] 名誼蕭, 基於深度學習之胰臟分割方法, 碩士論文,國立中央大學資訊工程學系,
    Taiwan, Jun. 2024.
    [21] H. R. Roth, L. Lu, N. Lay, A. P. Harrison, A. Farag, A. Sohn, and R. M. Summers,
    “Spatial aggregation of holistically-nested convolutional neural networks for automated
    pancreas localization and segmentation,” Medical image analysis, vol. 45, pp. 94–107,
    2018.
    [22] J. Cheng, J. Ye, Z. Deng, J. Chen, T. Li, H. Wang, Y. Su, Z. Huang, J. Chen, L. Jiang,
    H. Sun, J. He, S. Zhang, M. Zhu, and Y. Qiao, Sam-med2d, 2023.
    [23] J. Zhu, A. Hamdi, Y. Qi, Y. Jin, and J. Wu, Medical sam 2: Segment medical images as
    video via segment anything model 2, 2024.
    [24] X. Deng, H. Wu, R. Zeng, and J. Qin, “Memsam: Taming segment anything model for
    echocardiography video segmentation,” in Proceedings of the IEEE/CVF Conference on
    Computer Vision and Pattern Recognition (CVPR), Jun. 2024, pp. 9622–9631.
    [25] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam:
    Visual explanations from deep networks via gradient-based localization,” International
    Journal of Computer Vision, vol. 128, no. 2, pp. 336–359, Oct. 2019.
    [26] J. Gildenblat and contributors, Pytorch library for cam methods, https:// github. com/
    jacobgil/pytorch-grad-cam, 2021.
    [27] P. Rasuli and D. I. Hammond. “Metformin and contrast media: Where is the conflict?
    ”[Online]. Available: https://pubmed.ncbi.nlm.nih.gov/9640281/.
    [28] K. Greenway. “Hounsfield unit | radiology reference article | radiopaedia.org,” Radiopaedia, Accessed: Jul. 4, 2021. [Online]. Available: https://radiopaedia.org/articles/hounsfieldunit.
    [29] M. H. Lev and R. G. Gonzalez, “CT angiography and CT perfusion imaging,” in Brain
    Mapping: The Methods (Second Edition), A. W. Toga and J. C. Mazziotta, Eds., San
    Diego: Academic Press, Jan. 1, 2002, pp. 427–484.
    [30] A. Murphy. “Windowing (CT) | radiology reference article | radiopaedia.org,” Radiopaedia, Accessed: Jul. 4, 2021. [Online]. Available: https://radiopaedia.org/articles/windowingct?lang=us.
    [31] 台灣癌症資訊全人關懷協會 (TCI), “人體的胰臟內部結構-胰臟癌,” in StatPearls
    Publishing, May 16, 2025.
    [32] 维基百科, 胰臟, Mar. 16, 2025.
    [33] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and
    I. Polosukhin, Attention is all you need, 2023.
    [34] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M.
    Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, An image
    is worth 16x16 words: Transformers for image recognition at scale, 2021.
    [35] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead,
    A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, Segment anything, 2023.
    [36] K. J. Zuiderveld, “Contrast limited adaptive histogram equalization,” in Graphics gems,
    1994.
    [37] C. Xia, X. Wang, F. Lv, X. Hao, and Y. Shi, “Vit-comer: Vision transformer with convolutional multi-scale feature interaction for dense predictions,” in Proceedings of the
    IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 5493–
    5502.
    [38] H. Roth, A. Farag, E. B. Turkbey, L. Lu, J. Liu, and R. M. Summers, Data from PancreasCT, 2016.
    [39] B. Landman, Z. Xu, J. Igelsias, M. Styner, T. Langerak, and A. Klein, “Miccai multi-atlas
    labeling beyond the cranial vault–workshop and challenge,” in Proc. MICCAI MultiAtlas Labeling Beyond Cranial Vault—Workshop Challenge, vol. 5, 2015, p. 12.
    [40] Y. Ji, H. Bai, J. Yang, C. Ge, Y. Zhu, R. Zhang, Z. Li, L. Zhang, W. Ma, X. Wan, and P.
    Luo, Amos: A large-scale abdominal multi-organ benchmark for versatile medical image
    segmentation, 2022.
    [41] P. Bilic et al., “The liver tumor segmentation benchmark (lits),” Medical Image Analysis,
    vol. 84, p. 102 680, 2023.

    QR CODE
    :::