跳到主要內容

簡易檢索 / 詳目顯示

研究生: 邱柏愷
Po-Kai Chiu
論文名稱: 改進腹部電腦斷層掃描多任務多類別問題: 使用 2D VoCo 預訓練
Leveraging 2D VoCo-Based Pretraining to Enhance Multi-Task Multi-Class Classification of Abdominal CT Scan Medical Images
指導教授: 陳弘軒
Hung-Hsuan Chen
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 63
中文關鍵詞: 自監督學習對比學習腹部CT醫學影像
外文關鍵詞: Self-supervised Learning, Contrastive Learning, Abdominal CT, Medical Imaging
相關次數: 點閱:106下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在醫學影像分析領域中,深度學習模型的效能高度依賴大量且高
    品質的標註資料,然而醫學標註往往面臨高成本與高專業門檻的挑戰。
    為降低對人工標註的依賴,本研究提出一套應用於2D 醫學影像之自
    監督式對比學習架構,改良自3D VoCo(Volume Contrastive Learning
    Framework)而來,並整合序列建模技術,提升模型於腹部創傷分類任務
    中的辨識能力。
    本研究探討改良自3D VoCo 的2D 自監督對比學習方法於腹部CT
    影像分類任務中的應用。透過在公開腹部資料集進行切片級對比預訓練,
    學習切片間的語意結構,再將主幹架構遷移至RSNA 2023 資料集,執行
    多器官與單器官損傷分類。下游模型結合CNN-LSTM 架構以捕捉切片
    序列關聯,並透過多組消融實驗驗證對比策略之效益。
    實驗結果顯示,在多器官分類任務中也有不錯的成效,在資料標註有
    限的醫學場景下,能有效捕捉空間語意關聯並提升分類性能,證實VoCo
    框架在腹部CT 分析領域具備實務可行性與應用潛力,並提出了未來改
    進的方向,以進一步提升模型的實用性和泛化能力。這些結果表明2D
    VoCo 方法在醫療影像結合深度學習領域具有廣泛的應用前景和強大的
    擴展能力。


    In the field of medical image analysis, the performance of deep learning
    models heavily depends on large-scale, high-quality annotated datasets.
    However, medical annotations often face high costs and require specialized
    expertise. To reduce reliance on manual labeling, this study proposes a selfsupervised
    contrastive learning framework tailored for 2D medical imaging,
    adapted from the 3D Volume Contrastive Learning Framework (VoCo),
    and integrates sequence modeling techniques to enhance performance in
    abdominal trauma classification tasks.
    This study explores the application of the improved 2D VoCo method
    on abdominal CT image classification. By conducting slice-level contrastive
    pretraining on publicly available abdominal datasets, the model learns semantic
    structures across slices and transfers the pretrained backbone to
    the RSNA 2023 dataset for downstream multi-organ and single-organ injury
    classification tasks. The downstream model adopts a CNN-LSTM
    architecture to capture spatial-temporal correlations across slices, and a
    series of ablation studies are conducted to validate the effectiveness of the
    proposed contrastive strategy.
    Experimental results show that the proposed approach achieves promising
    performance even in multi-organ classification settings. Under limited
    annotation scenarios, the method effectively captures spatial-semantic dependencies
    and improves classification accuracy. These findings demonstrate
    the practical feasibility and application potential of the VoCo framework
    for abdominal CT analysis, and suggest directions for further improvement
    to enhance model generalizability and utility. Overall, the 2D
    VoCo method exhibits strong potential and scalability for medical image
    analysis in combination with deep learning.

    摘要vi Abstract vii 誌謝ix 目錄x 一、緒論1 1.1 研究背景與動機......................................................... 1 1.2 腹部CT 影像分析的挑戰............................................. 2 1.3 研究問題與貢獻......................................................... 5 二、相關研究7 2.1 自監督式學習(Self-Supervised Learning) ......................... 7 2.1.1 Instance-level Contrastive Learning........................ 8 2.1.2 Prototype-level Contrastive Learning...................... 8 2.2 自監督學習於醫學影像分析中的應用.............................. 9 三、研究模型及方法11 3.1 整體框架.................................................................. 12 3.2 上游:2D VoCo 預訓練架構.......................................... 13 3.2.1 對比學習架構與流程設計.................................... 14 3.2.2 位置標籤生成(Position Label Generation) ............. 16 x 目錄 3.2.3 損失函數設計................................................... 17 3.3 下游任務:分類架構設計............................................. 20 四、實驗結果與分析22 4.1 數據集與前處理......................................................... 22 4.2 資料處理與增強策略................................................... 23 4.2.1 器官區域擷取(TotalSegmentator) ...................... 25 4.3 評估方式.................................................................. 27 4.4 實驗參數.................................................................. 28 4.5 實驗結果與分析......................................................... 29 4.5.1 模型複雜度與推論效能分析................................. 29 4.5.2 實驗一:VoCo v1 使用不同數量之Base crops ......... 31 4.5.3 實驗二:VoCo v1 和VoCo v2 之比較.................... 32 4.5.4 實驗三:單一器官分類任務使用VoCo 之效果分析... 33 4.5.5 實驗四:多器官與單一器官分類任務之比較............ 34 4.5.6 實驗五:預訓練資料來源對分類效能之影響............ 36 4.5.7 特徵表示之可視化: t-SNE 分析............................ 38 五、總結42 5.1 結論........................................................................ 42 5.2 未來展望.................................................................. 43 參考文獻44 附錄A 程式碼47 附錄B 實驗數據48

    [1] L. Wu, J. Zhuang, and H. Chen, “Voco: A simple-yet-effective volume contrastive
    learning framework for 3d medical image analysis,” in Proceedings of the IEEE/
    CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2024,
    pp. 22 873–22 882.
    [2] L. Wu, J. Zhuang, and H. Chen, “Large-scale 3d medical image pre-training with
    geometric context priors,” arXiv preprint arXiv:2410.09890, 2024.
    [3] J. Ma, Y. Zhang, S. Gu, et al., “Automatic organ and pan-cancer segmentation
    in abdomen ct: The flare 2023 challenge,” arXiv preprint arXiv:2408.12534, 2024,
    MICCAI 2024 FLARE Challenge Summary. doi: 10.48550/arXiv.2408.12534.
    [Online]. Available: https://arxiv.org/abs/2408.12534.
    [4] J. D. Rudie, H.-M. Lin, R. L. Ball, et al., “The rsna abdominal traumatic injury ct
    (ratic) dataset,” arXiv preprint arXiv:2405.19595, 2024, Dataset for RSNA 2023
    Abdominal Trauma Detection Challenge. [Online]. Available: https://arxiv.org/
    abs/2405.19595.
    [5] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive
    learning of visual representations,” in International conference on machine
    learning, PMLR, 2020, pp. 1597–1607.
    [6] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised
    visual representation learning,” in Proceedings of the IEEE/CVF conference
    on computer vision and pattern recognition, 2020, pp. 9729–9738.
    [7] J.-B. Grill, F. Strub, F. Altché, et al., “Bootstrap your own latent: A new approach
    to self-supervised learning,” in Advances in neural information processing systems,
    vol. 33, 2020, pp. 21 271–21 284.
    [8] X. Chen and K. He, “Exploring simple siamese representation learning,” in Proceedings
    of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
    2021, pp. 15 750–15 758.
    [9] M. Caron, H. Touvron, I. Misra, et al., “Emerging properties in self-supervised
    vision transformers,” in Proceedings of the IEEE/CVF International Conference on
    Computer Vision, 2021, pp. 9650–9660.
    44
    [10] M. Caron, P. Bojanowski, A. Joulin, and M. Douze, “Deep clustering for unsupervised
    learning of visual features,” in Proceedings of the European conference on
    computer vision (ECCV), 2018, pp. 132–149.
    [11] M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, “Unsupervised
    learning of visual features by contrasting cluster assignments,” in Advances
    in neural information processing systems, vol. 33, 2020, pp. 9912–9924.
    [12] Y. M. Asano, C. Rupprecht, and A. Vedaldi, “Self-labelling via simultaneous clustering
    and representation learning,” in International Conference on Learning Representations,
    2020.
    [13] C. Li, Y. Zhang, G. Hu, J. Yang, and Y. Zhang, “Prototypical contrastive learning
    of unsupervised representations,” in Proceedings of the IEEE/CVF Conference on
    Computer Vision and Pattern Recognition, 2021, pp. 8159–8168.
    [14] S. Azizi, B. Mustafa, F. Ryan, et al., “Big self-supervised models advance medical
    image classification,” Proceedings of the IEEE/CVF International Conference on
    Computer Vision (ICCV), 2021.
    [15] Z. Cai, L. Lin, H. He, and X. Tang, “Corolla: An efficient multi-modality fusion
    framework with supervised contrastive learning for glaucoma grading,” arXiv
    preprint arXiv:2201.03795, 2022, To appear in ISBI 2022.
    [16] J. Liang, Y. Liu, X. Liu, et al., “Multi-scale cross contrastive learning for semisupervised
    medical image segmentation,” in International Conference on Medical
    Image Computing and Computer-Assisted Intervention (MICCAI), 2023.
    [17] C.-H. Lin, Y.-P. Chen, W.-C. Chang, et al., “The application of deep learning in
    abdominal trauma diagnosis by ct imaging,” World Journal of Emergency Surgery,
    vol. 19, no. 1, p. 22, 2024. doi: 10.1186/s13017-024-00546-7. [Online]. Available:
    https://wjes.biomedcentral.com/articles/10.1186/s13017-024-00546-7.
    [18] J. Wasserthal, H.-C. Breit, M. T. Meyer, et al., “Robust segmentation of 104
    anatomic structures in ct images,” Radiology: Artificial Intelligence, vol. 5, no. 5,
    e230024, 2023. doi: 10.1148/ryai.230024.
    [19] F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, and K. H. Maier-Hein, “Nnu-net:
    A self-configuring method for deep learning-based biomedical image segmentation,”
    Nature Methods, vol. 18, no. 2, pp. 203–211, 2021.
    [20] Y. Salimi, I. Shiri, Z. Mansouri, and H. Zaidi, “Deep learning-assisted multiple
    organ segmentation from whole‑body ct images,” medRxiv, 2023, preprint, not peerreviewed.
    doi: 10.1101/2023.10.20.23297331. [Online]. Available: https://doi.
    org/10.1101/2023.10.20.23297331.
    [21] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural
    networks,” in International conference on machine learning, PMLR, 2019, pp. 6105–
    6114.
    45
    參考文獻
    [22] M. Tan and Q. Le, “Efficientnetv2: Smaller models and faster training,” in International
    conference on machine learning, PMLR, 2021, pp. 10 096–10 106.
    [23] R. Wightman, Pytorch image models, https : / / github . com / huggingface /
    pytorch-image-models, 2019. doi: 10.5281/zenodo.4414861.
    [24] R. Wightman, H. Touvron, and H. Jegou, “Resnet strikes back: An improved training
    procedure in timm,” in NeurIPS 2021 Workshop on ImageNet: Past, Present,
    and Future.
    [25] R. Solovyev, A. A. Kalinin, and T. Gabruseva, “3d convolutional neural networks
    for stalled brain capillary detection,” Computers in Biology and Medicine, vol. 141,
    p. 105 089, 2022. doi: 10.1016/j.compbiomed.2021.105089.
    [26] V. Sovrasov. “Ptflops: A flops counting tool for neural networks in pytorch framework.”
    (), [Online]. Available: https://github.com/sovrasov/flops-counter.
    pytorch.

    QR CODE
    :::