跳到主要內容

簡易檢索 / 詳目顯示

研究生: 程謀謙
Mo-Qian Cheng
論文名稱: 改進電腦斷層掃描醫療影像多任務多類別分類問題:使用自監督式學習與監督式對比學習
Leveraging Self-Supervised Learning and Supervised Contrastive Learning in Enhancing Multi-Task Multi-Class Classification Problem for CT Scan Medical Images
指導教授: 陳弘軒
Hung-Hsuan Chen
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 74
中文關鍵詞: 醫療影像多類別多輸出分類對比學習自監督式學習模型強健性
外文關鍵詞: Medical Imaging, Multi-Class, Multi-Output, Classification, Contrastive Learning, Self-Supervised Learning, Model Robustness
相關次數: 點閱:24下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 監督式學習已經被充分實證在標註資料充足的前提下能夠取得非凡的表現,但是在醫療影像分類任務中容易因為缺乏精準且充足的標註而讓模型產生過擬合的問題,進而阻礙其在實務的應用。

    面對這個問題,本論文旨在研究多任務學習(Multi-Task Learning)架構與自監督式學習(Self-Supervised Learning, SSL)預訓練的整合對減緩過擬合問題的效果。整體的發想係藉由SSL預訓練鼓勵特徵提取器捕捉醫療影像中微弱卻關鍵的特徵資訊,以此在下游任務的微調中,減輕過擬合的現象同時提升表現。為了實測模型的表現,我們將經過不同方法預訓練的特徵提取器在下游多類別多輸出(Multi-Class Multi-Output)的腹部創傷偵測任務上進行監督式學習。

    我們的實驗結果表明了SSL、SCL能夠大幅減輕在監督式學習中嚴重的過擬合現象,甚至能夠進一步提升模型在許多指標上的表現。透過進一步的分析,透過參與預訓練特徵提取器的不同,我們發現影像特徵提取器是預訓練帶來整體效果進步的主因。最後,我們針對SCL方法額外進行替換特徵提取器骨幹模型的實驗,探索到監督式對比學習方法增強模型強健性(Robustness)的潛力,提供突破模型強健性瓶頸的見解。

    總的來說,我們的研究表明SSL預訓練能夠在複雜的醫療影像分類任務中為模型提升分類效果以及強健性。


    Supervised learning has been proven to achieve magnificent performance with abundant labeled data, but often encounter overfitting problem in medical image classification due to the scarcity and the inaccuracy of the labeled data, hindering its development on real-life application.

    To address the problem, this thesis aims to investigate the integration of multi-task learning scheme and self-supervised learning (SSL) pretraining. The core idea is to leverage SSL pretraining by encouraging the model to capture more subtle yet crucial features within medical images, thereby mitigate the affect of overfitting while improve the performance during the fine-tuning in the downstream task. Specifically, we pretrain the model by a wide variety of SSL method, and evaluate them on a multi-class multi-output abdominal trauma detection task.

    Our experiment results demonstrate the fact that SSL and SCL pretraining could alleviate the overfitting problem occurs commonly in supervised learning. Moreover, it also yields modest improvement across several metrics. Further analysis, through comparing the experiments results with different feature extractor components, we reveal the image feature extractor is the major contributor to these gains. Lastly, we discover the potential of SCL to reinforce the robustness of a model by switch the backbone model of the feature extractor, providing insights into breaking the model robustness bottleneck.

    In conclusion, our research suggest that SSL pretraining could improve the classification performance and the robustness of a model for complicate medical image classification task.

    目錄 頁次 摘要 vi Abstract viii 誌謝 x 目錄 xi 一、 緒論 1 二、 相關研究 4 2.1 醫療影像分類任務 ...................................................... 4 2.2 多重實例學習 (Multi-Instance Learning, MIL)................... 4 2.3 自監督式學習 (Self-Supervised Learning) ......................... 5 2.3.1 對比式方法 (Contrastive Method) ......................... 6 2.3.2 自蒸餾方法 (Self-Distillation Method) .................... 7 2.3.3 冗餘減少方法 (Redundancy Reduction Method) ....... 8 2.4 監督式對比學習 (Supervised Contrastive Learning) ............ 8 2.4.1 模型強健性 (Robustness)..................................... 9 2.5 總結 ........................................................................ 9 三、 研究模型與方法 11 3.1 語意分割模型訓練 ...................................................... 11 3.2 分類模型訓練 ............................................................ 14 3.2.1 分類模型架構 ................................................... 14 3.2.2 二維預訓練 ...................................................... 16 3.2.3 三維預訓練 ...................................................... 17 3.2.4 使用的 SSL 方法 ............................................... 19 3.2.5 使用的 SCL 方法............................................... 20 3.2.6 下游任務訓練方式 ............................................. 21 3.3 SCL 預訓練與模型強健性............................................. 22 四、 實驗結果與分析 24 4.1 數據集與前處理 ......................................................... 24 4.1.1 數據集 ............................................................ 24 4.1.2 前處理 ............................................................ 25 4.1.3 實驗細節 ......................................................... 27 4.2 實驗結果 .................................................................. 28 4.2.1 二維、三維預訓練權重的遷移學習 ........................ 29 4.2.2 二維預訓練權重的線性探測 (Linear Probing)........... 32 4.2.3 SCL 預訓練與模型強健性.................................... 32 4.2.4 UniMoCo 損失函數的調整 ................................... 36 五、 總結 37 5.1 結論 ........................................................................ 37 5.2 論文限制 .................................................................. 38 5.3 未來展望 .................................................................. 39 參考文獻 42 附錄 A 附錄 46 A.1 預訓練策略的損失函數 ................................................ 46 A.1.1 MoCo.............................................................. 46 A.1.2 BYOL ............................................................. 46 A.1.3 SimSiam .......................................................... 47 A.1.4 Barlow Twins.................................................... 47 A.1.5 VICReg ........................................................... 47 A.1.6 TiCo............................................................... 49 A.1.7 SupCon ........................................................... 49 A.1.8 UniMoCo ......................................................... 49 A.2 預訓練結果指標成績折線圖 .......................................... 51 A.2.1 二維預訓練 ...................................................... 51 A.2.2 三維預訓練 ...................................................... 52 A.2.3 二維預訓練線性探測 .......................................... 53 A.2.4 EfficientNetB0 強健性實驗................................... 54 A.2.5 ConvNeXt-Tiny 強健性實驗................................. 55 A.3 SupCon 在各任務各類別的指標成績 ............................... 56 附錄 B 實驗程式碼 57

    參考文獻
    [1] K. Søreide, “Epidemiology of major trauma,” British Journal of Surgery, vol. 96,
    no. 7, pp. 697–698, Jun. 2009, issn: 0007-1323. doi: 10.1002/bjs.6643. eprint:
    https://academic.oup.com/bjs/article-pdf/96/7/697/36652059/bjs6643.
    pdf. [Online]. Available: https://doi.org/10.1002/bjs.6643.
    [2] C. Gaarder, N. O. Skaga, T. Eken, J. Pillgram-Larsen, T. Buanes, and P. A.
    Naess, “The impact of patient volume on surgical trauma training in a scandina-
    vian trauma centre,” Injury, vol. 36, no. 11, pp. 1288–1292, 2005, issn: 0020-1383.
    doi: https://doi.org/10.1016/j.injury.2005.06.034. [Online]. Available:
    https://www.sciencedirect.com/science/article/pii/S0020138305002317.
    [3] J. Smith, E. Caldwell, S. D’Amours, B. Jalaludin, and M. Sugrue, “Abdominal
    trauma: A disease in evolution,” ANZ Journal of Surgery, vol. 75, no. 9, pp. 790–
    794, 2005. doi: https://doi.org/10.1111/j.1445-2197.2005.03524.x. eprint:
    https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1445- 2197.2005.
    03524.x. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.
    1111/j.1445-2197.2005.03524.x.
    [4] M. Raza, Y. Abbas, V. Devi, K. V. S. Prasad, K. N. Rizk, and P. P. Nair, “Non
    operative management of abdominal trauma - a 10 years review,” World J. Emerg.
    Surg., vol. 8, no. 1, p. 14, 2013.
    [5] Y. Schirris, E. Gavves, I. Nederlof, H. M. Horlings, and J. Teuwen, “Deepsmile:
    Contrastive self-supervised pre-training benefits msi and hrd classification directly
    from h&e whole-slide images in colorectal and breast cancer,” Medical Image Anal-
    ysis, vol. 79, p. 102 464, Jul. 2022, issn: 1361-8415. doi: 10.1016/j.media.2022.
    102464. [Online]. Available: http : / / dx . doi . org / 10 . 1016 / j . media . 2022 .
    102464.
    [6] K. Cho, K. D. Kim, Y. Nam, et al., “Chess: Chest x-ray pre-trained model via
    self-supervised contrastive learning,” Journal of Digital Imaging, vol. 36, no. 3,
    pp. 902–910, Jun. 2023, issn: 1618-727X. doi: 10.1007/s10278- 023- 00782- 4.
    [Online]. Available: https://doi.org/10.1007/s10278-023-00782-4.
    42
    [7] T. G. Dietterich, R. H. Lathrop, and T. Lozano-Pérez, “Solving the multiple in-
    stance problem with axis-parallel rectangles,” Artif. Intell., vol. 89, no. 1–2, pp. 31–
    71, 1997.
    [8] Y. Si-Woo, 12th place solution for rsna 2023 abdominal trauma detection, 2023.
    [Online]. Available: https://github.com/siwooyong/RSNA- 2023- Abdominal-
    Trauma-Detection.
    [9] S. Ozsoy, S. Hamdan, S. Arik, D. Yuret, and A. Erdogan, “Self-supervised learning
    with an information maximization criterion,” vol. 35, S. Koyejo, S. Mohamed, A.
    Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., pp. 35 240–35 253, 2022. [Online].
    Available: https://proceedings.neurips.cc/paper_files/paper/2022/file/
    e4cd50120b6d7e8daff1749d6bbaa889-Paper-Conference.pdf.
    [10] L. Jing, P. Vincent, Y. LeCun, and Y. Tian, “Understanding dimensional collapse in
    contrastive self-supervised learning,” 2022. arXiv: 2110.09348 [cs.CV]. [Online].
    Available: https://arxiv.org/abs/2110.09348.
    [11] A. van den Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive
    predictive coding,” 2019. arXiv: 1807.03748 [cs.LG]. [Online]. Available: https:
    //arxiv.org/abs/1807.03748.
    [12] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for con-
    trastive learning of visual representations,” 2020. arXiv: 2002 . 05709 [cs.LG].
    [Online]. Available: https://arxiv.org/abs/2002.05709.
    [13] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsuper-
    vised visual representation learning,” 2020. arXiv: 1911.05722 [cs.CV]. [Online].
    Available: https://arxiv.org/abs/1911.05722.
    [14] J.-B. Grill, F. Strub, F. Altché, et al., “Bootstrap your own latent: A new approach
    to self-supervised learning,” 2020. arXiv: 2006.07733 [cs.LG]. [Online]. Available:
    https://arxiv.org/abs/2006.07733.
    [15] Y. Tian, X. Chen, and S. Ganguli, Understanding self-supervised learning dynamics
    without contrastive pairs, 2021. arXiv: 2102.06810 [cs.LG]. [Online]. Available:
    https://arxiv.org/abs/2102.06810.
    [16] K. Q. Weinberger, J. Blitzer, and L. Saul, “Distance metric learning for large mar-
    gin nearest neighbor classification,” in Advances in Neural Information Processing
    Systems, Y. Weiss, B. Schölkopf, and J. Platt, Eds., vol. 18, MIT Press, 2005. [On-
    line]. Available: https://proceedings.neurips.cc/paper_files/paper/2005/
    file/a7f592cef8b130a6967a90617db5681b-Paper.pdf.
    [17] S. Chopra, R. Hadsell, and Y. LeCun, “Learning a similarity metric discriminatively,
    with application to face verification,” in 2005 IEEE Computer Society Conference
    on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, 2005, 539–546
    vol. 1. doi: 10.1109/CVPR.2005.202.
    43
    參考文獻
    [18] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face
    recognition and clustering,” in Proceedings of the IEEE Conference on Computer
    Vision and Pattern Recognition (CVPR), Jun. 2015.
    [19] P. Khosla, P. Teterwak, C. Wang, et al., “Supervised contrastive learning,” in
    Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato,
    R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33, Curran Associates, Inc., 2020,
    pp. 18 661–18 673. [Online]. Available: https://proceedings.neurips.cc/paper_
    files/paper/2020/file/d89a66c7c80a29b1bdbab0f2a1a94af8-Paper.pdf.
    [20] X. Chen and K. He, “Exploring simple siamese representation learning,” 2020.
    arXiv: 2011.10566 [cs.CV]. [Online]. Available: https://arxiv.org/abs/2011.
    10566.
    [21] J. Zbontar, L. Jing, I. Misra, Y. LeCun, and S. Deny, “Barlow twins: Self-supervised
    learning via redundancy reduction,” 2021. arXiv: 2103.03230 [cs.CV]. [Online].
    Available: https://arxiv.org/abs/2103.03230.
    [22] A. Bardes, J. Ponce, and Y. LeCun, “Vicreg: Variance-invariance-covariance regu-
    larization for self-supervised learning,” 2022. arXiv: 2105.04906 [cs.CV]. [Online].
    Available: https://arxiv.org/abs/2105.04906.
    [23] J. Zhu, R. M. Moraes, S. Karakulak, V. Sobol, A. Canziani, and Y. LeCun, “Tico:
    Transformation invariance and covariance contrast for self-supervised visual repre-
    sentation learning,” 2022. arXiv: 2206.10698 [cs.CV]. [Online]. Available: https:
    //arxiv.org/abs/2206.10698.
    [24] Z. Dai, B. Cai, and J. Chen, “Unimoco: Unsupervised, semi-supervised and fully-
    supervised visual representation learning,” in 2022 IEEE International Conference
    on Systems, Man, and Cybernetics (SMC), 2022, pp. 3099–3106. doi: 10.1109/
    SMC53654.2022.9945500.
    [25] J. Wasserthal, H.-C. Breit, M. T. Meyer, et al., “Totalsegmentator: Robust seg-
    mentation of 104 anatomic structures in ct images,” Radiology: Artificial Intelli-
    gence, vol. 5, no. 5, e230024, 2023. doi: 10.1148/ryai.230024. eprint: https:
    //doi.org/10.1148/ryai.230024. [Online]. Available: https://doi.org/10.
    1148/ryai.230024.
    [26] M. J. Cardoso, W. Li, R. Brown, et al., Monai: An open-source framework for
    deep learning in healthcare, 2022. arXiv: 2211.02701 [cs.LG]. [Online]. Available:
    https://arxiv.org/abs/2211.02701.
    [27] A. Myronenko, 3d mri brain tumor segmentation using autoencoder regularization,
    2018. arXiv: 1810.11654 [cs.CV]. [Online]. Available: https://arxiv.org/abs/
    1810.11654.
    44
    [28] A. Hatamizadeh, V. Nath, Y. Tang, D. Yang, H. Roth, and D. Xu, “Swin unetr:
    Swin transformers for semantic segmentation of brain tumors in mri images,” 2022.
    arXiv: 2201 . 01266 [eess.IV]. [Online]. Available: https : / / arxiv . org / abs /
    2201.01266.
    [29] C. H. Sudre, W. Li, T. Vercauteren, S. Ourselin, and M. Jorge Cardoso, “Gener-
    alised dice overlap as a deep learning loss function for highly unbalanced segmen-
    tations,” in Deep Learning in Medical Image Analysis and Multimodal Learning for
    Clinical Decision Support. Springer International Publishing, 2017, pp. 240–248,
    isbn: 9783319675589. doi: 10.1007/978-3-319-67558-9_28. [Online]. Available:
    http://dx.doi.org/10.1007/978-3-319-67558-9_28.
    [30] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense ob-
    ject detection,” in Proceedings of the IEEE International Conference on Computer
    Vision (ICCV), Oct. 2017.
    [31] B. Kang, S. Xie, M. Rohrbach, et al., Decoupling representation and classifier
    for long-tailed recognition, 2020. arXiv: 1910.09217 [cs.CV]. [Online]. Available:
    https://arxiv.org/abs/1910.09217.
    [32] A. Radford, J. W. Kim, C. Hallacy, et al., “Learning transferable visual models from
    natural language supervision,” in Proceedings of the 38th International Conference
    on Machine Learning, M. Meila and T. Zhang, Eds., ser. Proceedings of Machine
    Learning Research, vol. 139, PMLR, Jul. 2021, pp. 8748–8763.
    [33] M. Cherti, R. Beaumont, R. Wightman, et al., “Reproducible scaling laws for con-
    trastive language-image learning,” in Proceedings of the IEEE/CVF Conference on
    Computer Vision and Pattern Recognition (CVPR), Jun. 2023, pp. 2818–2829.
    [34] OpenAI, J. Achiam, S. Adler, et al., Gpt-4 technical report, 2024. arXiv: 2303.08774
    [cs.CL]. [Online]. Available: https://arxiv.org/abs/2303.08774.
    [35] G. Team, R. Anil, S. Borgeaud, et al., Gemini: A family of highly capable multimodal
    models, 2025. arXiv: 2312.11805 [cs.CL]. [Online]. Available: https://arxiv.
    org/abs/2312.11805.

    QR CODE
    :::