| 研究生: |
程謀謙 Mo-Qian Cheng |
|---|---|
| 論文名稱: |
改進電腦斷層掃描醫療影像多任務多類別分類問題:使用自監督式學習與監督式對比學習 Leveraging Self-Supervised Learning and Supervised Contrastive Learning in Enhancing Multi-Task Multi-Class Classification Problem for CT Scan Medical Images |
| 指導教授: |
陳弘軒
Hung-Hsuan Chen |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 中文 |
| 論文頁數: | 74 |
| 中文關鍵詞: | 醫療影像 、多類別 、多輸出 、分類 、對比學習 、自監督式學習 、模型強健性 |
| 外文關鍵詞: | Medical Imaging, Multi-Class, Multi-Output, Classification, Contrastive Learning, Self-Supervised Learning, Model Robustness |
| 相關次數: | 點閱:24 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
監督式學習已經被充分實證在標註資料充足的前提下能夠取得非凡的表現,但是在醫療影像分類任務中容易因為缺乏精準且充足的標註而讓模型產生過擬合的問題,進而阻礙其在實務的應用。
面對這個問題,本論文旨在研究多任務學習(Multi-Task Learning)架構與自監督式學習(Self-Supervised Learning, SSL)預訓練的整合對減緩過擬合問題的效果。整體的發想係藉由SSL預訓練鼓勵特徵提取器捕捉醫療影像中微弱卻關鍵的特徵資訊,以此在下游任務的微調中,減輕過擬合的現象同時提升表現。為了實測模型的表現,我們將經過不同方法預訓練的特徵提取器在下游多類別多輸出(Multi-Class Multi-Output)的腹部創傷偵測任務上進行監督式學習。
我們的實驗結果表明了SSL、SCL能夠大幅減輕在監督式學習中嚴重的過擬合現象,甚至能夠進一步提升模型在許多指標上的表現。透過進一步的分析,透過參與預訓練特徵提取器的不同,我們發現影像特徵提取器是預訓練帶來整體效果進步的主因。最後,我們針對SCL方法額外進行替換特徵提取器骨幹模型的實驗,探索到監督式對比學習方法增強模型強健性(Robustness)的潛力,提供突破模型強健性瓶頸的見解。
總的來說,我們的研究表明SSL預訓練能夠在複雜的醫療影像分類任務中為模型提升分類效果以及強健性。
Supervised learning has been proven to achieve magnificent performance with abundant labeled data, but often encounter overfitting problem in medical image classification due to the scarcity and the inaccuracy of the labeled data, hindering its development on real-life application.
To address the problem, this thesis aims to investigate the integration of multi-task learning scheme and self-supervised learning (SSL) pretraining. The core idea is to leverage SSL pretraining by encouraging the model to capture more subtle yet crucial features within medical images, thereby mitigate the affect of overfitting while improve the performance during the fine-tuning in the downstream task. Specifically, we pretrain the model by a wide variety of SSL method, and evaluate them on a multi-class multi-output abdominal trauma detection task.
Our experiment results demonstrate the fact that SSL and SCL pretraining could alleviate the overfitting problem occurs commonly in supervised learning. Moreover, it also yields modest improvement across several metrics. Further analysis, through comparing the experiments results with different feature extractor components, we reveal the image feature extractor is the major contributor to these gains. Lastly, we discover the potential of SCL to reinforce the robustness of a model by switch the backbone model of the feature extractor, providing insights into breaking the model robustness bottleneck.
In conclusion, our research suggest that SSL pretraining could improve the classification performance and the robustness of a model for complicate medical image classification task.
參考文獻
[1] K. Søreide, “Epidemiology of major trauma,” British Journal of Surgery, vol. 96,
no. 7, pp. 697–698, Jun. 2009, issn: 0007-1323. doi: 10.1002/bjs.6643. eprint:
https://academic.oup.com/bjs/article-pdf/96/7/697/36652059/bjs6643.
pdf. [Online]. Available: https://doi.org/10.1002/bjs.6643.
[2] C. Gaarder, N. O. Skaga, T. Eken, J. Pillgram-Larsen, T. Buanes, and P. A.
Naess, “The impact of patient volume on surgical trauma training in a scandina-
vian trauma centre,” Injury, vol. 36, no. 11, pp. 1288–1292, 2005, issn: 0020-1383.
doi: https://doi.org/10.1016/j.injury.2005.06.034. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S0020138305002317.
[3] J. Smith, E. Caldwell, S. D’Amours, B. Jalaludin, and M. Sugrue, “Abdominal
trauma: A disease in evolution,” ANZ Journal of Surgery, vol. 75, no. 9, pp. 790–
794, 2005. doi: https://doi.org/10.1111/j.1445-2197.2005.03524.x. eprint:
https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1445- 2197.2005.
03524.x. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.
1111/j.1445-2197.2005.03524.x.
[4] M. Raza, Y. Abbas, V. Devi, K. V. S. Prasad, K. N. Rizk, and P. P. Nair, “Non
operative management of abdominal trauma - a 10 years review,” World J. Emerg.
Surg., vol. 8, no. 1, p. 14, 2013.
[5] Y. Schirris, E. Gavves, I. Nederlof, H. M. Horlings, and J. Teuwen, “Deepsmile:
Contrastive self-supervised pre-training benefits msi and hrd classification directly
from h&e whole-slide images in colorectal and breast cancer,” Medical Image Anal-
ysis, vol. 79, p. 102 464, Jul. 2022, issn: 1361-8415. doi: 10.1016/j.media.2022.
102464. [Online]. Available: http : / / dx . doi . org / 10 . 1016 / j . media . 2022 .
102464.
[6] K. Cho, K. D. Kim, Y. Nam, et al., “Chess: Chest x-ray pre-trained model via
self-supervised contrastive learning,” Journal of Digital Imaging, vol. 36, no. 3,
pp. 902–910, Jun. 2023, issn: 1618-727X. doi: 10.1007/s10278- 023- 00782- 4.
[Online]. Available: https://doi.org/10.1007/s10278-023-00782-4.
42
[7] T. G. Dietterich, R. H. Lathrop, and T. Lozano-Pérez, “Solving the multiple in-
stance problem with axis-parallel rectangles,” Artif. Intell., vol. 89, no. 1–2, pp. 31–
71, 1997.
[8] Y. Si-Woo, 12th place solution for rsna 2023 abdominal trauma detection, 2023.
[Online]. Available: https://github.com/siwooyong/RSNA- 2023- Abdominal-
Trauma-Detection.
[9] S. Ozsoy, S. Hamdan, S. Arik, D. Yuret, and A. Erdogan, “Self-supervised learning
with an information maximization criterion,” vol. 35, S. Koyejo, S. Mohamed, A.
Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., pp. 35 240–35 253, 2022. [Online].
Available: https://proceedings.neurips.cc/paper_files/paper/2022/file/
e4cd50120b6d7e8daff1749d6bbaa889-Paper-Conference.pdf.
[10] L. Jing, P. Vincent, Y. LeCun, and Y. Tian, “Understanding dimensional collapse in
contrastive self-supervised learning,” 2022. arXiv: 2110.09348 [cs.CV]. [Online].
Available: https://arxiv.org/abs/2110.09348.
[11] A. van den Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive
predictive coding,” 2019. arXiv: 1807.03748 [cs.LG]. [Online]. Available: https:
//arxiv.org/abs/1807.03748.
[12] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for con-
trastive learning of visual representations,” 2020. arXiv: 2002 . 05709 [cs.LG].
[Online]. Available: https://arxiv.org/abs/2002.05709.
[13] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsuper-
vised visual representation learning,” 2020. arXiv: 1911.05722 [cs.CV]. [Online].
Available: https://arxiv.org/abs/1911.05722.
[14] J.-B. Grill, F. Strub, F. Altché, et al., “Bootstrap your own latent: A new approach
to self-supervised learning,” 2020. arXiv: 2006.07733 [cs.LG]. [Online]. Available:
https://arxiv.org/abs/2006.07733.
[15] Y. Tian, X. Chen, and S. Ganguli, Understanding self-supervised learning dynamics
without contrastive pairs, 2021. arXiv: 2102.06810 [cs.LG]. [Online]. Available:
https://arxiv.org/abs/2102.06810.
[16] K. Q. Weinberger, J. Blitzer, and L. Saul, “Distance metric learning for large mar-
gin nearest neighbor classification,” in Advances in Neural Information Processing
Systems, Y. Weiss, B. Schölkopf, and J. Platt, Eds., vol. 18, MIT Press, 2005. [On-
line]. Available: https://proceedings.neurips.cc/paper_files/paper/2005/
file/a7f592cef8b130a6967a90617db5681b-Paper.pdf.
[17] S. Chopra, R. Hadsell, and Y. LeCun, “Learning a similarity metric discriminatively,
with application to face verification,” in 2005 IEEE Computer Society Conference
on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, 2005, 539–546
vol. 1. doi: 10.1109/CVPR.2005.202.
43
參考文獻
[18] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face
recognition and clustering,” in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), Jun. 2015.
[19] P. Khosla, P. Teterwak, C. Wang, et al., “Supervised contrastive learning,” in
Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato,
R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33, Curran Associates, Inc., 2020,
pp. 18 661–18 673. [Online]. Available: https://proceedings.neurips.cc/paper_
files/paper/2020/file/d89a66c7c80a29b1bdbab0f2a1a94af8-Paper.pdf.
[20] X. Chen and K. He, “Exploring simple siamese representation learning,” 2020.
arXiv: 2011.10566 [cs.CV]. [Online]. Available: https://arxiv.org/abs/2011.
10566.
[21] J. Zbontar, L. Jing, I. Misra, Y. LeCun, and S. Deny, “Barlow twins: Self-supervised
learning via redundancy reduction,” 2021. arXiv: 2103.03230 [cs.CV]. [Online].
Available: https://arxiv.org/abs/2103.03230.
[22] A. Bardes, J. Ponce, and Y. LeCun, “Vicreg: Variance-invariance-covariance regu-
larization for self-supervised learning,” 2022. arXiv: 2105.04906 [cs.CV]. [Online].
Available: https://arxiv.org/abs/2105.04906.
[23] J. Zhu, R. M. Moraes, S. Karakulak, V. Sobol, A. Canziani, and Y. LeCun, “Tico:
Transformation invariance and covariance contrast for self-supervised visual repre-
sentation learning,” 2022. arXiv: 2206.10698 [cs.CV]. [Online]. Available: https:
//arxiv.org/abs/2206.10698.
[24] Z. Dai, B. Cai, and J. Chen, “Unimoco: Unsupervised, semi-supervised and fully-
supervised visual representation learning,” in 2022 IEEE International Conference
on Systems, Man, and Cybernetics (SMC), 2022, pp. 3099–3106. doi: 10.1109/
SMC53654.2022.9945500.
[25] J. Wasserthal, H.-C. Breit, M. T. Meyer, et al., “Totalsegmentator: Robust seg-
mentation of 104 anatomic structures in ct images,” Radiology: Artificial Intelli-
gence, vol. 5, no. 5, e230024, 2023. doi: 10.1148/ryai.230024. eprint: https:
//doi.org/10.1148/ryai.230024. [Online]. Available: https://doi.org/10.
1148/ryai.230024.
[26] M. J. Cardoso, W. Li, R. Brown, et al., Monai: An open-source framework for
deep learning in healthcare, 2022. arXiv: 2211.02701 [cs.LG]. [Online]. Available:
https://arxiv.org/abs/2211.02701.
[27] A. Myronenko, 3d mri brain tumor segmentation using autoencoder regularization,
2018. arXiv: 1810.11654 [cs.CV]. [Online]. Available: https://arxiv.org/abs/
1810.11654.
44
[28] A. Hatamizadeh, V. Nath, Y. Tang, D. Yang, H. Roth, and D. Xu, “Swin unetr:
Swin transformers for semantic segmentation of brain tumors in mri images,” 2022.
arXiv: 2201 . 01266 [eess.IV]. [Online]. Available: https : / / arxiv . org / abs /
2201.01266.
[29] C. H. Sudre, W. Li, T. Vercauteren, S. Ourselin, and M. Jorge Cardoso, “Gener-
alised dice overlap as a deep learning loss function for highly unbalanced segmen-
tations,” in Deep Learning in Medical Image Analysis and Multimodal Learning for
Clinical Decision Support. Springer International Publishing, 2017, pp. 240–248,
isbn: 9783319675589. doi: 10.1007/978-3-319-67558-9_28. [Online]. Available:
http://dx.doi.org/10.1007/978-3-319-67558-9_28.
[30] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense ob-
ject detection,” in Proceedings of the IEEE International Conference on Computer
Vision (ICCV), Oct. 2017.
[31] B. Kang, S. Xie, M. Rohrbach, et al., Decoupling representation and classifier
for long-tailed recognition, 2020. arXiv: 1910.09217 [cs.CV]. [Online]. Available:
https://arxiv.org/abs/1910.09217.
[32] A. Radford, J. W. Kim, C. Hallacy, et al., “Learning transferable visual models from
natural language supervision,” in Proceedings of the 38th International Conference
on Machine Learning, M. Meila and T. Zhang, Eds., ser. Proceedings of Machine
Learning Research, vol. 139, PMLR, Jul. 2021, pp. 8748–8763.
[33] M. Cherti, R. Beaumont, R. Wightman, et al., “Reproducible scaling laws for con-
trastive language-image learning,” in Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR), Jun. 2023, pp. 2818–2829.
[34] OpenAI, J. Achiam, S. Adler, et al., Gpt-4 technical report, 2024. arXiv: 2303.08774
[cs.CL]. [Online]. Available: https://arxiv.org/abs/2303.08774.
[35] G. Team, R. Anil, S. Borgeaud, et al., Gemini: A family of highly capable multimodal
models, 2025. arXiv: 2312.11805 [cs.CL]. [Online]. Available: https://arxiv.
org/abs/2312.11805.