| 研究生: |
邱柏愷 Po-Kai Chiu |
|---|---|
| 論文名稱: |
改進腹部電腦斷層掃描多任務多類別問題: 使用 2D VoCo 預訓練 Leveraging 2D VoCo-Based Pretraining to Enhance Multi-Task Multi-Class Classification of Abdominal CT Scan Medical Images |
| 指導教授: |
陳弘軒
Hung-Hsuan Chen |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 中文 |
| 論文頁數: | 63 |
| 中文關鍵詞: | 自監督學習 、對比學習 、腹部CT 、醫學影像 |
| 外文關鍵詞: | Self-supervised Learning, Contrastive Learning, Abdominal CT, Medical Imaging |
| 相關次數: | 點閱:107 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在醫學影像分析領域中,深度學習模型的效能高度依賴大量且高
品質的標註資料,然而醫學標註往往面臨高成本與高專業門檻的挑戰。
為降低對人工標註的依賴,本研究提出一套應用於2D 醫學影像之自
監督式對比學習架構,改良自3D VoCo(Volume Contrastive Learning
Framework)而來,並整合序列建模技術,提升模型於腹部創傷分類任務
中的辨識能力。
本研究探討改良自3D VoCo 的2D 自監督對比學習方法於腹部CT
影像分類任務中的應用。透過在公開腹部資料集進行切片級對比預訓練,
學習切片間的語意結構,再將主幹架構遷移至RSNA 2023 資料集,執行
多器官與單器官損傷分類。下游模型結合CNN-LSTM 架構以捕捉切片
序列關聯,並透過多組消融實驗驗證對比策略之效益。
實驗結果顯示,在多器官分類任務中也有不錯的成效,在資料標註有
限的醫學場景下,能有效捕捉空間語意關聯並提升分類性能,證實VoCo
框架在腹部CT 分析領域具備實務可行性與應用潛力,並提出了未來改
進的方向,以進一步提升模型的實用性和泛化能力。這些結果表明2D
VoCo 方法在醫療影像結合深度學習領域具有廣泛的應用前景和強大的
擴展能力。
In the field of medical image analysis, the performance of deep learning
models heavily depends on large-scale, high-quality annotated datasets.
However, medical annotations often face high costs and require specialized
expertise. To reduce reliance on manual labeling, this study proposes a selfsupervised
contrastive learning framework tailored for 2D medical imaging,
adapted from the 3D Volume Contrastive Learning Framework (VoCo),
and integrates sequence modeling techniques to enhance performance in
abdominal trauma classification tasks.
This study explores the application of the improved 2D VoCo method
on abdominal CT image classification. By conducting slice-level contrastive
pretraining on publicly available abdominal datasets, the model learns semantic
structures across slices and transfers the pretrained backbone to
the RSNA 2023 dataset for downstream multi-organ and single-organ injury
classification tasks. The downstream model adopts a CNN-LSTM
architecture to capture spatial-temporal correlations across slices, and a
series of ablation studies are conducted to validate the effectiveness of the
proposed contrastive strategy.
Experimental results show that the proposed approach achieves promising
performance even in multi-organ classification settings. Under limited
annotation scenarios, the method effectively captures spatial-semantic dependencies
and improves classification accuracy. These findings demonstrate
the practical feasibility and application potential of the VoCo framework
for abdominal CT analysis, and suggest directions for further improvement
to enhance model generalizability and utility. Overall, the 2D
VoCo method exhibits strong potential and scalability for medical image
analysis in combination with deep learning.
[1] L. Wu, J. Zhuang, and H. Chen, “Voco: A simple-yet-effective volume contrastive
learning framework for 3d medical image analysis,” in Proceedings of the IEEE/
CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2024,
pp. 22 873–22 882.
[2] L. Wu, J. Zhuang, and H. Chen, “Large-scale 3d medical image pre-training with
geometric context priors,” arXiv preprint arXiv:2410.09890, 2024.
[3] J. Ma, Y. Zhang, S. Gu, et al., “Automatic organ and pan-cancer segmentation
in abdomen ct: The flare 2023 challenge,” arXiv preprint arXiv:2408.12534, 2024,
MICCAI 2024 FLARE Challenge Summary. doi: 10.48550/arXiv.2408.12534.
[Online]. Available: https://arxiv.org/abs/2408.12534.
[4] J. D. Rudie, H.-M. Lin, R. L. Ball, et al., “The rsna abdominal traumatic injury ct
(ratic) dataset,” arXiv preprint arXiv:2405.19595, 2024, Dataset for RSNA 2023
Abdominal Trauma Detection Challenge. [Online]. Available: https://arxiv.org/
abs/2405.19595.
[5] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive
learning of visual representations,” in International conference on machine
learning, PMLR, 2020, pp. 1597–1607.
[6] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised
visual representation learning,” in Proceedings of the IEEE/CVF conference
on computer vision and pattern recognition, 2020, pp. 9729–9738.
[7] J.-B. Grill, F. Strub, F. Altché, et al., “Bootstrap your own latent: A new approach
to self-supervised learning,” in Advances in neural information processing systems,
vol. 33, 2020, pp. 21 271–21 284.
[8] X. Chen and K. He, “Exploring simple siamese representation learning,” in Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
2021, pp. 15 750–15 758.
[9] M. Caron, H. Touvron, I. Misra, et al., “Emerging properties in self-supervised
vision transformers,” in Proceedings of the IEEE/CVF International Conference on
Computer Vision, 2021, pp. 9650–9660.
44
[10] M. Caron, P. Bojanowski, A. Joulin, and M. Douze, “Deep clustering for unsupervised
learning of visual features,” in Proceedings of the European conference on
computer vision (ECCV), 2018, pp. 132–149.
[11] M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, “Unsupervised
learning of visual features by contrasting cluster assignments,” in Advances
in neural information processing systems, vol. 33, 2020, pp. 9912–9924.
[12] Y. M. Asano, C. Rupprecht, and A. Vedaldi, “Self-labelling via simultaneous clustering
and representation learning,” in International Conference on Learning Representations,
2020.
[13] C. Li, Y. Zhang, G. Hu, J. Yang, and Y. Zhang, “Prototypical contrastive learning
of unsupervised representations,” in Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, 2021, pp. 8159–8168.
[14] S. Azizi, B. Mustafa, F. Ryan, et al., “Big self-supervised models advance medical
image classification,” Proceedings of the IEEE/CVF International Conference on
Computer Vision (ICCV), 2021.
[15] Z. Cai, L. Lin, H. He, and X. Tang, “Corolla: An efficient multi-modality fusion
framework with supervised contrastive learning for glaucoma grading,” arXiv
preprint arXiv:2201.03795, 2022, To appear in ISBI 2022.
[16] J. Liang, Y. Liu, X. Liu, et al., “Multi-scale cross contrastive learning for semisupervised
medical image segmentation,” in International Conference on Medical
Image Computing and Computer-Assisted Intervention (MICCAI), 2023.
[17] C.-H. Lin, Y.-P. Chen, W.-C. Chang, et al., “The application of deep learning in
abdominal trauma diagnosis by ct imaging,” World Journal of Emergency Surgery,
vol. 19, no. 1, p. 22, 2024. doi: 10.1186/s13017-024-00546-7. [Online]. Available:
https://wjes.biomedcentral.com/articles/10.1186/s13017-024-00546-7.
[18] J. Wasserthal, H.-C. Breit, M. T. Meyer, et al., “Robust segmentation of 104
anatomic structures in ct images,” Radiology: Artificial Intelligence, vol. 5, no. 5,
e230024, 2023. doi: 10.1148/ryai.230024.
[19] F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, and K. H. Maier-Hein, “Nnu-net:
A self-configuring method for deep learning-based biomedical image segmentation,”
Nature Methods, vol. 18, no. 2, pp. 203–211, 2021.
[20] Y. Salimi, I. Shiri, Z. Mansouri, and H. Zaidi, “Deep learning-assisted multiple
organ segmentation from whole‑body ct images,” medRxiv, 2023, preprint, not peerreviewed.
doi: 10.1101/2023.10.20.23297331. [Online]. Available: https://doi.
org/10.1101/2023.10.20.23297331.
[21] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural
networks,” in International conference on machine learning, PMLR, 2019, pp. 6105–
6114.
45
參考文獻
[22] M. Tan and Q. Le, “Efficientnetv2: Smaller models and faster training,” in International
conference on machine learning, PMLR, 2021, pp. 10 096–10 106.
[23] R. Wightman, Pytorch image models, https : / / github . com / huggingface /
pytorch-image-models, 2019. doi: 10.5281/zenodo.4414861.
[24] R. Wightman, H. Touvron, and H. Jegou, “Resnet strikes back: An improved training
procedure in timm,” in NeurIPS 2021 Workshop on ImageNet: Past, Present,
and Future.
[25] R. Solovyev, A. A. Kalinin, and T. Gabruseva, “3d convolutional neural networks
for stalled brain capillary detection,” Computers in Biology and Medicine, vol. 141,
p. 105 089, 2022. doi: 10.1016/j.compbiomed.2021.105089.
[26] V. Sovrasov. “Ptflops: A flops counting tool for neural networks in pytorch framework.”
(), [Online]. Available: https://github.com/sovrasov/flops-counter.
pytorch.