改進腹部電腦斷層掃描多任務多類別問題: 使用 2D VoCo 預訓練

簡易檢索 / 詳目顯示

回結果列表

研究生：	邱柏愷 Po-Kai Chiu
論文名稱：	改進腹部電腦斷層掃描多任務多類別問題: 使用 2D VoCo 預訓練 Leveraging 2D VoCo-Based Pretraining to Enhance Multi-Task Multi-Class Classification of Abdominal CT Scan Medical Images
指導教授：	陳弘軒 Hung-Hsuan Chen
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering
論文出版年：	2025
畢業學年度：	113
語文別：	中文
論文頁數：	63
中文關鍵詞：	自監督學習、對比學習、腹部CT 、醫學影像
外文關鍵詞：	Self-supervised Learning, Contrastive Learning, Abdominal CT, Medical Imaging
相關次數：	點閱：106 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在醫學影像分析領域中，深度學習模型的效能高度依賴大量且高
品質的標註資料，然而醫學標註往往面臨高成本與高專業門檻的挑戰。
為降低對人工標註的依賴，本研究提出一套應用於2D 醫學影像之自
監督式對比學習架構，改良自3D VoCo（Volume Contrastive Learning
Framework）而來，並整合序列建模技術，提升模型於腹部創傷分類任務
中的辨識能力。
本研究探討改良自3D VoCo 的2D 自監督對比學習方法於腹部CT
影像分類任務中的應用。透過在公開腹部資料集進行切片級對比預訓練，
學習切片間的語意結構，再將主幹架構遷移至RSNA 2023 資料集，執行
多器官與單器官損傷分類。下游模型結合CNN-LSTM 架構以捕捉切片
序列關聯，並透過多組消融實驗驗證對比策略之效益。
實驗結果顯示，在多器官分類任務中也有不錯的成效，在資料標註有
限的醫學場景下，能有效捕捉空間語意關聯並提升分類性能，證實VoCo
框架在腹部CT 分析領域具備實務可行性與應用潛力，並提出了未來改
進的方向，以進一步提升模型的實用性和泛化能力。這些結果表明2D
VoCo 方法在醫療影像結合深度學習領域具有廣泛的應用前景和強大的
擴展能力。

In the field of medical image analysis, the performance of deep learning
models heavily depends on large-scale, high-quality annotated datasets.
However, medical annotations often face high costs and require specialized
expertise. To reduce reliance on manual labeling, this study proposes a selfsupervised
contrastive learning framework tailored for 2D medical imaging,
adapted from the 3D Volume Contrastive Learning Framework (VoCo),
and integrates sequence modeling techniques to enhance performance in
abdominal trauma classification tasks.
This study explores the application of the improved 2D VoCo method
on abdominal CT image classification. By conducting slice-level contrastive
pretraining on publicly available abdominal datasets, the model learns semantic
structures across slices and transfers the pretrained backbone to
the RSNA 2023 dataset for downstream multi-organ and single-organ injury
classification tasks. The downstream model adopts a CNN-LSTM
architecture to capture spatial-temporal correlations across slices, and a
series of ablation studies are conducted to validate the effectiveness of the
proposed contrastive strategy.
Experimental results show that the proposed approach achieves promising
performance even in multi-organ classification settings. Under limited
annotation scenarios, the method effectively captures spatial-semantic dependencies
and improves classification accuracy. These findings demonstrate
the practical feasibility and application potential of the VoCo framework
for abdominal CT analysis, and suggest directions for further improvement
to enhance model generalizability and utility. Overall, the 2D
VoCo method exhibits strong potential and scalability for medical image
analysis in combination with deep learning.

摘要vi
Abstract vii
誌謝ix
目錄x
一、緒論1
1.1 研究背景與動機......................................................... 1
1.2 腹部CT 影像分析的挑戰............................................. 2
1.3 研究問題與貢獻......................................................... 5
二、相關研究7
2.1 自監督式學習(Self-Supervised Learning) ......................... 7
2.1.1 Instance-level Contrastive Learning........................ 8
2.1.2 Prototype-level Contrastive Learning...................... 8
2.2 自監督學習於醫學影像分析中的應用.............................. 9
三、研究模型及方法11
3.1 整體框架.................................................................. 12
3.2 上游：2D VoCo 預訓練架構.......................................... 13
3.2.1 對比學習架構與流程設計.................................... 14
3.2.2 位置標籤生成（Position Label Generation) ............. 16
x
目錄
3.2.3 損失函數設計................................................... 17
3.3 下游任務：分類架構設計............................................. 20
四、實驗結果與分析22
4.1 數據集與前處理......................................................... 22
4.2 資料處理與增強策略................................................... 23
4.2.1 器官區域擷取（TotalSegmentator） ...................... 25
4.3 評估方式.................................................................. 27
4.4 實驗參數.................................................................. 28
4.5 實驗結果與分析......................................................... 29
4.5.1 模型複雜度與推論效能分析................................. 29
4.5.2 實驗一：VoCo v1 使用不同數量之Base crops ......... 31
4.5.3 實驗二：VoCo v1 和VoCo v2 之比較.................... 32
4.5.4 實驗三：單一器官分類任務使用VoCo 之效果分析... 33
4.5.5 實驗四：多器官與單一器官分類任務之比較............ 34
4.5.6 實驗五：預訓練資料來源對分類效能之影響............ 36
4.5.7 特徵表示之可視化: t-SNE 分析............................ 38
五、總結42
5.1 結論........................................................................ 42
5.2 未來展望.................................................................. 43
參考文獻44
附錄A 程式碼47
附錄B 實驗數據48
                                

[1] L. Wu, J. Zhuang, and H. Chen, “Voco: A simple-yet-effective volume contrastive
learning framework for 3d medical image analysis,” in Proceedings of the IEEE/
CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2024,
pp. 22 873–22 882.
[2] L. Wu, J. Zhuang, and H. Chen, “Large-scale 3d medical image pre-training with
geometric context priors,” arXiv preprint arXiv:2410.09890, 2024.
[3] J. Ma, Y. Zhang, S. Gu, et al., “Automatic organ and pan-cancer segmentation
in abdomen ct: The flare 2023 challenge,” arXiv preprint arXiv:2408.12534, 2024,
MICCAI 2024 FLARE Challenge Summary. doi: 10.48550/arXiv.2408.12534.
[Online]. Available: https://arxiv.org/abs/2408.12534.
[4] J. D. Rudie, H.-M. Lin, R. L. Ball, et al., “The rsna abdominal traumatic injury ct
(ratic) dataset,” arXiv preprint arXiv:2405.19595, 2024, Dataset for RSNA 2023
Abdominal Trauma Detection Challenge. [Online]. Available: https://arxiv.org/
abs/2405.19595.
[5] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive
learning of visual representations,” in International conference on machine
learning, PMLR, 2020, pp. 1597–1607.
[6] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised
visual representation learning,” in Proceedings of the IEEE/CVF conference
on computer vision and pattern recognition, 2020, pp. 9729–9738.
[7] J.-B. Grill, F. Strub, F. Altché, et al., “Bootstrap your own latent: A new approach
to self-supervised learning,” in Advances in neural information processing systems,
vol. 33, 2020, pp. 21 271–21 284.
[8] X. Chen and K. He, “Exploring simple siamese representation learning,” in Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
2021, pp. 15 750–15 758.
[9] M. Caron, H. Touvron, I. Misra, et al., “Emerging properties in self-supervised
vision transformers,” in Proceedings of the IEEE/CVF International Conference on
Computer Vision, 2021, pp. 9650–9660.
44
[10] M. Caron, P. Bojanowski, A. Joulin, and M. Douze, “Deep clustering for unsupervised
learning of visual features,” in Proceedings of the European conference on
computer vision (ECCV), 2018, pp. 132–149.
[11] M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, “Unsupervised
learning of visual features by contrasting cluster assignments,” in Advances
in neural information processing systems, vol. 33, 2020, pp. 9912–9924.
[12] Y. M. Asano, C. Rupprecht, and A. Vedaldi, “Self-labelling via simultaneous clustering
and representation learning,” in International Conference on Learning Representations,
2020.
[13] C. Li, Y. Zhang, G. Hu, J. Yang, and Y. Zhang, “Prototypical contrastive learning
of unsupervised representations,” in Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, 2021, pp. 8159–8168.
[14] S. Azizi, B. Mustafa, F. Ryan, et al., “Big self-supervised models advance medical
image classification,” Proceedings of the IEEE/CVF International Conference on
Computer Vision (ICCV), 2021.
[15] Z. Cai, L. Lin, H. He, and X. Tang, “Corolla: An efficient multi-modality fusion
framework with supervised contrastive learning for glaucoma grading,” arXiv
preprint arXiv:2201.03795, 2022, To appear in ISBI 2022.
[16] J. Liang, Y. Liu, X. Liu, et al., “Multi-scale cross contrastive learning for semisupervised
medical image segmentation,” in International Conference on Medical
Image Computing and Computer-Assisted Intervention (MICCAI), 2023.
[17] C.-H. Lin, Y.-P. Chen, W.-C. Chang, et al., “The application of deep learning in
abdominal trauma diagnosis by ct imaging,” World Journal of Emergency Surgery,
vol. 19, no. 1, p. 22, 2024. doi: 10.1186/s13017-024-00546-7. [Online]. Available:
https://wjes.biomedcentral.com/articles/10.1186/s13017-024-00546-7.
[18] J. Wasserthal, H.-C. Breit, M. T. Meyer, et al., “Robust segmentation of 104
anatomic structures in ct images,” Radiology: Artificial Intelligence, vol. 5, no. 5,
e230024, 2023. doi: 10.1148/ryai.230024.
[19] F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, and K. H. Maier-Hein, “Nnu-net:
A self-configuring method for deep learning-based biomedical image segmentation,”
Nature Methods, vol. 18, no. 2, pp. 203–211, 2021.
[20] Y. Salimi, I. Shiri, Z. Mansouri, and H. Zaidi, “Deep learning-assisted multiple
organ segmentation from whole‑body ct images,” medRxiv, 2023, preprint, not peerreviewed.
doi: 10.1101/2023.10.20.23297331. [Online]. Available: https://doi.
org/10.1101/2023.10.20.23297331.
[21] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural
networks,” in International conference on machine learning, PMLR, 2019, pp. 6105–
6114.
45
參考文獻
[22] M. Tan and Q. Le, “Efficientnetv2: Smaller models and faster training,” in International
conference on machine learning, PMLR, 2021, pp. 10 096–10 106.
[23] R. Wightman, Pytorch image models, https : / / github . com / huggingface /
pytorch-image-models, 2019. doi: 10.5281/zenodo.4414861.
[24] R. Wightman, H. Touvron, and H. Jegou, “Resnet strikes back: An improved training
procedure in timm,” in NeurIPS 2021 Workshop on ImageNet: Past, Present,
and Future.
[25] R. Solovyev, A. A. Kalinin, and T. Gabruseva, “3d convolutional neural networks
for stalled brain capillary detection,” Computers in Biology and Medicine, vol. 141,
p. 105 089, 2022. doi: 10.1016/j.compbiomed.2021.105089.
[26] V. Sovrasov. “Ptflops: A flops counting tool for neural networks in pytorch framework.”
(), [Online]. Available: https://github.com/sovrasov/flops-counter.
pytorch.

簡易檢索 / 詳目顯示

相關論文