| 研究生: |
許俊偉 Chun-Wei Hsu |
|---|---|
| 論文名稱: |
基於多模型共識實現類別增量式語意分割問題之研究 M2CB: Incremental Semantic Segmentation via Multiple Model Consensus Building |
| 指導教授: |
鄭旭詠
Hsu-Yung Cheng 余執彰 Chih-Chang Yu |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 中文 |
| 論文頁數: | 46 |
| 中文關鍵詞: | 增量式學習 、語義分割 、模型共識 |
| 外文關鍵詞: | Incremental Learning, Semantic Segmentation, Model Consensus |
| 相關次數: | 點閱:13 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著深度學習技術快速發展,傳統模型需面對資料量不斷增加所帶來的挑戰。每
次更新資料都需重新訓練整個模型,不僅耗費資源,亦可能受限於硬體與隱私問題。相較之下,增量式學習能在不重新訓練的情況下快速吸收新知,大幅降低訓練成本,更適合應對持續變動的資料環境。在眾多應用中,語意分割作為一項像素級別的分類任務,廣泛應用於自動駕駛、醫療影像與智慧監控等領域。然而,隨著任務需求不斷演進,模型必須學習新的類別,這也使得類別增量式語意分割(Class-Incremental Semantic Segmentation, CISS)成為一項重要課題。
CISS 面臨嚴重的挑戰,主要來自於災難性遺忘與背景偏移問題。雖然現有方法多
依賴標記資料,逐步微調模型以學習新任務,但當僅有多個已訓練完成的模型可用,且無法取得舊任務或新任務的標記資料時,此類方法將難以實際應用。
為了解決此問題,我們提出多模型共識機制(Multiple Model Consensus Building, M2CB),此方法利用多個預訓練模型對無標記資料的輸出結果,透過建立模型間的預測共識及選擇性地蒸餾可信知識,進行模型整合。實驗部分使用 Pascal VOC 2012 作為訓練資料集,並以 MS COCO 2017 作為無標記輔助資料來源。實驗結果顯示, M2CB在三個連續任務下,平均交集比分別較現有方法提升 6.31% 與 16.15%,驗證了所提方法在無標記情境下的有效性。
關鍵字:增量式學習、語義分割、模型共識
With the advancement of deep learning technologies, traditional models face increasing challenges as data volumes grow. Re-training the entire model for every update is computationally expensive and may be limited by hardware constraints or data privacy concerns. In contrast, incremental learning enables efficient knowledge acquisition from new data without retraining from scratch, significantly reducing training costs and better accommodating dynamic data environments.
Among various applications, semantic segmentation, a pixel-level classification task, is widely used in domains such as autonomous driving, medical imaging and smart surveillance. However, as task demands evolve, models are required to learn new classes over time, giving rise to the challenge of class-incremental semantic segmentation (CISS). CISS faces significant challenges due to catastrophic forgetting and background shift problems. While existing approaches typically rely on labeled data to progressively finetune models for new tasks, these methods become impractical when only multiple pre-trained models are available or when labeled data cannot be accessed for either old or new tasks.
To address this issue, we propose a novel approach called Multiple Model Consensus Building (M2CB), which leverages predictions from multiple pre-trained models on unlabeled data. M2CB selectively distills trustworthy knowledge by identifying consensus among the models' predictions. We evaluate M2CB using the Pascal VOC 2012 dataset as training data and MS COCO 2017 as the source of unlabeled auxiliary data. Experimental results demonstrate the effectiveness of our approach, achieving 6.31% and 16.15% improvement in mean Intersection over Union (mIoU) compared to existing methods after incorporating three consecutive tasks.
Keywords: Incremental Learning, Semantic Segmentation, Model Consensus
[1] M. McCloskey and N. J. Cohen, "Catastrophic interference in connectionist networks: The sequential learning problem," in Psychology of learning and motivation, vol. 24: Elsevier, 1989, pp. 109-165.
[2] G. Hinton, O. Vinyals, and J. Dean, "Distilling the knowledge in a neural network," arXiv preprint arXiv:1503.02531, 2015.
[3] J. Kirkpatrick et al., "Overcoming catastrophic forgetting in neural networks," Proceedings of the national academy of sciences, vol. 114, no. 13, pp. 3521-3526, 2017.
[4] Z. Li and D. Hoiem, "Learning without forgetting," IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 12, pp. 2935-2947, 2017.
[5] D. Lopez-Paz and M. A. Ranzato, "Gradient episodic memory for continual learning," Advances in neural information processing systems, vol. 30, 2017.
[6] S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, "icarl: Incremental classifier and representation learning," in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 2001-2010.
[7] H. Shin, J. K. Lee, J. Kim, and J. Kim, "Continual learning with deep generative replay," Advances in neural information processing systems, vol. 30, 2017.
[8] A. Mallya and S. Lazebnik, "Packnet: Adding multiple tasks to a single network by iterative pruning," in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 7765-7773.
[9] J. Serra, D. Suris, M. Miron, and A. Karatzoglou, "Overcoming catastrophic forgetting with hard attention to the task," in International conference on machine learning, 2018: PMLR, pp. 4548-4557.
[10] J. Zhang et al., "Class-incremental learning via deep model consolidation," in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2020, pp. 1131-1140.
[11] M. Litrico, A. Del Bue, and P. Morerio, "Guiding pseudo-labels with uncertainty estimation for source-free unsupervised domain adaptation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 7640-7650.
[12] H. Nam, H. Lee, J. Park, W. Yoon, and D. Yoo, "Reducing domain gap by reducing style bias," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 8690-8699.
[13] Y. Liu, W. Zhang, and J. Wang, "Source-free domain adaptation for semantic segmentation," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 1215-1224.
[14] Y. Ganin and V. Lempitsky, "Unsupervised domain adaptation by backpropagation," in International conference on machine learning, 2015: PMLR, pp. 1180-1189.
[15] F. Cermelli, M. Mancini, S. R. Bulo, E. Ricci, and B. Caputo, "Modeling the background for incremental learning in semantic segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9233-9242.
[16] S. Cha, Y. Yoo, and T. Moon, "Ssul: Semantic segmentation with unknown label for exemplar-based class-incremental learning," Advances in neural information processing systems, vol. 34, pp. 10919-10930, 2021.
[17] F. Cermelli, D. Fontanel, A. Tavera, M. Ciccone, and B. Caputo, "Incremental learning in semantic segmentation from image labels," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4371-4381.
[18] C. Shang, H. Li, F. Meng, Q. Wu, H. Qiu, and L. Wang, "Incrementer: Transformer for class-incremental semantic segmentation with knowledge distillation focusing on old class," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 7214-7224.
[19] G. Park, W. Moon, S. Lee, T.-Y. Kim, and J.-P. Heo, "Mitigating Background Shift in Class-Incremental Semantic Segmentation," in European Conference on Computer Vision, 2024: Springer, pp. 71-88.
[20] G. M. Van de Ven and A. S. Tolias, "Three scenarios for continual learning," arXiv preprint arXiv:1904.07734, 2019.
[21] M. De Lange et al., "A continual learning survey: Defying forgetting in classification tasks," IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 7, pp. 3366-3385, 2021.
[22] C. Fernando et al., "Pathnet: Evolution channels gradient descent in super neural networks," arXiv preprint arXiv:1701.08734, 2017.
[23] A. A. Rusu et al., "Progressive neural networks," arXiv preprint arXiv:1606.04671, 2016.
[24] J. Xu and Z. Zhu, "Reinforced continual learning," Advances in neural information processing systems, vol. 31, 2018.
[25] J. Gou, B. Yu, S. J. Maybank, and D. Tao, "Knowledge distillation: A survey," International Journal of Computer Vision, vol. 129, no. 6, pp. 1789-1819, 2021.
[26] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, "Indoor segmentation and support inference from rgbd images," in Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V 12, 2012: Springer, pp. 746-760.
[27] A. Douillard, Y. Chen, A. Dapogny, and M. Cord, "Plop: Learning without forgetting for continual semantic segmentation," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 4040-4050.
[28] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, "SegFormer: Simple and efficient design for semantic segmentation with transformers," Advances in neural information processing systems, vol. 34, pp. 12077-12090, 2021.
[29] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, "Encoder-decoder with atrous separable convolution for semantic image segmentation," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 801-818.
[30] F. Chollet, "Xception: Deep learning with depthwise separable convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1251-1258.
[31] O. Russakovsky et al., "Imagenet large scale visual recognition challenge," International journal of computer vision, vol. 115, pp. 211-252, 2015.
[32] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes (voc) challenge," International journal of computer vision, vol. 88, pp. 303-338, 2010.
[33] B. Hariharan, P. Arbeláez, L. Bourdev, S. Maji, and J. Malik, "Semantic contours from inverse detectors," in 2011 international conference on computer vision, 2011: IEEE, pp. 991-998.
[34] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. "Visual Object Classes Challenge 2012 (VOC2012)." http://host.robots.ox.ac.uk/pascal/VOC/voc2012/ (accessed.
[35] T.-Y. Lin et al., "Microsoft coco: Common objects in context," in Computer vision–ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, proceedings, part v 13, 2014: Springer, pp. 740-755.
[36] U. Michieli and P. Zanuttigh, "Incremental learning techniques for semantic segmentation," in Proceedings of the IEEE/CVF international conference on computer vision workshops, 2019, pp. 0-0.
[37] M. H. Phan, S. L. Phung, L. Tran-Thanh, and A. Bouzerdoum, "Class similarity weighted knowledge distillation for continual semantic segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16866-16875.
[38] Z. Lin, Z. Wang, and Y. Zhang, "Continual semantic segmentation via structure preserving and projected feature alignment," in European Conference on Computer Vision, 2022: Springer, pp. 345-361.
[39] H. Zhao, F. Yang, X. Fu, and X. Li, "Rbc: Rectifying the biased context in continual semantic segmentation," in European Conference on Computer Vision, 2022: Springer, pp. 55-72.