監督性變換器模型對變遷偵測應用的預訓練與微調策略

簡易檢索 / 詳目顯示

回結果列表

研究生：	柯浩飛 Juan Felipe Giraldo Cardenas
論文名稱：	監督性變換器模型對變遷偵測應用的預訓練與微調策略 Supervised transformer-based models pre-training and fine-tunning strategies for change detection
指導教授：	任玄 Hsuan Ren
口試委員:
學位類別：	碩士 Master
系所名稱：	太空及遙測研究中心 - 遙測科技碩士學位學程 Master of Science Program in Remote Sensing Science and Technology
論文出版年：	2024
畢業學年度：	112
語文別：	英文
論文頁數：	80
中文關鍵詞：	變換器
外文關鍵詞：	Transformer
相關次數：	點閱：17 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

影像變遷偵測是遙測重要應用之一，其目的在自動偵測同一場景在不同時間拍攝的兩張或多個影像之間的變化。然而，對機器學習演算法，大多數資料庫的樣本數量都很少，導致模型產生過度擬合的問題。為了應對這項挑戰，我們使用一些遷移學習策略，將一個資料集中獲得的知識傳輸到新的訓練並進行微調，以便新模型能從兩個資料集中學習，並且可以成為能夠跨不同資料庫的模型。目前最先進的方法依賴深度學習和變換器(Transformer)架構。本研究基於變換器模型（特別是 BIT 和 ChangeFormer）在使用遷移學習檢測對不同資料庫的效能。研究目的在利用對變換器全面環境進行建模的能力來提高變遷偵測的準確性。透過在 LEVIR-CD、WHU-CD 和 DSIFN-CD 等三個資料庫上評估這些模型，包括它們在各種場景下的適應性和穩健性。評估指標包括整體準確度 (Overall Accuracy)、交並比 (Intersection-over-Union)、F1 分數、精確度和召回率。透過將知識從一個資料庫轉移到另一個
資料庫的微調模型，利用指標顯示變遷偵測的改進，展示轉換器和遷移學習管道可以幫助處理變遷偵測任務的策略。

Image change detection is an important task in remote sensing, aiming to automatically detect changes between two or more images of the same scene taken at different times. However, most of the available datasets are small, leading the models to overfitting. To deal with this challenge, we used some transfer learning strategies to leverage the knowledge obtained in one dataset to be transmitted to a new training (fine-tuning), so that the new model learns from both datasets and can be generalized being able to model global context across datasets. State-of-the-art approaches rely their methods on deep learning and transformer architectures. This research investigates the effectiveness of transformer-based models, specifically the Bitemporal Image Transformer (BIT) and ChangeFormer, in detecting changes across different datasets using transfer learning. The study aims to leverage the ability to model global context of transformers to enhance change detection accuracy. By evaluating these models on datasets such as LEVIR-CD, WHU-CD, and DSIFN-CD, we assess their adaptability and robustness in various scenarios. The metrics to evaluate our pipelines are the Overall Accuracy (OA), Intersection-over-Union (IoU), F1-score, Precision and Recall. By transferring knowledge from a dataset to a fine-tuned model on another dataset, the metrics show an improvement detecting changes demonstrating that transformers and transfer learning pipelines can help to deal with
change detection tasks.

摘要 ................................................................................................................. I 
Abstract ........................................................................................................... II 
Contents ........................................................................................................ IV 
List of Figures ............................................................................................... VI 
List of Tables .............................................................................................. VIII 
Explanation of symbols ................................................................................ IX 
Chapter 1 Introduction .................................................................................. 1 
1.1 Motivation ...................................................................................... 1 
1.2 Objectives ....................................................................................... 2 
1.3 Overview ........................................................................................ 3 
1.4 Thesis organization ......................................................................... 6 
Chapter 2 Literature review .......................................................................... 7 
2.1 Transformer-based architecture with the Bitemporal Image 
Transformer (BIT).............................................................................................. 8 
CNN Backbone ................................................................................ 9 
Bitemporal Image Transformer (BIT) ...........................................10 
Prediction head ..............................................................................15 
2.2 ChangeFormer ..............................................................................16
 Hierarchical Transformer Encoder ................................................17 
Difference module .........................................................................18 
MLP decoder .................................................................................19 
Chapter 3 Methodology ...............................................................................21 
3.1 Datasets .........................................................................................21 
3.2 Training and testing strategies ......................................................26 
3.3 Metrics ..........................................................................................30 
3.4 Implementation details .................................................................33 
Chapter 4 Results and discussion ................................................................34 
4.1 Testing BIT ...................................................................................34 
Without fine-tuning .......................................................................34 
With fine-tuning ............................................................................35 
4.2 Testing ChangeFormer .................................................................43 
Without fine-tuning .......................................................................43 
With fine-tuning ............................................................................45 
Chapter 5 Conclusions and future work ......................................................55 
5.1 Conclusions ..................................................................................55 
5.2 Future work ..................................................................................56 
References ......................................................................................................58
                                

[1] A. Singh, “Review article digital change detection techniques using remotely sensed data,” Int. J. Remote Sens., vol. 10, no. 6, pp. 989–1003, Jun. 1989.
[2] Y. Lin, L. Zhang and N. Wang, "A New Time Series Change Detection Method for Landsat Land use and Land Cover Change," 2019 10th International Workshop on the Analysis of Multitemporal Remote Sensing Images (MultiTemp), Shanghai, China, 2019, pp. 1-4
[3] J. Z. Xu, W. Lu, Z. Li, P. Khaitan, and V. Zaytseva, “Building damage detection in satellite imagery using convolutional neural networks,” 2019, arXiv:1910.06444.
[4] X. Huang, L. Zhang and T. Zhu, "Building Change Detection From Multitemporal High-Resolution Remotely Sensed Images Based on a
Morphological Building Index," in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 7, no. 1, pp. 105-115, Jan. 2014.
[5] H. Chen and Z. Shi, “A spatial-temporal attention-based method and a new dataset for remote sensing image change detection,” Remote. Sens., vol. 12, no. 10, p. 1662, May 2020.
[6] Z. Li, W. Shi, P. Lu, L. Yan, Q. Wang, and Z. Miao, “Landslide mapping from aerial photographs using change detection-based Markov random field,” Remote Sens. Environ., vol. 187, pp. 76–90, 2016.
[7] S. Liu, M. Chi, Y. Zou, A. Samat, J. A. Benediktsson, and A. Plaza, “Oil spill detection via multitemporal optical remote sensing images: A change detection perspective,” IEEE Geosci. Remote Sens. Lett., vol. 14, no. 3, pp. 324–328, Mar. 2017.
[8] Z. Zheng, Y. Wan, Y. Zhang, S. Xiang, D. Peng, and B. Zhang, “CLNet: Cross-layer convolutional neural network for change detection in optical remote sensing imagery,” ISPRS J. Photogramm. Remote Sens., vol. 175, pp. 247-267, May 2021.
[9] R. Liu, D. Jiang, L. Zhang, and Z. Zhang, “Deep depthwise sep arable convolutional network for change detection in optical aerial images,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 13, pp. 1109–1118, 2020.
[10] Y. Zhan, K. Fu, M. Yan, X. Sun, H. Wang, and X. Qiu, “Change detection based on deep Siamese convolutional network for optical aerial images,” IEEE Geosci. Remote Sens. Lett., vol. 14, no. 10, pp. 1845–1849, Oct. 2017.
[11] M. Yang, L. Jiao, F. Liu, B. Hou, and S. Yang, “Transferred deep learning-based change detection in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 9, pp. 6960–6973, Sep. 2019.
[12] R. Caye Daudt, B. Le Saux, and A. Boulch, “Fully convolutional Siamese networks for change detection,” in Proc. 25th IEEE Int. Conf. Image Process. (ICIP), Oct. 2018, pp. 4063–4067.
[13] M. Zhang, G. Xu, K. Chen, M. Yan, and X. Sun, “Triplet-based semantic relation learning for aerial remote sensing image change detection,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 2, pp. 266–270, Feb. 2019.
[14] J.-M. Park, U.-H. Kim, S.-H. Lee, and J.-H. Kim, “Dual task learn ing by leveraging both dense correspondence and Mis-correspondence for robust change detection with imperfect matches,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 13739–13749.
[15] H. Chen and Z. Shi, “A spatial-temporal attention-based method and a new dataset for remote sensing image change detection,” Remote. Sens., vol. 12, no. 10, p. 1662, 2020.
[16] X. Peng, R. Zhong, Z. Li, and Q. Li, “Optical remote sensing image change detection based on attention mechanism and im age difference,” IEEE Transactions on Geoscience and Remote Sensing, pp. 1–12, 2020.
[17] H. Jiang, X. Hu, K. Li, J. Zhang, J. Gong, and M. Zhang, “Pga-siamnet: Pyramid feature-based attention-guided siamese network for remote sensing ortho imagery building change detection,” Remote Sensing, vol. 12, no. 3, p. 484, 2020.
[18] Hao Chen, Zipeng Qi, and Zhenwei Shi, “Remote sensing image change detection with transformers,” IEEE Transactions on Geoscience and Remote Sensing, 2021.
[19] W. G. C. Bandara and V. M. Patel, "A Transformer-Based Siamese Network for Change Detection," IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 2022, pp. 207-210.
[20] C. Zhang, L. Wang, S. Cheng, and Y. Li, “SwinSUNet: Pure transformer network for remote sensing image change detection,” IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art. no. 5224713.
[21] J. Lin, L. Zhao, S. Li, R. Ward, and Z. J. Wang, “Active-learning incorporated deep transfer learning for hyperspectral image classification,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 11, no. 11, pp. 4048–4062, Nov.
2018.
[22] J. Lin, C. He, Z. J. Wang, and S. Li, “Structure preserving transfer learning for unsupervised hyperspectral image classification,” IEEE
[23] Y. Yuan and L. Lin, "Self-Supervised Pretraining of Transformers for Satellite Image Time Series Classification," in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 474-487, 2021
[24] H. Chen and Z. Shi, “A spatial-temporal attention-based method and a new dataset for remote sensing image change detection,” Remote. Sens., vol. 12, no. 10, p. 1662, 2020.
[25] S. Ji, S. Wei, and M. Lu, “Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set,” IEEE Trans. Geoscience and Remote Sensing, vol. 57, no. 1, pp. 574–586, 2019.
[26] C. Zhang, P. Yue, D. Tapete, L. Jiang, B. Shangguan, L. Huang, and G. Liu, “A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images,” ISPRS, vol. 166, pp. 183–200, 2020.
[27] A. Vaswani et al., “Attention is all you need,” in Proc. Adv. Neural Inf. Process. Syst., vol. 30, 2017, pp. 1–11.
[28] A. Dosovitskiy et al., “An image is worth 16×16 words: Transformers for image recognition at scale,” 2020, arXiv:2010.11929.
[29] C. R. Chen, Q. Fan, and R. Panda, “CrossViT: Cross-attention multi scale vision transformer for image classification,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 347–356.
[30] B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar, “Masked-attention mask transformer for universal image segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 1280–1289.
[31] S. Zheng et al., “Rethinking semantic segmentation from a sequence to sequence perspective with transformers,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 6881–6890.
[32] D. Zhang, H. Zhang, J. Tang, M. Wang, X. Hua, and Q. Sun, “Feature pyramid transformer,” in Computer Vision—ECCV 2020, vol. 12373, A. Vedaldi, H. Bischof, T. Brox, and J.-M. Frahm, Eds. Glasgow, U.K.: Springer, 2020, pp. 323–339.
[33] X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable DETR: Deformable transformers for end-to-end object detection,” 2020,
arXiv:2010.04159.
[34] F. Yang, H. Yang, J. Fu, H. Lu, and B. Guo, “Learning texture transformer network for image super-resolution,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 5791–5800.
[35] H. Chen et al., “Pre-trained image processing transformer,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp.
12299–12310.
[36] Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao, “Pyramid vision transformer: A versatile back bone for dense prediction without convolutions,” arXiv preprint
arXiv:2102.12122, 2021.

簡易檢索 / 詳目顯示

相關論文