| 研究生: |
陳映慈 YING-TZU CHEN |
|---|---|
| 論文名稱: |
高光譜影像分類新興深度學習變換器架構之比較:以CTMixer、MAEST與SSTN為例 Comparison of Novel Transformer-based Deep Learning Architecture for Hyperspectral Image Classification: A Case Study of CTMixer, MAEST and SSTN |
| 指導教授: |
任玄
HSUAN REN |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
太空及遙測研究中心 - 遙測科技碩士學位學程 Master of Science Program in Remote Sensing Science and Technology |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 英文 |
| 論文頁數: | 76 |
| 中文關鍵詞: | 深度學習 、高光譜影像 、變換器架構 、影像分類 |
| 外文關鍵詞: | Deep Learning, Hyperspectral Image, Transformer Architecture, Image Classification |
| 相關次數: | 點閱:27 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著遙測(Remote Sensing, RS)與深度學習(Deep Learning, DL)技術的迅速發展,加速了地表物質分類與識別方法的演進與實務應用。高光譜影像(Hyper-Spectral Image, HSI)因具備豐富的光譜與空間資訊,廣泛應用於農業監測、地質探勘及地物分類等領域。然而,其高維度特性與地物類別間的光譜相似性,對於分類任務構成極大挑戰。近年來,深度學習技術迅速發展,特別是結合變換器 (Transformer) 架構的模型在 HSI 分類上展現高度潛力。鑑於這些方法不僅應用於不同數據集,所採用的訓練參數(如訓練週期與學習率)亦有所差異,因此有必要針對其分類表現與運算效率進行更全面的分析與比較。
本研究選擇三種近年提出的 Transformer-based 架構進行比較:融合卷積神經網路(CNN) 與 Transformer 結構的 CTMixer、採用遮罩自編碼器設計的 MAEST、以及採用改良式 Swin Transformer 的 SSTN。透過統一的實驗設計與兩組訓練參數配置(CFG1 與 CFG2),分別於 Indian Pines、Pavia University 與 Houston 2013 三組公開數據集上進行分類實驗。研究評估指標涵蓋整體準確率、平均準確率、Kappa 係數、類別分類表現、模型推論時間,並輔以分類地圖進行視覺化分析,以檢視模型在空間邊界識別與分布一致性方面的表現。
研究結果顯示,SSTN 在分類準確性與穩定性方面表現最佳,CTMixer 對參數設定較為敏感,適合應用於結構清晰的場景;而MAEST 則具備良好的推論效率,但在光譜相似與樣本不均的情境時,分類表現相對較弱。分類地圖的視覺化進一步揭示各模型在邊界清晰度與區域連續性上的差異:SSTN 可有效維持空間一致性,而 MAEST 在大面積地物的分類上則較易產生破碎現象。綜合比較指出,各架構各具優勢與適用場景,研究結果可作為未來高光譜影像分類任務中,模型選擇與架構設計之有力參考。
In recent years, deep learning techniques have advanced rapidly, with Transformer-based models in particular demonstrating high potential in hyperspectral image classification. Given that these methods are not only applied to different datasets but also adopt varying training parameters (such as training epochs and learning rates), a more comprehensive analysis and comparison of their classification performance and computational efficiency is necessary.
This study compares three recently proposed Transformer-based architectures: CTMixer, which integrates Convolutional Neural Network (CNN) and Transformer structures; MAEST, which employs a masked autoencoder design; and SSTN, which adopts a modified Swin Transformer. Through a unified experimental setup and two training parameter configurations (CFG1 and CFG2), classification experiments are conducted on three publicly available datasets: Indian Pines, Pavia University, and Houston 2013. Evaluation metrics include overall accuracy, average accuracy, kappa coefficient, class-wise classification performance, and model inference time. In addition, classification maps are visualized to examine the models' ability in boundary recognition and spatial distribution consistency.
The experimental results show that SSTN achieves the best performance in terms of classification accuracy and stability. CTMixer is more sensitive to parameter settings and is better suited for scenes with clear spatial structures. MAEST demonstrates good inference efficiency but exhibits weaker classification performance in scenarios with high spectral similarity and sample imbalance. The visualization of classification maps further reveals differences in boundary clarity and regional continuity among the models: SSTN effectively maintains spatial consistency, whereas MAEST tends to produce fragmented regions when classifying large-area objects. In summary, each architecture has its strengths and suitable application scenarios. The results of this study can serve as a valuable reference for model selection and architectural design in future HSI classification tasks.
[1] Alexander F. H. Goetz, Gregg Vane, Jerry E. Solomon and Barrett N. Rock, “Imaging spectrometry for earth remote sensing”, SCIENCE, Vol 228(4704), pp. 1147-1153, June 1985.
[2] Luca Giannoni, Frédéric Lange and Ilias Tachtsidis, “Hyperspectral imaging solutions for brain tissue metabolic and hemodynamic monitoring: past, current and future developments”, Journal of Optics, Vol 20(4), 2018
[3] Chanseok Ryu, Masahiko Suguri and Mikio Umeda, “Multivariate analysis of nitrogen content for rice at the heading stage using reflectance of airborne hyperspectral remote sensing”, Field Crops Research, Vol 122(3), pp.214-224, 2011.
[4] Naoto Yokoya, Jonathan Cheung-Wai Chan and Karl Segl, “Potential of Resolution-Enhanced Hyperspectral Data for Mineral Mapping Using Simulated EnMAP and Sentinel-2 Images”, Remote Sensing, Vol 8(3), 2016.
[5] Shutao Li, Weiwei Song, Leyuan Fang, Yushi Chen, Pedram Ghamisi and Jón Atli Benediktsson, “Deep learning for hyperspectral image classification: An overview”, IEEE transactions on geoscience and remote sensing, Vol 57(9), pp. 6690-6709, 2019.
[6] Vincent V. Salomonson, “Remote sensing, historical perspective”, Encyclopedia of Remote Sensing, pp. 684-691. Springer, New York, 2014.
[7] Alexander F. H. Goetz, “Three decades of hyperspectral remote sensing of the Earth: A personal view”, Remote sensing of environment, Vol 113, pp. S5-S16, 2009.
[8] Liangpei Zhang and Bo Du, “Recent advances in hyperspectral image processing”, Geo-Spatial Information Science, Vol 15(3), pp. 143-156, September 2012.
[9] Linmi Tao and Atif Mughees, Deep Learning for Hyperspectral Image Analysis and Classification, Springer, Singapore, 2021.
[10] Maider Vidal and José Manuel Amigo, “Pre-processing of hyperspectral images. Essential steps before image analysis”, Chemometrics and Intelligent Laboratory Systems, Vol 117, pp. 138-148, 2012.
[11] Craig Rodarmel and Jie Shan, “Principal Component Analysis for Hyperspectral Image Classification”, Surveying and Land Information Science, Vol 62(2), pp.115-122, 2002.
[12] Hongyan Zhang, Wei He, Liangpei Zhang, Huanfeng Shen and Qiangqiang Yuan, “Hyperspectral Image Restoration Using Low-Rank Matrix Recovery”, IEEE Transactions on Geoscience and Remote Sensing, Vol 52(8), pp. 4729-4743, 2013.
[13] Edwin Raczko and Bogdan Zagajewski, “Comparison of support vector machine, random forest and neural network classifiers for tree species classification on airborne hyperspectral APEX images”, European Journal of Remote Sensing, Vol 50(1), pp. 144-154, 2017.
[14] Rajee George, Hitendra Padalia and S.P.S. Kushwaha, “Forest tree species discrimination in western Himalaya using EO-1 Hyperion”, International Journal of Applied Earth Observation and Geoinformation, Vol 28, pp. 140-149, 2014.
[15] Leo Breiman, “Random Forests”, Machine learning, Vol 45, pp. 5-32, 2001.
[16] Daniel Doktor, Angela Lausch, Daniel Spengler and Martin Thurner, “Extraction of plant physiological status from hyperspectral signatures using machine learning methods”, Remote Sensing, Vol 6(12), pp. 12247-12274, 2014.
[17] Elhadi Adam, Onisimo Mutanga, Elfatih M. Abdel-Rahman and Riyad Ismail, “Estimating standing biomass in papyrus (Cyperus papyrus L.) swamp: Exploratory of in situ hyperspectral indices and random forest regression”, International Journal of Remote Sensing, Vol 35(2), pp. 693-714, 2014.
[18] Péter Burai, Balázs Deák, Orsolya Valkó and Tamás Tomor, “Classification of herbaceous vegetation using airborne hyperspectral imagery”, Remote Sensing, Vol 7(2), pp. 2046-2066, 2015.
[19] Lin He, Jun Li, Chenying Liu and Shutao Li, “Recent Advances on Spectral-Spatial Hyperspectral Image Classification: An Overview and New Guidelines”, IEEE Transactions on Geoscience and Remote Sensing, Vol 56(3), pp. 1579-1597, 2017.
[20] Muhammad Ahmad, Sidrah Shabbir, Swalpa Kumar Roy, Danfeng Hong, Xin Wu and Jing Yao, “Hyperspectral Image Classification—Traditional to Deep Models: A Survey for Future Prospects”, IEEE journal of selected topics in applied earth observations and remote sensing, Vol 15, pp. 968-999, 2021.
[21] Liangpei Zhang, Lefei Zhang and Bo Du, “Deep Learning for Remote Sensing Data: A Technical Tutorial on the State of the Art”, IEEE Geoscience and remote sensing magazine, Vol 4(2), pp. 22-40, 2016.
[22] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-Based Learning Applied to Document Recognition”, Proceedings of the IEEE, Vol 86 (11), pp.2278-2324, 1998.
[23] Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks”, Advances in neural information processing systems, Vol 25, 2012.
[24] Yanfei Zhong, Xin Hu, Chang Luo, Xinyu Wang, Ji Zhao and Liangpei Zhang, “WHU-Hi: UAV-borne hyperspectral with high spatial resolution (H2) benchmark datasets and classifier for precise crop identification based on deep convolutional neural network with CRF”, Remote Sensing of Environment, Vol 250, 2020.
[25] David E. Rumelhart, Geoffrey E. Hinton and Ronald J. Williams, “Learning representations by back-propagating errors”, Nature, Vol 323, pp. 533-536, 1986.
[26] Jeffrey L. Elman, “Finding Structure in Time”, Cognitive science, Vol 14(2), pp. 179-211, 1990.
[27] Sepp Hochreiter and Jürgen Schmidhuber, “Long Short-Term Memory”, Neural computation, Vol 9(8), pp. 1735-1780, 1997.
[28] Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk and Yoshua Bengio, “Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation”, arXiv preprint arXiv:1406.1078, 2014. Available: https://arxiv.org/abs/1406.1078
[29] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho and Yoshua Bengio, “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling”, arXiv preprint arXiv:1412.3555, 2014. Available: https://arxiv.org/abs/1412.3555
[30] Shaohui Mei, Xingang Li, Xiao Liu, Huimin Cai and Qian Du, “Hyperspectral Image Classification Using Attention-Based Bidirectional Long Short-Term Memory Network”, IEEE Transactions on Geoscience and Remote Sensing, Vol 60, pp. 1-12, 2021.
[31] Weilian Zhou, Sei-Ichiro Kamata, Haipeng Wang and Xi Xue, “Multiscanning-Based RNN-Transformer for Hyperspectral Image Classification”, IEEE Transactions on Geoscience and Remote Sensing, Vol 61, pp. 1-19, 2023.
[32] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville and Yoshua Bengio, “Generative Adversarial Nets”, Advances in neural information processing systems, Vol 27, 2014.
[33] Martin Arjovsky, Soumith Chintala and Léon Bottou, “Wasserstein Generative Adversarial Networks”, 34th International Conference on Machine Learning, pp. 214-223, Sydney, Australia, August 2017.
[34] He Zhang, Vishwanath Sindagi and Vishal M. Patel, “Image De-Raining Using a Conditional Generative Adversarial Network”, IEEE Transactions on Circuits and Systems for Video Technology, Vol 30(11), pp. 3943-3956, 2020.
[35] Jie Feng, Haipeng Yu, Lin Wang, Xianghai Cao, Xiangrong Zhang and Licheng Jiao, “Classification of Hyperspectral Images Based on Multiclass Spatial-Spectral Generative Adversarial Networks”, IEEE Transactions on Geoscience and Remote Sensing, Vol 57(8), pp. 5329-5343, 2019.
[36] Vincent Dumoulin and Francesco Visin, “A guide to convolution arithmetic for deep learning” arXiv preprint arXiv:1603.07285, 2016. Available: https://arxiv.org/abs/1603.07285
[37] Wenjie Luo, Yujia Li, Raquel Urtasun, Richard Zemel, “Understanding the effective receptive field in deep convolutional neural networks”, Advances in neural information processing systems, Vol 29, 2016.
[38] Rikiya Yamashita, Mizuho Nishio, Richard Kinh Gian Do and Kaori Togashi, “Convolutional neural networks: an overview and application in radiology”, Insights into imaging, Vol 9, pp. 611-629, 2018.
[39] Junru Yin, Changsheng Qi, Qiqiang Chen and Jiantao Qu, “Spatial-Spectral Network for Hyperspectral Image Classification: A 3-D CNN and Bi-LSTM Framework”, Remote Sensing, Vol 13(12), 2021.
[40] Xu Kang, Bin Song and Fengyao Sun, “A deep similarity metric method based on incomplete data for traffic anomaly detection in IoT.” Applied Sciences, Vol 9(1), 135, 2019.
[41] Yushi Chen, Hanlu Jiang, Chunyang Li, Xiuping Jia and Pedram Ghamisi, “Deep Feature Extraction and Classification of Hyperspectral Images Based on Convolutional Neural Networks”, IEEE Transactions on Geoscience and Remote Sensing, Vol 54(10), pp. 6232-6251, 2016.
[42] Zilong Zhong, Jonathan Li, Zhiming Luo and Michael Chapman, “Spectral-Spatial Residual Network for Hyperspectral Image Classification: A 3-D Deep Learning Framework”, IEEE Transactions on Geoscience and Remote Sensing, Vol 56(2), pp.847-858, 2018.
[43] Mingyi He, Bo Li and Huahui Chen, “Multi-scale 3D deep convolutional neural network for hyperspectral image classification”, 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, September 2018.
[44] Hao Sun, Xiangtao Zheng, Xiaoqiang Lu and Siyuan Wu, “Spectral-Spatial Attention Network for Hyperspectral Image Classification”, IEEE Transactions on Geoscience and Remote Sensing, Vol 58(5), pp. 3232-3245, 2020.
[45] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser and Illia Polosukhin, “Attention is all you need”, Advances in neural information processing systems (NIPS 2017), Long Beach, USA, December 2017.
[46] Dzmitry Bahdanau, Kyunghyun Cho and Yoshua Bengio, “Neural Machine Translation by Jointly Learning to Align and Translate”, arXiv preprint arXiv:1409.0473, 2014. Available: https://arxiv.org/abs/1409.0473
[47] Minh-Thang Luong, Hieu Pham and Christopher D. Manning, “Effective Approaches to Attention-based Neural Machine Translation”, arXiv preprint arXiv:1508.04025, 2015. Available: https://arxiv.org/abs/1508.04025
[48] Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan and Mubarak Shah, “Transformers in Vision: A Survey”, ACM computing surveys (CSUR), Vol 54(10s), pp. 1-41, 2022.
[49] Tianyang Lin, Yuxin Wang, Xiangyang Liu and Xipeng Qiu, “A survey of transformers”, AI open, Vol 3, pp. 111-132, 2022.
[50] Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang and Hao Ma, “Linformer: Self-Attention with Linear Complexity,” arXiv preprint arXiv:2006.04768, 2020. Available: https://arxiv.org/abs/2006.04768
[51] Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell and Adrian Weller, “Rethinking Attention with Performers”, arXiv preprint arXiv:2009.14794, 2020. Available: https://arxiv.org/abs/2009.14794
[52] Peter Shaw, Jakob Uszkoreit and Ashish Vaswani, “Self-Attention with Relative Position Representations”, arXiv preprint arXiv:1803.02155, 2018. Available: https://arxiv.org/abs/1803.02155
[53] Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le and Ruslan Salakhutdinov, “Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context”, arXiv preprint arXiv:1901.02860, 2019. Available: https://arxiv.org/abs/1901.02860
[54] Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Vol 1, pp. 4171-4186, June 2019.
[55] Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever, “Improving Language Understanding by Generative Pre-Training”, OpenAI Technical Report, 2018, Available: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
[56] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit and Neil Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale”, arXiv preprint arXiv:2010.11929, 2020. Available: https://arxiv.org/abs/2010.11929
[57] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin and Baining Guo, “Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows”, Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012-10022, 2021.
[58] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov and Sergey Zagoruyko, “End-to-End Object Detection with Transformers”, European conference on computer vision, pp. 213-229, Glasgow, United Kingdom, August 2020.
[59] Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo and Ling Shao, “Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction Without Convolutions”, Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 568-578, 2021.
[60] Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles and Herve Jegou, “Training data-efficient image transformers & distillation through attention”, Proceedings of the 38th International Conference on Machine Learning, pp. 10347-10357, 2021.
[61] Danfeng Hong, Zhu Han, Jing Yao, Lianru Gao, Bing Zhang and Antonio Plaza, “SpectralFormer: Rethinking Hyperspectral Image Classification With Transformers”, IEEE Transactions on Geoscience and Remote Sensing, Vol 60, pp. 1-15, 2021.
[62] Ji He, Lina Zhao, Hongwei Yang, Mengmeng Zhang and Wei Li, “HSI-BERT: Hyperspectral Image Classification Using the Bidirectional Encoder Representation From Transformers”, IEEE Transactions on Geoscience and Remote Sensing, Vol 58(1), 165-178, 2019.
[63] Ping Zhang, Haiyang Yu, Pengao Li and Ruili Wang, “TransHSI: A Hybrid CNN-Transformer Method for Disjoint Sample-Based Hyperspectral Image Classification”, Remote Sensing, Vol 15(22), 2023.
[64] Xihong Guo, Quan Feng and Faxu Guo, “CMTNet: a hybrid CNN-transformer network for UAV-based hyperspectral crop classification in precision agriculture”, Scientific Reports, Vol 15(1), 2025.
[65] Chongxuan Tian, Yuzhuo Chen, Yelin Liu, Xin Wang, Qize Lv, Yunze Li, Jinlin Deng, Yifei Liu and Wei Li, “Accurate classification of glomerular diseases by hyperspectral imaging and transformer”, Computer Methods and Programs in Biomedicine, Vol 254, 2024.
[66] Junjie Zhang, Zhe Meng, Feng Zhao, Hanqiang Liu and Zhenhui Chang, “Convolution Transformer Mixer for Hyperspectral Image Classification”, IEEE Geoscience and Remote Sensing Letters, Vol 19, 2022.
[67] Damian Ibañez, Ruben Fernandez-Beltran, Filiberto Pla and Naoto Yokoya, “Masked Auto-Encoding Spectral-Spatial Transformer for Hyperspectral Image Classification”, IEEE Transactions on Geoscience and Remote Sensing, Vol 60, 2022.
[68] Baisen Liu, Yuanjia Liu, Wulin Zhang, Yiran Tian and Weili Kong, “Spectral swin transformer network for hyperspectral image classification”, Remote Sensing, Vol 15(15), 2023.
[69] Kaichao You, Mingsheng Long, Jianmin Wang and Michael I. Jordan, “How does learning rate decay help modern neural networks?”, arXiv preprint arXiv:1908.01878, 2019.
[70] Ian Goodfellow, Yoshua Bengio and Aaron Courville, Deep learning, Vol 1, MIT Press., Cambridge, 2016.
[71] Marion F. Baumgardner, Larry L. Biehl and David A. Landgrebe, “220 Band AVIRIS Hyperspectral Image Data Set: June 12, 1992 Indian Pine Test Site 3”, Purdue University Research Repository, 2015. Available: https://purr.purdue.edu/publications/1947/1
[72] Grupo de Inteligencia Computacional (GIC), “Hyperspectral Remote Sensing Scenes”, University of the Basque Country. Available:
https://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes
[73] Christian Debes, Andreas Merentitis, Roel Heremans, Jürgen Hahn, Nikolaos Frangiadakis and Tim van Kasteren, “Hyperspectral and LiDAR Data Fusion: Outcome of the 2013 GRSS Data Fusion Contest”, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, Vol 7, pp. 2405-2418, March 2014.