| 研究生: |
常興唯 Hsing-Wei Chang |
|---|---|
| 論文名稱: |
紅外線與可見光影像融合之語意分割資料集建立 及其對融合效果的影響評估 Establishment and Evaluation of a Semantic Segmentation Dataset for Infrared and Visible Image Fusion |
| 指導教授: |
蘇柏齊
Po-Chyi Su |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 中文 |
| 論文頁數: | 56 |
| 中文關鍵詞: | 影像融合 、影像對齊 、語意分割 、深度學習 、風格轉換 |
| 外文關鍵詞: | Image fusion, Image alignment, Semantic segmentation, Deep learning, Style transfer |
| 相關次數: | 點閱:18 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
影像融合的目的是整合不同類型的輸入影像,透過影像間的互補資訊生成具更完整場景顯示和視覺感知的影像,以支援後續的進階視覺任務,例如物件偵測與語意分割等。紅外線與可見光影像融合是受到廣泛關注的研究領域,但使用深度學習方法進行模型訓練時通常需要大量的標記資料,現有的紅外線與可見光影像融合資料集卻只提供影像,缺乏精確的物件標記以及語意分割等,從而影響影像融合結果的呈現,也限制了相關領域的進一步發展。本研究提出創建具語意分割資訊的紅外線與可見光影像融合資料集方法,利用現有的語意分割資料集的一般影像,以風格轉換方式生成相對應的紅外線影像,依此建立具標記的融合影像資料集,即每組紅外線與可見光影像皆包含對應的語意分割標記。這樣的資料集建立方式能夠提升影像融合效果,也能針對實際拍攝的紅外線與可見光影像可能出現畫面解析度不同及內容錯位的問題,提供基於語意分割遮罩的對齊方法,將紅外線及可見光影像進行重新採樣對齊,對於此類研究中常見的對齊前處理能節省不少的時間與人力。
The purpose of image fusion is to integrate different types of input images and generate a more complete image with improved scene representation and visual perception, supporting advanced vision tasks such as object detection and semantic segmentation. Infrared and visible image fusion is a widely studied research area, but training fusion models using deep learning methods often requires a large amount of annotated data. Existing infrared and visible image fusion datasets only provide images without precise object annotations or semantic segmentation, which affects the presentation of fusion results and limits the further development of related fields. In this study, we propose a method to create a dataset for infrared and visible image fusion with semantic segmentation information. We utilize general images from existing semantic segmentation datasets and generate corresponding infrared images using style transfer techniques. This allows us to establish a labeled fusion image dataset, where each pair of infrared and visible images is accompanied by their respective semantic segmentation labels. This dataset creation method improves image fusion performance and can also provide an alignment method based on semantic segmentation masks for disparate resolution and misalignment in real-world infrared and visible images. which saves significant time and resources in the common alignment preprocessing step.
[1] D. Wang, J. Liu, X. Fan, and R. Liu, "Unsupervised misaligned infrared and visible image fusion via cross-modality image generation and registration," arXiv preprint arXiv:2205.11876, 2022.
[2] L. Tang, Y. Deng, Y. Ma, J. Huang, and J. Ma, "SuperFusion: A Versatile Image Registration and Fusion Network with Semantic Awareness," IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 12, pp. 2121-2137, 2022, doi: 10.1109/JAS.2022.106082.
[3] J. Liu et al., "Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection," in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 18-24 June 2022 2022, pp. 5792-5801, doi: 10.1109/CVPR52688.2022.00571.
[4] L. Tang, J. Yuan, and J. Ma, "Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network," Information Fusion, vol. 82, pp. 28-42, 2022/06/01/ 2022, doi: https://doi.org/10.1016/j.inffus.2021.12.004.
[5] H. Zhang, H. Xu, X. Tian, J. Jiang, and J. Ma, "Image fusion meets deep learning: A survey and perspective," Information Fusion, vol. 76, pp. 323-336, 2021/12/01/ 2021, doi: https://doi.org/10.1016/j.inffus.2021.06.008.
[6] H. Li and X. J. Wu, "DenseFuse: A Fusion Approach to Infrared and Visible Images," IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2614-2623, 2019, doi: 10.1109/TIP.2018.2887342.
[7] H. Li, X. J. Wu, and T. Durrani, "NestFuse: An Infrared and Visible Image Fusion Architecture Based on Nest Connection and Spatial/Channel Attention Models," IEEE Transactions on Instrumentation and Measurement, vol. 69, no. 12, pp. 9645-9656, 2020, doi: 10.1109/TIM.2020.3005230.
[8] L. Tang, J. Yuan, H. Zhang, X. Jiang, and J. Ma, "PIAFusion: A progressive infrared and visible image fusion network based on illumination aware," Information Fusion, vol. 83-84, pp. 79-92, 2022/07/01/ 2022, doi: https://doi.org/10.1016/j.inffus.2022.03.007.
[9] J. Ma, W. Yu, P. Liang, C. Li, and J. Jiang, "FusionGAN: A generative adversarial network for infrared and visible image fusion," Information Fusion, vol. 48, pp. 11-26, 2019/08/01/ 2019, doi: https://doi.org/10.1016/j.inffus.2018.09.004.
[10] J. Ma, H. Xu, J. Jiang, X. Mei, and X. P. Zhang, "DDcGAN: A Dual-Discriminator Conditional Generative Adversarial Network for Multi-Resolution Image Fusion," IEEE Transactions on Image Processing, vol. 29, pp. 4980-4995, 2020, doi: 10.1109/TIP.2020.2977573.
[11] A. Toet, "The TNO Multiband Image Data Collection," Data in Brief, vol. 15, pp. 249-251, 2017/12/01/ 2017, doi: https://doi.org/10.1016/j.dib.2017.09.038.
[12] H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, "U2Fusion: A Unified Unsupervised Image Fusion Network," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 502-518, 2022, doi: 10.1109/TPAMI.2020.3012548.
[13] X. Jia, C. Zhu, M. Li, W. Tang, S. Liu, and W. Zhou, "LLVIP: A Visible-infrared Paired Dataset for Low-light Vision," arXiv e-prints, p. arXiv:2108.10831, 2021. [Online]. Available: https://ui.adsabs.harvard.edu/abs/2021arXiv210810831J.
[14] M. Cordts et al., "The Cityscapes Dataset for Semantic Urban Scene Understanding," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27-30 June 2016 2016, pp. 3213-3223, doi: 10.1109/CVPR.2016.350.
[15] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, "SegFormer: Simple and efficient design for semantic segmentation with transformers," Advances in Neural Information Processing Systems, vol. 34, pp. 12077-12090, 2021.
[16] C. Peng, T. Tian, C. Chen, X. Guo, and J. Ma, "Bilateral attention decoder: A lightweight decoder for real-time semantic segmentation," Neural Networks, vol. 137, pp. 188-199, 2021/05/01/ 2021, doi: https://doi.org/10.1016/j.neunet.2021.01.021.
[17] J. Liu, X. Fan, J. Jiang, R. Liu, and Z. Luo, "Learning a Deep Multi-Scale Feature Ensemble and an Edge-Attention Guidance for Image Fusion," IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 1, pp. 105-119, 2022, doi: 10.1109/TCSVT.2021.3056725.
[18] G. Qu, D. Zhang, and P. Yan, "Information measure for performance of image fusion," Electronics Letters, vol. 38, pp. 313-315, 04/28 2002, doi: 10.1049/el:20020212.
[19] H. R. Sheikh and A. C. Bovik, "Image information and visual quality," IEEE Transactions on Image Processing, vol. 15, no. 2, pp. 430-444, 2006, doi: 10.1109/TIP.2005.859378.
[20] V. Aslantas and E. Bendes, "A new image quality metric for image fusion: The sum of the correlations of differences," AEU - International Journal of Electronics and Communications, vol. 69, no. 12, pp. 1890-1896, 2015/12/01/ 2015, doi: https://doi.org/10.1016/j.aeue.2015.09.004.
[21] C. Xydeas and V. Petrovic, "Objective image fusion performance measure," Electronics Letters, vol. 36, pp. 308-309, 03/17 2000, doi: 10.1049/el:20000267.