| 研究生: |
葉凌瑋 Ling-Wei Yeh |
|---|---|
| 論文名稱: |
運用正負影像進行監督式訓練以實現紅外光與可見光之畫面亮度自適應融合 Using Positive and Negative Images for Supervised Training to Achieve Luminance-Adaptive Fusion of Infrared and Visible Light Images |
| 指導教授: |
蘇柏齊
Po-Chyi Su |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 中文 |
| 論文頁數: | 60 |
| 中文關鍵詞: | 影像融合 、深度學習 、卷積神經網路 、監督式學習 、亮度自適應 |
| 外文關鍵詞: | Image Fusion, Deep Learning, Convolution Neural Network, Supervised Learning, Luminance Self-Adaptive |
| 相關次數: | 點閱:15 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
紅外光與可見光影像融合的目的是將同一場景之不同光譜影像資訊保留於單一畫面中。然而,兩輸入圖的較大亮度差異可能導致彼此內容干擾,影響融合影像的互補資訊呈現。現存的影像融合方法通常對於較低亮度影像有較佳的效果,但是當其中一張影像具有較高亮度內容時,我們發現融合影像的紋理對比度經常出現降低的情況。為了避免極高亮度和極低亮度影像所引起的融合效果不理想,我們提出新的深度學習模型訓練方法,運用正負影像進行自監督式訓練。考量畫面內容的邊緣是融合影像的呈現重點,我們計算影像梯度提取紋理以保留原圖細節供監督式學習參考。畫面中不同區域的紋理協助產生訓練融合影像的引導圖,我們另也運用邊緣增強做為融合影像梯度的參考以降低極端亮度對於畫面細節的影響。我們引入通道注意力模塊來針對特徵圖中不同通道進行強化或弱化,並加速模型訓練。監督式訓練計算正負影像與引導圖間的相似度、負融合影像反轉後與正融合影像間的相似度,以及融合影像梯度的相似度,盡可能保留畫面細節,並實現亮度自適應的影像融合目標。實驗結果證明我們所提出的方法能夠取得良好的成效。
The main task of fusing infrared and visible light images is to preserve the spectral information of the same scene in a single frame. However, the extreme brightness differences between the two input images can lead to content interference and affect the presentation of complementary information in the fused image. Existing image fusion methods often perform well for lower-brightness images, but when one of the images contains high-brightness content, we observed a decrease in texture contrast in the fused image. To overcome the issue of poor fusion results caused by extremely high and low brightness images, we propose a new training method that utilizes self-supervised learning with positive and negative images in a deep learning neural network. We extract image gradients to generate ground truth that preserves the details from the original images for reference in supervised learning. Additionally, we use edge enhancement as the ground truth for the gradients of the fused image to mitigate the adverse effects of brightness on preserving fused details. We also introduce a channel attention module to enhance or weaken different channels in the feature maps. The training process measures the similarity between the positive and negative images and the designed ground truth, as well as the similarity between the inverted negative fused image and the positive fused image. This encourages the deep learning network to preserve detailed features and achieve Luminance-adaptive image fusion. Experimental results demonstrate the effectiveness of our proposed method, confirming that the generation of ground truth can guide the preservation of information in the fusion of infrared and visible light images.
[1] L. Tang, J. Yuan, and J. Ma, "Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network," Information Fusion, vol. 82, pp. 28-42, 2022/06/01/ 2022, doi: https://doi.org/10.1016/j.inffus.2021.12.004.
[2] H. Li and X. J. Wu, "DenseFuse: A Fusion Approach to Infrared and Visible Images," IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2614-2623, 2019, doi: 10.1109/TIP.2018.2887342.
[3] J. Hu, L. Shen, and G. Sun, "Squeeze-and-Excitation Networks," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18-23 June 2018 2018, pp. 7132-7141, doi: 10.1109/CVPR.2018.00745.
[4] Y. Zhang, Y. Liu, P. Sun, H. Yan, X. Zhao, and L. Zhang, "IFCNN: A general image fusion framework based on convolutional neural network," Information Fusion, vol. 54, pp. 99-118, 2020/02/01/ 2020, doi: https://doi.org/10.1016/j.inffus.2019.07.011.
[5] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[6] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely connected convolutional networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700-4708.
[7] R. Hou et al., "VIF-Net: An Unsupervised Framework for Infrared and Visible Image Fusion," IEEE Transactions on Computational Imaging, vol. 6, pp. 640-651, 2020, doi: 10.1109/TCI.2020.2965304.
[8] H. Li, X.-J. Wu, and J. Kittler, "Infrared and visible image fusion using a deep learning framework," in 2018 24th international conference on pattern recognition (ICPR), 2018: IEEE, pp. 2705-2710.
[9] J. Ma, W. Yu, P. Liang, C. Li, and J. Jiang, "FusionGAN: A generative adversarial network for infrared and visible image fusion," Information Fusion, vol. 48, pp. 11-26, 2019/08/01/ 2019, doi: https://doi.org/10.1016/j.inffus.2018.09.004.
[10] H. Xu, P. Liang, W. Yu, J. Jiang, and J. Ma, "Learning a Generative Model for Fusing Infrared and Visible Images via Conditional Generative Adversarial Network with Dual Discriminators," presented at the Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, 2019, 2019. [Online]. Available: https://doi.org/10.24963/ijcai.2019/549.
[11] D. Han, L. Li, X. Guo, and J. Ma, "Multi-exposure image fusion via deep perceptual enhancement," Information Fusion, vol. 79, pp. 248-262, 2022/03/01/ 2022, doi: https://doi.org/10.1016/j.inffus.2021.10.006.
[12] H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, "U2Fusion: A Unified Unsupervised Image Fusion Network," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 502-518, 2022, doi: 10.1109/TPAMI.2020.3012548.
[13] X. Jia, C. Zhu, M. Li, W. Tang, and W. Zhou, "LLVIP: A visible-infrared paired dataset for low-light vision," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3496-3504.
[14] A. Toet, "TNO Image Fusion Dataset," ed: figshare, 2022.
[15] Q. Ha, K. Watanabe, T. Karasawa, Y. Ushiku, and T. Harada, "MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes," in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 24-28 Sept. 2017 2017, pp. 5108-5115, doi: 10.1109/IROS.2017.8206396.
[16] J. Liu, X. Fan, J. Jiang, R. Liu, and Z. Luo, "Learning a Deep Multi-Scale Feature Ensemble and an Edge-Attention Guidance for Image Fusion," IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 1, pp. 105-119, 2022, doi: 10.1109/TCSVT.2021.3056725.
[17] J. Liu et al., "Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5802-5811.
[18] J. Ma, L. Tang, F. Fan, J. Huang, X. Mei, and Y. Ma, "SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer," IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 7, pp. 1200-1217, 2022, doi: 10.1109/JAS.2022.105686.