| 研究生: |
許位祥 Wei-Hsiang Hsu |
|---|---|
| 論文名稱: |
基於尺度遞迴網路的生成對抗網路之 影像去模糊 Scale-recurrent Network Based Generative Adversarial Network for Image Deblurring |
| 指導教授: |
唐之瑋
Chih-Wei Tang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 通訊工程學系 Department of Communication Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 中文 |
| 論文頁數: | 68 |
| 中文關鍵詞: | 單影像去模糊 、生成對抗網路 、尺度遞迴網路 、虛擬標籤 |
| 外文關鍵詞: | single image deblurring, generative adversarial network, scale-recurrent network, pseudo label |
| 相關次數: | 點閱:14 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
拍攝時不論是相機或被拍攝物的晃動,都易使拍攝的影像有著運動模糊(motion blur),造成觀賞體驗受嚴重的影響,或是視覺追蹤(visual tracking)和物件偵測(object detection)等效能下降。而現有基於深度學習的方案往往得耗費高網路參數量或記憶體,以換取網路生成高品質的去模糊影像。SRN^+為現有文獻中,網路參數量較低且效果甚佳的基於深度學習之影像去模糊網路方案,因此本論文提出以SRN^+的網路架構作為生成器(generator),並於訓練階段加入以虛擬標籤(pseudo label)輔助之鑑別器(discriminator),提升生成器去模糊影像的品質。和 standard GAN(generative adversarial network)不同,虛擬標籤輔助之生成對抗網路會同時被提供去模糊影像和對應的清晰影像,使鑑別器(discriminator)能給予生成器的優化更準確的損失,提升去模糊影像細節的回復。以漏斗式柔和標籤(funnel soft labelling)代替二元(binary)標籤,降低鑑別器的學習能力,使生成器較不會面臨梯度消失,穩定生成對抗網路的訓練。除此之外,本論文提出對於不同尺度(scale)的損失函數給予不一樣的權重,使網路能對於大尺度階段的去模糊影像之損失,給予更大的權重,並且最大尺度的損失函數以均方誤差(mean squared error, MSE)取代平均絕對誤差(mean absolute error, MAE),使去模糊影像更加的清晰。在測試階段只需使用生成器輸出去模糊影像,因此本論文所提方案的網路參數和計算複雜度皆和SRN^+相同,對於GoPro資料集,峰值訊噪比(peak signal-to-noise ratio, PSNR)比SRN^+高0.51dB,結構相似性(structural similarity index measure, SSIM)高0.005,和現今頂尖方案MPRNet最輕量化的版本1-stage相比,峰值訊噪比高於1dB,網路參數量為MPRNet(1-stage)的7/10。
Camera shake or moving objects causes blurred images. It would lead to the awful visual experience or decrease accuracy of visual tracking and object detection. Existing deep learning based approaches usually requires more network parameters or memory usage to generate high-quality deblurred images. SRN^+ is an existing deep learning based single image deblurring network which has a low amount of network parameters and good performance. Therefore, this paper proposes to adopt SRN^+ as the generator, and input training samples with pseudo labels to the discriminator to improve the quality of deblurred images from the generator at the training stage. Different form standard GAN (generative adversarial network), the proposed generative adversarial network with pseudo labels is provided with the deblurred image and the corresponding sharp image at the same time. Accordingly, the discriminator gives the more accurate loss to guide optimization of the generator to restore details of deblurred images. Use funnel soft labelling instead of binary label to reduce the learning ability of the discriminator, so that the generator will avoid gradient vanishing, and stabilize the training of the generative adversarial network. In addition, this paper proposes to assign different weights to loss functions of different scales, where a larger weight is assigned to the loss of the deblurred image at the large-scale stage. The loss function of the largest scale adopts mean squared error (MSE) instead of mean absolute error (MAE) to make the deblurred image more sharp. At the test stage, the generator generates the deblurred image where the amount of network parameters and computational complexity of the proposed scheme are the same as SRN^+. For the GoPro dataset, the proposed scheme is 0.51dB higher than SRN^+ on the peak signal-to-noise ratio (PSNR), and it is 0.005 higher than SRN^+ on the structural similarity index measure (SSIM). Compared with the lightest version (i.e., 1-stage) of the state-of-the-art deblurring net MPRNet, the proposed scheme is 1dB higher on PSNR.
[1] P. Hsu and B. Y. Chen, “Blurred image detection and classification,” in Proc. International Conference on Multimedia Modeling, pp. 277-286, Jan. 2008.
[2] S. Lee and S. Cho, “Recent advances in image deblurring,” in Proc. SIGGRAPH Asia 2013 Courses, pp.1-108, Nov. 2013.
[3] E. O. Brigham and R. E. Morrow, “The fast Fourier transform,” IEEE spectrum, Vol. 4, No. 12, pp. 63-70, Dec. 1967.
[4] L. Lucy, “An iterative technique for the rectification of observed distributions,” Astronomical Journal, Vol. 79, pp. 745-754, 1974.
[5] J. Sun, W. Cao, Z. Xu, and J. Ponce, “Learning a convolutional neural network for non-uniform motion blur removal,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 769-777, June 2015.
[6] A. Chakrabarti, “A neural approach to blind motion deblurring,” in European Conference on Computer Vision, pp. 221–235, Oct. 2016.
[7] X. Tao, H. Gao, X. Shen, J. Wang, and J. Jia, “Scale-recurrent network for deep image deblurring,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 8174-8182, Dec. 2018.
[8] X. Shi, Z. Chen, H. Wang, D. Y. Yeung, W. K. Wong, and W. C. Woo, “Convolutional LSTM network: a machine learning approach for precipitation nowcasting,” in Proc. 28th International Conference on Neural Information Processing System, Vol. 1, pp.802-810, Dec. 2015.
[9] H. Gao, X. Tao, X. Shen, and J. Jia, "Dynamic scene deblurring with parameter selective sharing and nested skip connections," in Proc. IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 3848-3856, June 2019.
[10] O. Kupyn, T. Martyniuk, J. Wu, and Z. Wang, “Deblurgan-v2: deblurring (orders-of-magnitude) faster and better,” in Pro. IEEE International Conference on Computer Vision, pp. 8878-8887, Aug. 2019.
[11] T. Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proc. IEEE conference on Computer Vision and Pattern Recognition, pp. 2117-2125, July 2017.
[12] J. U. Yun, B. Jo, and I. K. Park, “Joint face super-resolution and deblurring using generative adversarial network,” IEEE Access, Vol. 8, pp.159661-159671, Aug. 2020.
[13] S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, M. H. Yang, and L. Shao, “Multi-stage progressive image restoration,” in Proc. IEEE conference on Computer Vision and Pattern Recognition, June 2021.
[14] S. Nah, T. H. Kim, and K. M. Lee, “Deep multi-scale convolutional neural network for dynamic scene deblurring,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3883-3891, July 2017.
[15] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Communications of the ACM, Vol. 60, pp.84-90, June 2017.
[16] M. Suin, K. Purohit, and A. N. Rajagopalan, “Spatially-attentive patch-hierarchical network for adaptive motion deblurring,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3606-3615, Aug. 2020.
[17] H. Zhang, Y. Dai, H. Li, and P. Koniusz, “Deep stacked hierarchical multi-patch network for image deblurring,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5978-5986, June 2019.
[18] K. Purohit and A. N. Rajagopalan, “Region-adaptive dense network for efficient motion deblurring,” in Proc. AAAI Conference on Artificial Intelligence, Vol. 34, No. 07, pp. 11882-11889, Apr. 2020.
[19] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proc. IEEE conference on Computer Vision and Pattern Recognition, pp. 4700-4708, July 2017.
[20] Y. Yuan, W. Su, and D. Ma, “Efficient dynamic scene deblurring using spatially variant deconvolution network with optical flow guided training,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3555-3564, June 2020.
[21] F. J. Tsai, Y. T. Peng, Y. Y. Lin, C. C. Tsai, and C. W. Lin, “BANet: Blur-aware attention networks for dynamic scene deblurring,” arXiv preprint arXiv:2101.07518, Jan. 2021.
[22] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio, “Generative adversarial nets,” in Proc. Neural Information Processing Systems, pp. 2672-2680, Dec. 2014.
[23] H. Thanh-Tung and T. Tran, “Catastrophic forgetting and mode collapse in GANs,” in Proc. International Joint Conference on Neural Networks (IJCNN), pp. 1-10, July 2020.
[24] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proc. International Conference on Machine Learning(CML), pp.214-223, July 2017.
[25] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein gans,” in Proc. International Conference on Neural Information Processing Systems(NIPS), pp. 5769-5779, Dec. 2017.
[26] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generative adversarial networks,” in Proc. International Conference on Learning Representations, Feb. 2018.
[27] J. Heinonen, “Lectures on Lipschitz analysis,” in University of Jyväskylä, 2005.
[28] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley, “Least squares generative adversarial networks,” in Proc. IEEE International Conference on Computer Vision(ICCV), pp. 2794-2802, Oct. 2017.
[29] S. Ramakrishnan, S. Pachori, A. Gangopadhyay, and S. Raman, “Deep generative filter for motion deblurring,” in Proc. IEEE International Conference on Computer Vision Workshops, pp. 2993-3000, Sep. 2017.
[30] M. Mirza, and S. Osindero, “Conditional generative adversarial nets,” in arXiv preprint arXiv:1411.1784, Nov. 2014.
[31] S. Zheng, Z. Zhu, J. Cheng, Y. Guo, and Y. Zhao, “Edge heuristic GAN for non-uniform blind deblurring,” IEEE Signal Processing Letters, pp. 1546-1550, July 2019.
[32] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in Proc. European Conference on Computer Vision, pp. 694-711, March 2016.
[33] J. Pan, D. Sun, H. Pfister, and M.-H. Yang, “Blind image deblurring using dark channel prior,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 1628-1636, June 2016.
[34] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. International Conference on Learning Representations(ICLR), pp. 1-14, May 2015.
[35] H. Tomosada, T. Kudo, T. Fujisawa, and M. Ikehara, “GAN-Based Image Deblurring Using DCT Discriminator,” in Proc. 25th IEEE International Conference on Pattern Recognition, pp. 3675-3681, Jan. 2021.
[36] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, inception-ResNet and the impact of residual connections on learning,” in 35th AAAI Conference on Artificial Intelligence, pp. 4278-4284, Feb. 2017.
[37] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. C. Chen, “MobileNetV2: inverted residuals and linear bottlenecks,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp.4510-4520, June 2018.
[38] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251-1258, July 2017.
[39] J. Rim, H. Lee, J. Won, and S. Cho, “Real-world blur dataset for learning and benchmarking deblurring algorithms,” in Proc. European Conference on Computer Vision, pp. 184-201, Aug. 2020.
[40] A. Hore, amd D. Ziou, “Image quality metrics: PSNR vs. SSIM,” in Proc. International Conference on Pattern Recognition, pp. 2366-2369, Aug. 2010.
[41] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, Vol. 13, No. 4, pp. 600-612, April 2004.