| 研究生: |
邱亦成 Yi-Cheng Chiu |
|---|---|
| 論文名稱: |
基於深度學習之結合全局及局部資訊和修復分割細節的語義分割方法 Global and Local context and Coarse to Fine Semantic Segmentation |
| 指導教授: |
施國琛
Timothy K. Shih |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 英文 |
| 論文頁數: | 71 |
| 中文關鍵詞: | 深度學習 、語義分割 、卷積神經網路 |
| 外文關鍵詞: | deep learning, semantic segmentation, convolutional neural network |
| 相關次數: | 點閱:18 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
圖像語義分割的問題在計算機視覺和人工智慧是非常熱門的議題。影像分割的訓練資料集的產生,也非常耗費時間及人力,訓練高精確度的影像分割結果來減輕資料產出的成本,也是本論文的目標。最近對基於深度學習的語義分割研究中,為了能即時在道路上運行和GPU卡的容量限制,通常會採取下採樣的操作,導致場景中的細節丟失。我們在論文中探討各個著名的語義分割架構所提出的方法,從自編碼到專注力模型分析其貢獻及優缺點,此外我們也修改其網路架構,提出由二個模組所組成的JCF架構,其中一個模組從高分辨率圖像中取得細節資訊,透過最後通道權重結合二特徵圖使原來的分割結果更加精細。
而我們最終所提出的網路架構GLNet,結合全域專注力資訊和局部的多尺度上下文資訊,幫助模型理解各種場景之間物體的關係,減少分類的錯誤,並透過通道權重模組,引入卷積神經網路前層的資訊來修補分割物件的邊界和細節部分,而我們提出的架構和目前幾個著名的方法相比得到了改進。
The issue of image semantic segmentation is renowned within computer vision and artificial intelligence. The ground truth in image segmentation is hard to produce and is time- and resource-intensive. It is also the goal of this paper to produce high-precision image segmentation results to reduce the cost of ground truth data output. Recently, in the research of semantic segmentation based on deep learning, in order to be able to run in real-time and limit the capacity of the GPU card, has reduced image resolution through downsampling operation, resulting in detail loss in the scene. In the paper, we explore the famous semantic segmentation architecture, from autoencoder to attention model to analyze its contribution, advantages and disadvantages. In addition, we also modify its network architecture, and propose a JCF architecture consisting of two modules. One module obtains detailed information from high-resolution images, and combines two feature map with the channel weights to make the segmentation result from coarse to fine.
Our proposed network architecture, combined with global spatial information and local multi-scale context information, helps the model understand the relationship between objects between various scenes, reduces false alarm, and repair the boundaries and details of the segmented object through channel attention modules. The experiments of our proposed architecture is improved compared to state-of-the-art methods.
[1] V. Badrinarayanan, A. Kendall and R. Cipolla, "SegNet: A Deep convolutional encoder-decoder architecture for image segmentation," in IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 39, no. 12, pp. 2481-2495, 2017.
[2] J. Long, E. Shelhamer and T. Darrell, "Fully convolutional networks for semantic segmentation." 2015 The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, U.S.A., 2015, pp. 3431-3440.
[3] O. Ronneberger, P. Fischer and T. Brox, "U-net: Convolutional networks for biomedical image segmentation." Medical Image Computing and Computer-Assisted Intervention(MICCAI), Munich, Germany, 2015, pp. 234-241.
[4] G. Lin, A. Milan, C. Shen and I. Reid, "RefineNet: multi-path refinement networks for high-resolution semantic segmentation," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, U.S.A., 2017, pp. 5168-5177.
[5] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning applied to document recognition," in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1 Nov. 1998.
[6] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. L. Yuille, "Semantic image segmentation with deep convolutional nets and fully connected CRFs," International Conference on Learning Representations (ICLR), San Diego, U.S.A., 2015, pp. 1-14.
[7] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. L. Yuille, "DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834-848, 1 April 2018.
[8] F. Yu and V. Koltun, "Multi-scale context aggregation by dilated convolutions," International Conference on Learning Representations (ILCR), San Juan, U.S.A., 2016, pp. 1-13.
[9] D. Eigen and R. Fergus, "Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture," The IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, pp. 2650-2658.
[10] L. Chen, Y. Yang, J. Wang, W. Xu and A. L. Yuille, "Attention to scale: Scale-aware semantic image segmentation," The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, U.S.A., 2016, pp. 3640-3649.
[11] L. Chen, G. Papandreou, F. Schroff and H. Adam, "Rethinking atrous convolution for semantic image segmentation," arXiv preprint arXiv:1706.05587, 2017.
[12] H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia, "Pyramid scene parsing network," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, U.S.A., 2017, pp. 6230-6239.
[13] H. Noh, S. Hong and B. Han, "Learning deconvolution network for semantic segmentation," The IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, pp. 1520-1528.
[14] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg and F. Li, "ImageNet large scale visual recognition challenge," in International Journal of Computer Vision, vol. 115, no. 3, pp. 211-252, 1 Dec. 2015.
[15] H. Zhao, X. Qi, X. Shen, J. Shi and J. Jia, “ICNet for real-time semantic segmentation on high-resolution images,” arXiv preprint arxiv:1704.08545, 2018.
[16] A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," The 25th International Conference on Neural Information Processing Systems(NIPS'12), Lake Tahoe, U.S.A., 2012, vol.1, pp. 1097-1105.
[17] K. He, X. Zhang, S. Ren and J. Sun, "Deep residual learning for image recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, U.S.A., 2016, pp. 770-778.
[18] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, U.S.A., 2016, pp. 3213-3223.
[19] L. Chen, Y. Zhu, G. Papandreou, F. Schroff and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” European Conference on Computer Vision (ECCV), Munich, Germany, 2018, pp. 833-851.
[20] K. Simonyan, A. Zisserman, “Very deep convolutional networks for large-scale image recognition.” arXiv preprint arXiv:1409.1556.
[21] Mollahosseini, Ali, David Chan, and Mohammad H. Mahoor. "Going deeper in facial expression recognition using deep neural networks." 2016 IEEE Winter conference on applications of computer vision (WACV). IEEE, 2016.
[22] Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
[23] X. Wang, R. Girshick, A. Gupta and K. He, "Non-local neural networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
[24] J. Fu, J. Liu, H. Tian, Z. Fang, H. Lu, “Dual attention network for scene segmentation.” arXiv preprint arXiv:1809.02983, 2018.
[25] F. Chollet. "Xception: Deep learning with depthwise separable convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[26] M. Yang, K. Yu, C. Zhang, Z. Li and K. Yang, "Denseaspp for semantic segmentation in street scenes." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
[27] H. Zhang, K. Dana, J. Shi, Z. Zhang, X. Wang, A. Tyagi and A. Agrawal, “Context encoding for semantic segmentation.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
[28] S. Woo, J. Park, L. Joon-Young and I. So Kweon, "Cbam: Convolutional block attention module." Proceedings of the European Conference on Computer Vision (ECCV). 2018.