基於注意力之用於物件定位的語義分割方法｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	席朗斯 Phanuvich Hirunsirisombut
論文名稱：	基於注意力之用於物件定位的語義分割方法 Attention Based Semantic Segmentation for Object Localization
指導教授：	施國琛教授 Prof. Timothy K. Shih
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering
論文出版年：	2020
畢業學年度：	108
語文別：	英文
論文頁數：	53
中文關鍵詞：	語意分割、深度學習、擴張捲積、注意力網路
外文關鍵詞：	Semantic Segmentation, Deep learning, Dilated Convolutional, Attention network
相關次數：	點閱：12 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

現今已有多數的研究者使用深度學習來解決電腦視覺領域的問題，語意分割是其中一項最熱門的問題，其目的在於以像素為單位進行各類別的標註。U-Net是其中一項著名的方法，該方法於2015年提出並用於生物醫學上的圖像語意分割。然而U-Net用於小物件的分割不甚理想，此外先前的研究引出了一個問題，該問題與透過Self-attention來強化語意分割時使用ReLU有關，因為該激勵函數會將負數轉為零。
為了解決這些問題，有人提出用於物件定位的語意分割基於Dilated Attention的方法。首先這個研究使用典型的U-Net來提取特徵，為了預防相關資訊在較深層的網路時會遺失，注意力模組被用於skipping connected時。此外每個注意力模組皆使用擴張捲積取代典型的卷積來增加感受野，並將淺層特徵傳至深層。在現實環境中車子等物件可能會因太靠近而有重疊的現象，對這個現象進行語意分割的問題稱之為「merging regions」。我們在分割兩個物件時使用Watershed transform後處理的方法來解決該問題。實驗結果顯示在語意分割任務中相較於原先的方法並使用數種不同的損失函數，這個方法在Dice score coefficient評分較原先的方法來得優秀。

Nowadays, many researches were built up to solve problems in computer vision field by using deep learning algorithms. Semantic segmentation is one of most popular problem that related to label every single of pixels in an image which category that they belong to. Then, famous approach call “U-Net” was invented in 2015 for medical purpose in case of biomedical segmentation. Unfortunately, U-net is facing with small reception fields that affected to outcoming result. Moreover, one problem of previous work of usage self-attention for enhance semantic segmentation came from using Rectified Linear Unit (ReLU), because of degree of negative part of this activation function will judge every value that spreading around negative number into zero. To address these problems, Dilated Attention based semantic segmentation for object localization was proposed. Firstly, this work using standard U-net as a main network to extract features from input. Then, each edge of skipping strategy inside U-net network, attention modules are placed to prevent missing relevant information while going deeper. Moreover, each attention module is using atrous convolutional instead of ordinary convolutional to enlarge reception fields of attention module to collect and pass feature from coarse layer to fine layer. In the real scenarios, object like cars may stick too close to another cars. Unfortunately, this problem called “merging regions” that appear then we try to segment two or more object that are overlaying. To solve the problem, Watershed transform is using as post-processing strategy to separate two objects apart. For experimental result shows that under Dice score coefficient or DSC measurement this proposed method outperform baseline model with combination of models with different well-known loss functions in semantic segmentation task.

Table of Content

Abstract    V
摘要    VIII
Acknowledgement    IX
List Of Figures    XI
List Of Tables    XI
Chapter 1    Introduction    1
1.1 Background    1
1.2 Dissertation Organization    4
Chapter 2    Related Works    5
2.1 Segmentation Task    5
2.2 Attention Networks    14
2.3 Differential Of Activation Functions    17
2.4 Watershed Transform For Image Segmentation    20
Chapter 3   Methodology    24
3.1 Proposed Model    24
3.2 Attention Module For Semantic Segmentation    26
3.3 Watershed Transform For Post-Processing    28
Chapter 4   Experimental Result    29
4.1 Dataset And Annotation    29
4.2 Experimental Setup    30
4.3 Comparison To Baseline Model    30
Chapter 5 Discussion    36
Chapter 6   Conclusion And Future Works    37
6.1 Conclusion    37
6.2 Future Works    38
Reference    39


                                

[1] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2015, vol. 07-12-June, pp. 3431–3440, doi: 10.1109/CVPR.2015.7298965.
[2] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in Medical Image Computing and Computer-Assisted Intervention -- MICCAI 2015, 2015, pp. 234–241.
[3] O. Oktay et al., “Attention U-Net: Learning Where to Look for the Pancreas,” 2018, [Online]. Available: http://arxiv.org/abs/1804.03999.
[4] M. Cordts et al., “The Cityscapes Dataset for Semantic Urban Scene Understanding,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016, vol. 2016-Decem, pp. 3213–3223, doi: 10.1109/CVPR.2016.350.
[5] Y. Li, K. He, J. Sun, and others, “R-fcn: Object detection via region-based fully convolutional networks,” Adv. Neural Inf. Process. Syst., no. Nips, pp. 379–387, 2016, [Online]. Available: http://papers.nips.cc/paper/6465-r-fcn-object-detection-via-region-based-fully-convolutional-networks.pdf.
[6] L. Wang, W. Ouyang, X. Wang, and H. Lu, “Visual tracking with fully convolutional networks,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, vol. 2015 Inter, pp. 3119–3127, doi: 10.1109/ICCV.2015.357.
[7] C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang, “BiSeNet: Bilateral segmentation network for real-time semantic segmentation,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, vol. 11217 LNCS, pp. 334–349, doi: 10.1007/978-3-030-01261-8_20.
[8] Y. Hu et al., “Fully Automatic Pediatric Echocardiography Segmentation Using Deep Convolutional Networks Based on BiSeNet,” in Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, 2019, pp. 6561–6564, doi: 10.1109/EMBC.2019.8856457.
[9] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 386–397, 2020, doi: 10.1109/TPAMI.2018.2844175.
[10] L. Chen, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834--848, 2015.
[11] M. T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-based neural machine translation,” in Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing, 2015, pp. 1412–1421, doi: 10.18653/v1/d15-1166.
[12] W. Wang and J. Shen, “Deep Visual Attention Prediction,” IEEE Trans. Image Process., vol. 27, no. 5, pp. 2368–2378, 2018, doi: 10.1109/TIP.2017.2787612.
[13] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2015.
[14] F. Wang et al., “Residual attention network for image classification,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017, vol. 2017-Janua, pp. 6450–6458, doi: 10.1109/CVPR.2017.683.
[15] Z. Shi, C. Chen, Z. Xiong, D. Liu, Z. J. Zha, and F. Wu, “Deep residual attention network for spectral image super-resolution,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2019, vol. 11133 LNCS, pp. 214–229, doi: 10.1007/978-3-030-11021-5_14.
[16] J.-H. Kim, J.-H. Choi, M. Cheon, and J.-S. Lee, “RAM: Residual Attention Module for Single Image Super-Resolution,” arXiv Prepr., 2018, doi: arXiv:1811.12043v1.
[17] T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal Loss for Dense Object Detection,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, vol. 2017-Octob, pp. 2999–3007, doi: 10.1109/ICCV.2017.324.
[18] N. Abraham and N. M. Khan, “A novel focal tversky loss function with improved attention u-net for lesion segmentation,” in Proceedings - International Symposium on Biomedical Imaging, 2019, vol. 2019-April, pp. 683–687, doi: 10.1109/ISBI.2019.8759329.
[19] H. P. Ng, S. Huang, S. H. Ong, K. W. C. Foong, P. S. Goh, and W. L. Nowinski, “Medical image segmentation using watershed segmentation with texture-based region merging,” in Proceedings of the 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS’08 - “Personalized Healthcare through Technology,” 2008, pp. 4039–4042, doi: 10.1109/iembs.2008.4650096.
[20] Y. Zhao, J. Liu, H. Li, and G. Li, “Improved watershed algorithm for dowels image segmentation,” in Proceedings of the World Congress on Intelligent Control and Automation (WCICA), 2008, pp. 7640–7643, doi: 10.1109/WCICA.2008.4594115.
[21] M. Sikander Hayat Khiyal, A. Khan, and A. Bibi, “Modified Watershed Algorithm for Segmentation of 2D Images,” Issues Informing Sci. Inf. Technol., vol. 6, pp. 877–886, 2009, doi: 10.28945/1077.
[22] M. Bai and R. Urtasun, “Deep watershed transform for instance segmentation,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017, vol. 2017-Janua, pp. 2858–2866, doi: 10.1109/CVPR.2017.305.
[23] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” 2016.
[24] Y. Wei, H. Xiao, H. Shi, Z. Jie, J. Feng, and T. S. Huang, “Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi-Supervised Semantic Segmentation,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 1, pp. 7268–7277, 2018, doi: 10.1109/CVPR.2018.00759.
[25] “UdaCity dataset.” https://github.com/udacity/self-driving-car/.

簡易檢索 / 詳目顯示

相關論文