結合語義分割特徵與注意力模型之室內場景分類系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃健銘 Jian-Ming Huang
論文名稱：	結合語義分割特徵與注意力模型之室內場景分類系統 Indoor Scene Image Classification System combining Semantic Segmentation Features and Attention Module
指導教授：	鄭旭詠
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering
論文出版年：	2020
畢業學年度：	108
語文別：	英文
論文頁數：	70
中文關鍵詞：	場景辨識、語義分割、注意力模型、特徵融合
相關次數：	點閱：11 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

場景辨識是電腦視覺中重要的一個環節，現今機器學習的方法效能遠遠高於傳統處理的方式，然而，直接使用神經網路進行分類往往會遺失物體、空間佈局、和背景之間關聯的資訊，導致分類效果不佳。因此抓取出物體、空間佈局、和背景之間關聯的資訊，並使用有效的方式將這些資訊、特徵與原圖結合進行分類，是目前場景分類中重要的挑戰。
本論文提出的方法，對影像做語義分割，並將語義分割影像與原圖影像分別使用神經網路模型提取特徵，將語義分割特徵使用注意力模型與原圖特徵進行特徵融合，最後進行分類、辨識。
實驗結果證明，在我們收集的旅館室內場景資料集中，準確率能達到最好的效果。在公開15-Scene資料集中，比較其他論文方法，我們方法的效果可以取得更好的分類準確度。因此，透過使用語義分割的方式，能夠抓取到物體、空間佈局和背景之間關聯的資訊，並使用注意力模型進行特徵融合，能在場景辨識中取得更好的辨識效果。

Scene recognition is an important part of computer vision. The efficiency of current machine learning methods is much better than traditional processing methods. However, using neural networks directly for classification often loses more information of objects, spatial layout, and background. Resulting in poor classification. Therefore, it is an important challenge in scene classification to capture the information of objects, spatial layout, and background, and use an effective method to merge these features to classify scene.
The method proposed in this paper performs semantic segmentation on the image. Use Neural network model to extract the features of the semantic segmentation image and original image respectively. And then, use the attention module to fuse the semantic segmentation features with original image features. Finally, according to these fused features to classify images.
The experiment results show that our method can achieve the best result on the Hotel Indoor Scene dataset. Furthermore, in the public 15-Scene dataset, our method can outperform existing methods. Therefore, by using semantic segmentation, the information of objects, spatial layout and background can be captured. Using the attention module to do feature fusion can achieve better accuracy in scene recognition.

摘要    I
Abstract    II
目錄    III
圖目錄    V
表目錄    VI
第一章 緒論    1
1 研究背景與動機    1
2 論文架構    3
第二章 相關研究方法    4
1 圖像語義分割    4
1.1 UPerNet    4
1.2 Mask R-CNN    6
2 場景物件提取    8
3 特徵提取神經網路架構    10
第三章 研究方法    12
1旅館室內場景資料庫蒐集    12
2系統架構流程    14
3原圖特徵分支    14
4分割特徵分支    15
4.1語義分割前處理    15
4.1.1使用Mask R-CNN進行語義分割    15
4.1.2使用UPerNet進行語義分割    20
4.2物件分割前處理    22
4.2.1使用Mask R-CNN進行物件分割    22
4.2.2使用UPerNet進行物件分割    25
5 特徵融合    27
5.1原圖特徵與語義分割特徵融合    27
5.2原圖特徵與物件分割特徵融合    28
6 系統介面與功能    30
第四章 實驗結果    32
1資料庫    32
2 實驗環境與參數設置    33
3 實驗數據與分析    33
3.1 不使用特徵融合    33
3.2 旅館室內場景資料集使用特徵融合    35
3.2.1 使用語義分割特徵融合    36
3.2.2 使用物件分割特徵融合    39
3.2.3 同時使用語義分割特徵與物件分割特徵融合    41
3.3 15-Scene資料集使用特徵融合    42
3.3.1使用語義分割特徵融合    42
3.3.2使用物件分割特徵融合    43
3.3.3同時使用語義分割特徵與物件分割特徵融合    44
3.4 實驗結果    45
3.4.1 旅館室內場景資料集實驗結果    45
3.4.2 15-Scene資料集實驗結果    49
3.4.3 程式執行時間    54
第五章 結論與未來研究方向    55
參考文獻    56


                                

[1] Krizhevsky A, Sutskever I, Hinton G, "Imagenet classification with deep convolutional neural networks", 2012.
[2] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., anhoucke, V., and Rabinovich, A. "Going deeper with convolutions", 2014.
[3] Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. "Places: A 10 million Image Database for Scene Recognition", 2017
[4] Shuang Bai 1 ꞏZhaohong Li 1 ꞏJianjun Hou. "Learning two-pathway convolutional neural networks for categorizing scene images", 2016
[5] Luis Herranz, Shuqiang Jiang, Xiangyang Li. "Scene recognition with CNNs: objects, scales and dataset bias", 2018
[6] Xiaojuan Cheng, Jiwen Lu, Jianjiang Feng, Bo Yuan, Jie Zhou. "Scene recognition with objectness", 2018
[7] Alejandro Lopez-Cifuentes, Marcos Escudero-Vinolo, Jesus Bescos, Alvaro Garcia-Martin. "Semantic-Aware Scene Recognition", 2019
[8] Long, J., Shelhamer, E., and Darrell, T. "Fully convolutional networks for semantic segmentation.", 2014.
[9] Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla. "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation", 2015.
[10] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille. "Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs", 2014

[11] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille. "DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs", 2016
[12] Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam. "Rethinking Atrous Convolution for Semantic Image Segmentation", 2017
[13] Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia. " Pyramid Scene Parsing Network", 2016
[14] T. Xiao, Y. Liu, B. Zhou, Y. Jiang, and J. Sun. "Unified Perceptual Parsing for Scene Understanding.", 2018.
[15] Kaiming He Georgia Gkioxari Piotr Doll ́ar Ross Girshick, "Mask R-CNN", 2018.
[16] T.-Y. Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie. "Feature pyramid networks for object detection." In CVPR, 2017.
[17] Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, Antonio Torralba, "Semantic Understanding of Scenes through the ADE20K Dataset", 2018.
[18] Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, Piotr Dollár, "Panoptic Segmentation", 2019.
[19] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation", 2014.
[20] R. B. Girshick, "Fast R-CNN", 2015.
[21] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", 2016.
[22] Lowe DG, "Distinctive image features from scale-invariant keypoints. ", 2004.
[23] Dalal N, Triggs B, "Histograms of oriented gradients for human detection", 2005.
[24] Bay H, Tuytelaars T, Gool LV, "Surf: speeded up robust features", 2006.
[25] Cortes, Corinna, and Vladimir VAPNIK. "Support-vector networks.", 1995.
[26] K. He, X. Zhang, S. Ren, and J. Sun. "Deep residual learning for image recognition. ", 2016.
[27] Simonyan, K. & Zisserman, A. "Very deep convolutional networks for largescale image recognition", 2014.
[28] A. Oliva and A. Torralba, "Modeling the shape of the scene: A holistic representation of the spatial envelope", IJCV, 2001.
[29] L. Fei-Fei and P. Perona, "A bayesian hierarchical model for learning natural scene categories", CVPR, 2005.
[30] S. Lazebnik, C. Schmid, and J. Ponce, "Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories", CVPR, 2006.
[31] N. Rasiwasia , N. Vasconcelos. "Holistic context models for visual recognition", TPAMI 34 (5) (2012) 902–917 .
[32] L.J. Li , H. Su , Y. Lim. "Object bank: an object-level image representation for high-level visual recognition", IJCV 107 (1) (2014) 20–39 .
[33] L. Bo , X. Ren , D. Fox. "Kernel descriptors for visual recognition" NIPS, 2010, pp. 244–252 .
[34] R. Kwitt , N. Vasconcelos , N. Rasiwasia. "Scene recognition on the semantic manifold", ECCV, 2012, pp. 359–372 .
[35] H.O. Song , R. Girshick , S. Zickler. "Generalized sparselet models for real-time multiclass object recognition", TPAMI 37 (5) (2015) 1001–1012 .
[36] L. Zhang , X. Zhen , L. Shao. "Learning object-to-class kernels for scene classification", TIP 23 (8) (2014) 3241–3253 .

簡易檢索 / 詳目顯示

相關論文