| 研究生: |
黃健銘 Jian-Ming Huang |
|---|---|
| 論文名稱: |
結合語義分割特徵與注意力模型之室內場景分類系統 Indoor Scene Image Classification System combining Semantic Segmentation Features and Attention Module |
| 指導教授: | 鄭旭詠 |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 英文 |
| 論文頁數: | 70 |
| 中文關鍵詞: | 場景辨識 、語義分割 、注意力模型 、特徵融合 |
| 相關次數: | 點閱:11 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
場景辨識是電腦視覺中重要的一個環節,現今機器學習的方法效能遠遠高於傳統處理的方式,然而,直接使用神經網路進行分類往往會遺失物體、空間佈局、和背景之間關聯的資訊,導致分類效果不佳。因此抓取出物體、空間佈局、和背景之間關聯的資訊,並使用有效的方式將這些資訊、特徵與原圖結合進行分類,是目前場景分類中重要的挑戰。
本論文提出的方法,對影像做語義分割,並將語義分割影像與原圖影像分別使用神經網路模型提取特徵,將語義分割特徵使用注意力模型與原圖特徵進行特徵融合,最後進行分類、辨識。
實驗結果證明,在我們收集的旅館室內場景資料集中,準確率能達到最好的效果。在公開15-Scene資料集中,比較其他論文方法,我們方法的效果可以取得更好的分類準確度。因此,透過使用語義分割的方式,能夠抓取到物體、空間佈局和背景之間關聯的資訊,並使用注意力模型進行特徵融合,能在場景辨識中取得更好的辨識效果。
Scene recognition is an important part of computer vision. The efficiency of current machine learning methods is much better than traditional processing methods. However, using neural networks directly for classification often loses more information of objects, spatial layout, and background. Resulting in poor classification. Therefore, it is an important challenge in scene classification to capture the information of objects, spatial layout, and background, and use an effective method to merge these features to classify scene.
The method proposed in this paper performs semantic segmentation on the image. Use Neural network model to extract the features of the semantic segmentation image and original image respectively. And then, use the attention module to fuse the semantic segmentation features with original image features. Finally, according to these fused features to classify images.
The experiment results show that our method can achieve the best result on the Hotel Indoor Scene dataset. Furthermore, in the public 15-Scene dataset, our method can outperform existing methods. Therefore, by using semantic segmentation, the information of objects, spatial layout and background can be captured. Using the attention module to do feature fusion can achieve better accuracy in scene recognition.
[1] Krizhevsky A, Sutskever I, Hinton G, "Imagenet classification with deep convolutional neural networks", 2012.
[2] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., anhoucke, V., and Rabinovich, A. "Going deeper with convolutions", 2014.
[3] Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. "Places: A 10 million Image Database for Scene Recognition", 2017
[4] Shuang Bai 1 ꞏZhaohong Li 1 ꞏJianjun Hou. "Learning two-pathway convolutional neural networks for categorizing scene images", 2016
[5] Luis Herranz, Shuqiang Jiang, Xiangyang Li. "Scene recognition with CNNs: objects, scales and dataset bias", 2018
[6] Xiaojuan Cheng, Jiwen Lu, Jianjiang Feng, Bo Yuan, Jie Zhou. "Scene recognition with objectness", 2018
[7] Alejandro Lopez-Cifuentes, Marcos Escudero-Vinolo, Jesus Bescos, Alvaro Garcia-Martin. "Semantic-Aware Scene Recognition", 2019
[8] Long, J., Shelhamer, E., and Darrell, T. "Fully convolutional networks for semantic segmentation.", 2014.
[9] Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla. "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation", 2015.
[10] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille. "Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs", 2014
[11] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille. "DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs", 2016
[12] Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam. "Rethinking Atrous Convolution for Semantic Image Segmentation", 2017
[13] Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia. " Pyramid Scene Parsing Network", 2016
[14] T. Xiao, Y. Liu, B. Zhou, Y. Jiang, and J. Sun. "Unified Perceptual Parsing for Scene Understanding.", 2018.
[15] Kaiming He Georgia Gkioxari Piotr Doll ́ar Ross Girshick, "Mask R-CNN", 2018.
[16] T.-Y. Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie. "Feature pyramid networks for object detection." In CVPR, 2017.
[17] Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, Antonio Torralba, "Semantic Understanding of Scenes through the ADE20K Dataset", 2018.
[18] Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, Piotr Dollár, "Panoptic Segmentation", 2019.
[19] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation", 2014.
[20] R. B. Girshick, "Fast R-CNN", 2015.
[21] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", 2016.
[22] Lowe DG, "Distinctive image features from scale-invariant keypoints. ", 2004.
[23] Dalal N, Triggs B, "Histograms of oriented gradients for human detection", 2005.
[24] Bay H, Tuytelaars T, Gool LV, "Surf: speeded up robust features", 2006.
[25] Cortes, Corinna, and Vladimir VAPNIK. "Support-vector networks.", 1995.
[26] K. He, X. Zhang, S. Ren, and J. Sun. "Deep residual learning for image recognition. ", 2016.
[27] Simonyan, K. & Zisserman, A. "Very deep convolutional networks for largescale image recognition", 2014.
[28] A. Oliva and A. Torralba, "Modeling the shape of the scene: A holistic representation of the spatial envelope", IJCV, 2001.
[29] L. Fei-Fei and P. Perona, "A bayesian hierarchical model for learning natural scene categories", CVPR, 2005.
[30] S. Lazebnik, C. Schmid, and J. Ponce, "Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories", CVPR, 2006.
[31] N. Rasiwasia , N. Vasconcelos. "Holistic context models for visual recognition", TPAMI 34 (5) (2012) 902–917 .
[32] L.J. Li , H. Su , Y. Lim. "Object bank: an object-level image representation for high-level visual recognition", IJCV 107 (1) (2014) 20–39 .
[33] L. Bo , X. Ren , D. Fox. "Kernel descriptors for visual recognition" NIPS, 2010, pp. 244–252 .
[34] R. Kwitt , N. Vasconcelos , N. Rasiwasia. "Scene recognition on the semantic manifold", ECCV, 2012, pp. 359–372 .
[35] H.O. Song , R. Girshick , S. Zickler. "Generalized sparselet models for real-time multiclass object recognition", TPAMI 37 (5) (2015) 1001–1012 .
[36] L. Zhang , X. Zhen , L. Shao. "Learning object-to-class kernels for scene classification", TIP 23 (8) (2014) 3241–3253 .