| 研究生: |
林翰廷 Han-Ting Lin |
|---|---|
| 論文名稱: |
旅館場景影像自動分類系統 Automatic Hotel scene image classification system |
| 指導教授: | 鄭旭詠 |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 中文 |
| 論文頁數: | 40 |
| 中文關鍵詞: | 室內場景 、VGG16 、Mask R-CNN 、特徵融合 |
| 相關次數: | 點閱:13 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
場景識別是圖像語義分割中相當重要的一個環節,而如何正確且有效率地在場景中找到有效資訊的位置,是場景識別領域中十分困難的問題。在場景識別的任務上,場景是由物體、空間布局和背景之間的關聯關係等因素綜合而成的,而場景中的物體種類對分類結果影響甚深,透過辨識的場景物體分類出場景,例如浴室中的浴缸或馬桶、臥室中的床或書桌等。
本論文提出的方法是以辨識物體的特徵作為前處理的步驟,再根據結果分類出特定場景,透過Mask R-CNN算法針對輸入的圖片進行特定室內物件分割的處理,接著以分割完的物件作為場景的特徵,再與場景結合並進行分類。實驗結果證明,透過獲取場景中物件特徵的方法的前處理,能在場景識別中取得更好的場景分類準確度。
Scene Recognition is an important operation of Image Semantic Segmentation, in the wide range of scene recognition, it is a thorny issue to correctly and efficient find effective location information in specific scene. In the mission of scene recognition, a scene is mainly comprised of three elements, including object, spatial layout and the relationship between backgrounds, these object types in scene have huge impact on results of classification. Through this matter, scene could be recognized based on those identified objects of scene, for example, bathtub or toilet in the bathroom, bed or writing desk in the bedroom.
In this thesis, an effective architecture for scene recognition is proposed. The architecture includes a pre-process step to identify feature of each object, then classify specified scene based on the results of object feature. Moreover, those input pictures will be pre-processed through Mask R-CNN algorithm to identify specific indoor objects by results of segmentation, and those specified indoor objects become elements for scene recognition classification. The experimental results show that through pre-process of object identification, the proposed method has the advantages of accuracy in scene recognition.
[1] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. "Going deeper with convolutions", 2014.
[2] Espinace, P., Kollar, T., Roy, N., Soto, A., "Indoor Scene Recognition Through Object Detection Using Adaptive Objects Search ", 2010.
[3] Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba, "Places: A 10 million Image Database for Scene Recognition", 2017.
[4] Shuang Bai 1 ·Zhaohong Li 1 ·Jianjun Hou, "Learning two-pathway convolutional neural networks for categorizing scene images", 2016
[5] Szummer M, Picard RW, “Indoor-outdoor image classification.”, 1998
[6] Quattoni A, Torralba A, “Recognizing indoor scenes”, 2009.
[7] Li L, Su H, Xing EP, Fei-Fei L , “Object bank: a high-level image representation for scene classification and semantic feature sparsification.”, 2010.
[8] Pandey M, Lazebnik S, “Scene recognition and weakly supervised object localization with deformable part-based models”, 2011.
[9] Singh AAES, Gupta A, “Unsupervised discovery of mid-level discriminative patches”, 2012.
[10] Sadeghi F, Tappen MF, “Latent pyramidal regions for recognizing scenes.”, 2012.
[11] Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A, “Learning deep features for scene recognition using places database”, 2014.
[12] Ranzato M, Susskind J, Mnih V, Hinton G, “On deep generative models with applications to recognition.”, 2011.
[13] Lowe DG, “Distinctive image features from scale-invariant keypoints.”, 2004.
[14] Dalal N, Triggs B, “Histograms of oriented gradients for human detection”, 2005.
[15] Bay H, Tuytelaars T, Gool LV, “Surf: speeded up robust features”, 2006.
[16] Krizhevsky A, Sutskever I, Hinton G, “Imagenet classification with deep convolutional neural networks”, 2012.
[17] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation”, 2014.
[18] R. B. Girshick, “Fast R-CNN”, 2015.
.
[19] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", 2016.
[20] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi., "You only look once: Unified, real-time object detection.", 2015.
[21] J. Redmon and A. Farhadi. "YOLO9000: Better, faster, stronger", InCVPR, 2017.
[22] J. Redmon and A. Farhadi., “Yolov3: An incremental improvement.”, 2018.
[23] Luis Herranz, Shuqiang Jiang, Xiangyang Li, “Scene recognition with CNNs: objects, scales and dataset bias”, 2016.
[24] Zhang L, Zhen X, Shao L, “Learning object-to-class kernels for scene classification.”, 2014.
[25] Long, J., Shelhamer, E., and Darrell, T., "Fully convolutional networks for semantic segmentation.", 2014.
[26] Kaiming He Georgia Gkioxari Piotr Doll ́ar Ross Girshick, "Mask R-CNN", 2018.
[27] Simonyan, K. & Zisserman, A., "Very deep convolutional networks for large-scale image recognition", 2014,
[28] S. Ioffe and C. Szegedy., “Batch normalization: Accelerating deep network training by reducing internal covariate shift”, 2015.
[29] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna, “Rethinking the Inception Architecture for Computer Vision”, 2015.
[30] Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Alemi., “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning”, 2016.
[31] Ruder, S. "An overview of gradient descent optimization algorithms", 2016.
[32] Ning Qian., “On the momentum term in gradient descent learning algorithms. Neural networks : the official journal of the International Neural Network Society”, 1999.
[33] Diederik P. Kingma, Jimmy Ba, "Adam: A Method for Stochastic Optimization", 2017.
[34] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R., "Dropout: A Simple Way to Prevent Neural Networks from Overfitting", 2014.
[35] waleedka, "Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow", 2018, from https://github.com/matterport/Mask_RCNN.