跳到主要內容

簡易檢索 / 詳目顯示

研究生: 羅昌威
Chanwit Loakhajorn
論文名稱: 基於圖卷積網路的自動門檢測
Automatic Door Detection based on Graph Convolution Network
指導教授: 施國琛 教授
Timothy K.Shih
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 56
中文關鍵詞: 深度學習電腦視覺圖卷積網路自動門偵測
外文關鍵詞: Deep Learning, Computer Vision, Graph Convolution Network, Door Detection
相關次數: 點閱:11下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 中文摘要
    這項研究可以引導機器人找到入口或自動門,也可以引導盲人從遠處到自動門附
    近。如今,物件檢測模型的效果已經非常強大,高準確度,高幀率,但主要問題是難以
    區分便利商店的自動門和玻璃,因此我們使用圖卷積網絡(GCN)來提高準確度。我們
    的想法是使用 GCN 模型並透過周遭的物件來辨識自動門。這個系統分為兩個部分,物
    件偵測與物件關聯。物件偵測的部分,我們使用高準確度及高幀率的現有模型。YOLOv4
    是今年提出最佳的物件偵測模型,但僅使用 YOLOv4 還是會有大量誤判的狀況,因此需
    要透過我們提出的方法來改善。物件關聯部分我們提出了結合 GCN 的全連接層。接著,
    在開始物件關聯前,我們需要先將物件偵測的結果轉成圖的架構。我們在 GTX 1080 上
    進行訓練,並在 AGX 上測試模型。我們的資料集是一個自製的資料集,我們從 Google
    街景中收集並錄製位於台灣的街景影片。該資料集由 100 多家便利商店組成,其中包含
    內部和外部環境,我們可以透過 GCN 來減少誤偵測的狀況。在我們的測試資料集中獲
    得了 86%的準確度。實際以影片或在真實環境測試,我們的模型可以達到 5 FPS 左右,
    證明我們提出的模型可以找到自動門。為了證實我們的模型可以解決問題,我們展示了
    實驗部分的結果與模型的運作方式


    Abstract
    This research purpose is to navigate robots to find doors and help blind people find
    entrances and exits even across a room. Nowadays, the object detection model is very
    powerful, high accuracy, and high frame rate but the main problem is hard to distinguish
    between glass doors and glass walls in convenience stores. To solve this problem, we use
    Graph Convolutional Network (GCN) to improve accuracy results. The idea is we use the
    GCN model to identify the entrance by using surrounding objects. The system consists of two
    parts, object detector and association part. For the object detector part, we use the advantage
    of public models for high accuracy and frame rate per second. The YOLOv4 is the new model
    this year and is state of the art compared with the previous model. But YOLOv4 still has
    wrong detection so it needs to use our proposed model to fix it. For the association part is our
    proposed model, Fully Connected Layer (FC) combined with GCN. Then we need to convert
    the output of the object detector to graph structure first before input to association. We train
    on GTX 1080 and test the real-time models on the AGX broad. My dataset is custom,
    collecting from google street view and recording video from Taiwan. The dataset consists of
    more than 100 convenience stores, including inside and outside environments. We can reduce
    some wrong detection from the object detector to achieve this. We get test accuracy 86
    percent from our test set. However, we are testing on the video to demo and it showed our
    result gets fps around 5 frames and it can prove our proposed model can find the doors. To
    confirm our model over the problem, we demonstrate in the experiment part and showed how
    the model works.

    Table of Contents 中文摘要 ..................................................................................................................................... i Abstract ....................................................................................................................................... ii Acknowledgments ..................................................................................................................... iii Table of Contents ...................................................................................................................... iv List of Figures ............................................................................................................................ vi List of Tables ........................................................................................................................... viii Chapter 1 Introduction ................................................................................................................ 1 1.1 Overview ...................................................................................................................... 1 1.2 Problem Definition ...................................................................................................... 3 1.3 Scope and Limitation ................................................................................................... 3 1.4 Thesis Structure ........................................................................................................... 3 Chapter 2 Related work .............................................................................................................. 4 2.1 Deep learning .................................................................................................................... 4 2.1.1 Convolutional neural networks (CNN) .................................................................................. 4 2.1.2 Fully Connected Layer (FC) .................................................................................................. 4 2.1.3 Activation Function ................................................................................................................ 5 2.1.4 Dropout layer.......................................................................................................................... 6 2.2 Object Detection model. ................................................................................................... 7 2.2.1 YOLOv4: Optimal Speed and Accuracy of Object Detection ............................................... 7 2.2.2 YOLOv3 model .................................................................................................................... 13 2.2.3 Spatial Relation Recognition (SRR)..................................................................................... 15 v 2.3 Find Relation using Graph Convolutional Network (GCN) ........................................... 16 Chapter 3 Proposed Approach .................................................................................................. 20 3.1 Object Detection part ...................................................................................................... 21 3.2 Association part .............................................................................................................. 21 3.3 Dataset ............................................................................................................................ 25 Chapter 4 Experimental Result ................................................................................................. 30 4.1 Analysis object detector part .......................................................................................... 30 4.2 Analysis Association part ............................................................................................... 32 4.3 Analysis of combined ..................................................................................................... 35 4.4 Analysis Result ............................................................................................................... 37 Chapter 5 Conclusion and Future work .................................................................................... 41 5.1 Conclusion ...................................................................................................................... 41 5.2 Future work .................................................................................................................... 41 References ................................................................................................................................ 42

    [1] R. Girshick, "Fast R-CNN," 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 2015, pp. 1440-1448. [2] K. He, G. Gkioxari, P. Dollár and R. Girshick, "Mask R-CNN," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 2017, pp. 2980-2988. [3] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 779-788. [4] J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 6517-6525. [5] Joseph Redmon, Ali Farhadi, “YOLOv3: An Incremental Improvement,” IEEE Conference on Computer Vision and Pattern Recognition, 2018, arXiv preprint arXiv:1804.0276. [6] S. Liu, L. Qi, H. Qin, J. Shi and J. Jia, "Path Aggregation Network for Instance Segmentation," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 8759-8768. [7] Vinod Nair,Geoffrey E Hinton, “Rectified linear units improve restricted boltzmann machines,” In Proceedings of International Conference on Machine Learning (ICML), 2010, pages 807–814. [8] Shruti Jadon, “Activation Function,” 9 July 2020, taken from https://medium.com/@shrutijadon10104776/survey-on-activation-functions-for -deep-learning-9689331ba092 [9] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov, “DropOut: A simple way to prevent neural networks from overfitting,” The journal of machine learning research, 2014, 15(1):1929–1958. [10] Educative.io, “What is dropout in neural networks?,” 9 July 2020, taken from https://www.educative.io/edpresso/what-is-dropout-in-neural-networks [11] S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 1 June
    43
    2017. [12] Chien-Yao Wang, Hong-Yuan Mark Liao, Yueh-Hua Wu, Ping-Yang Chen, Jun-Wei Hsieh, I-Hau Yeh. “CSPNet: A new backbone that can enhance learning capability of cnn,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPR Workshop), 2020, arXiv preprint arXiv:1911.11929. [13] K. He, X. Zhang, S. Ren and J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904-1916, 1 Sept. 2015. [14] Ayoosh Kathuria, “What’s new in YOLO v3?,” 9 Junly 2020, taken from https://towardsdatascience.com/yolo-v3-object-detection-53fb7d3bfe6b [15] Kaiyu Yang, Olga Russakovsky, Jia Deng, “SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition,” Computer Vision and Pattern Recognition, 2019, arXiv preprint arXiv:1908.02660. [17] Thomas KipfThomas, “GRAPH CONVOLUTIONAL NETWORKS,” 9 July 2020, taken from https://tkipf.github.io/graph-convolutional-networks/ [16] David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre Rafael Gomez-Bombarelli, Timothy Hirzel, Al ´ an Aspuru-Guzik, Ryan P. Adams, “Convolutional Networks on Graphs for Learning Molecular Fingerprints,” Advances in Neural Information Processing Systems 28, 2015, arXiv preprint arXiv:1509.09292. [18] Thomas N. Kipf, “Semi-Supervised Classification with Graph Convolutional Networks,” International Conference on Learning Representations, 2017, arXiv preprint arXiv:1609.02907. [19] Sergey Ioffe and Christian Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” 2015, arXiv preprint arXiv:1502.03167. [20] Wangpeng He, Zhe Huang, Zhifei Wei, Cheng Li and Baolong Gu, “TF‐YOLO: An Improved Incremental Network for Real‐Time Object Detection,” Applied Science 2019, 9, 3225; doi:10.3390/app9163225

    QR CODE
    :::