跳到主要內容

簡易檢索 / 詳目顯示

研究生: 廖浤鈞
Hung-Chun Liao
論文名稱: 基於深度學習之關聯式追蹤網路
A novel relational deep network for object tracking
指導教授: 施國琛
Kuo-Chen Shih
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 61
中文關鍵詞: 深度學習物件追蹤
外文關鍵詞: deep learning, object tracking
相關次數: 點閱:17下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 物件追蹤在電腦視覺和深度學習領域是一項熱門的議題,物件追蹤目的在於從一串具連續性的畫面當中找出目標物件的所在位置,現今方法多以深度學習來提高辨識的準確度,而物件追蹤在深度學習領域可分為單物件追蹤及多物件追蹤,前者目的在於判斷物標物件在連續畫面中的位置,而後者目的在於將不同時間點的相同件進行配對,本論文將著重於探討單物件追蹤。
    現今基於深度學習的單物件追蹤方法多數都是採用孿生網路的架構,並透過相關計算來找出特徵圖上各個位置與目標物的相關性,本文將試圖改善目前物件追蹤方法尚存在的一些問題,我們嘗試加入變異損失函數來強化模型區分前背景的能力,並加入圖像卷積網路來藉由目標物與周遭物件之間的關聯來提升模型判斷的準確度。
    由於物件偵測模型是針對每張輸入影像判斷目標物是否存在於影像當中,但在一串連續性的畫面當中,每幀畫面都有些許的不同,如此可能會在某幾幀畫面造成漏偵測的問題,因此我們嘗試加入物件追蹤模型來解決物件偵測模型在連續畫面中的不穩定性,當物件偵測模型偵測到目標物時可由物件追蹤模型來追蹤目標物在往後幾幀畫面的位置,我們將物件追蹤模型結合招牌偵測模型以提升偵測的穩定度及準確度。


    Visual Object Tracking is a popular task in computer vision and deep learning. The purpose of object tracking is to find the location of the target object from a series of continuous images. These years, most object tracking method use deep learning to improve the accuracy. In the field of deep learning, object tracking can be divided into single object tracking and multi-object tracking, the former aims to find the location of the target object in each frames, while the later aims to do the object association, which matches the objects in different time steps. This paper will focus on single object tracking.
    Most of the current deep learning based single object tracking methods use Siamese network architecture, then using the correlation filter to find the correlation between target image and search image. This paper try to improve some existing problems in Siamese based visual object tracking method. We try to add variance loss to enhance the model to distinguish the foreground and the background. Besides, we add the graph convolutional network to improve the accuracy by associating the target object and the surrounding objects.
    Object detection model is to determine whether the target object exists in the image for each input image, but in a continuous series of frames, each frame is slightly different, some objects may be miss detected in some frames, so we try to use tracking model to solve the problem. When the detection model detect the target object, we can use tracking model to track the target in the later frames. We use the visual object tracking model to enhance the stability and the accuracy of the object detection model.

    1. Introduction - 1 - 2. Related Work - 4 - 2.1 Feature extractor networks - 4 - 2.2 Visual Object Tracking model - 10 - 2.3 Graph Convolutional Network - 18 - 3. Proposed Method - 21 - 3.1 Proposed tracking method pipeline - 21 - 3.2 Region proposal - 22 - 3.3 Similarity calculation - 23 - 3.4 Attention-plugin in Siamese Tracker - 32 - 3.5 Tracking compare with Detection - 35 - 4. Experimental Results - 36 - 4.1 Training - 36 - 4.2 Testing - 37 - 5. Conclusion - 47 - 6. Reference - 48 -

    [1] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning applied to document recognition," in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1 Nov. 1998
    [2] F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. arXiv preprint arXiv:1503.03832, 2015
    [3] A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," The 25th International Conference on Neural Information Processing Systems (NIPS'12), Lake Tahoe, U.S.A., 2012, vol.1, pp. 1097-1105.
    [4] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint, arXiv:1409.1556, 2014.
    [5] C. Szegedy, S. Ioffe, and V. Vanhoucke. Inception-v4, inception-resnet and the impact of residual connections on learning. In ICLR Workshop, 2016
    [6] K. He, X. Zhang, S. Ren and J. Sun, "Deep residual learning for image recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, U.S.A., 2016, pp. 770-778.
    [7] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861, 2017
    [8] F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally, and K. Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and < 0.5MB model size. arXiv preprint arXiv:1602.07360, 2016
    [9] Bromley, J., Bentz, J.W., Bottou, L., Guyon, I., LeCun, Y., Moore, C., S¨ackinger, E., Shah, R.: Signature verification using a “Siamese” time delay neural network. International Journal of Pattern Recognition and Artificial Intelligence (1993)
    [10] L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. Torr. Fully-convolutional siamese networks for object tracking. In European Conference on Computer Vision workshops, 2016
    [11] B. Li, J. Yan, W. Wu, Z. Zhu, and X. Hu. High performance visual tracking with siamese region proposal network. In IEEE Conference on Computer Vision and Pattern Recognition, 2018
    [12] Z. Zhu, Q. Wang, B. Li, W. Wu, J. Yan, and W. Hu. Distractor-aware siamese networks for visual object tracking. In European Conference on Computer Vision, 2018
    [13] Bo Li, Wei Wu, Qiang Wang, Fangyi Zhang, Junliang Xing, and Junjie Yan. Siamrpn++: Evolution of siamese visual tracking with very deep networks. In CVPR, 2019
    [14] Q. Wang, L. Zhang, L. Bertinetto, W. Hu, and P. H. Torr. Fast online object tracking and segmentation: A unifying approach. In CVPR, pages 1328–1338, 2019
    [15] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards 14 real-time object detection with region proposal networks,” in Neural Information Processing Systems (NIPS), 2015
    [16] S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeleton-based action recognition,” in ThirtySecond AAAI Conference on Artificial Intelligence, 2018.
    [17] M. Niepert, M. Ahmed, and K. Kutzkov, “Learning convolutional neural networks for graphs,” in ICML 2016, 2016, pp. 2014–2023.
    [18] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. International Conference on Learning Representations (ICLR), 2017.
    [19] Gao, J., Zhang, T., Xu, C.: Graph Convolutional Tracking. CVPR (2019)
    [20] H. Hu, J. Gu, Z. Zhang, J. Dai, and Y. Wei, “Relation networks for object detection,” in CVPR 2018, vol. 2, no. 3, 2018.
    [21] 吳佳霖(2019)。利用注意力插件改善卷積網路:使用前置與後置方法。國立中央大學資訊工程研究所碩士論文,未出版。桃園市。

    QR CODE
    :::