基於模板更新之孿生網路的三百六十度視訊等角立方體投影之行人追蹤

簡易檢索 / 詳目顯示

回結果列表

研究生：	戴鸛臻 Kuan-Chen Tai
論文名稱：	基於模板更新之孿生網路的三百六十度視訊等角立方體投影之行人追蹤 People Tracking Based on Siamese Network with Template Update for EAC Format of 360-degree Videos
指導教授：	唐之瑋 Chih-Wei Tang
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 通訊工程學系 Department of Communication Engineering
論文出版年：	2020
畢業學年度：	108
語文別：	中文
論文頁數：	101
中文關鍵詞：	行人追蹤、360度視訊、等角立方體投影、孿生網路、FLD 、貝氏分類器
外文關鍵詞：	people tracking, 360-degree videos, equi-angular cubemap (EAC), Siamese neural network, Fisher linear discriminant (FLD), Bayes classifier
相關次數：	點閱：11 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在360度視訊中，等角立方體投影(equi-angular cubemap projection, EAC)屬於立方體投影(cubemap projection, CMP)的變體，等角立方體投影相較於立方體投影幾何形變程度較小，在追蹤問題上較不易衍生錯誤。然而等角立方體投影的影像仍具有相鄰面內容不連續的特性，且仍有不均勻幾何失真，導致現有追蹤方案在等角立方體投影的影像中準確性嚴重下降。因此，本論文針對等角立方體投影的360度視訊，提出基於孿生網路的行人追蹤方案，以卷積神經網路(convolutional neural network)對目標模板與目前畫面之搜索視窗提取特徵，並比對特徵以追蹤目標。在影像不連續的問題上，本文使用面拼貼 (face stitching)措施，使追蹤器能於連續的影像內容進行追蹤，同時避免造成更多幾何形變。因應不均勻幾何失真，基於孿生網路由當前畫面計算的分數圖(score map)，來預測更新模板(template update)的時機，使用FLD (Fisher’s linear discriminate)將分數圖降維，並計算分數圖之平均值與標準差作為三種特徵，再通過貝氏分類器(Bayes classifier)決定是否更新模板。實驗結果顯示，本論文所提出之面拼貼與模板更新方案有效提升SiamFC追蹤準確率。

Variants of cubemap projection format (CMP) such as equi-angular cubemap (EAC) of 360-degree videos has less geometric deformation, which may reduce tracking error. However, accuracy and speed of most existing trackers degrade seriously in the face of content discontinuity and non-uniform geometric deformation in EAC formats of 360-degree videos. Thus, this paper proposes a Siamese network based people tracking scheme for 360-degree videos using EAC format. The tracker extracts features from the target template and the search window of the current frame by a convolutional neural network, and compare features to predict the bounding box of target. To be robust against the content discontinuity between inconsistent adjacent faces of EAC images, this paper proposes an efficient face stitching scheme such that the tracker keeps tracking across adjacent faces and avoids raising geometric deformation simultaneously. By referring to the score map generated by Siamese networks, the proposed pre-trained Bayes classifier based mechanism of template update determines the right timing of update. The input feature vector of Bayes classifier includes the data that generated by dimensionality reduction from score map using Fisher linear discriminant (FLD), the mean of the score map and the standard deviation of the score map. Experimental results show that the proposed face stitching scheme and the mechanism of template update effectively improve the tracking accuracy of SiamFC.

摘要    I
Abstract    II
誌謝    IV
目錄    V
圖目錄    VII
表目錄    X
第一章 緒論    1
1.1 前言    1
1.2 研究動機    1
1.3 研究方法    3
1.4 論文架構    4
第二章 基於孿生網路之視覺追蹤與基於 立方體投影之視覺追蹤技術介紹    5
2.1 基於孿生網路之視覺追蹤    6
2.2 視覺追蹤之模板更新技術介紹    9
2.3基於立方體投影之360度視覺追蹤技術介紹    10
2.3.1 立方體投影與等角立方體投影原理    11
2.3.2基於立方體投影之視覺追蹤    14
2.4總結    16
第三章 基於降維之特徵擷取    17
3.1 Principal Component Analysis    18
3.2 Fisher’s Linear Discriminant    18
3.2.1 Linear Discriminate Analysis於電腦視覺之應用    21
3.2.2 Shrinkage    22
3.3總結    24
第四章 本論文所提之三百六十度視訊 等角立方體投影之行人追蹤方案    25
4.1 系統架構    26
4.2等角立方體投影影像之面拼貼與面切換    27
4.3 本論文提出之模板更新方案    30
4.3.1分數圖之特徵擷取    33
4.3.2基於貝氏分類器之模板更新時機決策    41
4.4總結    43
第五章 實驗結果與討論    44
5.1 實驗參數與測試影片規格    44
5.2 追蹤系統實驗結果    47
5.2.1 基於Overlap Ratio之追蹤準確率    48
5.2.2 基於Location Error之追蹤準確率    60
5.2.3 基於Success Plot、Precious Plot之追蹤準確率    63
5.3總結    76
第六章 結論與未來展望    77
參考文獻    78
Publications    83
符號表    84


                                

[1] N. K. Sankaran, H. J. Nisar, J. Zhang, K. Formella, J. Amos, L. T. Barker, and T. Kesavadas, “Efficacy study on interactive mixed reality (IMR) software with sepsis prevention medical education,” in Proc. 2019 IEEE Conference on Virtual Reality and 3D User Interfaces, pp. 664-670, March 2019.
[2] F. Ekpar, “A framework for intelligent video surveillance,” in Proc. IEEE International Conference on Computer and Information Technology Workshops, pp. 421-426, July 2008.
[3] L. Heng, B. Choi, Z. Cui, M. Geppert, S. Hu, B. Kuan, and G. H. Lee, “Project autovision: Localization and 3d scene perception for an autonomous vehicle with a multi-camera system,” in Proc. 2019 International Conference on Robotics and Automation, pp. 4695-4702, May 2019.
[4] J. Ahn, M. Kim, S. Kim, S. Lee, and J. Park, “Formation-based tracking method for human following robot,” in Proc. IEEE International Conference on Ubiquitous Robots, pp. 24-28. June 2018.
[5] M. Zhou, “AHG8: A study on Equi-Angular Cubemap projection (EAC), ” in Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 7th Meeting: JVET-G0056, Torino, Italy, July 13-21, 2017.
[6] Z. Zhou, B. Niu, C. Ke, and W. Wu, “Static object tracking in road panoramic videos,” in Proc. IEEE International Symposium on Multimedia, pp. 57-64, December 2010.
[7] J. Bromley, I. Guyon, Y. LeCun, E. Sackinger, and R. Shah, “Signature verification using a siamese time delay neural network,” International Journal of Pattern Recognition and Artificial Intelligence, Vol. 7, No. 4, pp. 669-688, 1993.
[8] D. Held, S. Thrun, and S. Savarese, “Learning to track at 100 fps with deep regression networks,” in Proc. European Conference on Computer Vision, pp. 749-765, September 2016.
[9] L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. A. Torr, “Fully-convolutional Siamese networks for object tracking,” in Proc. European Conference on Computer Vision, pp. 850-865, September 2016.
[10] J. Valmadre, L. Bertinetto, J. Henriques, A. Vedaldi and P.H. Torr, “End-to-end representation learning for correlation filter based tracking,” In Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2805-2813, July 2017.
[11] Z. Xu, H. Luo, B. Hui, Z. Chang, and M. Ju, “Siamese tracking with adaptive template-updating strategy,” Applied Sciences, Vol. 9, No. 18, 3725, September 2019.
[12] Nam, Hyeonseob and Bohyung Han, “Learning multi-domain convolutional neural networks for visual tracking,” in Proc. IEEE conference on computer vision and pattern recognition, pp. 4293-4302, June 2016.
[13] R. Tao, E. Gravves, and A. W. M. Smeulders, “Siamese instance search for tracking,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420-1429, June 2016.
[14] W. Cao, Y. Li, and Z. He, “Weighted optical flow prediction and attention model for object tracking,” IEEE Access, Vol. 7, pp. 144885-144894, September 2019.
[15] L. Zhang, A. Gonzalez-Garcia1, J. van de Weijer, M. Danelljan, and F. S. Khan, “ Learning the model update for Siamese trackers,” in Proc. IEEE International Conference on Computer Vision, pp. 4010-4019, October 2019.
[16] K.-C. Tai and C.-W. Tang, “Siamese networks based people tracking for 360-degree videos with equi-angular cubemap format,” accepted by IEEE International Conference on Consumer Electronics - Taiwan, Taiwan, September 2020.
[17] ISO/IEC JTC 1/SC 29/WG 11, “Algorithm descriptions of projections of projection format conversion and video quality metrics in 360 Lib,” Doc JVET-E1003, Geneva, January 2017.
[18] J. Tang, S. Alelyani, and H. Liu, “Feature selection for classification: A review, ” Data Classification: Algorithms and Applications, July 2014.
[19] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks, ” Science, Vol. 313, No. 5786, pp. 504-507, July 2006.
[20] K. Fukunaga, Introduction to Statistical Pattern Recognition. 2nd ed, Elsevier, 2013.
[21] I. K. Fodor, “A survey of dimension reduction techniques,” No. UCRL-ID-148494. Lawrence Livermore National Lab., CA (US), 2002.
[22] R. A. Fisher, “The use of multiple measurements in taxonomic problems” Annals of Eugenics, Vol. 7, No.2, pp. 179–188, September 1936.
[23] B. Ghojogh and M. Crowley, “Linear and quadratic discriminant analysis: Tutorial, ” arXiv preprint arXiv:1906.02590, June 2019.
[24] W. Zhao, R. Chellappa, and N. Nandhakumar. “Empirical performance analysis of linear discriminant classifiers, ” in Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 164-169, June 1998.
[25] R. Ramirez and Z. Vamvakousis. “Detecting emotion from EEG signals using the emotive epoc device, ” in Proc. International Conference on Brain Informatics, pp. 175-184, December 2012.
[26] A. Bouzalmat, J. Kharroubi, and A. Zarghili. “Comparative study of PCA, ICA, LDA using SVM classifier, ” Journal of Emerging Technologies in Web Intelligence, Vol.6, No.1, pp. 64-68, February 2014.
[27] G. Li, D. Liang, Q. Huang, S. Jiang, and W. Gao, “Object tracking using incremental 2D-LDA learning and Bayes inference,” in Proc. IEEE International Conference on Image Processing, pp. 1568-1571, October 2008.
[28] O. Ledoit and M. Wolf , “A well-conditioned estimator for large-dimensional covariance matrices”, Journal of Multivariate Analysis, Vol. 88, No. 2, pp. 365-411, February 2004.
[29] J. Bai and S. Shi, “Estimating high dimensional covariance matrices and its applications,” Annuals of Economics and Finance, Vol. 12, No.2, pp. 199-215, September 2011.
[30] J. Schäfer and S. Korbinian , “A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics,” Statistical Applications in Genetics and Molecular Biology, Vol. 4, No. 1, January 2005.
[31] https://www.mettle.com/360vr-master-series-free-360-downloads-page.
[32] Y. Wu, J. Lim, and M.-H. Yang, “Online object tracking: A benchmark,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2411-2418, June 2013.
[33] F. Duanmu, Y. Mao, S. Liu, S. Srinivasan, and Y. Wang, “A subjective study of viewer navigation behaviors when watching 360-degree videos on computers,” in Proc. IEEE International Conference on Multimedia Expo, pp. 1-6, July 2018.
[34] Shum sir, “360 VR,” Shum sir Rebik’s Cube, 2017 . https://www.youtube.com/watch?v=g5taEwId2wA
[35] B. Li, J. Yan, W. Wu, Z. Zhu, and X. Hu, “High performance visual tracking with siamese region proposal network, ” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971-8980, June 2018.
[36] R. Woods, D. J. Czitrom, R. C. Gonzalez, and S. Armitage, Digital Image Processing, 3e, 2008.
[37] X. Corbillon, F. De Simone, and G. Simon, “360-degree video head movement dataset,” in Proc. the 8th ACM on Multimedia Systems Conference, pp. 199-204, June 2016.

簡易檢索 / 詳目顯示

相關論文