| 研究生: |
戴鸛臻 Kuan-Chen Tai |
|---|---|
| 論文名稱: |
基於模板更新之孿生網路的三百六十度視訊等角立方體投影之行人追蹤 People Tracking Based on Siamese Network with Template Update for EAC Format of 360-degree Videos |
| 指導教授: |
唐之瑋
Chih-Wei Tang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 通訊工程學系 Department of Communication Engineering |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 中文 |
| 論文頁數: | 101 |
| 中文關鍵詞: | 行人追蹤 、360度視訊 、等角立方體投影 、孿生網路 、FLD 、貝氏分類器 |
| 外文關鍵詞: | people tracking, 360-degree videos, equi-angular cubemap (EAC), Siamese neural network, Fisher linear discriminant (FLD), Bayes classifier |
| 相關次數: | 點閱:11 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在360度視訊中,等角立方體投影(equi-angular cubemap projection, EAC)屬於立方體投影(cubemap projection, CMP)的變體,等角立方體投影相較於立方體投影幾何形變程度較小,在追蹤問題上較不易衍生錯誤。然而等角立方體投影的影像仍具有相鄰面內容不連續的特性,且仍有不均勻幾何失真,導致現有追蹤方案在等角立方體投影的影像中準確性嚴重下降。因此,本論文針對等角立方體投影的360度視訊,提出基於孿生網路的行人追蹤方案,以卷積神經網路(convolutional neural network)對目標模板與目前畫面之搜索視窗提取特徵,並比對特徵以追蹤目標。在影像不連續的問題上,本文使用面拼貼 (face stitching)措施,使追蹤器能於連續的影像內容進行追蹤,同時避免造成更多幾何形變。因應不均勻幾何失真,基於孿生網路由當前畫面計算的分數圖(score map),來預測更新模板(template update)的時機,使用FLD (Fisher’s linear discriminate)將分數圖降維,並計算分數圖之平均值與標準差作為三種特徵,再通過貝氏分類器(Bayes classifier)決定是否更新模板。實驗結果顯示,本論文所提出之面拼貼與模板更新方案有效提升SiamFC追蹤準確率。
Variants of cubemap projection format (CMP) such as equi-angular cubemap (EAC) of 360-degree videos has less geometric deformation, which may reduce tracking error. However, accuracy and speed of most existing trackers degrade seriously in the face of content discontinuity and non-uniform geometric deformation in EAC formats of 360-degree videos. Thus, this paper proposes a Siamese network based people tracking scheme for 360-degree videos using EAC format. The tracker extracts features from the target template and the search window of the current frame by a convolutional neural network, and compare features to predict the bounding box of target. To be robust against the content discontinuity between inconsistent adjacent faces of EAC images, this paper proposes an efficient face stitching scheme such that the tracker keeps tracking across adjacent faces and avoids raising geometric deformation simultaneously. By referring to the score map generated by Siamese networks, the proposed pre-trained Bayes classifier based mechanism of template update determines the right timing of update. The input feature vector of Bayes classifier includes the data that generated by dimensionality reduction from score map using Fisher linear discriminant (FLD), the mean of the score map and the standard deviation of the score map. Experimental results show that the proposed face stitching scheme and the mechanism of template update effectively improve the tracking accuracy of SiamFC.
[1] N. K. Sankaran, H. J. Nisar, J. Zhang, K. Formella, J. Amos, L. T. Barker, and T. Kesavadas, “Efficacy study on interactive mixed reality (IMR) software with sepsis prevention medical education,” in Proc. 2019 IEEE Conference on Virtual Reality and 3D User Interfaces, pp. 664-670, March 2019.
[2] F. Ekpar, “A framework for intelligent video surveillance,” in Proc. IEEE International Conference on Computer and Information Technology Workshops, pp. 421-426, July 2008.
[3] L. Heng, B. Choi, Z. Cui, M. Geppert, S. Hu, B. Kuan, and G. H. Lee, “Project autovision: Localization and 3d scene perception for an autonomous vehicle with a multi-camera system,” in Proc. 2019 International Conference on Robotics and Automation, pp. 4695-4702, May 2019.
[4] J. Ahn, M. Kim, S. Kim, S. Lee, and J. Park, “Formation-based tracking method for human following robot,” in Proc. IEEE International Conference on Ubiquitous Robots, pp. 24-28. June 2018.
[5] M. Zhou, “AHG8: A study on Equi-Angular Cubemap projection (EAC), ” in Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 7th Meeting: JVET-G0056, Torino, Italy, July 13-21, 2017.
[6] Z. Zhou, B. Niu, C. Ke, and W. Wu, “Static object tracking in road panoramic videos,” in Proc. IEEE International Symposium on Multimedia, pp. 57-64, December 2010.
[7] J. Bromley, I. Guyon, Y. LeCun, E. Sackinger, and R. Shah, “Signature verification using a siamese time delay neural network,” International Journal of Pattern Recognition and Artificial Intelligence, Vol. 7, No. 4, pp. 669-688, 1993.
[8] D. Held, S. Thrun, and S. Savarese, “Learning to track at 100 fps with deep regression networks,” in Proc. European Conference on Computer Vision, pp. 749-765, September 2016.
[9] L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. A. Torr, “Fully-convolutional Siamese networks for object tracking,” in Proc. European Conference on Computer Vision, pp. 850-865, September 2016.
[10] J. Valmadre, L. Bertinetto, J. Henriques, A. Vedaldi and P.H. Torr, “End-to-end representation learning for correlation filter based tracking,” In Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2805-2813, July 2017.
[11] Z. Xu, H. Luo, B. Hui, Z. Chang, and M. Ju, “Siamese tracking with adaptive template-updating strategy,” Applied Sciences, Vol. 9, No. 18, 3725, September 2019.
[12] Nam, Hyeonseob and Bohyung Han, “Learning multi-domain convolutional neural networks for visual tracking,” in Proc. IEEE conference on computer vision and pattern recognition, pp. 4293-4302, June 2016.
[13] R. Tao, E. Gravves, and A. W. M. Smeulders, “Siamese instance search for tracking,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420-1429, June 2016.
[14] W. Cao, Y. Li, and Z. He, “Weighted optical flow prediction and attention model for object tracking,” IEEE Access, Vol. 7, pp. 144885-144894, September 2019.
[15] L. Zhang, A. Gonzalez-Garcia1, J. van de Weijer, M. Danelljan, and F. S. Khan, “ Learning the model update for Siamese trackers,” in Proc. IEEE International Conference on Computer Vision, pp. 4010-4019, October 2019.
[16] K.-C. Tai and C.-W. Tang, “Siamese networks based people tracking for 360-degree videos with equi-angular cubemap format,” accepted by IEEE International Conference on Consumer Electronics - Taiwan, Taiwan, September 2020.
[17] ISO/IEC JTC 1/SC 29/WG 11, “Algorithm descriptions of projections of projection format conversion and video quality metrics in 360 Lib,” Doc JVET-E1003, Geneva, January 2017.
[18] J. Tang, S. Alelyani, and H. Liu, “Feature selection for classification: A review, ” Data Classification: Algorithms and Applications, July 2014.
[19] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks, ” Science, Vol. 313, No. 5786, pp. 504-507, July 2006.
[20] K. Fukunaga, Introduction to Statistical Pattern Recognition. 2nd ed, Elsevier, 2013.
[21] I. K. Fodor, “A survey of dimension reduction techniques,” No. UCRL-ID-148494. Lawrence Livermore National Lab., CA (US), 2002.
[22] R. A. Fisher, “The use of multiple measurements in taxonomic problems” Annals of Eugenics, Vol. 7, No.2, pp. 179–188, September 1936.
[23] B. Ghojogh and M. Crowley, “Linear and quadratic discriminant analysis: Tutorial, ” arXiv preprint arXiv:1906.02590, June 2019.
[24] W. Zhao, R. Chellappa, and N. Nandhakumar. “Empirical performance analysis of linear discriminant classifiers, ” in Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 164-169, June 1998.
[25] R. Ramirez and Z. Vamvakousis. “Detecting emotion from EEG signals using the emotive epoc device, ” in Proc. International Conference on Brain Informatics, pp. 175-184, December 2012.
[26] A. Bouzalmat, J. Kharroubi, and A. Zarghili. “Comparative study of PCA, ICA, LDA using SVM classifier, ” Journal of Emerging Technologies in Web Intelligence, Vol.6, No.1, pp. 64-68, February 2014.
[27] G. Li, D. Liang, Q. Huang, S. Jiang, and W. Gao, “Object tracking using incremental 2D-LDA learning and Bayes inference,” in Proc. IEEE International Conference on Image Processing, pp. 1568-1571, October 2008.
[28] O. Ledoit and M. Wolf , “A well-conditioned estimator for large-dimensional covariance matrices”, Journal of Multivariate Analysis, Vol. 88, No. 2, pp. 365-411, February 2004.
[29] J. Bai and S. Shi, “Estimating high dimensional covariance matrices and its applications,” Annuals of Economics and Finance, Vol. 12, No.2, pp. 199-215, September 2011.
[30] J. Schäfer and S. Korbinian , “A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics,” Statistical Applications in Genetics and Molecular Biology, Vol. 4, No. 1, January 2005.
[31] https://www.mettle.com/360vr-master-series-free-360-downloads-page.
[32] Y. Wu, J. Lim, and M.-H. Yang, “Online object tracking: A benchmark,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2411-2418, June 2013.
[33] F. Duanmu, Y. Mao, S. Liu, S. Srinivasan, and Y. Wang, “A subjective study of viewer navigation behaviors when watching 360-degree videos on computers,” in Proc. IEEE International Conference on Multimedia Expo, pp. 1-6, July 2018.
[34] Shum sir, “360 VR,” Shum sir Rebik’s Cube, 2017 . https://www.youtube.com/watch?v=g5taEwId2wA
[35] B. Li, J. Yan, W. Wu, Z. Zhu, and X. Hu, “High performance visual tracking with siamese region proposal network, ” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971-8980, June 2018.
[36] R. Woods, D. J. Czitrom, R. C. Gonzalez, and S. Armitage, Digital Image Processing, 3e, 2008.
[37] X. Corbillon, F. De Simone, and G. Simon, “360-degree video head movement dataset,” in Proc. the 8th ACM on Multimedia Systems Conference, pp. 199-204, June 2016.