| 研究生: |
莊英瑋 Ying-Wei Chuang |
|---|---|
| 論文名稱: |
基於遞迴神經網路於多重深度攝影機架構下之駕駛動作辨識 Driver Behavior Recognition based on Multiple Depth Cameras using Recurrent Neural Network |
| 指導教授: |
張寶基
Pao-Chi Chang |
| 口試委員: | |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
資訊電機學院 - 通訊工程學系 Department of Communication Engineering |
| 論文出版年: | 2018 |
| 畢業學年度: | 106 |
| 語文別: | 中文 |
| 論文頁數: | 94 |
| 中文關鍵詞: | 駕駛動作辨識 、深度攝影機 、深度學習 、多視角拍攝 |
| 外文關鍵詞: | driver behavior recognition, depth camera, deep learning, RNN |
| 相關次數: | 點閱:18 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本篇論文是針對車內駕駛的動作辨識,針對駕駛動作的目的,一方面是和行車安全有高度相關性,在發現駕駛不專心時或有危險時給予提醒,另一方面可應用在車上型娛樂的控制上。我們提出利用兩台的Kinect攝影機,拍攝到的不同視角影像、經過前處理,並利用深度學習裡面的遞迴神經網路架構去做訓練辨識。使用不同視角的影像降低只用單一視角造成的自我遮蔽的問題,使用長短期記憶的架構可以讓網路學習到隨時間變化而改變的資訊,這套系統應用在我們自己拍攝的Vap多視角駕駛動作資料庫上,可以達到不錯的辨識正確率
This thesis is aimed at in-car driver behavior recognition. One of the purpose is for the safe drive, because it would be dangerous that driver doesn’t concentrate when driving. The other is the application for the In-car entertainment. We propose a multi-view driver behavior recognition system (MDBR system). The pointcloud is captured from different views, and we manage to preprocess the original data by rotation, calibration, merging and sampling. Then, we use the Long short-term memory (LSTM) network, a type of recurrent neural network, as classifier. The dataset we used is VAP multi-view driver behavior dataset. This dataset is we proposed, and contain 10 driver behavior. Using multi-view data can effectively reduce the influence of the occlusion problem. The recognition accuracy of MDBR system have good performance.
參考文獻
[1] https://www.amazon.com/b?ie=UTF8&node=16008589011
[2] https://www.xbox.com/en-US/xbox-one/accessories/kinect
[3] Weimar, R.; Romberg, R.; Frigo, S.; Kasshlke, B.; Feulner, P. “Time-of-flight techniques for the investigation of kinetic energy distributions of ions and neutrals desorbed by core excitations” in Conference: 8th International Workshop on Desorption Induced by Electronic Transitions (DIET 8), San Alfonso, NJ (US), 09/07/1999--10/01/1999; Other Information: PBD: 31 Aug 2000,
[4] Jamie Shotton ; Andrew Fitzgibbon ; Mat Cook ; Toby Sharp ; Mark Finocchio ; Richard Moore ; Alex Kipman ; Andrew Blake, “Real-time human pose recognition in parts from single depth images” in 2011 Conference on Computer Vision and Pattern Recognition (CVPR 2011),pp. 1297-1304, 20-25 June 2011
[5] Vangos Pterneas, “HOW TO USE KINECT HD FACE,” 2015
https://pterneas.com/2015/06/06/kinect-hd-face/
[6] Shahram Izadi , David Kim , Otmar Hilliges , David Molyneaux , Richard Newcombe , Pushmeet Kohli , Jamie Shotton , Steve Hodges , Dustin Freeman , Andrew Davison , Andrew Fitzgibbon ”KinectFusion: Realtime 3D Reconstruction and Interaction Using a Moving Depth Camera” UIST '11 Proceedings of the 24th annual ACM symposium on User interface software and technology, October 16 - 19, 2011,pp559-568
[7] S. Rusinkiewicz and M. Levoy, "Efficient variants of the ICP algorithm," Proceedings Third International Conference on 3-D Digital Imaging and Modeling, Quebec City, Que., 2001, pp. 145-152.
[8] G. Welch and G. Bishop, “An introduction to the kalman filter,” University of North Carolina at Chapel Hill, Chapel Hill, NC, USA, Tech. Rep. 95–041, 1995.
[9] L. R. Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, Feb. 1989.
[10] Seon-Woo Lee, Kenji Mase “Activity and Location Recognition Using Wearable Sensors “ in IEEE Pervasive Computing, Volume: 1, Issue: 3, July-Sept. 2002,pp24-32
[11] A. Zenonos, A. Khan, G. Kalogridis, S. Vatsikas, T. Lewis, and M. Sooriyabandara,”Healthy office: Mood recognition at work using smartphones and wearable sensors” in IEEE International Conference on Pervasive Computing and Communication Workshops, Mar. 2016, pp. 1–6.
[12] G. Sprint, D. Cook, R. Fritz, and M. Schmitter-Edgecombe, “Detecting health and behavior change by analyzing smart home sensor data,” in IEEE International Conference on Smart Computing, May 2016, pp. 1–3.
[13] C. Shen, Y. Chen, and G. Yang, “On motion-sensor behavior analysis for human-activity recognition via smartphones,” in IEEE International Conference on Identity, Security and Behavior Analysis, Feb. 2016, pp. 1–6.
[14] J. G. Lee, M. S. Kim, T. M. Hwang, and S. J. Kang, “A mobile robot which can follow and lead human by detecting user location and behavior with wearable devices,” in IEEE International Conference on Consumer Electronics, Jan. 2016, pp. 209–210.
[15] Seema Rawat , Somya Vats and Praveen Kumar, “Evaluating and Exploring the MYO ARMBAND” in 2016 International Conference System Modeling & Advancement in Research Trends (SMART), Nov. 2016 ,pp. 115-120.
[16] O´ scar D. Lara and Miguel A. Labrador, “A Survey on Human Activity Recognition using Wearable Sensors” in IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 15, NO. 3, THIRD QUARTER 2013, pp1192-1209
[17] F. Lv and R. Nevatia, “Recognition and segmentation of 3-d human action using hmm and multi-class adaboost,” in Proceedings of the European Conference on Computer Vision, 2006, pp. 359–372.
[18] Y. Freund and R. Schapire, “A decision theoretic generalization of on-line learning and application to boosting,” Journal of Computer and System Science, vol. 55, no. 1, pp. 119–139, 1995
[19] Y. Sheikh, M. Sheikh, and M. Shah, “Exploring the space of a human action,” in IEEE International Conference on Computer Vision, vol. 1, Oct. 2005, pp. 144–149.
[20] M. Hussein, M. Torki, M. Gowayyed, and M. El-Saban, “Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations,” in Proceedings of the International Joint Conference on Artificial Intelligence, 2013, pp. 2466–2472.
[21] J. Wang, Z. Liu, Y. Wu, and J. Yuan, “Mining actionlet ensemble for action recognition with depth cameras,” in IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2012, pp. 1290–1297.
[22] C. Chang and C. Lin, “LIBSVM: A library for support vector machines,” ACM Transactions on Intelligent Systems and Technology, vol. 2, 27:1–27:27, 3 2011, Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[23] X. Yang and Y. L. Tian, “Eigenjoints-based action recognition using naïve-bayes-nearestneighbor,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Jun. 2012, pp. 14–19.
[24] Y. Zhu, W. Chen, and G. Guo, “Fusing spatiotemporal features and joints for 3d action recognition,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops, Jun. 2013, pp. 486–491.
[25] R. Chaudhry, F. Ofli, G. Kurillo, R. Bajcsy, and R. Vidal, “Bio-inspired dynamic 3d discriminative skeletal features for human action recognition,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops, Jun. 2013, pp. 471–478.
[26] E. Ohn-Bar and M. M. Trivedi, “Joint angles similarities and hog2 for action recognition,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops, Jun. 2013, pp. 465–470.
[27] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, Jun. 2005, pp. 886–893.
[28] G. Evangelidis, G. Singh, and R. Horaud, “Skeletal quads: Human action recognition using joint quadruples,” in International Conference on Pattern Recognition, Aug. 2014, pp. 4513–4518.
[29] T. Jaakola and D. Haussler, “Exploiting generative models in discriminative classifiers,” in Proceedings of the Conference on Advances in Neural Information Processing Systems II, 1999, pp. 487–493.
[30] N. A. Azis, H. J. Choi, and Y. Iraqi, “Substitutive skeleton fusion for human action recognition,” in International Conference on Big Data and Smart Computing, Feb. 2015, pp. 170–177.
[31] N. A. Azis, Y. S. Jeong, H. J. Choi, and Y. Iraqi, “Weighted averaging fusion for multiview skeletal data and its application in action recognition,” IET Computer Vision, vol. 10, no. 2, pp. 134–142, 2016.
[32] N. A. Azis, H. J. Choi, and Y. Iraqi, “Substitutive skeleton fusion for human action recognition,” in International Conference on Big Data and Smart Computing, Feb. 2015, pp. 170–177.
[33] C. Braunagel, E. Kasneci, W. Stolzmann and W. Rosenstiel, "Driver-Activity Recognition in the Context of Conditionally Autonomous Driving," 2015 IEEE 18th International Conference on Intelligent Transportation Systems, Las Palmas, 2015, pp. 1652-1657.
[34] S. Yan, Y. Teng, J. S. Smith and B. Zhang, "Driver behavior recognition based on deep convolutional neural networks," 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), Changsha, 2016, pp. 636-641.
[35] Y. Xing et al., "Identification and Analysis of Driver Postures for In-Vehicle Driving Activities and Secondary Tasks Recognition," in IEEE Transactions on Computational Social Systems, vol. 5, no. 1, pp. 95-108, March 2018.
[36] Y. W. Chuang, S. W. Sun and P. C. Chang, "Driver posture recognition for 360-degree holographic media browsing," 2017 10th International Conference on Ubi-media Computing and Workshops (Ubi-Media), Pattaya, 2017, pp. 1-6
[37] Paul J. Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University, 1974
[38] Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. Learning representations by back-propagating errors. Nature. 8 October 1986, 323 (6088): 533–536
[39] D. H. Ackley, G. E. Hinton, T. J. Sejnowski, “A Learning Algorithm for Boltzmann Machines,” In D. E. Rumelhart, J. L. McClelland, and the PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations (Cambridge: MIT Press): 282–317. 1985.
[40] P. Smolensky, Parallel Distributed Processing: Volume 1:Foundations, D. E. Rumelhart, J. L. McClelland, Eds. (MIT Press, Cambridge, 1986), pp. 194–281
[41] A. Mnih, and G. E. Hinton, “Learning Unreliable Constraints using Contrastive Divergence,” In IJCNN 2005, Montreal.
[42] Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, “Greedy Layer-Wise Training of Deep Networks,” Advances in Neural Information Processing Systems 19, 2007.
[43] G. Casella, E. I. George, “Explaining the Gibbs Sampler,” The American Statistician 46 (3): 167, 1992.
[44] McCulloch, Warren S.; Pitts, Walter. A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics. 1943-12-01, 5 (4): 115–133
[45] W. Li; Z. Zhang; Z. Liu, "Action recognition based on a bag of 3D points," in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW),pp.9-14, 13-18, June 2010.
[46] K. Kaewplee; N. Khamsemanan; C. Nattee, "A rule-based approach for improving Kinect Skeletal Tracking system with an application on standard Muay Thai maneuvers," in 15th International Symposium on Soft Computing and Intelligent Systems (SCIS), 2014 Joint 7th International Conference on and Advanced Intelligent Systems (ISIS), vol., no., pp.281-285, 3-6b Dec. 2014.
[47] Hochreiter, Sepp; Schmidhuber, Jürgen (1997-11-01). "Long Short-Term Memory". Neural Computation. 9 (8): 1735–1780.
[48] Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio, “Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio” in In Proc. International Conference on Learning Representations
[49] MSR Action3D https://www.uow.edu.au/~wanqing/#Datasets
[50] Northwestern-UCLA Multiview Action3D Datase http://www.stat.ucla.edu/~xnie/multiview_action.html
[51] Tensorflow 官方網站: https://www.tensorflow.org/
[52] C. H. Kuo, P. C. Chang and S. W. Sun, "Behavior Recognition Using Multiple Depth Cameras Based on a Time-Variant Skeleton Vector Projection," in IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 1, no. 4, pp. 294-304, Aug. 2017