基於遞迴神經網路於多重深度攝影機架構下之駕駛動作辨識

簡易檢索 / 詳目顯示

回結果列表

研究生：	莊英瑋 Ying-Wei Chuang
論文名稱：	基於遞迴神經網路於多重深度攝影機架構下之駕駛動作辨識 Driver Behavior Recognition based on Multiple Depth Cameras using Recurrent Neural Network
指導教授：	張寶基 Pao-Chi Chang
口試委員:
學位類別：	博士 Doctor
系所名稱：	資訊電機學院 - 通訊工程學系 Department of Communication Engineering
論文出版年：	2018
畢業學年度：	106
語文別：	中文
論文頁數：	94
中文關鍵詞：	駕駛動作辨識、深度攝影機、深度學習、多視角拍攝
外文關鍵詞：	driver behavior recognition, depth camera, deep learning, RNN
相關次數：	點閱：18 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本篇論文是針對車內駕駛的動作辨識，針對駕駛動作的目的，一方面是和行車安全有高度相關性，在發現駕駛不專心時或有危險時給予提醒，另一方面可應用在車上型娛樂的控制上。我們提出利用兩台的Kinect攝影機，拍攝到的不同視角影像、經過前處理，並利用深度學習裡面的遞迴神經網路架構去做訓練辨識。使用不同視角的影像降低只用單一視角造成的自我遮蔽的問題，使用長短期記憶的架構可以讓網路學習到隨時間變化而改變的資訊，這套系統應用在我們自己拍攝的Vap多視角駕駛動作資料庫上，可以達到不錯的辨識正確率

This thesis is aimed at in-car driver behavior recognition. One of the purpose is for the safe drive, because it would be dangerous that driver doesn’t concentrate when driving. The other is the application for the In-car entertainment. We propose a multi-view driver behavior recognition system (MDBR system). The pointcloud is captured from different views, and we manage to preprocess the original data by rotation, calibration, merging and sampling. Then, we use the Long short-term memory (LSTM) network, a type of recurrent neural network, as classifier. The dataset we used is VAP multi-view driver behavior dataset. This dataset is we proposed, and contain 10 driver behavior. Using multi-view data can effectively reduce the influence of the occlusion problem. The recognition accuracy of MDBR system have good performance.

摘要    I
Abstract    II
誌謝    III
目錄    V
圖目錄    VII
表目錄    X
第一章 緒論    1
1.1  研究背景    1
1.2　研究動機與目的    2
1.3　論文架構    4
第二章　深度攝影機及動作辨識相關介紹    5
2.1　深度攝影機    5
2.1.1　Kinect深度攝影機    5
2.1.2　硬體規格    6
2.1.3    技術與功能    8
2.1.4    開發工具介紹 Kinect SDK    14
2.2 動作辨識    15
2.2.1 動作辨識相關文獻介紹    16
2.2.2 車內行為辨識    19
第三章　深度學習相關基本介紹    21
3.1　類神經網路    21
3.1.1    生物神經元    22
3.1.2    人工神經元    23
3.1.3人工神經網路    28
3.2　深度學習    31
3.2.1    深度神經網路    31
3.2.2    遞迴神經網路    33
3.2.3    長短期記憶 (LSTM)    35
第四章 提出之車內駕駛動作辨識系統    37
4.1　系統架構    37
4.2　利用骨架當作特徵進行駕駛動作辨識    39
4.3    利用多視角點雲當作特徵進行駕駛動作辨識    41
4.4　VAP多視角駕駛動作資料庫    44
第五章 實驗結果與分析討論    49
5.1　實驗環境介紹    49
5.2　實驗結果    50
5.2.1    骨架特徵輸入之實驗結果    50
5.2.2    多視角點雲特徵輸入之實驗結果    55
5.3　比較與討論    64
第六章　結論與未來展望    74
參考文獻    75


                                

參考文獻
[1] https://www.amazon.com/b?ie=UTF8&node=16008589011
[2] https://www.xbox.com/en-US/xbox-one/accessories/kinect
[3] Weimar, R.; Romberg, R.; Frigo, S.; Kasshlke, B.; Feulner, P. “Time-of-flight techniques for the investigation of kinetic energy distributions of ions and neutrals desorbed by core excitations” in Conference: 8th International Workshop on Desorption Induced by Electronic Transitions (DIET 8), San Alfonso, NJ (US), 09/07/1999--10/01/1999; Other Information: PBD: 31 Aug 2000,
[4] Jamie Shotton ; Andrew Fitzgibbon ; Mat Cook ; Toby Sharp ; Mark Finocchio ; Richard Moore ; Alex Kipman ; Andrew Blake, “Real-time human pose recognition in parts from single depth images” in 2011 Conference on Computer Vision and Pattern Recognition (CVPR 2011),pp. 1297-1304, 20-25 June 2011
[5] Vangos Pterneas, “HOW TO USE KINECT HD FACE,” 2015
https://pterneas.com/2015/06/06/kinect-hd-face/
[6] Shahram Izadi , David Kim , Otmar Hilliges , David Molyneaux , Richard Newcombe , Pushmeet Kohli , Jamie Shotton , Steve Hodges , Dustin Freeman , Andrew Davison , Andrew Fitzgibbon ”KinectFusion: Realtime 3D Reconstruction and Interaction Using a Moving Depth Camera” UIST '11 Proceedings of the 24th annual ACM symposium on User interface software and technology, October 16 - 19, 2011,pp559-568
[7] S. Rusinkiewicz and M. Levoy, "Efficient variants of the ICP algorithm," Proceedings Third International Conference on 3-D Digital Imaging and Modeling, Quebec City, Que., 2001, pp. 145-152.
[8] G. Welch and G. Bishop, “An introduction to the kalman filter,” University of North Carolina at Chapel Hill, Chapel Hill, NC, USA, Tech. Rep. 95–041, 1995.
[9] L. R. Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, Feb. 1989.
[10] Seon-Woo Lee, Kenji Mase “Activity and Location Recognition Using Wearable Sensors “ in IEEE Pervasive Computing, Volume: 1, Issue: 3, July-Sept. 2002,pp24-32
[11] A. Zenonos, A. Khan, G. Kalogridis, S. Vatsikas, T. Lewis, and M. Sooriyabandara,”Healthy office: Mood recognition at work using smartphones and wearable sensors” in IEEE International Conference on Pervasive Computing and Communication Workshops, Mar. 2016, pp. 1–6.
[12] G. Sprint, D. Cook, R. Fritz, and M. Schmitter-Edgecombe, “Detecting health and behavior change by analyzing smart home sensor data,” in IEEE International Conference on Smart Computing, May 2016, pp. 1–3.
[13] C. Shen, Y. Chen, and G. Yang, “On motion-sensor behavior analysis for human-activity recognition via smartphones,” in IEEE International Conference on Identity, Security and Behavior Analysis, Feb. 2016, pp. 1–6.
[14] J. G. Lee, M. S. Kim, T. M. Hwang, and S. J. Kang, “A mobile robot which can follow and lead human by detecting user location and behavior with wearable devices,” in IEEE International Conference on Consumer Electronics, Jan. 2016, pp. 209–210.
[15] Seema Rawat , Somya Vats and Praveen Kumar, “Evaluating and Exploring the MYO ARMBAND” in 2016 International Conference System Modeling & Advancement in Research Trends (SMART), Nov. 2016 ,pp. 115-120.
[16] O´ scar D. Lara and Miguel A. Labrador, “A Survey on Human Activity Recognition using Wearable Sensors” in IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 15, NO. 3, THIRD QUARTER 2013, pp1192-1209
[17] F. Lv and R. Nevatia, “Recognition and segmentation of 3-d human action using hmm and multi-class adaboost,” in Proceedings of the European Conference on Computer Vision, 2006, pp. 359–372.
[18] Y. Freund and R. Schapire, “A decision theoretic generalization of on-line learning and application to boosting,” Journal of Computer and System Science, vol. 55, no. 1, pp. 119–139, 1995
[19] Y. Sheikh, M. Sheikh, and M. Shah, “Exploring the space of a human action,” in IEEE International Conference on Computer Vision, vol. 1, Oct. 2005, pp. 144–149.
[20] M. Hussein, M. Torki, M. Gowayyed, and M. El-Saban, “Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations,” in Proceedings of the International Joint Conference on Artificial Intelligence, 2013, pp. 2466–2472.
[21] J. Wang, Z. Liu, Y. Wu, and J. Yuan, “Mining actionlet ensemble for action recognition with depth cameras,” in IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2012, pp. 1290–1297.
[22] C. Chang and C. Lin, “LIBSVM: A library for support vector machines,” ACM Transactions on Intelligent Systems and Technology, vol. 2, 27:1–27:27, 3 2011, Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[23] X. Yang and Y. L. Tian, “Eigenjoints-based action recognition using naïve-bayes-nearestneighbor,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Jun. 2012, pp. 14–19.
[24] Y. Zhu, W. Chen, and G. Guo, “Fusing spatiotemporal features and joints for 3d action recognition,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops, Jun. 2013, pp. 486–491.
[25] R. Chaudhry, F. Ofli, G. Kurillo, R. Bajcsy, and R. Vidal, “Bio-inspired dynamic 3d discriminative skeletal features for human action recognition,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops, Jun. 2013, pp. 471–478.
[26] E. Ohn-Bar and M. M. Trivedi, “Joint angles similarities and hog2 for action recognition,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops, Jun. 2013, pp. 465–470.
[27] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, Jun. 2005, pp. 886–893.
[28] G. Evangelidis, G. Singh, and R. Horaud, “Skeletal quads: Human action recognition using joint quadruples,” in International Conference on Pattern Recognition, Aug. 2014, pp. 4513–4518.
[29] T. Jaakola and D. Haussler, “Exploiting generative models in discriminative classifiers,” in Proceedings of the Conference on Advances in Neural Information Processing Systems II, 1999, pp. 487–493.
[30] N. A. Azis, H. J. Choi, and Y. Iraqi, “Substitutive skeleton fusion for human action recognition,” in International Conference on Big Data and Smart Computing, Feb. 2015, pp. 170–177.
[31] N. A. Azis, Y. S. Jeong, H. J. Choi, and Y. Iraqi, “Weighted averaging fusion for multiview skeletal data and its application in action recognition,” IET Computer Vision, vol. 10, no. 2, pp. 134–142, 2016.
[32] N. A. Azis, H. J. Choi, and Y. Iraqi, “Substitutive skeleton fusion for human action recognition,” in International Conference on Big Data and Smart Computing, Feb. 2015, pp. 170–177.
[33] C. Braunagel, E. Kasneci, W. Stolzmann and W. Rosenstiel, "Driver-Activity Recognition in the Context of Conditionally Autonomous Driving," 2015 IEEE 18th International Conference on Intelligent Transportation Systems, Las Palmas, 2015, pp. 1652-1657.
[34] S. Yan, Y. Teng, J. S. Smith and B. Zhang, "Driver behavior recognition based on deep convolutional neural networks," 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), Changsha, 2016, pp. 636-641.
[35] Y. Xing et al., "Identification and Analysis of Driver Postures for In-Vehicle Driving Activities and Secondary Tasks Recognition," in IEEE Transactions on Computational Social Systems, vol. 5, no. 1, pp. 95-108, March 2018.
[36] Y. W. Chuang, S. W. Sun and P. C. Chang, "Driver posture recognition for 360-degree holographic media browsing," 2017 10th International Conference on Ubi-media Computing and Workshops (Ubi-Media), Pattaya, 2017, pp. 1-6
[37] Paul J. Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University, 1974
[38] Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. Learning representations by back-propagating errors. Nature. 8 October 1986, 323 (6088): 533–536
[39] D. H. Ackley, G. E. Hinton, T. J. Sejnowski, “A Learning Algorithm for Boltzmann Machines,” In D. E. Rumelhart, J. L. McClelland, and the PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations (Cambridge: MIT Press): 282–317. 1985.
[40] P. Smolensky, Parallel Distributed Processing: Volume 1:Foundations, D. E. Rumelhart, J. L. McClelland, Eds. (MIT Press, Cambridge, 1986), pp. 194–281
[41] A. Mnih, and G. E. Hinton, “Learning Unreliable Constraints using Contrastive Divergence,” In IJCNN 2005, Montreal.
[42] Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, “Greedy Layer-Wise Training of Deep Networks,” Advances in Neural Information Processing Systems 19, 2007.
[43] G. Casella, E. I. George, “Explaining the Gibbs Sampler,” The American Statistician 46 (3): 167, 1992.
[44] McCulloch, Warren S.; Pitts, Walter. A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics. 1943-12-01, 5 (4): 115–133
[45] W. Li; Z. Zhang; Z. Liu, "Action recognition based on a bag of 3D points," in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW),pp.9-14, 13-18, June 2010.
[46] K. Kaewplee; N. Khamsemanan; C. Nattee, "A rule-based approach for improving Kinect Skeletal Tracking system with an application on standard Muay Thai maneuvers," in 15th International Symposium on Soft Computing and Intelligent Systems (SCIS), 2014 Joint 7th International Conference on and Advanced Intelligent Systems (ISIS), vol., no., pp.281-285, 3-6b Dec. 2014.
[47] Hochreiter, Sepp; Schmidhuber, Jürgen (1997-11-01). "Long Short-Term Memory". Neural Computation. 9 (8): 1735–1780.
[48] Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio, “Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio” in In Proc. International Conference on Learning Representations
[49] MSR Action3D https://www.uow.edu.au/~wanqing/#Datasets
[50] Northwestern-UCLA Multiview Action3D Datase http://www.stat.ucla.edu/~xnie/multiview_action.html
[51] Tensorflow 官方網站: https://www.tensorflow.org/
[52] C. H. Kuo, P. C. Chang and S. W. Sun, "Behavior Recognition Using Multiple Depth Cameras Based on a Time-Variant Skeleton Vector Projection," in IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 1, no. 4, pp. 294-304, Aug. 2017

簡易檢索 / 詳目顯示

相關論文