| 研究生: |
牟庭辰 Ting-Chen Mou |
|---|---|
| 論文名稱: |
基於遞迴神經網路於多重裝置下之硬舉動作辨識及應用 Deadlift Recognition and Application based on Multiple Modalities using Recurrent Neural Network |
| 指導教授: |
張寶基
Pao-Chi Chang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 通訊工程學系 Department of Communication Engineering |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 中文 |
| 論文頁數: | 86 |
| 中文關鍵詞: | 動作辨識 、慣性測量 、深度學習 、多重裝置結合 、深度攝影機 |
| 外文關鍵詞: | smart gym |
| 相關次數: | 點閱:13 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來隨者人工智慧的興起以及電腦硬體效能之提升,人體的動作辨識也漸漸受大家所歡迎,尤其在電腦視覺與圖形識別領域上更是發展迅速。相關應用像是遊戲、追蹤監控、智慧環境場所、醫療領域……等。
本篇論文是針對重訓類動作進行辨識,以硬舉運動項目為例,一方面是因為這類型的動作屬於多關節運動,在重訓動作中屬於高效率的運動,然而對於初學者而言常會因為觀念不正確導致姿勢錯誤而受傷,而我們所出的系統可以分辨出不同類型的硬舉動作之外,在後續的應用方面可以有效的
給予使用者相關的專業建議,找到問題所在,進而避免及預防受傷的風險。
另一方面可以分擔健身教練的工作,讓教練可以專注於更專業的教學內容上。我們提出利用多台裝置的架構之下進行整套系統的運行,包括Kinect攝影機及x-OSC慣性測量裝置,利用時變性投影法及特徵串接等方法,將特徵資料送入深度學習中的多層長短期記憶架構去做訓練辨識訓練。使用多重裝置可以彌補單一裝置的不足,對於後續的應用也能有更多不同層面的處理方式。這套系統應用在我們自己所拍攝的Vap多重裝置之重訓動作資料庫上,且後續的應用也能有效的分析動做。
With the rise of artificial intelligence and the improvement of computer hardware performance in recent years, the human action recognition (HAR) has gradually been popular, especially in Computer Vision and Pattern Recognition. The application has been widely developed in various field. For example, games, tracking and surveillance systems, smart environment, medical field etc.
This thesis is aimed at fitness behavior recognition, taking deadlift as an example. This exercise is a multi-joint movement, however beginners usually cause injury due to their incorrect concept and wrong posture. One of the purpose is for giving users some professional advice about fitness. The other is the system for the gym can share the work of the fitness instructor, so coaches can focus on the more professional teaching content to their students.
We propose a multi-modality deadlift recognition and application system, including Kinect camera and inertial sensors (x-OSC). Using Time-Variant Skeleton Vector Projection method and feature concatenated method before we feed features to our network. Then, we use the Long short-term memory (LSTM) network, a type of recurrent neural network, as classifier. The dataset we used is VAP multi-modalities fitness behavior dataset. This dataset is we proposed, and contain 6 fitness behavior. Using multi-modality data can achieve a good recognition accuracy effectively and the applications of our system can also analyze uses results effectively.
[1] [Online]. Available: https://www.ronfic.com/
[2] Seon Woo Lee and Kenji Mase, “Activity and Location Recognition Using Wearable Sensors “IEEE Pervasive Computing, Volume: 1, Issue: 3, pp24-32, July-Sept. 2002
[3] [Online]. Available: http://x-io.co.uk/x-osc/
[4] [Online]. Available: https://www.xbox.com/xbox-one/accessories/kinect
[5] Weimar. R, Romberg. R, Frigo. S, Kasshlke. B, and Feulner. P, “Time-of-flight techniques for the investigation of kinetic energy distributions of ions and neutrals desorbed by core excitations” in Conference: 8th International Workshop on Desorption Induced by Electronic Transitions (DIET 8), San Alfonso, NJ (US), 09/07/1999--10/01/1999.
[6] Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, and Andrew Blake, “Real-time human pose recognition in parts from single depth images” in 2011 Conference on Computer Vision and Pattern Recognition (CVPR 2011), pp. 1297-1304, 20-25 June 2011
[7] S. Rusinkiewicz and M. Levoy, "Efficient variants of the ICP algorithm," Proceedings Third International Conference on 3-D Digital Imaging and Modeling, Quebec City, Que., 2001, pp. 145-152.
[8] Shahram Izadi, David Kim, Otmar Hilliges, David Molyneaux, Richard Newcombe, Pushmeet Kohli, Jamie Shotton, Steve Hodges, Dustin Freeman, Andrew Davison, and Andrew Fitzgibbon,” KinectFusion: Realtime 3D Reconstruction and Interaction Using a Moving Depth Camera” UIST '11 Proceedings of the 24th annual ACM symposium on User interface software and technology, October 16 - 19, 2011, pp559-568
[9] Vangos Pterneas, (2015) HOW TO USE KINECT HD FACE,
[Online]. Available: https://pterneas.com/2015/06/06/kinect-hd-face/
[10] L. R. Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, Feb. 1989.
[11] Golan. G and Zaidner. L, (2008). Creative strategies in viral advertising: An application of Taylor's six‐segment message strategy wheel. Journal of Computer‐Mediated Communication, 13, 959– 972.
[12] J. Bialkowski, S. Darolles, and G. L. Fol, Improving VWAP strategies: A dynamic volume approach, J. Banking Finance, 32 (2008), pp. 1709--1722
[13] G. Welch and G. Bishop, “An introduction to the kalman filter,” University of North Carolina at Chapel Hill, Chapel Hill, NC, USA, Tech. Rep. 95–041, 1995.
[14] A. H. Jazwinski, “Stochastic Processes and Filtering Theory”, Academic Press, New York, 1970
[15] Webster. D, Celik. O, “Systematic review of Kinect applications in elderly care and stroke rehabilitation”, J. NeuroEng. Rehabil, 2014
[16] R.A. Clark, Y.H. Pua, K. Fortin, C. Ritchie, and K.E.Webster, “Validity of the MicrosoftKinect for assessment of postural control” Gait Posture 2012, 36, 372–377.
[17] Gupta, H.P.; Chudgar, H.S.; Mukherjee, S.; Dutta, T.; Sharma, K. A continuous hand gestures recognition technique for human-machine interaction using accelerometer and gyroscope sensors. IEEE Sens. J. 2016,16, 6425–6432.
[18] Chen, C.; Jafari, R.; Kehtarnavaz, N. A real-time human action recognition system using depth and inertial sensor fusion. IEEE Sens. J. 2016, 16, 773–781.
[19] O´ scar D. Lara and Miguel A. Labrador, “A Survey on Human Activity Recognition using Wearable Sensors” in IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 15, NO. 3, THIRD QUARTER 2013, pp1192-1209
[20] C. Shen, Y. Chen, and G. Yang, “On motion-sensor behavior analysis for human-activity recognition via smartphones,” in IEEE International Conference on Identity, Security and Behavior Analysis, Feb. 2016, pp. 1–6.
[21] J. G. Lee, M. S. Kim, T. M. Hwang, and S. J. Kang, “A mobile robot which can follow and lead human by detecting user location and behavior with wearable devices,” in IEEE International Conference on Consumer Electronics, Jan. 2016, pp. 209–210.
[22] Seon-Woo Lee, Kenji Mase “Activity and Location Recognition Using Wearable Sensors, “IEEE Pervasive Computing, Volume: 1, Issue: 3, July-Sept. 2002, pp24-32
[23] Y. Zhu, W. Chen, and G. Guo, “Fusing spatiotemporal features and joints for 3d action recognition,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops, Jun. 2013, pp. 486–491.
[24] M. Hussein, M. Torki, M. Gowayyed, and M. El-Saban, “Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations,” in Proceedings of the International Joint Conference on Artificial Intelligence, 2013, pp. 2466–2472.
[25] F. Lv and R. Nevatia, “Recognition and segmentation of 3-d human action using hmm and multi-class adaboost,” in Proceedings of the European Conference on Computer Vision, 2006, pp. 359–372
[26] J. Wang, Z. Liu, Y. Wu, and J. Yuan, “Mining actionlet ensemble for action recognition with depth cameras,” in IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2012, pp. 1290–1297
[27] R. Chaudhry, F. Ofli, G. Kurillo, R. Bajcsy, and R. Vidal, “Bio-inspired dynamic 3d discriminative skeletal features for human action recognition,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops, Jun. 2013, pp. 471–478.
[28] T. Jaakola and D. Haussler, “Exploiting generative models in discriminative classifiers,” in Proceedings of the Conference on Advances in Neural Information Processing Systems II, 1999, pp. 487–493.
[29] N. A. Azis, H. J. Choi, and Y. Iraqi, “Substitutive skeleton fusion for human action recognition,” in International Conference on Big Data and Smart Computing, Feb. 2015, pp. 170–177.
[30] Yi Wang, Xiaowen Zhu, Chengzhang Qu, “Fitness Movement Recognition and Evaluation Based on Kinect”
[31] D. Das, S. M. Busetty, V. Bharti, P. K. Hegde, "Strength Training: A Fitness Application for Indoor Based Exercise Recognition and Comfort Analysis", 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 1126-1129, 2017.
[32] Z. Saenz-de Urturi, B. Garcia-Zapirain Soto, "Kinect-based virtual game for the elderly that detects incorrect body postures in real time", Sensors, vol. 16, no. 5, pp. 704, 2016.
[33] Chih-Chieh Fang, Ting-Chen Mou, Shih-Wei Sun, Pao-Chi Chang, “Maching-Learning Based Fitness Behavior Recognition from Camera and Sensor Modalities,” in IEEE International Conference on Artificial Intelligence and Virtual Reality(AIVR),17 January 2019.
[34] D. H. Ackley, G. E. Hinton, T. J. Sejnowski, “A Learning Algorithm for Boltzmann Machines,” In D. E. Rumelhart, J. L. McClelland, and the PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations (Cambridge: MIT Press): 282–317. 1985.
[35] A. Mnih, and G. E. Hinton, “Learning Unreliable Constraints using Contrastive Divergence,” In IJCNN 2005, Montreal.
[36] P. Smolensky, Parallel Distributed Processing: Volume 1: Foundations, D. E. Rumelhart, J. L. McClelland, Eds. (MIT Press, Cambridge, 1986), pp. 194–281
[37] D. H. Ackley, G. E. Hinton, T. J. Sejnowski, “A Learning Algorithm for Boltzmann Machines,” In D. E. Rumelhart, J. L. McClelland, and the PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations (Cambridge: MIT Press): 282–317. 1985.
[38] Paul J. Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University, 1974
[39] Alex Graves, Abdel-rahman Mohamed, and Geoffrey E. Hinton. Speech recognition with deep recurrent neural networks. In Proc. ICASSP, 2013.
[40] Vinyals, Oriol, Toshev, Alexander, Bengio, Samy, and Erhan, Dumitru. Show and tell: A neural image caption generator. arXiv:1411.4555 [cs.CV], November 2014.
[41] Hochreiter, Sepp; Schmidhuber, Jürgen (1997-11-01). "Long Short-Term Memory". Neural Computation. 9 (8): 1735–1780.
[42] C.-H. Kuo, P.-C. Chang, S.-W. Sun, "Behavior recognition using multiple depth cameras based on a time-variant skeleton vector projection", IEEE Trans. Emerging Topics Comput. Intell., vol. 1, no. 4, pp. 294-304, Aug. 2017.
[43] [Online]. Available: https://www.powerlifting.sport/rulescodesinfo/technical-rules.html
[44] [Online]. Available: https://www.tensorflow.org/
[45] [Online]. Available: https://keras.io/