利用深度學習模型融合多重感測器之小提琴弓法動作辨識

簡易檢索 / 詳目顯示

回結果列表

研究生：	劉寶云 Bao-Yun Liu
論文名稱：	利用深度學習模型融合多重感測器之小提琴弓法動作辨識 Violin Bowing Action Recognition based on Multiple Modalities by Deep Learning-Based Sensing Fusion
指導教授：	張寶基 Pao-Chi Chang
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 通訊工程學系 Department of Communication Engineering
論文出版年：	2020
畢業學年度：	108
語文別：	中文
論文頁數：	99
中文關鍵詞：	動作辨識、Kinect 、深度攝影機、慣性感測器、深度學習、多重裝置融合
外文關鍵詞：	Multiple modalities, violin
相關次數：	點閱：15 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著人工智慧的興起，利用深度學習做人類動作辨識也變成現今很重要的研究議題之一，像是在電腦視覺與圖形辨識領域中，動作辨識就是其熱門的研究項目。
本篇所提出的論文是針對小提琴中弓法的動作辨識，是因為多媒體藝術表演中往往需要許多人力及時間，重複測試及彩排才能將環境的聲光效果與表演者完美配合，因此若能利用動作辨識使機器能夠在表演中辨識表演者所做的動作，之後就能夠利用該系統做後續觸發聲光效果等應用。我們提出利用多重裝置做動作辨識，裝置包括Kinect攝影機及Myo armband慣性感測器，來獲取深度影像及慣性資料，並個別經過前處理及資料擴增後，分別進入三維卷積架構以及長短期記憶架構中進行特徵訓練，最後透過決策融合的方法，將不同模型訓練後的特徵做融合，並輸出成最終的分類結果。不同裝置錄製的資料都有其優缺點，因此使用適當的多重裝置可以彌補單一裝置資料上的不足。這套系統應用在我們自己所拍攝的Vap多重裝置之小提琴動作資料庫上，可以達到不錯的辨識正確率。

With the rise of Artificial Intelligence, the use of deep learning for human action recognition (HAR) has become one of the most important research topics today. For example, in the field of computer vision and graphics recognition, action recognition is its popular research project.
The paper presented in this article is aimed at the action recognition of the bowing in the violin because multimedia art performances often require a lot of manpower and time. Repeated tests and rehearsals can perfectly match the sound and light effects of the environment with the performers, so if they can be used action recognition enables the machine to recognize the actions performed by the performer during the performance, and then can use the system for subsequent triggering of sound and light effects and other applications. We propose to use multiple devices for action recognition. The devices include Kinect and Myo armband Inertial Measurement Unit (IMU). After preprocessing and data augmentation, the image data will be sent to the 3D convolution in deep learning for training. The inertial data will be sent to the long short-term memory (LSTM) network in deep learning for training. After training, we use the decision fusion to fuse the features of different devices, and output the final classification results.

摘要    I
Abstract    II
致謝    III
圖目錄    VII
表目錄    IX
第一章、 緒論    1
1 研究背景    1
2 研究動機與目的    3
3 論文架構    4
第二章、 深度攝影機、慣性感測器及動作辨識相關介紹    5
1 深度攝影機    5
1.1 Kinect 深度攝影機    5
1.2 硬體規格    6
1.3 技術與功能    7
1.4 開發工具介紹Kinect SDK    8
2 慣性感測器    10
2.1 MYO Armband 智慧臂環慣性感測器    10
2.2 硬體規格    12
3 動作辨識    13
3.1 相關文獻介紹    13
3.2 小提琴動作辨識    15
第三章、 深度學習相關基本介紹    18
1 類神經網路    18
1.1 類神經網路的學習機制    19
1.2 類神經網路發展歷史    20
2 深度學習    29
2.1 卷積神經網路    29
2.2 三維卷積神經網路    31
2.3 遞迴神經網路    33
2.4 長短期記憶模型    36
第四章、 提出的小提琴動作辨識系統及決策融合    38
1 系統架構    38
2 小提琴動作辨識    40
2.1 前處理    40
2.2 深度學習模型    47
3 多模型決策級融合    52
4 VAP小提琴動作資料庫    54
4.1 小提琴動作辨識弓法介紹    55
4.2 資料錄製環境配置    59
第五章、 實驗結果與分析討論    60
1 實驗環境介紹    60
2 實驗結果比較與討論    61
第六章、 結論與未來展望    80
參考文獻    82

                                

[1] KINECT 官方網站 : https://www.xbox.com/xbox-one/accessories/kinect
[2] Jamie Shotton ; Andrew Fitzgibbon ; Mat Cook ; Toby Sharp ; Mark Finocchio ; Richard Moore ; Alex Kipman ; Andrew Blake, “Real-time human pose recognition in parts from single depth images” in 2011 Conference on Computer Vision and Pattern Recognition (CVPR 2011),pp. 1297-1304, 20-25 June 2011
[3] Shahram Izadi , David Kim , Otmar Hilliges , David Molyneaux , Richard Newcombe , Pushmeet Kohli , Jamie Shotton , Steve Hodges , Dustin Freeman , Andrew Davison , Andrew Fitzgibbon ”KinectFusion: Realtime 3D Reconstruction and Interaction Using a Moving Depth Camera” UIST '11 Proceedings of the 24th annual ACM symposium on User interface software and technology, October 16 - 19, 2011,pp559-568
[4] S. Rusinkiewicz and M. Levoy, "Efficient variants of the ICP algorithm," Proceedings Third International Conference on 3-D Digital Imaging and Modeling, Quebec City, Que., 2001, pp. 145-152.
[5] C. Liu, Y. Hu, Y. Li, S. Song, and J. Liu, “PKU-MMD: A Large Scale Benchmark for Continuous Multi-ModalHuman Action Understanding”, arXiv:1703.07475 [cs.CV], 2017
[6] https://personal.utdallas.edu/~kehtar/UTD-MHAD.html (Chen et al., IEEE ICIP 2015)
[7] Webster, D.; Celik, O. Systematic review of Kinect applications in elderly care and stroke rehabilitation J. NeuroEng. Rehabil. 2014, 11. [CrossRef] [PubMed]
[8] Gupta, H.P.; Chudgar, H.S.; Mukherjee, S.; Dutta, T.; Sharma, K. A continuous hand gestures recognition technique for human-machine interaction using accelerometer and gyroscope sensors. IEEE Sens. J. 2016,16, 6425–6432. [CrossRef]
[9] C. Chen, R. Jafari and N. Kehtarnavaz, "Improving human action recognition using fusion of depth camera and inertial sensors", IEEE Trans. Human-Mach. Syst., vol. 45, no. 1, pp. 51-61, Feb. 2015.
[10] S. W. Lee and K. Mase, “Activity and location recognition using wearable sensors,” IEEE Pervasive Computing, Vol.1, No.3, pp.24-32, 2002.
[11] J. G. Lee, M. S. Kim, T. M. Hwang, and S. J. Kang, “A mobile robot which can follow and lead human by detecting user location and behavior with wearable devices,” in IEEE International Conference on Consumer Electronics, Jan. 2016, pp. 209–210.
[12] R. Xie and J.Cao, “Accelerometer-based hand gesture recognition by neural network and similaritymatching”, IEEE Sensors Journal, Vol. 16, No. 11, 4537–4545, 2016.
[13] N. Dawar and N. Kehtarnavaz, ‘‘Action detection and recognition in continuous action streams by deep learning-based sensing fusion,’’ IEEE Sensors J., vol. 18, no. 23, pp. 9660–9668, Dec. 2018.
[14] W. Li, Z. Zhang, and Z. Liu, “Action recognition based on a bag of 3D points,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Workshops, San Francisco, CA, USA, pp. 9–14, Jun. 2010.
[15] C. Chen, R. Jafari, and N. Kehtarnavaz, “Action recognition from depth sequences using depth motion maps-based local binary patterns,” in Proc. IEEE Winter Conf. Appl. Comput. Vis., Waikoloa Beach, HI, USA, pp. 1092–1099, Jan. 2015.
[16] D. Dalmazzo, and R. Rafael, "Air violin: a machine learning approach to fingering gesture recognition," proceedings of the 1st ACM SIGCHI International Workshop on Multimodal Interaction for Education, pp. 63-66, 2017.
[17] D. C. Dalmazzo, and R. Rafael, "Bowing gestures classification in violin performance: a machine learning approach," Frontiers in psychology 10: 344, 2019.
[18] W. S. Mcculloch and W. Pitts, “A Logical Calculus of the Ideas Immanent in Nervous Activity,” Bulletin of Mathematical Biophysics, vol.5, no.4, pp.115-133, Dec. 1943.
[19] F. A. Makinde, C. T. Ako, O. D. Orodu, I. U. Asuquo, “Prediction of crude oil viscosity using feed-forward back-propagation neural network (FFBPNN),” Petroleum and Coal , vol. 54, pp. 120-131, 2012.
[20] D. O. Hebb, “Organization of Behavior,” New York: Wiley & Sons.
[21] M. Minsky, S. Papert, “Perceptrons,” Cambridge, MA: MIT Press.
[22] P. J. Werbos, “Beyond regression: new tools for prediction and analysis in the behavioral sciences,” Ph.D. thesis, Harvard University, 1974.
[23] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol.323, no.6088, pp.533-536, 1986.
[24] S. Lawrence, et al., “Face recognition: A convolutional neural-network approach”, IEEE Transactions on Neural Networks, vol.8, no. 1, pp. 98-113, 1997.
[25] Y. Lecun, et al., “Gradient-based learning applied to document recognition”, Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[26] I. Mrazova, M. Kukacka, “Hybrid convolutional neural networks”, Industrial Informatics INDIN 2008. 6th IEEE International Conference, 2008.
[27] K. Fukushima, “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position,” Biological cybernetics, vol. 36, no. 4, pp. 193-202, 1980.
[28] Y. Lecun, et al., “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[29] J. Shuiwang, X. Wei, Y. Ming, and Y. Kai, “3D Convolutional Neural Networks for Human Action Recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221–231, 2013.
[30] J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” Proceedings of the national academy of sciences, vol. 79, no. 8, pp. 2554-2558, 1982.
[31] Hochreiter, Sepp; Schmidhuber, Jürgen (1997-11-01). "Long Short-Term Memory". Neural Computation. 9 (8): 1735–1780.
[32] N. Dawar, S. Ostadabbas and N. Kehtarnavaz, “Data augmentation in deep learning-based fusion of depth and inertial sensing for action recognition,” IEEE Sensors Letters, vol.3, no.1, pp.1-4, 2019.
[33] W. Li, C. Chen, H. Su and Q. Du, “Local binary patterns and extreme learning machine for hyperspectral imagery classification,” IEEE Transactions on Geoscience and Remote Sensing, vol.53, no.7, pp.3681-3693, 2015.
[34] TensorFlow: an open source Python package for machine intelligence, https://www.ten-sorflow.org, retrieved Dec. 1, 2016.
[35] J. Dean, et al. “Large-Scale Deep Learning for Building Intelligent Computer Systems,” in Proceedings of the Ninth ACM International Conference on Web Search and Data Min-ing, pp. 1-1, Feb. 2016.
[36] keras官方網站: https://keras.io/

簡易檢索 / 詳目顯示

相關論文