| 研究生: |
查忠敏 Chung-MIn Cha |
|---|---|
| 論文名稱: |
基於Kinect之互動實驗室 A Kinect-based Interactive Laboratory |
| 指導教授: |
蘇木春
Mu-Chun Su |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 中文 |
| 論文頁數: | 105 |
| 中文關鍵詞: | 手部辨識 、手部特徵擷取 、放射狀基底函數網路 、人機互動 、虛擬實驗 |
| 外文關鍵詞: | gesture reorganization, action features, radial basis function network, human-computer interaction, virtual laboratory |
| 相關次數: | 點閱:11 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文提出一套結合手部動作的互動實驗室模擬系統,系統有兩個核心模組,分別是「手部動作辨識模組」和「互動實驗室模擬模組」,此系統讓使用者可以藉由肢體操作的方式達到具備真實操作感的實驗模擬效果。本論文所提出之「基於Kinect之互動實驗室」目前是針對高中之化學實驗所設計的,從許多基礎化學實驗中觀察出十種在做實驗時常會用到的手部動作,依此來構成此系統之動作單元。這十種動作單元可依活動範圍之大小分成兩類:大動作之動作單元與小動作之動作單元。其中,大動作指的是在操作設備時,會有較大範圍的移動變化之手部動作,至於小動作則是指活動範圍侷限於手掌部分變化的細部動作。
本論文利用Kinect感測器當作動作擷取器,提供相關手部資訊給「手部動作辨識模組」處理,然後此模組採用兩階段方式來進行動作辨識。首先,此模組會利用使用者的骨架資訊進行大動作的辨識;接著,會進行小動作的辨識,先萃取出本論文所提出之動作特徵後,再採用放射狀基底函數網路(radial basis function network簡稱RBFN)作為動作辨識單元。另外,「互動實驗室模擬模組」則提供簡單的互動介面,以結合「動作辨識模組」的功能來完成化學實驗的模擬,此外,本模組也設計出實驗課程的編寫介面,以提供老師未來可以很方便地進行課程的新增與修改。
本論文透過許多不同面向的實驗設計來驗證本系統之效能。首先會探討此系統的非特定使用者的推廣性,接著,會針對環境的變化來測試本系統的強健性 (robustness) 。在非特定使用者的推廣性的實驗方面,會將使用者分為特定使用者和非特定使用者兩群,將三位特定使用者的資料作為訓練RBF網路的樣本庫來源,然後,再以此訓練後之網路來測試另外八位使用者的辨識效果,以驗證本系統之非特定使用者的推廣性。至於本系統的環境變化強健性的測試,則是探討使用者的站立距離和朝向角度的不同是否會導致系統的辨識率有所變化。實驗結果顯示,特定使用者和非特定使用者在相同環境的辨識率皆為97%和96%,在不同環境下的條件下,特定使用者的辨識率90%~96之間,而非特定使用者辨識率為86%~91%之間。從結果可推論,不同的使用者在相同的環境下辨識率皆有96%以上;另外當相同使用者面對不同環境的時候,兩群使用者的辨識率便有較大的差異。故此系統對於不同使用者的推廣性是足夠的,但對於環境的強健性來說,辨識率最低會到86%。
This thesis presents a gesture-based interactive laboratory simulation system. The proposed system is consisted of two main modules. The first module is the “gesture recognition module” and the second one is the “interactive laboratory simulation module”. The proposed system provides users with a simulation environment to conduct experiments with the real sense of the operations via gestures. The current proposed “Kinect-based interactive laboratory” is developed to simulate the chemical experiments designed for students at senior high-schools. Ten basic gestures which were induced from many elementary chemical experiments consist of the main action units for the simulation system. These ten gestures can be categorized into two action units according to their size range of activities: the large-scale gesture action unit and the small-scale gesture action unit. While the large-scale gesture action unit refers to the gestures which will involve in a wide range of hand movements during the operations of experiment equipment, the small-scale gesture action unit refers to the gestures of which activity scopes are confined to the palm portion.
In this thesis, a Kinect sensor is utilized as a motion capture device, providing relevant gesture information to the "gesture recognition module”. Then this module adopts a two-stage approach to gesture recognition. First of all, this module classifies large-scale gestures based on the skeleton information. It will extract salient gesture features proposed by this thesis from hand movements. The radial basis function network (RBFN) is adopted as the gesture recognition unit for classifying the small-scale gestures. In the following, the "Interactive laboratory simulation module" provides a simple interactive interface to combine the recognition results achieved by the "gesture recognition module" to complete the simulation of chemical experiments. In addition, an authoring tool is designed for teachers to allow them to be able to easily add and modify the system for the preparations of the future experiments designed by them.
In this thesis, many different aspects of experiments were design to verify the performance of the system. This system will first discuss the generalization degree of the system to non-specific users and then the system robustness degree to the changes in the operating environment. In the experiments of non-specific users’ generalization testing, the users were divided into the specific user group consisted of 3 subjects and the non-specific user group consisted of 8 subjects. The data collected from specific user group was used to train RBF networks. Then data collected from non-specific user group were utilized to test the generalization performance of the trained RBF networks. As for the verification of the system robustness to the environmental changes, the experiments were designed to explore whether the user's different standing locations and the viewing angles will result in the changes of the recognition rates. Experimental results showed that under the same environment, the recognition rate could achieve at least 97% and 96% correct for the specific user and non-specific group. As for the specific users under the different environment, the recognition rate varied from 90% to 96%. For the non-specific user, the recognition rate varied from 85% to 91% due to the changes in either the locations or the viewing angles. Therefore, the influence of the changes in the operating environment was more apparent than the changes in the users.
參考文獻
[1] B. Luskin. Think "Exciting": E-Learning and the Big "E" [Online] Available: http://www.educause.edu/ero/article/think-exciting-e-learning-and-big-e Mar. 3, 2010[data accessed]
[2] L. Johnson, S. Adams, and M. Cummins, The NMC Horizon Report: 2012 Higher Education Edition. Austin, The New Media Consortium, 2012.
[3] L. Johnson, S. Adams Becker, M. Cummins, V. Estrada, A. Freeman, and H. Ludgate, NMC Horizon Report: 2013 Higher Education Edition. Austin, The New Media Consortium, 2013.
[4] Linden Lab. Second Life [Online] Available: http://secondlife.com/ Jun. 23, 2003[data accessed]
[5] PhET [Online] Available: http://phet.colorado.edu/zh_TW/simulations/category/chemistry Nov. 5, 2008[data accessed]
[6] Late Nite Labs [Online] Available: https://latenitelabs.com/ Jun. 18, 2013[data accessed]
[7] Kinect Virtual Laboratory [Online] Available: http://www.jetonsoft.com/ Apr. 17, 2012[data accessed]
[8] Microsoft. Touch interaction design (windows store apps) [Online] Available: http://msdn.microsoft.com/en-us/library/windows/apps/hh465415.aspx May 14, 2013[data accessed]
[9] Apple Inc. Mac Basics: Multi-Touch gestures [Online] Available: http://support.apple.com/kb/ht4721 Mar. 7, 2013[data accessed]
[10] Kinect for Window [Online] Available: http://www.microsoft.com/en-us/kinectforwindows/ Jun. 12, 2012[data accessed]
[11] Kinect for Window SDK [Online] http://www.microsoft.com/en-us/kinectforwindows/develop/new.aspx Mar. 18, 2013[data accessed]
[12] Xtion PRO [Online] Available: http://www.asus.com/Multimedia/Xtion_PRO/ Jan. 6, 2011
[13] Smart TV [Online] Available: http://www.samsung.com/tw/consumer/televisions/televisions/smart-tv Dec. 8, 2010
[14] J. K. Aggarwal and M. S. Ryoo, “Human activity analysis: A review,” ACM Comput. Surv. (CSUR), vol. 43, no. 3, 2011
[15] C. Rougier, J. Meunier, A. St-Arnaud, and J. Rousseau, “Fall Detection from Human Shape and Motion History using Video Surveillance,” in Proc. of 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW), Niagara Falls, USA, Ont., pp. 875-880, 2007.
[16] W. L. Lu and J. J. Little, “Simultaneous tracking and action recognition using the pca-hog descriptor,” in Proc. of 3rd Canadian Conference on Computer and Robot Vision (CRV), Quebec City, Canada, pp. 6., 2006.
[17] K. Mikolajczyk and H. Uemura, “Action recognition with motion-appearance vocabulary forest,” in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 2008, pp. 1-8.
[18] H. Ning , Y. Hu, and T. Huang, “Searching human behaviors using spatialtemporal words”, in Proc. of IEEE International Conference on Image Processing, San Antonio, TX, UAS, pp. 337-340, 2007.
[19] G. Willems, T. Tuytelaars, and L. Van Gool, “An effcient dense and scale-invariant spatio-temporal interest point detector,” in Proc. of 10th European Conference on Computer Vision: Part II, Heidelberg, Berlin, Germany, pp. 650-663, 2008.
[20] A. Klaser, M. Marsza lek, and C. Schmid, ”A spatio-temporal descriptor based on 3Dgradients,” in 19th British Machine Vision Conference, Leeds, UK, pp. 275:1-10, 2008.
[21] D. G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[22] P. Scovanner, S. Ali, and M. Shah, “A 3-dimensional SIFT descriptor and its application to action recognition,” in Proc. of 15th international conference on Multimedia, New York, USA, pp. 357-360, 2007.
[23] C. Pinhanez, “Representation and recognition of action in interactive spaces,” Ph.D. thesis, department of Architecture, Massachusetts Institute of Technology, Cambridge, MA, USA, 1999.
[24] V. Krüger, D. Kragic, A. Ude, and C. Geib, “The meaning of action: a review on action recognition and mapping,” Advanced Robotics, vol. 21, London, England, Taylor and Francis Ltd, pp. 1473-1501, 2007.
[25] D.Weinland, R.Ronfard, and E.Boyer, “A Survey of Vision-Based Methods for Action Representation, Segmentation and Recognition,” Computer Vision and Image Understanding, vol. 115, no.2, pp.224-241, 2011.
[26] D. Marr and L. Vaina, “Representation and recognition of the movements of shapes,” in Proc. of the Royal Society of London. Series B, Biological Sciences, vol. 214, London, England, The Royal Society of London, pp. 501-524, 1982.
[27] L. W. Campbell and A. F. Bobick, “Recognition of human body motion using phase space constraints,” in Proc. of Fifth International Conference on Computer Vision, Cambridge, MA, USA, pp. 624-630, 1995.
[28] D. Ramanan and D. A. Forsyth, “Automatic annotation of everyday movements,” Neural Info. Proc. Systems (NIPS), Vancouver, Canada, 2003.
[29] M. Brand, N. Oliver, and A. Pentland, “Coupled hidden markov models for complex action recognition,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Juan, USA, pp. 994-999, 1997.
[30] A. Yilmaz and M. Shah, “Recognizing human actions in videos acquired by uncalibrated moving cameras,” in Tenth IEEE International Conference on Computer Vision, Beijing, China, pp. 150-157, 2005.
[31] Y. Guo, G. Xu, and S. Tsuji, “Understanding human motion patterns,” in Proc. of 12th IAPR International Conference on Pattern Recognition: Conference B: Computer Vision & Image Processing, Jerusalem, Israel, pp. 325-329, 1994.
[32] J. Yamato, J. Ohya, and K. Ishii, “Recognizing human action in time-sequential images using hidden markov,” in 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Champaign, IL, USA, pp. 379-385, 1992.
[33] L. Wang and D. Suter, “Recognizing human activities from silhouettes: Motion subspace and factorial discriminative graphical model,” in IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, pp. 1-8, 2007.
[34] J. Rittscher and A. Blake, “Classification of human body motion,” in Proc. of Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, pp. 634-639, 1999.
[35] R. Polana and R. Nelson, “Low level recognition of human motion (or how to get your man without finding his body parts),” in Proc. of IEEE Workshop on Motion of Non-Rigid and Articulated Objects, Austin, TX, USA, pp. 77-82, 1994.
[36] A. A. Efros, A. Berg, G. Mori, and J. Malik, “Recognizing action at a distance,” in IEEE International Conference on Computer Vision, Nice, pp. 726-733, 2003.
[37] I. Laptev and T. Lindeberg, “Space-time interest points,” in IEEE International Conference on Computer Vision, Nice, France, pp. 432-439, 2003.
[38] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior recognition via sparse spatio-temporal feature,” in 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS), Beijing, China, pp. 65-72, 2005.
[39] L. Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” in Proc. of IEEE, vol. 77, no.2, pp. 257-286, Aug. 1989.
[40] A. Bobick and J. Davis, “Real-time recognition of activity using temporal templates,” in Proc. of 3rd IEEE Workshop on Applications of Computer Vision, Sarasota, FL, USA, pp. 39-42, 1996.
[41] A. F. Bobick and J. W. Davis, “The recognition of human movement using temporal templates,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no.3, pp. 257-267, Aug. 2001.
[42] D. Weinland, R. Ronfard, and E. Boyer, “Motion history volumes for free view-point action recognition,” in IEEE International Workshop on modeling People and Human Interaction, 2005.
[43] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, “Actions as space-time shapes,” in IEEE International Conference on Computer Vision, Beijing, China, pp. 1395-1402, Oct. 2005.
[44] Md. Atiqur Rahman Ahad, J. K. Tan, H. Kim, and S. Ishikawa, “Motion history image: its variants and applications,” Machine Vision and Applications, vol. 23 (2), Dordrecht, Netherlands , Springer Science+Business Media B.V., pp. 255-281, 2012.
[45] S. Carlsson and J. Sullivan, “Aciton recognition by shape matching to key frames,” in IEEE Computer Society Workshop on Models versus Exemplars in Computer Vision, 2011
[46] S. Zhao, W. Tan, C. Wu, C. Liu, and S. Wen, “A Novel Interaction Method of Virtual Reality System Based on Hand Gesture Recognition,” in Chinese Control and Decision Conference, Guilin, China, pp. 5879-5882, 2009.
[47] C. T. Hsieh,C. H. Yeh, K. M. Hung, L. M. Chen, and C. Y. Ke, “A real time hand gesture recognition system based on DFT and SVM,” in 8th International Conference on Information Science and Digital Content Technology, Jeju Island, Korea, pp. 490-494, 2012.
[48] Z. Ren, J. Meng, J. Yuan, and Z. Zhang, “Robust hand gesture recognition with kinect sensor,” in Proc. of 19th ACM international conference on Multimedia, New York, USA, pp. 759-760, 2011.
[49] Z. Ren, J. Yuan, and Z. Zhang, “Robust hand gesture recognition based on finger-earth mover's distance with a commodity depth camera,” in Proc. of 19th ACM international conference on Multimedia, New York, USA, pp.1093-1096, 2011.
[50] C. Tang, Y. Ou, G. Jiang, Q. Xie, and Y. Xu, “Hand tracking and pose recognition via depth and color information,” in IEEE International Conference on Robotics and Biomimetics (ROBIO), Guangzhou, USA, pp. 1104-1109, 2012.
[51] M. Caputo, K. Denker, B. Dums, and G. Umlauf, “3D Hand Gesture Recognition Based on Sensor Fusion of Commodity Hardware,” in Mensch & Computer 2012: interaktiv informiert – allgegenwärtig und allumfassend!?, München, Germany, pp. 293-302, 2012.
[52] X. Zhang, X. Chen, W. H. Wang, J. H. Yang, V. Lantz, and K. Q. Wang, “Hand gesture recognition and virtual game control based on 3D accelerometer and EMG sensors,” in Proc. of 14th international conference on Intelligent user interfaces, New York, USA, pp. 401-406, 2009.
[53] C. Yang, Y. Jang, J. Beh, D. Han, and H. Ko, “Gesture recognition using depth-based hand tracking for contactless controller application,” in IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, pp. 297-298, 2012.
[54] Y. Wang, C. Yang, X. Wu, S. Xu, and H. Li, “Kinect Based Dynamic Hand Gesture Recognition Algorithm Research,” in 4th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), Nanchang, Jiangxi, China, pp. 274-279, 2012.
[55] 蘇木春、張孝德,「機器學習:類神經網路、模糊系統以及基因演算法則」,全華科技圖書,民國九十三年。