跳到主要內容

簡易檢索 / 詳目顯示

研究生: 鄭棕升
Chong Sheng Tee
論文名稱: 基於深度學習手勢辨識之元宇宙虛擬會議互動系統
An Interactive System for Metaverse Virtual Conference Based on Deep Learning Gesture Recognition
指導教授: 施國琛
Timothy K. Shih
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 56
中文關鍵詞: 手勢辨識元宇宙虛擬會議虛擬實境手部追蹤深度學習
外文關鍵詞: gesture recognition, Metaverse, virtual conference, virtual reality, hand tracking, deep learning
相關次數: 點閱:15下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 深度學習的技術已於世界廣汎人知,人們將這些技術開發出數個項目,並運用在各種領域的應用上。現今,人們也逐漸利用虛擬實境 (VR) 技術配合深度學習,研發出一些實用工具或系統,以讓使用者進行特定的任務。引入虛擬實境技術,是希望能夠減輕使用者的負擔,給予良好的使用者體驗,還有突破某些場景下的使用限制。在本文中,我們提出了元宇宙虛擬會議互動系統。我們只使用了一個單目RGB攝影機作爲系統的輸入裝置。此系統使用了深度學習中手勢辨識的技術。此系統利用了MediaPipe框架作爲手部追蹤的系統核心架構。我們將從手部追蹤的部分獲得的手部關節數據傳到虛擬環境,然後在虛擬環境中的3D手部模型對應到經過處理的手部關節數據。在映射的處理上,我們需使用反向運動學 (IK) 的概念和一些演算法,才能將3D手部模型做出對應的行爲或方向轉換。由於被預測出來的z軸關節點數據只表示與手部之間的關鍵點有關聯,所以我們需要加入一些條件讓3D手部關鍵點與虛擬環境有關聯性。爲了解決3D手部模型在每一幀中會不斷抖動的問題,我們使用了平滑算法,以提升3D手部模型在虛擬環境中的穩定性。此系統的主要功能有2項:虛擬鍵盤和虛擬手寫。通過這些功能所輸入的文本或圖案能被記錄在3D虛擬空間中的便利貼上。我們透過物件碰撞的概念來作爲與虛擬物件互動的方法。最後,系統評估表明使用者無需使用VR控制器就能與虛擬物件進行互動。只需隨意移動手部或彎曲手指就能讓3D手部模型做出類似的動作。這項以單目RGB攝影機實施VR虛擬實境體驗之技術在系統上有一定的實用性和穩定性。


    The technology of deep learning has been widely known in the world, and people have developed these technologies into several projects and applied them in various fields. Nowadays, people are gradually using virtual reality (VR) technology with deep learning to develop practical tools or systems to allow users to perform specific tasks. The introduction of virtual reality technology is to reduce the burden on users, provide a good user experience, and break through the limitations of use in certain scenarios. In this paper, we propose an interactive system for Metaverse virtual conference. We only used a monocular RGB camera as the input device for the system. This system uses the techniques of gesture recognition in deep learning. This system utilizes the MediaPipe framework as the core architecture of the hand tracking system. We transfer the hand joint data obtained from the hand tracking part to the virtual environment, and then the 3D hand model in the virtual environment corresponds to the processed hand joint data. In the process of mapping, we need to use the concept of inverse kinematics (IK) and some algorithms to convert the 3D hand model to the corresponding behavior or transform. Since the predicted z-axis joint point data only indicates that it is related to the keypoints between the hand, we need to add some conditions to make the 3D hand keypoints related to the virtual environment. To solve the problem that the 3D hand model keeps shaking in each frame, we use a smoothing algorithm to improve the stability of the 3D hand model in the virtual environment. There are 2 main functions in this system: virtual keyboard and virtual handwriting. Text or patterns entered through these functions can be recorded on post-it notes in a 3D virtual space. We use the concept of object collision as a way to interact with virtual objects. Finally, system evaluations show that users can interact with virtual objects without using a VR controller. The 3D hand model can perform similar actions by simply moving the hand or bending the fingers. This technology of implementing VR experience with a monocular RGB camera has certain practicality and stability on the system.

    摘要 i Abstract ii Contents iii List of Figures v List of Tables v 1 Introduction 1 2 Related Works 4 2.1 Unity 4 2.2 Intel Realsense SDK 2.0 6 2.2.1 Hand Tracking Module 6 2.3 OpenPose 8 3 Primary Research 10 3.1 MediaPipe 10 3.1.1 MediaPipe Hands 10 3.1.2 Palm Detection Model 11 3.1.3 Hand Landmark Model 12 3.2 Inverse Kinematics (IK) 13 4 Methodology 16 4.1 The Integration of MediaPipe and Unity 16 4.1.1 Open Sound Control (OSC) 16 4.1.2 Hand Keypoints Data Preprocessing 17 4.2 3D Hand Model 19 4.3 IK in 3D Hand Model 21 3.3.1 Motion of the 3D Fingers 22 4.4 3D Hand Model Transform Processing 26 4.4.1 Transform Rotation 26 4.4.2 Moving Forward and Backward in Unity 30 4.5 3D Object Collision 32 4.6 Simple Moving Average (SMA) 34 5 Experiments 36 5.1 Environment Setup 36 5.1.1 Equipment Setup 36 5.2 System Evaluation 37 5.2.1 Virtual Keyboard Scene 40 5.2.2 Virtual Handwriting Scene 41 5.2.3 Post-it note in Virtual Space 43 6 Conclusion 44 7 Reference 46

    [1] A. Juliani, V. Berges, E. Teng, A. Cohen, J. Harper, C. Elion, C. Goy, Y. Gao, H. Henry, M. Mattar, and D. Lange. Unity: A General Platform for Intelligent Agents. arXiv preprint arXiv:1809.02627v2, 2020.
    [2] Z. Cao, G. Hidalgo, T. Simon, S. Wei, and Y. Sheikh. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
    [3] Z. Cao, G. Hidalgo, T. Simon, S. Wei, and Y. Sheikh. Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. CVPR, 2017.
    [4] C. Lugaresi, J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, F. Zhang, C. Chang, M. G. Yong, J. Lee, W. Chang, W. Hua, M. Georg, and M. Grundmann. Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172, 2019.
    [5] F. Zhang, V. Bazarevsky, A. Vakunov, A. Tkachenka, G. Sung, C. Chang, and M. Grundmann. MediaPipe Hands: On-device Real-time Hand Tracking. CVPR Workshop on Computer Vision for Augmented and Virtual Reality, 2020.
    [6] Google LLC. MediaPipe Hands. [Online]. Available: https://google.github.io/mediapipe/solutions/hands [Accessed June 2021].
    [7] A. Aristidou and J. Lasenby. Inverse kinematics: a review of existing techniques and introduction of a new fast iterative solver. Technical Report CUED/F-INFENG/TR-632, University of Cambridge, 2019.
    [8] S. Scherzinger, A. Roennau, and R. Dillmann. Inverse Kinematics with Forward Dynamics Solvers for Sampled Motion Tracking. IEEE ICAR, 2019.
    [9] B. Dariush, M. Gienger, B. Jian, C. Goerick, and K. Fujimura. Whole body humanoid control from human motion descriptors. IEEE ICRA, 2008.
    [10] H. Brock, F. Law, K. Nakadai, and Y. Nagashima. Learning Three-dimensional Skeleton Data from Sign Language Video. ACM Transactions on Intelligent Systems and Technology, Volume 11, 2020.
    [11] B. Kim. An Adaptive Neural Network Learning-based Solution for the Inverse Kinematics of Humanoid Fingers. International Journal of Advanced Robotic Systems, 2014.
    [12] C. Lee and J. Chang. A Workspace-Analysis-Based Genetic Algorithm for Solving Inverse Kinematics of a Multi-Fingered Anthropomorphic Hand. Applied Sciences 2021, 11(6), 2668, 2021.
    [13] Earnest Robot. Auto Hand - VR Physics Interaction. Unity Asset Store. [Online]. Available: https://assetstore.unity.com/packages/tools/game-toolkits/auto-hand-vr-physics-interaction-165323#description [Accessed October 2021].
    [14] Intel RealSense team. Intel RealSense SDK 2.0. [Online]. Available: https://github.com/IntelRealSense/librealsense [Accessed January 2021].
    [15] I. C. Jurado, U. R. Vargas, and C. Penaloza. AI-VR Platform for Hand Rehabilitation. IEEE SMC, 2020.

    QR CODE
    :::