跳到主要內容

簡易檢索 / 詳目顯示

研究生: 李元熙
LI-YUAN-SI
論文名稱: 基於3D全身人體追蹤及虛擬試衣之手語展示系統
Sign Language Display System Based on 3D Body Tracking and Virtual Try-on
指導教授: 施國琛
SHIH-GUO-CHEN
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 44
中文關鍵詞: 虛擬試衣人體建模手語
相關次數: 點閱:9下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 手語是一種視覺交流形式,它依靠手勢、面部表情和肢體語言的組合來傳達意義。 全世界數以百萬計的失聰或聽障人士以及與他們交流的人每天都在使用它。 然而,儘管它很重要,但由於手語的複雜性和可變性,手語識別和翻譯仍然是一項具有挑戰性的任務。

    近年來,計算機視覺和技術越來越多地應用於手語識別和翻譯,並取得了好的成果。 在這項工作中,我們介紹了一種基於三維身體建模和虛擬試衣的手語顯示系統。 我們的方法涉及使用身體網格估計來生成手語者的 3D 人體模型,然後將其用作多服裝網絡的輸入以模擬手語者衣服的外觀。

    我們收集了包含 100 個手語影片的資料集,每個影片都有不同的手語者表演一系列手語。 為了使用這些影片,我們首先使用 YOLOv5 裁剪出手語者以創建更好的環境來進行人體網格估計。 並使用旨在提高手腕旋轉精度的身體網格估計算法從每個影片中提取手語者的身體模型,然後應用虛擬試穿的方法在手語者身上模擬不同類型的服裝。 之後,我們得到了一個姿勢和形狀與原始手語者相同的虛擬人物模型,其衣服是從衣裝資料集中選擇的。 我們將這些模型一幀一幀地組合起來,生成一個影片,該影片顯示了一個虛擬人體模型穿著虛擬服裝演示手語。


    Sign language is a form of visual communication that relies on a combination of hand gestures, facial expressions, and body language to convey meaning. Millions of individuals worldwide who are deaf or hard of hearing, as well as by those who communicate with them utilize it on a daily basis. However, despite its importance, sign language recognition and translation remains a challenging task since the complexity and variability of sign language.

    In recent years, computer vision and technique has been increasingly applied to sign language recognition and translation, with promising results. In this work, we introduce a sign language display system, based on three-dimensional body modeling[1] and virtual try-on[2]. Our approach involves using body mesh estimation to generate a 3D human model of the signer, which is then used as input to a multi-garment network[2] to simulate the appearance of clothing on the signer.

    We collected a dataset of 100 sign language videos, each featuring a different signer performing a range of signs. To use these videos, we firstly use YOLOv5[17] to crop out the signer to create a better environment to do human mesh estimation. And used body mesh estimation algorithms which aims to improve the accuracy of wrist rotation to extract the signer's body model from each video, and then applied a virtual try-on method to simulate different types of clothing on the signer. Afterwards, we got a virtual human model whose pose and shape is same as the original signer, and its clothes is select from a cloth dataset. We combined these model frame by frame to generate a video which shows a virtual human model with virtual clothes acting sign language.

    1.Introduction 1 2.Related Work 2 2.1 YOLO 2 2.2 Deep Learning 3 2.3 Convolutional Neural Network 4 2.4 Human body estimation 6 2.4.1 2D pose estimation 7 2.4.2 3D pose estimation 9 2.4.3 Mesh estimation 10 2.5 3D human model 15 2.6 Virtual try-on 17 2.7 Internet Information Services 19 3.Methodology 20 3.1 Introduction 20 3.2 YOLO detection 20 3.3 3D Whole-Body Estimation 21 3.4 Virtual Try-on 26 3.5 Deploy Website to IIS 30 3.6 Conclusion 30 4.Experiments 31 4.1 Data Collection 31 4.2 Experimental Setup 31 4.3 Experimental Step 31 5.Conclusion 37 6.Reference 38

    [1]Gyeongsik Moon, Hongsuk Choi, Kyoung Mu Lee, Dept. of ECE & ASRI, IPAI, Seoul National University, Korea. Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation. Moon et al. - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) - 2022
    [2]Bharat Lal Bhatnagar, Garvita Tiwari, Christian Theobalt, and Gerard Pons-Moll. Multi-Garment Net: Learning to Dress 3D People from Images. 2019 IEEE/CVF International Conference on Computer Vision (ICCV)
    [3]Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, University of Washington, Allen Institute for AI, Facebook AI Research. You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) - 2016
    [4]https://brohrer.mcknote.com/zh-Hant/how_machine_learning_works/how_convolutional_neural_networks_work.html
    [5]Zhe Cao, Tomas Simon, Shih-En Wei and Yaser Sheikh. Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) - 2017
    [6]C. Lugaresi, J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, F. Zhang, C. Chang, M. G. Yong, J. Lee, W. Chang, W. Hua, M. Georg, and M. Grundmann. Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172, 2019.
    [7]Arindam Sengupta, Feng Jin, Renyuan Zhang and Siyang Cao Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ USA. mm-Pose: Real-Time Human Skeletal Posture Estimation using mmWave Radars and CNNs. IEEE Sensors Journal - 2020.
    [8]https://mmpose.readthedocs.io/zh_CN/latest/demos.html
    [9]Yu Rong, Takaaki Shiratori, Hanbyul Joo, The Chinese University of Hong Kong, Facebook Reality Labs, Facebook AI Research, FrankMocap: Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration. Rong et al. - 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) - 2021
    [10]Angjoo Kanazawa, Michael J. Black, David W. Jacobs, Jitendra Malik, University of California, Berkeley, MPI for Intelligent Systems, Tubingen, Germany, University of Maryland, College Park. End-to-end Recovery of Human Shape and Pose. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition - 2018.
    [11]Muhammed Kocabas, Nikos Athanasiou, Michael J. Black, Max Planck Institute for Intelligent Systems, Tubingen, Germany, Max Planck ETH Center for Learning Systems. VIBE: Video Inference for Human Body Pose and Shape Estimation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) - 2020
    [12]Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, Michael J. Black, Max Planck Institute for Intelligent Systems, Tubingen, Germany, Industrial Light and Magic, San Francisco, CA. SMPL: A Skinned Multi-Person Linear Model. Loper et al. - ACM Transactions on Graphics - 2015

    [13]Pavlakos, Georgios and Choutas, Vasileios and Ghorbani, Nima and Bolkart, Timo and Osman, Ahmed A. A. and Tzionas, Dimitrios and Black, Michael J. Expressive Body Capture: 3D Hands, Face, and Body from a Single Image. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) - 2019.
    [14]Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu, Larry S. Davis University of Maryland, College Park. VITON: An Image-based Virtual Try-on Network. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition - 2018
    [15]Bochao Wang, Huabin Zheng, Xiaodan Liang, Yimin Chen, Liang Lin, Meng Yang. Toward Characteristic-Preserving Image-based Virtual Try-On Network. Computer Vision – ECCV 2018 - 2018.
    [16]http://home.ustc.edu.cn/~pjh/openresources/cslr-dataset-2015/index.html
    [17]https://github.com/ultralytics/yolov5
    [18]https://medium.com/@_Xing_Chen_/yolov5-%E8%A9%B3%E7%B4%B0%E8%A7%A3%E8%AE%80-724d55ec774
    [19]Yao Feng, Vasileios Choutas, Timo Bolkart, Dimitrios Tzionas, and Michael J. Black. Collaborative regression of expressive bodies using moderation. 2021 International Conference on 3D Vision (3DV) - 2021
    [20]Vasileios Choutas, Georgios Pavlakos, Timo Bolkart, Dimitrios Tzionas, and Michael J Black. Monocular expressive body regression through body-driven attention. Computer Vision – ECCV 2020.
    [21]Yuxiao Zhou, Marc Habermann, Ikhsanul Habibie, Ayush Tewari, Christian Theobalt, Feng Xu1. Monocular Real-time Full Body Capture with Inter-part Correlations. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) - 2021
    [22]https://paperswithcode.com/sota/3d-human-pose-estimation-on-3dpw
    [23]https://github.com/vchoutas/smplx/tree/main/transfer_model
    [24]https://github.com/bharat-b7/MultiGarmentNetwork
    [25]https://zhuanlan.zhihu.com/p/256358005
    [26]https://github.com/facebookresearch/frankmocap/issues/91
    [27]https://github.com/mks0601/Hand4Whole_RELEASE
    [28]https://140.115.51.243/sign-language/list

    QR CODE
    :::