| 研究生: |
劉起華 Liu-Chi Hua |
|---|---|
| 論文名稱: |
樂器表演虛擬換衣系統:以吉他為例 Instruments perform Virtual try-on system: using Guitar as an Example |
| 指導教授: |
施國琛
Timothy K. Shih |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 英文 |
| 論文頁數: | 48 |
| 中文關鍵詞: | 虛擬換衣 、人體分割 、深度學習 |
| 外文關鍵詞: | Virtual try-on, Human Parsing, Deep learning |
| 相關次數: | 點閱:10 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
深度學習已應用在多個領域之中,在圖像領域上更是取代了多個傳統技術,虛擬穿衣(Virtual try-on)便是在該領域上的重要分支,2D應用上常用在商業成衣產業線上試衣,減少消費者到現場的成本,在3D應用中,則會生成一個3D的人體模型,在模型上進行試穿,輸出結果上比2D方法更加穩定,但須繁雜的預處理,如3D掃描等。在本文中,我們提出一個樂器表演換衣系統,可以讓使用者在改變在樂器表演影片中的外衣,可用於短影音娛樂與其他使用者分享,不需重新換裝。此系統使用深度學習的人體分割技術,以SCHP與DensePose作為主要人體分割模型,將人體與衣著各部位傳遞給穿衣模型。因人體分割無法表示被遮蔽的人體,使用OpenPose作為人體與手部骨架系統填補缺失的人體資訊。HR-VITON作為主要穿衣深度模型,利用人體分割與人體骨架的資訊,產生合理的穿衣結果。因HR-VITON為了提高模型泛化能力,會將部分影像挖空作為輸入,來讓結果在邊界的處理更加平滑,但這會影響樂器的還原能力,本文調整了影像挖空的演算法,讓樂器仍保良好的還原效果,讓使用者能在已有表演影像的前提下,做出更多不同的變化。
Deep learning has been applied in various domains and has superseded several traditional techniques in the field of image processing. Virtual try-on is an important sub-domain in this field. In 2D applications, Virtual try-on is often used in the online fitting of commercial garments, reducing the cost for consumers to visit physical locations. In 3D applications, a 3D human body model is generated and used for fitting, resulting in more stable outputs than 2D methods, but requiring complex preprocessing such as 3D scanning. In this paper, we propose an instrument performance virtual try-on system that allows users to change their clothing in instrument performance videos, suitable for sharing short video entertainment with other users without changing clothes. The system uses deep learning-based human body segmentation techniques, with SCHP and DensePose as the main body segmentation models, and transfers body and clothing parts to the Virtual try-on model. Since human body segmentation cannot represent occluded body parts, OpenPose is used as a body and hand skeleton system to supplement missing body information. HR-VITON serves as the main Virtual try-on model, using body segmentation and body skeleton information to generate reasonable fitting results. To improve the model's generalization ability, HR-VITON introduces a hole digging technique for input images to achieve smoother boundary handling. However, this negatively affects the instrument restoration capability. In this study, we adjust the hole digging algorithm to maintain good instrument restoration effects, enabling users to create various changes based on existing performance videos.
[1] Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu, Larry S. Davis University of Maryland, College Park. “VITON: An Image-based Virtual Try-on Network” Conference on Computer Vision and Pattern Recognition (CVPR), 2018
[2] S. Belongie, J. Malik, J. Puzicha. “Shape matching and object recognition using shape contexts” IEEE TPAMI, 2002
[3] O. Ronneberger, P. Fischer, T. Brox. “U-net: Convolutional networks for biomedical image segmentation” In MIC-CAI, 2015
[4] Han Yang, Ruimao Zhang, Xiaobao Guo, Wei Liu, Wangmeng Zuo, Ping Luo. “Towards Photo-Realistic Virtual Try-On by Adaptively Generating ↔ Preserving Image Content” Conference on Computer Vision and Pattern Recognition (CVPR), 2020
[5] Bochao Wang, Huabin Zheng, Xiaodan Liang, Yimin Chen, Liang Lin, Meng Yang. “Toward Characteristic-Preserving Image-based Virtual Try-On Network” European Conference on Computer Vision (ECCV), 2018
[6] Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu. “Spatial Transformer Networks” Conference on Neural Information Processing Systems(NIPS), 2015
[7] Peike Li, Yunqiu Xu, Yunchao Wei, Yi Yang. “Self-Correction for Human Parsing” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022
[8] Tao Ruan, Ting Liu, Zilong Huang, Yunchao Wei, Shikui Wei, Yao Zhao. “Devil in the Details: Towards Accurate Single and Multiple Human Parsing” AAAI Conference on Artificial Intelligence, 2019
[9] Ke Gong, Xiaodan Liang, Yicheng Li, Yimin Chen, Ming Yang, Liang Lin. “Instance-level Human Parsing via Part Grouping Network” European Conference on Computer Vision(ECCV), 2018
[10] Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, Yaser Sheikh. “OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021
[11] Rıza Alp Güler, Natalia Neverova, Iasonas Kokkinos. “DensePose: Dense Human Pose Estimation In The Wild” Conference on Computer Vision and Pattern Recognition (CVPR), 2018
[12] Sangyun Lee, Gyojung Gu, Sunghyun Park, Seunghwan Choi, Jaegul Choo. “High-Resolution Virtual Try-On with Misalignment and Occlusion-Handled Conditions” European Conference on Computer Vision (ECCV), 2022
[13] Dong-Hyun Lee. “Pseudo-Label:The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks” International Conference on Machine Learning(ICML), 2013
[14] Samuli Laine, Timo Aila. “Temporal Ensembling for Semi-Supervised Learning” International Conference on Learning Representations(ICLR), 2017
[15] Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollár. “Microsoft COCO: Common Objects in Context” www.arxiv.org, 2014
[17] Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, Michael J. Black. “SMPL, A Skinned Multi-Person Linear Model” https://smpl.is.tue.mpg.de/index.html, 2020
[18] Rıza Alp Güler, George Trigeorgis, Epameinondas Antonakos, Patrick Snape, Stefanos Zafeiriou, Iasonas Kokkinos. “DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild” www.arxiv.org, 2016
[19] Taesung Park, Ming-Yu Liu, Ting-Chun Wang, Jun-Yan Zhu. “Semantic Image Synthesis with Spatially-Adaptive Normalization”, www.arxiv.org, 2019
[20] Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro. “High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs”, www.arxiv.org, 2017
[21] GoGoDuck912. “Self-Correction-Human-Parsing”, https://github.com/GoGoDuck912/Self-Correction-Human-Parsing, 2021
[22] Google Research, “Open Images Dataset V7 and Extensions”, https://storage.googleapis.com/openimages/web/index.html, 2022
[23] OpenMMLab. “mmdetection”, https://github.com/open-mmlab/mmdetection, 2023
[24] Bharat Lal Bhatnagar, Garvita Tiwari, Christian Theobalt and Gerard Pons-Moll, “Multi-Garment Net: Learning to Dress 3D People from Images”, International Conference on Computer Vision(ICCV), 2019
[25] Fuwei Zhao, Zhenyu Xie, Michael Kampffmeyer, Haoye Dong, Songfang Han, Tianxiang Zheng, Tao Zhang, Xiaodan Liang, “M3D-VTON: A Monocular-to-3D Virtual Try-On Network”, International Conference on Computer Vision(ICCV), 2021