| 研究生: |
陳思頴 Sih-Ying Chen |
|---|---|
| 論文名稱: |
多尺度區域強化之姿態遷移用於自動人像生成 Multi-Scale Region Reinforcement on Pose Transfer for Automatic Person Image Generation |
| 指導教授: | 鄭旭詠 |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 中文 |
| 論文頁數: | 64 |
| 中文關鍵詞: | 生成對抗網路 、姿態轉換 、OpenPose |
| 外文關鍵詞: | Generative Adversarial Network, Pose Transfer, OpenPose |
| 相關次數: | 點閱:14 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
随著人工智慧與深度學習領域的蓬勃發展,已被廣泛應用於不同領域中,
不論是在語義分析、影像識別等,都有相當顯著的貢獻。如今人工智慧的目標
不在是讓電腦擁有智慧,而是希望讓電腦也具有創造力,如寫詩、作曲、或者
是影像生成等,透過人工智慧,無中生有,創造出無限潛能。本篇論文提供一
個姿態遷移系統,藉由人物圖像與目標姿態,讓電腦自動生成出符合目標姿態
的人物圖像。
本論文使用了漸進式的姿態遷移生成模型架構,透過漸近式的方式將人物
圖像的姿態轉換至目標姿態。在轉換的過程中,我們提出了多尺度區域提取器
(Multi-Scale Region Extractor),透過擷取人物影像中特定的區域位置的特徵圖,
來改善自動編碼器遺失資料訊息的問題,同時也降低了姿態遷移中斷肢的可能
性。並針對於多尺度區域特徵提取器,設計了區域風格損失函數 (Region Style
Loss),來優化訓練生成模型的過程。最後,基於本系統的架構下,只要使用一
張人物圖像,便可以針對喜好生成出不同舞蹈風格的影片。
With the vigorous development of artificial intelligence and deep learning, they
have been widely used in different fields. Whether in semantic analysis, image
recognition, etc., there are quite significant contributions. The goals of artificial
intelligence are to make computer creative, such as writing poems, composing, or
making images, making out of noting, rather than to have intelligence. This thesis
proposes a DanceGAN, which can make computer generate character images that
matches the target posture automatically.
In this thesis, we use a progressive pose transfer to generate a model architecture,
which transforms the pose of the character images to the target pose in an asymptotic
manner. In the transform process, we propose the Multi-Scale Region Extractor to
capture specific area of the character image to improve the missing data message
problems of auto encoder. We also design the Region Style Loss for Multi-Scale
Region Extractor to improve the training process of generating model. Finally, based
on the architecture of this system, we can generate different dancing style according
to your favorite using only one character image.
[1] Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. CoRR, abs/1312.6114, 2013.
[2] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Proc. NIPS, pages 2672–2680, 2014.
[3] Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. CoRR, abs/1411.1784, 2014
[4] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. In Proc. CVPR 2017, 2017.
[5] Christoph Lassner, Gerard Pons-Moll, and Peter V. Gehler. A generative model of people in clothing. In Proc. ICCV, pages 853–862, 2017.
[6] Liqian Ma, Xu Jia, Qianru Sun, Bernt Schiele, Tinne Tuytelaars, and Luc Van Gool. Pose guided person image generation. In Proc. NIPS, pages 405–415, 2017
[7] Aliaksandr Siarohin, Enver Sangineto, Stephane Lathuili ere, and Nicu Sebe. Deformable gans for pose-based human image generation. CoRR, abs/1801.00055, 2018.
[8] Natalia Neverova, Riza Alp Guler, and Iasonas Kokkinos. Dense pose transfer. arXiv preprint arXiv:1809.01995, 2018.
[9] Zhen Zhu, Tengteng Huang, Baoguang Shi, Miao Yu1, Bofei Wang, Xiang Bai1. Progressive Pose Attention Transfer for Person Image Generation. Arxiv preprint arXiv:1904.03349v3 [cs.CV] 13 May 2019
[10] Liu, Ziwei and Luo, Ping and Qiu, Shi and Wang, Xiaogang and Tang, Xiaoou. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 2016.
[11] Karen Simonyan, Andrew Zisserman. Very Deep Convolutional Networks For Large-Scale Image Recognition, Published as a conference paper at ICLR 2015
[12] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille. Semantic Image Segmentation With Deep Convolutional Nets And Fully Connected CRFs, Arxiv preprint arXiv:1412.7062v4 [cs.CV] 7 Jun 2016
[13] Liang-Chieh Chen, George Papandreou, Kevin Murphy, and Alan L. Yuille, DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. Arxiv preprint arXiv:1606.00915v2 [cs.CV] 12 May 2017
[14] Liang-Chieh Chen George Papandreou Florian Schroff Hartwig Adam. Rethinking Atrous Convolution for Semantic Image Segmentation, Arxiv preprint arXiv:1706.05587v3 [cs.CV] 5 Dec 2017
[15] Tim Salimans, Ian J. Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In Proc. NIPS, pages 2226–2234, 2016.
[16] Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P.Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Processing, 13(4):600–612, 2004.
[17] Tao Yu, Zongyu Guo, Xin Jin, Shilin Wu, Zhibo Chen, Weiping Li, Zhizheng Zhang, Sen Liu. Region Normalization for Image Inpainting. Arxiv preprint arXiv:1911.10375v1 [cs.CV] 23 Nov 2019
[18] Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. Realtime multi-person 2d pose estimation using part affinity fields. In Proc. CVPR, pages 1302–1310, 2017.
[19] Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2d human pose estimation: New benchmark and state of the art analysis. In Proc. CVPR, 2014
[20] Bo Zhao, Xiao Wu, Zhi-Qi Cheng, Hao Liu, Zequn Jie, and Jiashi Feng. Multi-view image generation from a single-view. In 2018 ACM Multimedia Conference on Multimedia Conference, MM 2018, Seoul, Republic of Korea, October 22-26, 2018, pages 383–391, 2018.
[21] Hao Zhu, Hao Su, Peng Wang, Xun Cao, and Ruigang Yang. View extrapolation of human body from a single image. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
[22] Amit Raj, Patsorn Sangkloy, Huiwen Chang, James Hays, Duygu Ceylan, and Jingwan Lu. Swapnet: Image based garment transfer. In Computer Vision- ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XII, pages 679–695, 2018.
[23] Mihai Zanfir, Alin-Ionut Popa, Andrei Zanfir, and Cristian Sminchisescu. Human appearance transfer. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
[24] Guha Balakrishnan, Amy Zhao, Adrian V. Dalca, Frdo Durand, and John Guttag. Synthesizing images of humans in unseen poses. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
[25] Caroline Chan, Shiry Ginosar, Tinghui Zhou, andAlexei A Efros. Everybody dance now. arXiv preprint arXiv:1808.07371, 2018.
[26] Liqian Ma, Qianru Sun, Stamatios Georgoulis, Luc Van Gool, Bernt Schiele, and Mario Fritz. Disentangled person image generation. In IEEE Conference on Computer Vision and Pattern Recognition, 2018.
[27] Chenyang Si, Wei Wang, Liang Wang, and Tieniu Tan. Multistage adversarial losses for pose-based human image synthesis. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
[28] Matthew Loper, Naureen Mahmood, Javier Romero,Gerard Pons-Moll, and Michael J. Black. SMPL:A skinned multi-person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, oct 2015.
[29] Patrick Esser, Ekaterina Sutter, and Bjorn Ommer. A ¨variational u-net for conditional appearance and shape generation. In IEEE Conference on Computer Vision and Pattern Recognition, pages 8857–8866, 2018.
[30] Ioffe, S., and Szegedy, C. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.
[31] J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation. in CVPR, 2015.
[32] V. Badrinarayanan, A. Kendall, and R. Cipolla, Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv:1511.00561, 2015.
[33] F. Yu and V. Koltun, Multi-scale context aggregation by dilated convolutions. in ICLR, 2016.
[34] LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. In Proc. IEEE, 1998.
[35] Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II, pages 694–711, 2016.
[36] C. J. van den Branden Lambrecht and O. Verscheure, Perceptual quality measure using a spatio-temporal model of the human visual system, in Proc. SPIE, vol. 2668, pp. 450–461, 1996.
[37] Z. Wang and A. C. Bovik, Embedded foveation image coding, IEEE Trans. Image Processing, vol. 10, pp. 1397–1410, Oct. 2001.
[38] J. Xing, An image processing model of contrast perception and discrimination of the human visual system, in SID Conference, (Boston), May 2002.