多尺度區域強化之姿態遷移用於自動人像生成｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳思頴 Sih-Ying Chen
論文名稱：	多尺度區域強化之姿態遷移用於自動人像生成 Multi-Scale Region Reinforcement on Pose Transfer for Automatic Person Image Generation
指導教授：	鄭旭詠
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering
論文出版年：	2020
畢業學年度：	108
語文別：	中文
論文頁數：	64
中文關鍵詞：	生成對抗網路、姿態轉換、OpenPose
外文關鍵詞：	Generative Adversarial Network, Pose Transfer, OpenPose
相關次數：	點閱：14 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

随著人工智慧與深度學習領域的蓬勃發展，已被廣泛應用於不同領域中，
不論是在語義分析、影像識別等，都有相當顯著的貢獻。如今人工智慧的目標
不在是讓電腦擁有智慧，而是希望讓電腦也具有創造力，如寫詩、作曲、或者
是影像生成等，透過人工智慧，無中生有，創造出無限潛能。本篇論文提供一
個姿態遷移系統，藉由人物圖像與目標姿態，讓電腦自動生成出符合目標姿態
的人物圖像。
本論文使用了漸進式的姿態遷移生成模型架構，透過漸近式的方式將人物
圖像的姿態轉換至目標姿態。在轉換的過程中，我們提出了多尺度區域提取器
(Multi-Scale Region Extractor)，透過擷取人物影像中特定的區域位置的特徵圖，
來改善自動編碼器遺失資料訊息的問題，同時也降低了姿態遷移中斷肢的可能
性。並針對於多尺度區域特徵提取器，設計了區域風格損失函數 (Region Style
Loss)，來優化訓練生成模型的過程。最後，基於本系統的架構下，只要使用一
張人物圖像，便可以針對喜好生成出不同舞蹈風格的影片。

With the vigorous development of artificial intelligence and deep learning, they
have been widely used in different fields. Whether in semantic analysis, image
recognition, etc., there are quite significant contributions. The goals of artificial
intelligence are to make computer creative, such as writing poems, composing, or
making images, making out of noting, rather than to have intelligence. This thesis
proposes a DanceGAN, which can make computer generate character images that
matches the target posture automatically.
In this thesis, we use a progressive pose transfer to generate a model architecture,
which transforms the pose of the character images to the target pose in an asymptotic
manner. In the transform process, we propose the Multi-Scale Region Extractor to
capture specific area of the character image to improve the missing data message
problems of auto encoder. We also design the Region Style Loss for Multi-Scale
Region Extractor to improve the training process of generating model. Finally, based
on the architecture of this system, we can generate different dancing style according
to your favorite using only one character image.

摘要    V
Abstract    VI
致謝    VII
目錄    VIII
圖目錄    X
表目錄    XI
第一章 緒論    1
1 研究動機    1
2 相關文獻    2
3 系統架構    5
4 論文架構    6
第二章 文獻回顧    7
1 DeepFashion資料集    7
2 VGG-19 網路模型的特徵提取器    8
3 圖像語義切割    9
4 生成網路模型    11
4.1 AutoEncoder    12
4.2 Generative Adversarial Network    13
第三章 研究方法與系統程式    16
1 資料集    17
2    遷移式的生成模型    18
2.1 Encoder & Decoder    19
2.1.1 Multiple Scale Region Extractor    19
2.1.2 Learnable Region Normalization    21
2.2 Pose-Attentional Transfer Network    25
3 鑑別器    27
4 損失函數    27
第四章 實驗結果    30
1 設備環境    30
2 資料集    30
3 驗證指標    31
3.1 Inception Score    31
3.2 SSIM（Structural Similarity）    32
4 方法比較    33
4.1 Pose-Transfer GAN v.s. DanceGAN    33
4.2 不同的區域提取用於MSRE    37
4.3 不同標準化的影響    44
5 速度評測    47
第五章 結論與未來研究方向    48
參考文獻    49


                                

[1] Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. CoRR, abs/1312.6114, 2013.

[2] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Proc. NIPS, pages 2672–2680, 2014.

[3] Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. CoRR, abs/1411.1784, 2014

[4] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. In Proc. CVPR 2017, 2017.

[5] Christoph Lassner, Gerard Pons-Moll, and Peter V. Gehler. A generative model of people in clothing. In Proc. ICCV, pages 853–862, 2017.

[6] Liqian Ma, Xu Jia, Qianru Sun, Bernt Schiele, Tinne Tuytelaars, and Luc Van Gool. Pose guided person image generation. In Proc. NIPS, pages 405–415, 2017

[7] Aliaksandr Siarohin, Enver Sangineto, Stephane Lathuili ere, and Nicu Sebe. Deformable gans for pose-based human image generation. CoRR, abs/1801.00055, 2018.

[8] Natalia Neverova, Riza Alp Guler, and Iasonas Kokkinos. Dense pose transfer. arXiv preprint arXiv:1809.01995, 2018.

[9] Zhen Zhu, Tengteng Huang, Baoguang Shi, Miao Yu1, Bofei Wang, Xiang Bai1. Progressive Pose Attention Transfer for Person Image Generation. Arxiv preprint arXiv:1904.03349v3 [cs.CV] 13 May 2019

[10] Liu, Ziwei and Luo, Ping and Qiu, Shi and Wang, Xiaogang and Tang, Xiaoou. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 2016.

[11] Karen Simonyan, Andrew Zisserman. Very Deep Convolutional Networks For Large-Scale Image Recognition, Published as a conference paper at ICLR 2015
[12] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille. Semantic Image Segmentation With Deep Convolutional Nets And Fully Connected CRFs, Arxiv preprint arXiv:1412.7062v4 [cs.CV] 7 Jun 2016

[13] Liang-Chieh Chen, George Papandreou, Kevin Murphy, and Alan L. Yuille, DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. Arxiv preprint arXiv:1606.00915v2 [cs.CV] 12 May 2017

[14] Liang-Chieh Chen George Papandreou Florian Schroff Hartwig Adam. Rethinking Atrous Convolution for Semantic Image Segmentation, Arxiv preprint arXiv:1706.05587v3 [cs.CV] 5 Dec 2017

[15] Tim Salimans, Ian J. Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In Proc. NIPS, pages 2226–2234, 2016.

[16] Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P.Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Processing, 13(4):600–612, 2004.

[17] Tao Yu, Zongyu Guo, Xin Jin, Shilin Wu, Zhibo Chen, Weiping Li, Zhizheng Zhang, Sen Liu. Region Normalization for Image Inpainting. Arxiv preprint arXiv:1911.10375v1 [cs.CV] 23 Nov 2019

[18] Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. Realtime multi-person 2d pose estimation using part affinity fields. In Proc. CVPR, pages 1302–1310, 2017.

[19] Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2d human pose estimation: New benchmark and state of the art analysis. In Proc. CVPR, 2014

[20] Bo Zhao, Xiao Wu, Zhi-Qi Cheng, Hao Liu, Zequn Jie, and Jiashi Feng. Multi-view image generation from a single-view. In 2018 ACM Multimedia Conference on Multimedia Conference, MM 2018, Seoul, Republic of Korea, October 22-26, 2018, pages 383–391, 2018.

[21] Hao Zhu, Hao Su, Peng Wang, Xun Cao, and Ruigang Yang. View extrapolation of human body from a single image. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

[22] Amit Raj, Patsorn Sangkloy, Huiwen Chang, James Hays, Duygu Ceylan, and Jingwan Lu. Swapnet: Image based garment transfer. In Computer Vision- ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XII, pages 679–695, 2018.

[23] Mihai Zanfir, Alin-Ionut Popa, Andrei Zanfir, and Cristian Sminchisescu. Human appearance transfer. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

[24] Guha Balakrishnan, Amy Zhao, Adrian V. Dalca, Frdo Durand, and John Guttag. Synthesizing images of humans in unseen poses. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

[25] Caroline Chan, Shiry Ginosar, Tinghui Zhou, andAlexei A Efros. Everybody dance now. arXiv preprint arXiv:1808.07371, 2018.

[26] Liqian Ma, Qianru Sun, Stamatios Georgoulis, Luc Van Gool, Bernt Schiele, and Mario Fritz. Disentangled person image generation. In IEEE Conference on Computer Vision and Pattern Recognition, 2018.

[27] Chenyang Si, Wei Wang, Liang Wang, and Tieniu Tan. Multistage adversarial losses for pose-based human image synthesis. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

[28] Matthew Loper, Naureen Mahmood, Javier Romero,Gerard Pons-Moll, and Michael J. Black. SMPL:A skinned multi-person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, oct 2015.

[29] Patrick Esser, Ekaterina Sutter, and Bjorn Ommer. A ¨variational u-net for conditional appearance and shape generation. In IEEE Conference on Computer Vision and Pattern Recognition, pages 8857–8866, 2018.

[30] Ioffe, S., and Szegedy, C. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.

[31] J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation. in CVPR, 2015.

[32] V. Badrinarayanan, A. Kendall, and R. Cipolla, Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv:1511.00561, 2015.

[33] F. Yu and V. Koltun, Multi-scale context aggregation by dilated convolutions. in ICLR, 2016.

[34] LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. In Proc. IEEE, 1998.

[35] Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II, pages 694–711, 2016.

[36] C. J. van den Branden Lambrecht and O. Verscheure, Perceptual quality measure using a spatio-temporal model of the human visual system, in Proc. SPIE, vol. 2668, pp. 450–461, 1996.

[37] Z. Wang and A. C. Bovik, Embedded foveation image coding, IEEE Trans. Image Processing, vol. 10, pp. 1397–1410, Oct. 2001.

[38] J. Xing, An image processing model of contrast perception and discrimination of the human visual system, in SID Conference, (Boston), May 2002.

簡易檢索 / 詳目顯示

相關論文