跳到主要內容

簡易檢索 / 詳目顯示

研究生: 陳書恆
Shu-Heng Chen
論文名稱: 使用生成對抗學習的全卷積網路移除影像中的外嵌文字
Removing Embedded Text in Images via Fully Convolutional Networks with Generative Adversarial Learning
指導教授: 曾定章
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 61
中文關鍵詞: 影像修復深度學習生成對抗網路
外文關鍵詞: image inpainting, deep learning, generative adversarial network
相關次數: 點閱:9下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 影像加上文字是網路上最普遍被使用的媒介之一。舉例來說,網民會製作大量的梗圖 (memes) 使用在許多的目的上。然而在某些情況下,這些外加的文字會破壞影像的美觀而且增加其他應用的難度,像是場景的辨識、物體的分類…等。因此,本研究主要的目標是提出一個能夠自動清除影像中外嵌文字並補全影像的系統。
    隨著新世代電腦技術的發展,深度學習技術可以應用在影像處理技術上並且表現優於傳統的影像處理方法。在我們提出的系統中,為了獲得更佳的結果,我們利用最新的深度學習框架,建立了兩個模組:文字遮罩生成模組和影像補全模組。文字遮罩生成模組用來自動偵測給定影像中的嵌入文字,再輸出對應的遮罩。影像補全模組則是將受汙染的影像和對應的遮罩影像作為輸入,然後產生修補後的影像。
    我們透過實驗與兩種已經成熟發展的非深度學習的影像修補技術進行比較。結果顯示我們提出的方法比傳統的影像修復技術,修補後的影像更自然且更少瑕疵。


    An image embedded by texts is one of the most common 2D media in the web; for example, the netizen produce lots of this kind pictures or memes for different purposes. In some situations, the added texts make a beauty picture into a garbage. For example, we cannot use the image for some other purposes, such as scene recognition, object classification, …, etc. Therefore, in this study, we aim to propose a system that can clean texts automatically on a given image and inpaint or restore the image.
    With novel generation of computer technology, the deep learning architecture can be applied on the inpainting problem and perform better results than several traditional methods. In the proposed system, we construct two modules using the latest and novel deep learning frameworks to get a great result. The first module, mask generation module, is used for detecting the embedded texts in a given image automatically and products the corresponding bitmap image mask. The second module, image completion module, can inpaint the corrupt images based on the given mask image.
    In the experiments, we compare our results with two fully developed and without deep learning technique methods. We show that the proposed method can provide more natural and less flawed results than the classic image inpainting methods provided.

    Abstract i Table of Contents ii List of Figures iv List of Tables vi Chapter 1 Introduction 1 1.1 Motivation 1 1.2 System overview 2 1.3 Thesis organization 4 Chapter 2 Related Works 5 2.1 Image inpainting 5 2.1.1 Diffusion-based methods 6 2.1.2 Examplar-based methods 6 2.1.3 Others 7 2.2 Deep learning 8 2.2.1 Convolutional neural networks 8 2.2.2 Fully convolutional networks 9 2.2.3 Generative adversarial nets 9 Chapter 3 Methods 11 3.1 System overview 11 3.1.1 Mask generation module 12 3.1.2 Image completion module 14 3.1.3 Overall Architecture 16 3.2 Training 17 3.2.1 Loss functions 17 3.2.2 Learning algorithm 18 Chapter 4 Experiments 20 4.1 Dataset 20 4.1.1 Build training dataset 20 4.1.2 Preprocessing 22 4.2 Environment setting 22 4.3 Results 23 4.3.1 Results on mask generation module 23 4.3.2 Results on image completion module 26 Chapter 5 Evaluation and Comparison 29 Chapter 6 Conclusion and Future Works 34 References 35

    [1] C. Guillemot and O. LeMeur, “Image inpainting: overview and recent advances,” IEEE Signal Processing Magazine, vol.31, no.1, pp.127-144, 2014.
    [2] M. Bertalmio, G. Sapiro, V. Caselles, et al., “Image inpainting,” in Proc. ACM SIGGRAPH Conf., New Orleans, LA, Sep.23-28, 2000, pp.417-424.
    [3] T. F. Chan and J. Shen, “Nontexture inpainting by curvature-driven diffusions,” Journal of Visual Communication and Image Representation, vol.12, no.4, pp.436-449, 2001.
    [4] A. Telea, “An image inpainting technique based on the fast marching method,” Journal of Graphics Tools, vol.9, no.1, pp.23-34, 2004.
    [5] C. Ballester, M. Bertalmio, V. Caselles, et al., “Filling-in by joint interpolation of vector fields and gray levels,” IEEE Trans. Image Processing, vol.10, no.8, pp.1200-1211, 2001.
    [6] T. Chan and J. Shen, “Local inpainting models and tv inpainting,” SIAM Journal on Applied Mathematics, vol.62, no.3, pp.1019-1043, 2001.
    [7] A. Levin, A. Zomet, and Y. Weiss, “Learning how to inpaint from global image statistics,” in Proc. IEEE In. Conf. on Computer Vision, Nice, France, Oct.13-16, 2003, pp.305-312.
    [8] L. Y. Wei and M. Levoy, “Fast texture synthesis using tree-structured vector quantization,” in Proc. ACM SIGGRAPH Conf., New Orleans, LA, Sep.23-28, 2000, pp.479-488.
    [9] M. Ashikhmin, “Synthesizing natural textures,” in Proc. ACM SIGGRAPH Conf. on Symp. Interactive 3D Graphics, Research Triangle Park, NC, Mar.19-21, 2001, pp.217-226.
    [10] A. A. Efros and W. T. Freeman, “Image quilting for texture synthesis and transfer b1 b2 random placement input texture,” in Proc. ACM SIGGRAPH Conf., Los Angeles, CA, Aug.12-17, 2001, pp.341-346.
    [11] V. Kolmogorov and R. Zabih, “What energy functions can be minimized via graph cuts?,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol.26, no.2, pp.147-159, 2004.
    [12] P. Pérez, M. Gangnet, and A. Blake, “Poisson image editing,” ACM Trans. Graphics, vol.22, no.3, pp.313-318, 2003.
    [13] A. Bugeau, M. Bertalmío, V. Caselles, et al., “A comprehensive framework for image inpainting,” IEEE Trans. Image Processing, vol.19, no.10, pp.2634-2645, 2010.
    [14] J. Liu, P. Musialski, P. Wonka, et al., “Tensor completion for estimating missing values in visual data,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol.35, no.1, pp.208-220, 2013.
    [15] D. L. Donoho, “Compressed sensing,” IEEE Trans. Information Theory, vol.52, no.4, pp.1289-1306, 2006.
    [16] M. Elad, J. L. Starck, P. Querre, et al., “Simultaneous cartoon and texture image inpainting using morphological component analysis (MCA),” Applied and Computational Harmonic Analysis, vol.19, no.3, pp.340-358, 2005.
    [17] M. Elad and M. Aharon, “Image denoising via sparse and redundant representation over learned dictionaries,” IEEE Trans. Image Processing, vol.15, no.12, pp.3736-3745, 2006.
    [18] J. Mairal, M. Elad, and G. Sapiro, “Sparse representation for color image restoration,” IEEE Trans. Image Processing, vol.17, no.1, pp.53-69, 2008.
    [19] M. Elad, Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing, 1st ed., New York, NY, Springer Publishing Company, Incorporated, New York, NY, 2010.
    [20] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. Advances In Neural Information Processing Systems, Lake Tahoe, NV, Dec.3-6, 2012, pp.1097-1105.
    [21] Y .LeCun, B. E. Boser, J. S. Denker, et al., “Handwritten digit recognition with a back-propagation network,” in Proc. Advances in Neural Information Processing Systems, Lake Tahoe, NV, Nov.26-29, 1990, pp.396-404.
    [22] Y. LeCun, L. Bottou, Y. Bengio, et al., “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol.86, no.11, pp.2278-2323, 1998.
    [23] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol.20, no.3, pp.273-297, 1995.
    [24] N. Srivastava, G. Hinton, A. Krizhevsky, et al., “Dropout: a simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol.15, no.1, pp.1929-1958, 2014.
    [25] M. Zeiler, D. Krishnan, G. Taylor, et al., “Deconvolutional networks for feature learning,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, San Francisco, CA, Jun.13-18, 2010, pp.2528-2535.
    [26] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in Proc. European Conf. on Computer Vision, Zurich, Switzerland, Sep.8-11, 2014, pp.818-833.
    [27] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Boston, MA, Jun.7-12, 2015, pp.3431-3440.
    [28] A. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily fooled,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Boston, MA, Jun.7-12, 2015, pp.427-436.
    [29] I. Goodfellow, J. Pouget-Abadie, M. Mirza, et al., “Generative adversarial nets,” in Proc. Advances in Neural Information Processing Systems, Montreal, Quebec, Canada, Dec.8-13, 2014, pp.2672-2680.
    [30] D. Pathak, J. Donahue, and A. A. Efros, “Context encoders : feature learning by inpainting,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, Jun.26-Jul.1, 2016, pp.2536-2544.
    [31] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: a deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, Preprint, 2017.
    [32] S. Iizuka, E. Simo-Serra, and H. Ishikawa, “Globally and locally consistent image completion,” ACM Trans. Graphics, vol.36, no.4, p.107:1-107:14, 2017.
    [33] J. T. Springenberg, A. Dosovitskiy, T. Brox, et al., “Striving for simplicity: the all convolutional net,” in Proc. Int. Conf. for Learning Representations (workshop track), May 7-9, 2015.
    [34] F.Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” in Proc. Int. Conf. for Learning Representations, San Diego, CA, May 2-4, 2016.
    [35] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. for Learning Representations, San Diego, CA, May 7-9, 2015.
    [36] S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” in Proc. Int. Conf. on Machine Learning, Lille, France, Jul.6-11, 2015, pp.448-456.
    [37] R. Yeh, C. Chen, T. Y. Lim, et al., “Semantic image inpainting with perceptual and contextual losses,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Honolulu, HI, Jul.21-26, 2017.
    [38] D. P. Kingma and J. L. Ba, “Adam: a method for stochastic optimization,” in Proc. Int. Conf. for Learning Representations., San Diego, CA, May 7-9, 2015.
    [39] T. Salimans, I. Goodfellow, W. Zaremba, et al., “Improved techniques for training gans,” in Proc. Advances in Neural Information Processing Systems, San Diego, CA, Dec.5-10, 2016, pp.2234-2242.
    [40] B. C. Russell, A. Torralba, K. P. Murphy, et al., “Labelme: a database and web-based tool for image annotation,” Int. Journal of Computer Vision, vol.77, no.1-3, pp.157-173, 2008.
    [41] Jia Deng, Wei Dong, R. Socher, et al., “Imagenet: a large-scale hierarchical image database,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Miami, FL, Jun.20-25, 2009, pp.248-255.
    [42] Y. Lin, J. B. Michel, E. L. Aiden, et al., “Syntactic annotations for the google books ngram corpus,” in Proc. ACL Conf. System Demonstrations, Jeju Island, Korea, Jul.9-11, 2012, pp.169-174.
    [43] M. Abadi, A. Agarwal, P. Barham, et al., Tensorflow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, Technique, Report on tensorflow.org, 2015.
    [44] C. Barnes, E. Shechtman, A. Finkelstein, et al., “Patchmatch: a randomized correspondence algorithm for structural image editing,” ACM Trans. Graphics, vol.28, no.3, p.24:1-24:11, 2009.

    QR CODE
    :::