跳到主要內容

簡易檢索 / 詳目顯示

研究生: 謝柏維
Po-Wei Hsieh
論文名稱: 基於全卷積神經網路之中文字分割機制
Chinese Character Segmentation via Fully Convolutional Neural Network
指導教授: 蘇柏齊
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 79
中文關鍵詞: 文字偵測自然場景全卷積神經網路
外文關鍵詞: text detection, natural scenes, full convolutional neural networks
相關次數: 點閱:11下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 自然場景中的文字與人工符號傳達的重要訊息,因此從影像中擷取文字具有許多潛在的用途,然而目前的方法多根基於對拼音文字文本處理,對於中文這類語素文字文本仍有改進的空間。本研究嘗試以單一中文字作為標記重點,提出結合語意分割 (semantic segmentation) 的自然場景中文字偵測機制。我們所提出的方法分成兩階段:第一階段採用全卷積網路 (Fully Convolutional Network, FCN) 訓練對自然場景的中文文本偵測模型,在訓練時除了真實場景訓練集資料外,也加入模擬資料彌補資料集的缺失,強化模型的偵測能力。第二階段則協助分離文字區域,並以區域分布關係對文字框分組,使節和的文字串在不同文字書寫方向和排版中仍然有效,提升應用價值。實驗結果顯示所提出的方法能有效偵測中文文本,並探討各步驟對偵測結果的影響。


    The important information conveyed by texts and artificial symbols in natural scenes, so capturing text context from images has many potential applications. However, the current methods are almost based on the processing of phonetic text, and the methods for morpheme text such as Chinese are still improved. This study attempts to propose a Chinese character text detection mechanism of semantic segmentation for natural scene images, with marking the label for each individual Chinese character. The proposed method is divided into two stages: in the first stage, we trained the Fully Convolutional Network (FCN) as the Chinese text detection model for natural scenes. We adopted real natural scene as the training dataset, and added synthetic datasets and to enhance the detection ability of the model. In the second stage, it assisted in separating the text areas and grouping the text boxes by the regional distribution relationship, and combined the character information in different writing directions and layouts to improve the worth of application. The experimental results show that the proposed method can effectively detect Chinese text in natural scenes, and explore the impact of each step on the detection results.

    論文摘要 I Abstract II 致謝 III 目錄 IV 附圖目錄 VI 表格目錄 IX 第一章 緒論 1 1.1 研究背景與動機 1 1.2 研究貢獻 3 1.3 論文架構 4 第二章 相關研究 5 2.1 深度學習之文字偵測 5 2.2 模擬資料使用 12 第三章 提出方法 15 3.1 中文字定位偵測 15 3.1.1 DeepLab v3+ 介紹 16 3.1.2 真實場景資料集 23 3.1.3 模擬資料生成 25 3.1.4 資料標記與數據平衡 31 3.2 候選文字區域後處理 34 3.2.1 遮罩處理 35 3.2.2 候選文字區域提取 36 3.2.3 候選文字區域排列 39 3.2.4 文字形態校正 50 第四章 實驗結果 51 4.1 開發環境與訓練方法介紹 51 4.2 排列比較 52 4.3 相關應用流程 58 第五章 結論與未來展望 62 參考文獻 63

    [1] L. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. “Encoder-decoder with atrous separable convolution for semantic image segmentation.” In European Conference on Computer Vision (ECCV), 2018.
    [2] L.C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation.” arXiv:1706.05587 (2017).
    [3] Qi, H., Zhang, Z., Xiao, B., Hu, H., Cheng, B., Wei, Y., Dai, J. “Deformable convolutional networks.” In coco detection and segmentation challenge 2017 entry. ICCV COCO Challenge Workshop (2017).
    [4] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122, 2015.
    [5] L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille, “DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, April 2018.
    [6] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, ”Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv:1704.04861 [cs], Apr 2017.
    [7] Tai-Ling Yuan, Zhe Zhu, Kun Xu, Cheng-Jun Li, Tai-Jiang Mu, and Shi-Min Hu. “A large Chinese text dataset in the wild.” Journal of Computer Science and Technology, 2019, 34(3): 509-521.
    [8] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in CVPR, 2016.
    [9] A. Gupta, A. Vedaldi, and A. Zisserman. “Synthetic data for text localisation in natural images.” in Proc. CVPR, 2016, pp. 2315-2324.
    [10] Z. Tian, W. Huang, T. He, P. He, and Y. Qiao, “Detecting text in natural image with connectionist text proposal network,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 56–72.
    [11] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R CNN: Towards Real Time Object Detection with Region Proposal Networks,” in IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137-1149, 2017.
    [12] R. Girshick, “Fast R CNN,” in Proc. IEEE Int. Conf. Comput. Vis. Vis., pp. 1440-1448, 2015.
    [13] M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu. “Textboxes: A fast text detector with a single deep neural network.” In AAAI, pp. 4161–4167, 2017.
    [14] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. E. Reed, “SSD: single shot multibox detector”. In Proc. ECCV, 2016.
    [15] Y. Liu and L. Jin. “Deep matching prior network: Toward tighter multi-oriented text detection.” In CVPR, pp. 3454– 3461, 2017.
    [16] J. Long, E. Shelhamer, and T. Darrell. “Fully convolutional networks for semantic segmentation.” In CVPR, 2015.
    [17] D. He, X. Yang, C. Liang, Z. Zhou, G. Alexander, I. Ororbia, D. Kifer, and C. L. Giles. “Multi-scale fcn with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild.” In CVPR, pp. 474–483, 2017.
    [18] P. He, W. Huang, T. He, Q. Zhu, Y. Qiao, and X. Li. “Single shot text detector with regional attention.” In ICCV, volume 6, 2017.
    [19] S. Long, J. Ruan, W. Zhang, X. He, W. Wu, and C. Yao. “Textsnake: A flexible representation for detecting text of arbitrary shapes.” arXiv preprint arXiv:1807.01544, 2018.
    [20] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb, "Learning from Simulated and Unsupervised Images through Adversarial Training", Computer Vision and Pattern Recognition (CVPR) 2017 IEEE Conference on, pp. 2242-2251, 2017.
    [21] Debidatta Dwibedi, Ishan Misra, Martial Hebert, "Cut Paste and Learn: Surprisingly Easy Synthesis for Instance Detection", Computer Vision (ICCV) 2017 IEEE International Conference on, pp. 1310-1319, 2017.
    [22] F. Liu, C. Shen, and G. Lin. “Deep convolutional neural fieldsfor depth estimation from a single image.” In Proc. CVPR, 2015.
    [23] P. Perez, M. Gangnet, and A. Blake. “Poisson image editing.” ACM Transactions on Graphics, 22(3):313–318, 2003.
    [24] Y. Baek, B. Lee, D. Han, S. Yun, and H. Lee. “Character region awareness for text detection.” In CVPR, pp. 4321–4330. IEEE, 2019.
    [25] P. Zhang, P. Su. “Text Detection in Street View Images with Hirarchical Fully Convolution Neural Networks.” National Central University, 2018.
    [26] https://language.moe.gov.tw/001/Upload/files/SITE_CONTENT/M0001/PIN/biau1.htm
    [27] https://yinguobing.com/separable-convolution/

    QR CODE
    :::