| 研究生: |
謝柏維 Po-Wei Hsieh |
|---|---|
| 論文名稱: |
基於全卷積神經網路之中文字分割機制 Chinese Character Segmentation via Fully Convolutional Neural Network |
| 指導教授: | 蘇柏齊 |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 中文 |
| 論文頁數: | 79 |
| 中文關鍵詞: | 文字偵測 、自然場景 、全卷積神經網路 |
| 外文關鍵詞: | text detection, natural scenes, full convolutional neural networks |
| 相關次數: | 點閱:11 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
自然場景中的文字與人工符號傳達的重要訊息,因此從影像中擷取文字具有許多潛在的用途,然而目前的方法多根基於對拼音文字文本處理,對於中文這類語素文字文本仍有改進的空間。本研究嘗試以單一中文字作為標記重點,提出結合語意分割 (semantic segmentation) 的自然場景中文字偵測機制。我們所提出的方法分成兩階段:第一階段採用全卷積網路 (Fully Convolutional Network, FCN) 訓練對自然場景的中文文本偵測模型,在訓練時除了真實場景訓練集資料外,也加入模擬資料彌補資料集的缺失,強化模型的偵測能力。第二階段則協助分離文字區域,並以區域分布關係對文字框分組,使節和的文字串在不同文字書寫方向和排版中仍然有效,提升應用價值。實驗結果顯示所提出的方法能有效偵測中文文本,並探討各步驟對偵測結果的影響。
The important information conveyed by texts and artificial symbols in natural scenes, so capturing text context from images has many potential applications. However, the current methods are almost based on the processing of phonetic text, and the methods for morpheme text such as Chinese are still improved. This study attempts to propose a Chinese character text detection mechanism of semantic segmentation for natural scene images, with marking the label for each individual Chinese character. The proposed method is divided into two stages: in the first stage, we trained the Fully Convolutional Network (FCN) as the Chinese text detection model for natural scenes. We adopted real natural scene as the training dataset, and added synthetic datasets and to enhance the detection ability of the model. In the second stage, it assisted in separating the text areas and grouping the text boxes by the regional distribution relationship, and combined the character information in different writing directions and layouts to improve the worth of application. The experimental results show that the proposed method can effectively detect Chinese text in natural scenes, and explore the impact of each step on the detection results.
[1] L. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. “Encoder-decoder with atrous separable convolution for semantic image segmentation.” In European Conference on Computer Vision (ECCV), 2018.
[2] L.C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation.” arXiv:1706.05587 (2017).
[3] Qi, H., Zhang, Z., Xiao, B., Hu, H., Cheng, B., Wei, Y., Dai, J. “Deformable convolutional networks.” In coco detection and segmentation challenge 2017 entry. ICCV COCO Challenge Workshop (2017).
[4] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122, 2015.
[5] L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille, “DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, April 2018.
[6] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, ”Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv:1704.04861 [cs], Apr 2017.
[7] Tai-Ling Yuan, Zhe Zhu, Kun Xu, Cheng-Jun Li, Tai-Jiang Mu, and Shi-Min Hu. “A large Chinese text dataset in the wild.” Journal of Computer Science and Technology, 2019, 34(3): 509-521.
[8] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in CVPR, 2016.
[9] A. Gupta, A. Vedaldi, and A. Zisserman. “Synthetic data for text localisation in natural images.” in Proc. CVPR, 2016, pp. 2315-2324.
[10] Z. Tian, W. Huang, T. He, P. He, and Y. Qiao, “Detecting text in natural image with connectionist text proposal network,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 56–72.
[11] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R CNN: Towards Real Time Object Detection with Region Proposal Networks,” in IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137-1149, 2017.
[12] R. Girshick, “Fast R CNN,” in Proc. IEEE Int. Conf. Comput. Vis. Vis., pp. 1440-1448, 2015.
[13] M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu. “Textboxes: A fast text detector with a single deep neural network.” In AAAI, pp. 4161–4167, 2017.
[14] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. E. Reed, “SSD: single shot multibox detector”. In Proc. ECCV, 2016.
[15] Y. Liu and L. Jin. “Deep matching prior network: Toward tighter multi-oriented text detection.” In CVPR, pp. 3454– 3461, 2017.
[16] J. Long, E. Shelhamer, and T. Darrell. “Fully convolutional networks for semantic segmentation.” In CVPR, 2015.
[17] D. He, X. Yang, C. Liang, Z. Zhou, G. Alexander, I. Ororbia, D. Kifer, and C. L. Giles. “Multi-scale fcn with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild.” In CVPR, pp. 474–483, 2017.
[18] P. He, W. Huang, T. He, Q. Zhu, Y. Qiao, and X. Li. “Single shot text detector with regional attention.” In ICCV, volume 6, 2017.
[19] S. Long, J. Ruan, W. Zhang, X. He, W. Wu, and C. Yao. “Textsnake: A flexible representation for detecting text of arbitrary shapes.” arXiv preprint arXiv:1807.01544, 2018.
[20] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb, "Learning from Simulated and Unsupervised Images through Adversarial Training", Computer Vision and Pattern Recognition (CVPR) 2017 IEEE Conference on, pp. 2242-2251, 2017.
[21] Debidatta Dwibedi, Ishan Misra, Martial Hebert, "Cut Paste and Learn: Surprisingly Easy Synthesis for Instance Detection", Computer Vision (ICCV) 2017 IEEE International Conference on, pp. 1310-1319, 2017.
[22] F. Liu, C. Shen, and G. Lin. “Deep convolutional neural fieldsfor depth estimation from a single image.” In Proc. CVPR, 2015.
[23] P. Perez, M. Gangnet, and A. Blake. “Poisson image editing.” ACM Transactions on Graphics, 22(3):313–318, 2003.
[24] Y. Baek, B. Lee, D. Han, S. Yun, and H. Lee. “Character region awareness for text detection.” In CVPR, pp. 4321–4330. IEEE, 2019.
[25] P. Zhang, P. Su. “Text Detection in Street View Images with Hirarchical Fully Convolution Neural Networks.” National Central University, 2018.
[26] https://language.moe.gov.tw/001/Upload/files/SITE_CONTENT/M0001/PIN/biau1.htm
[27] https://yinguobing.com/separable-convolution/