跳到主要內容

簡易檢索 / 詳目顯示

研究生: 許馨文
Sin-Wun Syu
論文名稱: 基於深度學習網路之繁體中文場景文字辨識策略
Traditional Chinese Scene Text Recognition Strategies based on Deep Learning Networks
指導教授: 蘇柏齊
Po-Chyi Su
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 54
中文關鍵詞: 深度學習光學字元辨識繁體中文字辨識字串校正
外文關鍵詞: Deep Learning, Optical Character Recognition, Scene Text Recognition, Text-line Correction
相關次數: 點閱:11下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 文字辨識是一個從圖像中提取文字特徵的影像辨識任務,目前也有許多相關的應用場景,例如:印刷文件文字辨識、手寫字辨識、車牌辨識等。相較於針對文件掃描的文字識別,自然場景中的文字因為多樣化的字型、角度、光線變化以及障礙物遮擋等,增加了文字辨識的挑戰性。繁體中文自然場景文字識別的相關研究目前較為少見,主因是僅台灣廣泛地使用繁體中文字,且相較於英數字,中文字元種類數量龐大,蒐集足夠數量的街景文字圖片十分困難,影像標記也非常耗時。本研究使用多種字型檔產生人工資料集,並針對街景文字場景設計多種資料增強方法,包括調整文字大小、傾斜角度、背景紋理變化以及文字輪廓外框等,於訓練過程中策略性隨機調用,期使人工資料集達到模擬真實街景影像的效果,不僅增強資料的可靠性,也解決了資料類別不平衡、以及可能的標記錯誤。本研究提出基於深度學習網路的繁體中文字辨識策略,並且設計文字串校正機制,針對字串中少部分文字辨識錯誤的情況,使用校正方法來提升文字串的整體辨識準確度。實驗結果顯示,本研究能有效識別自然場景中的繁體中文字,與現有方法評比擁有更佳的準確度。


    Text recognition is an important task for extracting information from imagery data. Scene text recognition is one of its challenging scenarios since the texts appearing in natural scenes may have diversified fonts or size, be occluded by other objects and be captured from varying angles or under different light conditions. In contrast to alpha-numerical characters, Traditional Chinese Characters (TCC) receive less attention and the large number of TCC makes it difficult to collect and label enough scene-text images. This research aims at developing a set of strategies for TCC recognition. We develop a synthetic dataset using a variety of data augmentation methods, including text deformations, noise adding and background changes, which appear often in natural scenes. A segmentation-based text spotting scheme is used to locate the areas of text-lines and characters so that the characters can be recognized by the trained model and then linked into meaning text-lines. The text-lines can be corrected via network search, which will further boost the model performance after re-training. The experimental results show that the proposed strategies work better in recognizing TCC in natural scenes, when compared with existing publicly available tools.

    1.Introduction 1 2.Related Work 4 2.1 Text Detection and Recognition based on Deep Learning 4 2.2 Datasets of text in natural scenes 6 2.3 Synthetic Dataset 7 3.Proposed Method 9 3.1 Architecture 9 3.2 Scene text detector 10 3.3 Rectified character recognizer: Synthetic dataset 12 3.4 Rectified character recognizer: Data augmentation methods 13 3.5 Rectified character recognizer: STR Network Design 17 3.5.1 Feature Normalization 17 3.5.2 Feature Extraction Network 17 3.5.3 Pre­-trained Model 18 3.5.4 Label Smoothing Loss Function 18 3.5.5 Learning Rate Decay 19 3.6 Text­-line Correction Mechanism 20 4. Experimental Results 23 4.1 Development environment 23 4.2 Evaluation of character recognition accuracy 23 4.3 Fine­-tune by real street view data 25 4.4 Text­-line accuracy evaluation 26 4.5 Display of recognition results 28 4.6 Comparison with other commercial software 30 4.6.1 Test case of special font 30 4.6.2 Test case of night scene 32 4.6.3 Test case of text skew 33 5. Conclusion and Future Work 35 Reference 36

    Tao Wang, David J. Wu, Adam Coates, and Andrew Y. Ng. End­to­-end text recognition with convolutional neural networks. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pages 3304–3308, 2012.

    Ankush Gupta, Andrea Vedaldi, and Andrew Zisserman. Synthetic data for text localisation in natural images. In IEEE Conference on Computer Vision and Pattern Recognition, 2016.

    Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisser­ man. Synthetic data and artificial neural networks for natural scene text recognition. 2014.

    Baoguang Shi, Xiang Bai, and Cong Yao. An end­-to-­end trainable neu­ral network for image-­based sequence recognition and its application to scene text recognition. 2015.

    Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmid­ huber. Connectionist temporal classification: labelling unsegmented se­quence data with recurrent neural networks. In Proceedings of the 23rd
    international conference on Machine learning, pages 369–376, 2006.

    Baoguang Shi, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. Robust scene text recognition with automatic rectification. In Pro­ceedings of the IEEE conference on computer vision and pattern recog­nition, pages 4168–4176, 2016.

    Hui Li, Peng Wang, Chunhua Shen, and Guyu Zhang. Show, attend and read: A simple and strong baseline for irregular text recognition. In Pro­ceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 8610–8617, 2019.

    Baoguang Shi, Xiang Bai, and Cong Yao. An end­-to-­end trainable neu­ral network for image-­based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and ma­chine intelligence, 39(11):2298–2304, 2016.

    Yuning Du, Chenxia Li, Ruoyu Guo, Xiaoting Yin, Weiwei Liu, Jun Zhou, Yifan Bai, Zilin Yu, Yehua Yang, Qingqing Dang, and Haoshuang Wang. PP­-OCR: A practical ultra lightweight OCR system. 2020.

    Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, and Xiang Bai. Real­ time scene text detection with differentiable binarization. 2019.

    Fenfen Sheng, Zhineng Chen, and Bo Xu. Nrtr: A no­-recurrence sequence­-to-­sequence model for scene text recognition. In 2019 Inter­national Conference on Document Analysis and Recognition (ICDAR), pages 781–786. IEEE, 2019.

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.

    Khaoula Elagouni, Christophe Garcia, and Pascale Sébillot. A compre­hensive neural­-based approach for text recognition in videos using natural language processing. In Proceedings of the 1st ACM International Con­ference on Multimedia Retrieval, pages 1–8, 2011.

    Tai­-Ling Yuan, Zhe Zhu, Kun Xu, Cheng­-Jun Li, and Shi­-Min Hu. Chi­nese text in the wild. 2018.

    Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, and Hwalsuk Lee. Character region awareness for text detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9365–9374, 2019.

    Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011.

    Kai Wang, Boris Babenko, and Serge Belongie. End-­to-­end scene text recognition. In 2011 International Conference on Computer Vision, pages 1457–1464, 2011.

    Jehyun Jung, SeongHun Lee, Min Su Cho, and Jin Hyung Kim. Touch tt: Scene text extractor using touchscreen interface. ETRI Journal, 33(1): 78–88, 2011.

    Raymond Smith, Chunhui Gu, Dar­Shyang Lee, Huiyi Hu, Ranjith Un­nikrishnan, Julian Ibarz, Sacha Arnoud, and Sophia Lin. End-­to-­end in­terpretation of the french street name signs dataset. 2017.

    Cong Yao, Xiang Bai, Wenyu Liu, Yi Ma, and Zhuowen Tu. Detecting texts of arbitrary orientations in natural images. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 1083–1090, 2012.

    Guan­-Xin Zeng, Yu-­Hong Hou, Po­-Chyi Su, and Li-­Wei Kang. Scene text­-line extraction with fully convolutional network and refined propos­als. In 2020 Asia­-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pages 1247–1251, 2020.

    教育部語文成果網­常用字下載. https://language.moe.gov.tw/
    result.aspx?classify_sn=23&subclassify_sn=437&content_sn=46. Accessed: 2021­-06-­22.

    Ekin D. Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V. Le. Ran­daugment: Practical data augmentation with no separate search. 2019.

    Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Zhi Zhang, Haibin Lin, Yue Sun, Tong He, Jonas Mueller, R. Manmatha, Mu Li, and Alexan­der J. Smola. Resnest: Split­attention networks. 2020.

    Jia Deng, Wei Dong, Richard Socher, Li-­Jia Li, Kai Li, and Li Fei-­Fei. Imagenet: A large-­scale hierarchical image database. In 2009 IEEE Con­ference on Computer Vision and Pattern Recognition, pages 248–255, 2009.

    Rafael Müller, Simon Kornblith, and Geoffrey E. Hinton. When does label smoothing help? 2019.

    QR CODE
    :::