利用邊界距離改進裁切式場景文字偵測｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	侯昱宏 Yu-Hong Hou
論文名稱：	利用邊界距離改進裁切式場景文字偵測 Exploiting Distance to Boundary for Segmentation-based Scene-Text Spotting
指導教授：	蘇柏齊 Po-Chyi Su
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	中文
論文頁數：	51
中文關鍵詞：	深度學習、街景文字定位、語義分割
外文關鍵詞：	Deep learning, scene text spotting, semantic segmentation
相關次數：	點閱：17 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

由於影像中的文字提供了豐富的資訊，場景文字定位有助於擷取影像
中的感興趣區域。現今許多場景文字定位方法採用基於裁切的像素預測方
式，即將每個像素分類為特定類型，經常是文字類別與背景類別，再將屬於
文字的像素聚集成需要偵測的文字區域。像素預測方式的優點包括易於實
現、良好的性能以及應用的靈活性。然而，自然場景中的文字有著不同大小、
形狀及顏色，要正確地分離文句仍是具有挑戰的議題。本研究提出運用邊界
距離的方式來協助分割文字像素，以達成更精確的場景文字定位。我們的方
法可用於提取單一字元、單詞、文字串或具有相似紋理的圖案，同時也適用
於檢測以矩形、四邊形或任意形狀包圍的文字框。此外，文字標記的過程相
比於其他方法亦更為簡便。我們探討了網路架構、分類不平衡與後處理等議
題。實驗結果顯示此設計的可行性，證實其有助於改進基於裁切的場景文字
定位方法。

Scene text spotting helps to locate regions of interest in images as texts inside
pictures often provide abundant information. Many existing schemes adopted the
segmentation-based methodology, which classifies each pixel as a specific type,
usually text or background. Major advantages of pixel prediction include easy to
implement, good performance and flexibility. However, appropriately separating
words in such schemes remains a challenging issue.
This research investigates the use of distance to boundary for partitioning
texts to achieve more accurate scene text spotting. The proposed scheme can be
used to extract single characters, words, text-lines or objects with similar textures.
It is also applicable to detecting texts bounded by rectangles, quadrilaterals or
boxes with arbitrary shapes. The labeling process is relatively efficient. The issues
of network architecture, categorical imbalance and post-processing are discussed.
The experimental results demonstrate the feasibility of the proposed design, which
can help to improve segmentation-based scene-text spotting approaches.

論文摘要........................................................................................................I
Abstract ........................................................................................................ II
目錄.............................................................................................................III
附圖目錄...................................................................................................... V
表格目錄....................................................................................................VII
第一章 緒論................................................................................................. 1
1.1 研究動機及貢獻 ........................................................................... 1
1.2 論文架構 ....................................................................................... 4
第二章 相關研究......................................................................................... 5
2.1 傳統影像處理方法 ....................................................................... 5
筆畫寬度變化..................................................................... 5
最大穩定極值區域............................................................. 5
滑動窗口文本檢測............................................................. 6
2.2 深度學習方法 ............................................................................... 7
語義分割............................................................................. 7
物件偵測............................................................................. 8
第三章 提出方法....................................................................................... 11
3.1 資料標記 ..................................................................................... 11
資料集............................................................................... 11
不同標記方式比較........................................................... 12
IV
標記生成方法................................................................... 14
3.2 網路架構 ..................................................................................... 15
HRNet ................................................................................ 15
ResNeXt............................................................................. 17
架構流程........................................................................... 20
損失函數........................................................................... 21
3.3 訓練細節 ..................................................................................... 22
3.4 後處理(Post-Processing)............................................................. 23
第四章 實驗結果....................................................................................... 29
4.1 評估方法 ..................................................................................... 29
4.2 Ablation Study ............................................................................ 30
4.3 後處理實驗 ................................................................................. 31
4.4 ICDAR 測試 ............................................................................... 32
ICDAR2013 ....................................................................... 32
ICDAR2017 ....................................................................... 33
ICDAR2019_ArT .............................................................. 34
不同模型的比較............................................................... 34
第五章 結論與未來展望........................................................................... 35
5.1 結論 ............................................................................................. 35
5.2 未來展望 ..................................................................................... 35
參考文獻..................................................................................................... 36

                                

[1] He K, Gkioxari, G Dollár, Girshick, “Mask r-cnn”, In Proceedings of the IEEE
international conference on computer vision, 2017.
[2] B. Epshtein, O. Eyal, W. Yonatan, “Detecting text in natural scenes with stroke
width transform”, IEEE International Conference Computer Vision and Pattern
Recognition (CVPR), 2010.
[3] W. Huang, Z. Lin, J. Yang, J. Wang, “Text localization in natural images using
stroke feature transform and text covariance descriptors”, IEEE International
Conference on Computer Vision (ICCV), 2013.
[4] C. Yao, X. Bai, W. Liu, Y. Ma, Z. Tu, “Detecting texts of arbitrary orientations
in natural images”, IEEE International Conference Computer Vision and Pattern
Recognition (CVPR), 2012.
[5] L. Neumann, K. Matas, “Text localization in real-world images using
eficiently pruned exhaustive search”, IEEE International Conference on
Document Analysis and Recognition (ICDAR), 2011.
[6] J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide-baseline stereo
from maximally stable extremal regions”, Image and vision computing (IVC), vol.
22, pp. 761–767, 2004.
[7] W. Huang, Q. Yu, X. Tang, “Robust scene text detection with convolution
neural network induced mser trees”, European Conference on Computer Vision
(ECCV), 2014.
[8] L. Neumann, K. Matas, “Real-time scene text localization and recognition”,
IEEE International Conference Computer Vision and Pattern Recognition
(CVPR), 2012.
[9] W. Huang, Z. Lin, J. Yang, and J. Wang, “Text localization in natural images
using stroke feature transform and text covariance descriptors”, IEEE
37
International Conference on Computer Vision (ICCV), 2013.
[10] C. L. Zitnick and P. Dolla´r, “Edge boxes: Locating object proposals from
edges”, European Conference on Computer Vision (ECCV), 2014.
[11] D.G. Lowe, “Object recognition from local scale-invariant features”,
Proceedings of the International Conference on Computer Vision: 1150–1157.
1999.
[12] P. Viola, M. Jones, “Rapid object detection using a boosted cascade of
simple features”, IEEE Computer Society Conference on Computer Vision and
Pattern Recognition. CVPR 2001.
[13] N. Dalal, B. Triggs, “Histograms of oriented gradients for human
detection”, IEEE International Conference Computer Vision and Pattern
Recognition (CVPR), 2005.
[14] C. Cortes, V. Vapnik, “Support-vector networks.”, Machine Learning. 1995,
20 (3): 273–297.
[15] Yoav Freund, Robert Schapire, “Experiments with a New Boosting
Algorithm”, Machine Learning: Proceedings of the Thirteenth International
Conference, 1996.
[16] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for
semantic segmentation”, CVPR, 2015.
[17] T. Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie “Feature
Pyramid Networks for Object Detection”, IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), 2017.
[18] X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, J. Liang “EAST: An
Efficient and Accurate Scene Text Detector”, 2017 IEEE Conference on
38
Computer Vision and Pattern Recognition, 2017.
[19] Yi Li, Zhe Wu, Shuang Zhao, Xian Wu, “PSENet: Psoriasis Severity
Evaluation Network”, Proceedings of the AAAI Conference on Artificial
Intelligence, 2020.
[20] Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, Hwalsuk Lee.
“Character Region Awareness For Text Detection”, 2019 IEEE Conference on
Computer Vision and Pattern Recognition (cvpr), 2019.
[21] Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, Xiang Bai, “Real-time
Scene Text Detection with Differentiable Binarization”, Proceedings Of the Aaai
Conference on Artificial Intelligence, 2020.
[22] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time
object detection with region proposal networks”, In NIPS, 2015.
[23] Zhi Tian, Weilin Huang, Tong He, Pan He-Yu Qiao, “Detecting Text in
Natural Image with Connectionist Text Proposal Network.”, Eccv 2016 Lecture
Notes in Computer Science, 2016.
[24] Minghui Liao, Baoguang Shi, Xiang Bai, “Textboxes++: A Single-shot
Oriented Scene Text Detector.”, Ieee Transactions on Image Processing, 2018.
[25] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, A. C. Berg
“SSD: Single Shot MultiBox Detector”, arXiv preprint arXiv:1512.02325.
[26] Minghui Liao, Zhen Zhu, Baoguang Shi, Gui, Song Xia, Xiang Bai,
“Rotation-Sensitive Regression for Oriented Scene Text Detection”, IEEE
Conference on Computer Vision and Pattern Recognition, 2018.
[27] Yunguan Fu, Nina Brown, Shaheer Saeed, Adrià Casamitjana, Zachary
Baum, Rémi Delaunay, “DeepReg: a deep learning toolkit for medical image
39
registration”, Journal of Open Source Software, 2020.
[28] Baoguang Shi, Xiang Bai, Serge Belongie, “Detecting Oriented Text in
Natural Images by Linking Segments”, IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), 2017.
[29] Lele Xie, Yuliang Liu, Lianwen Jin, Zecheng Xie, “DeRPN: Taking a
Further Step toward More General Object Detection”, Proceedings of the AAAI
Conference on Artificial Intelligence, 2019.
[30] Yuliang Liu, Sheng Zhang, Lianwen Jin, Lele Xie, Yaqiang Wu, Zhepeng
Wang, “Omnidirectional Scene Text Detection with Sequential-free Box
Discretization”, Proceedings of the Twenty-Eighth International Joint
Conference on Artificial Intelligence, 2019.
[31] Dimosthenis Karatzas, Faisal Shafait, Seiichi Uchida, Masakazu Iwamura,
Lluis Bigorda, Sergi Mestre, Joan Mas, David Mota, Jon Almazan, Lluis Heras,
“Icdar 2013 Robust Reading Competition”,12th International Conference on
Document Analysis and Recognition, 2013.
[32] Nibal Nayef, Fei Yin, Imen Bizid, Hyunsoo Choi, Yuan Feng, “Icdar2017
Robust Reading Challenge on Multi-lingual Scene Text Detection and Script
Identification”, 14th Iapr International Conference on Document Analysis and
Recognition, 2017.
[33] Chee Chng, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, “Icdar2019
Robust Reading Challenge on Arbitrary-shaped Text”, International Conference
on Document Analysis and Recognition, 2019.
[34] Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang, “Deep High-resolution
Representation Learning For Human Pose Estimation”, IEEE Conference on
40
Computer Vision and Pattern Recognition (cvpr), 2019.
[35] Saining Xie, Ross Girshick, Piotr Dollar, Zhuowen Tu, Kaiming He,
“Aggregated Residual Transformations For Deep Neural Networks”, Ieee
Conference on Computer Vision and Pattern Recognition, 2017.
[36] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna “Rethinking the
Inception Architecture for Computer Vision”, IEEE Conference on Computer
Vision and Pattern Recognition, 2016.

簡易檢索 / 詳目顯示

相關論文