| 研究生: |
呂信德 Hsin-Te Lu |
|---|---|
| 論文名稱: |
一個應用於攝影機擷取文字影像之光學文字辨識前處理系統 An OCR Preprocessing System for Text Images Captured by Camera |
| 指導教授: |
范國清
Kuo-Chin Fan 溫敏淦 Ming-Gang Wen |
| 口試委員: | |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 畢業學年度: | 99 |
| 語文別: | 英文 |
| 論文頁數: | 185 |
| 中文關鍵詞: | 文字切割 、部分字元之語言辨識 、多語系文件分析 、曲線文字行校正 、文字行建構 、文字偵測 、以攝影機為基礎之光學文字辨識 、連字過濾器 、typographical structure 、shapelet feature 、periphery feature |
| 外文關鍵詞: | periphery feature, shapelet feature, typographical structure, camera-based OCR, text detection, text line constructure, curved text line correction, multilingual document analysis, partial character of language identification, character segmentation, touched character filter |
| 相關次數: | 點閱:9 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著科技的進步,現在個人行動裝備上常會搭配鏡頭,使手機擁有照相的功能,也讓使用者可以隨處拍攝感興趣的物品。如果行動裝備可以安裝光學文字辨識系統,那就可以擁有即時文字翻譯功能。但是利用行動裝備拍攝的文字影像常會導入外界的干擾,而不利於文字辨識。由於目前商用的OCR軟體對於準確切割的單字影像辨識率已達99%以上,但是在前處理階段如何克服外在因素影響,並且準確地將文字切割出來,將會是本篇論文首要的目標。
在文字辨識之前,還需要先執行文字偵測,文字行建構與語言辨識。本篇論文是採用由下而上的方法來建構文字行,接著利用k-mean 與 least mean square error 來分析文字行並且找出typographical structure。為了處理多語言文件,我們提出一個結合特徵篩選器與語言辨認器的語言辨識器。這個語言辨識器是以部份文字和部首為辨識單位,特徵篩選器將會從文字影像自動的挑選出Shapelet feature,然後就將此特徵交給語言辨認器辨識該文字影像所屬的語言。在文字切割部份,我們提出一個結合連字過濾的文字切割方法,並且從文字影像中找出periphery features,此特徵將交給support vector machines得出信任值。在實驗當中,整體文字切割正確率達到94.90%,此一數據也證明所提出系統的可行性。
Due to the rapid development of mobile devices equipped with cameras, the realization of what you get is what you see seems not to be a dream. The mobile devices together with the proposed technique can thus serve as a translation tool to translate from one language to another language by recognizing the texts presented in the captured scenes. Images captured by cameras will embed much more external or unwanted environmental effects which need not to be considered in traditional optical character recognition (OCR). In this dissertation, we plan to segment a text image captured by mobile devices into individual single characters to facilitate later OCR kernel processing. Before proceeding character segmentation, text detection, text-line construction, and language identification need to be performed in advance. In our work, we construct text-lines from text blocks using a bottom-up method. After text-line construction, typographical structure is analyzed by utilizing the proposed k-mean and least mean square error method. The extracted typographical structure features will be incorporated to facilitate later text-line completion, local binarization, and character segmentation tasks. To cope with multilingual documents, a combined language identifier, called feature selector and language identifier, which can successfully identify the language of partial character is proposed. Shapelet features extracted by feature selectors are utilized to identify the language of text blocks. A novel character segmentation method which integrates touched character filters is employed on text images captured by cameras. In addition, periphery features are also extracted from the segmented images of touched characters, and fed as inputs to support vector machines (SVM) to calculate the confident values. In our experiment, the accuracy rate of the proposed character segmentation system is 94.90%, which demonstrates the feasibility and effectiveness of the proposed method.
[1] H. T. Lue, M. G. Wen, H. Y. Cheng, K. C. Fan, C. W. Lin, and C. C. Yu, “A novel
character segmentation method for text images captured by cameras,” J. ETRI, vol. 32,
No. 5, pp. 729-739, Oct. 2010.
[2] X. Chen, J. Yang, J. Zhang, and A. Waibel, “Automatic detection and recognition of
signs from natural scenes,” IEEE Tran. Image Process., vol. 13, pp. 87-99, Jan. 2004.
[3] N. Ezaki, M. Bulacu, and L. Schomaker, “Text detection from natural scene images:
towards a system for visually impaired persons,” in Proc. the 17th Int. Conf. Pattern
Recognition, 2004, vol. 2, pp. 683-686.
[4] R. Lienhart and A. Wernicke, “Localizing and segmenting text in images and videos,”
IEEE Tran. Circuits Syst. Video Technol., vol. 12, pp. 256-268, Apr. 2002.
[5] M. R. Lyu, J. Song, and M. Cai, “A comprehensive method for multilingual video text
detection, localization, and extraction,” IEEE Tran. Circuits Syst. Video Technol., vol.
15, pp. 243-255, Feb. 2005.
[6] V. Wu, R. Manmatha, and E. M. Riseman, “Textfinder: an automatic system to detect
and recognize text in images,” IEEE Tran. Pattern Anal. Mach. Intell., vol. 21, pp.
1224-1229, Nov. 1999.
[7] W. Wu, X. Chen, and J. Yang, “Detection of text on road signs from video,” IEEE
Tran. Intell. Transp. Syst., vol. 6, pp. 378-390, Dec. 2005.
[8] D. Chen, K. Shearer, and H. Bourlard, “Text enhancement with asymmetric filter for
video OCR,” in Proc. the 11th Int. Conf. Image Analysis and Processing, Palermo,
Italy, 2001, pp. 192-197.
[9] M. A. Smith and T. Kanade, “Video skimming for quick browsing based on audio and
image characterization,” Tech. Rep. CMU-CS-95-186, Carnegie Mellon Univ., July
1995.
[10] M. Pietikainen and O. Okun, “Edge-based method for text detection from complex
document images,” in Proc. the 6th Int. Conf. Document Analysis and Recognition,
2001, pp. 286-291.
[11] C. Liu, C. Wang, and R. Dai, “Text detection in images based on unsupervised
classification of edge-based features,” in Proc. the 8th Int. Conf. Document Analysis
and Recognition, 2005, vol. 2, pp. 610-614.
[12] Y. Zhong, K. Karu, and A. K. Jain, “Locating text in complex color images,” Pattern
Recognition, vol. 28, pp. 1523-1535, Oct. 1995.
[13] K. C. Kim, H. R. Byun, Y. J. Song, Y. W. Choi, S. Y. Chi, K. K. Kim, and Y. K.
Chung, “Scene text extraction in natural scene images using hierarchical feature
combining and verification,” in Proc. the 17th Int. Conf. Pattern Recognition, 2004,
vol. 2, pp. 679-682.
[14] X. L. Chen, J. Yang, J. Zhang, and A. Waibel, “Automatic detection and recognition of
signs from natural scenes,” IEEE Tran. Image Process., vol. 13, pp. 87-99, Jan. 2004.
[15] K. L. Kim, K. Jung, and J. H. Kim, “Texture-based approach for text detection in
images using support vector machines and continuously adaptive mean shift
algorithm,” IEEE Tran. Pattern Anal. Mach. Intell., vol. 25, pp. 1631-1639, Dec.
2003.
[16] K. I. Kim, K. Jung, and H. Kim, “Texture-based approach for text detection in images
using support vector machines and continuously adaptive mean shift algorithm,” IEEE
Tran. Pattern Anal. Mach. Intell., vol. 25, pp. 1631-1639, Dec. 2003.
[17] Y. Zhong, H. Zhang, and A. K. Jain, “Automatic caption localization in compressed
video,” IEEE Tran. Pattern Anal. Mach. Intell., vol. 22, pp. 385-392, Apr. 2000.
[18] Y. K. Lim, S. H. Choi, and S. W. Lee, “Text extraction in MPEG compressed video
for content-based indexing,” in Proc. the 15th Int. Conf. Pattern Recognition, 2000, pp.
409-412.
[19] B. T. Chun, Y. Bae, and T. Y. Kim, “Automatic text extraction in digital videos using
FFT and neural network,” in Proc. IEEE Int. Conf. Fuzzy Syst., Seoul, South Korea,
1999, vol. 2, pp. 1112-1115.
[20] W. Mao, F. Chung, K. Lanm, and W. Siu, “Hybrid Chinese / English text detection in
images and video frames,” in Proc. the 16th Int. Conf. Pattern Recognition, Quebec,
Canada, 2002, vol. 3, pp. 1015-1018.
[21] J. Gllavata, R. Ewerth, and B. Freisleben, “Text detection in images based on
unsupervised classification of high-frequency wavelet coefficients,” in Proc. the 17th
Int. Conf. Pattern Recognition, 2004, vol. 1, pp. 425-428.
[22] C. Thillou, S. Ferreira, and B. Gosselin, “An embedded application for degraded text
recognition,” EURASIP J. Applied Signal Processing, vol. 2005, pp. 2127-2135, 2005.
[23] S. Hu and M. Chen, “Adaptive Fréchet kernel based support vector machine for text
detection,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2005, vol. 5, pp.
365-368.
[24] K. C. Kim, H. R. Byun, Y. J. Song, Y. W. Choi, S. Y. Chi, K. K. Kim, and Y. K.
Chung, “Scene text extraction in natural scene images using hierarchical feature
combining and verification,” in Proc. the 17th Int. Conf. Pattern Recognition, 2004,
vol. 2, pp. 679-682.
[25] T. Yamguchi and M. Maruyama, “Character extraction from natural scene images by
hierarchical classifiers,” in Proc. the 17th Int. Conf. Pattern Recognition, 2004, vol.
2, pp. 687-690.
[26] H. Li, D. Doerman, and O. Kia, “Automatic text detection and tracking in digital
video,” IEEE Tran. Image Process., vol. 9, pp. 147-156, Jan. 2000.
[27] H. Li and D. Doermann, “A video text detection system based on automated training,”
in Proc.15th Int. Conf. Pattern Recognition, 2000, pp. 223-226.
[28] D. Bargeron, P. Viola, and P. Simard, “Boosting-based transductive learning for text
detection,” in Proc. the 8th Int. Conf. Document Analysis and Recognition, 2005, vol.
2, pp. 1166-1177.
[29] K. Jung, “Neural network-based text location in color images,” Pattern Recognition
Letters, vol. 22, Issue 14, pp. 1503-1515, Dec. 2001.
[30] K. Y. Jeong, K. Jung, E. Y. Kim, and H. J. Kim, “Neural network-based text location
for news video indexing,” in Proc. IEEE Int. Conf. Image Process., Kobe, Japan, 1999,
vol. 3, pp. 319-323.
[31] H. Li, D. Doermann, and O. Kia, “Automatic text detection and tracking in digital
video,” IEEE Tran. Image Process., vol. 9, pp. 147-156, Jan. 2000.
[32] D. Q. Zhang and S. F. Chang, “Learning to detect scene text using a higher-order MRF
with belief propagation,” in Proc. IEEE Computer Society Conf. Computer Vision and
Pattern Recognition Workshop, 2004, pp. 101-108.
[33] A. K. Jain and K. Karu, “Learning texture discrimination masks,” IEEE Tran. Pattern
Anal. Mach. Intell., vol. 18, pp. 195-205, Feb. 1996.
[34] K. I. Kim, K. Jung, S. H. Park, and H. J. Kim, “Support vector machine-based text
detection in digital video,” Pattern Recognition, vol. 34, pp. 527-529, Feb. 2001.
[35] T. Yamaguchi and M. Maruyama, “Character extraction from natural scene images by
hierarchical classifiers,” in Proc. the 17th Int. Conf. Pattern Recognition, 2004, vol. 2,
pp. 687-690.
[36] K. Jung, K. I. Kim, and A. K. Jain. “Text information extraction in images and video:
a survey,” Pattern Recognition, vol. 37, pp. 977-997, May 2004.
[37] K. C. Fan and L. S. Wang, “Classification of machine-printed and handwritten texts
using character block layout variance,” Pattern Recognition, vol. 31, pp. 1275-1284,
Sep. 1998.
[38] J. L. Meunier, “Optimized XY-cut for determining a page reading order,” in Proc. the
8th Int. Conf. Document Analysis and Recognition, 2005, vol. 1, pp. 347- 351.
[39] K. C. Fan, L. S. Wang, and Y. K. Wang, “Page segmentation and identification for
intelligent signal processing,” Signal Processing, vol. 45, pp. 329-346, July 1995.
[40] Z. Shi and V. Govindaraju, “Line separation for complex document images using
fuzzy run length,” in Proc. 1st Int. Workshop on Document Image Analysis for
Libraries, 2004, pp. 306-312.
[41] B. Gatos, A. Antonacopoulos, and N. Stamatopoulos, ”Handwriting segmentation
contest,” in Proc. the 9th Int. Conf. Document Analysis and Recognition, Curitiba,
Brazil, 2007, pp. 1284-1288.
[42] F. M. Wahl, K. Y. Wong, and R. G. Casey, “Block segmentation and text extraction in
mixed text/image documents,” Computer Graphics and Image Processing, vol. 20, pp.
375-390, Dec. 1982.
[43] L. L. Sulem, A. Hanimyan, and C. Faure, “A Hough based algorithm for extracting
text lines in handwritten documents,” in Proc. the 3rd Int. Conf. Document Analysis
and Recognition, Montreal, Canada, 1995, pp. 774-777.
[44] Y. Pu and Z. Shi, “A natural learning algorithm based on Hough transform for text
lines extraction in handwritten documents,” in Proc. the 6th Int. Workshop on
Frontiers in Handwriting Recognition, Taejon, Korea, 1998, pp. 637-646.
[45] F. Yin and C. L. Liu, “Handwritten Chinese text line segmentation by clustering with
distance metric learning,” Pattern Recognition, vol. 42, pp. 3146-3157, Dec. 2009.
[46] S. I. Abuhaiba, S. Datta, and M. J. J. Holt, “Line extraction and stroke ordering of text
pages,” in Proc. the 3rd Int. Conf. Document Analysis and Recognition, 1995, vol. 1,
pp. 390-393.
[47] A. Simon, J. C. Pret, and A. P. Johnson, “A fast algorithm for bottom-up document
layout analysis,” IEEE Tran. Pattern Anal. Mach. Intell., vol. 19, pp. 273-277, Mar.
1997.
[48] S. Basu, C. Chaudhuri, M. Kundu, M. Nasipuri, and D. K. Basu, “Text line extraction
from multi-skewed handwritten document,” Pattern Recognition, vol. 40, pp.
1825-1839, June 2007.
[49] H. Yan, “Fuzzy Curve-Tracing algorithm,” IEEE Tran. Syst. Man Cybern. B, Cybern.,
vol. 31, pp. 768-780, Oct. 2001.
[50] H. Yan, “Detection of curved text path based on the Fuzzy Curve-tracing (FCT)
algorithm,” Int. J. Document Analysis and Recognition, pp. 266-270, 2001.
[51] B. S. Y. Lum and H. Yan, “Complex curve tracing based on a minimum spanning tree
model and regulrized fuzzy clustering,” in Int. Conf. Image Processing, 2004, vol. 3,
pp. 2091-2093.
[52] G. Toussaint, “Solving geometric problems with the rotating calipers,” in Proc. IEEE
MELECON, 1983, pp. 1-8.
[53] H. Goto, K. Aoba, and H. Aso, “A framework for detecting and selecting text line
candidates of correct orientation,” in Proc. the 14th Int. Conf. Pattern Recognition,
1998, vol. 2, pp. 1074-1076.
[54] H. Goto and H. Aso, “Extracting curved text lines using local linearity of the text line,”
Int. J. Document Analysis and Recognition, vol. 2, pp. 111-119, 1999.
[55] Z. Zhang and C. L. Tan, “Correcting document image warping based on regression of
curve text lines,” in Proc. the 7th Int. Conf. Document Analysis and Recognition, 2003,
pp. 589-593.
[56] H. Hase, M. Yoneda, T. Shinokawa, and C. Y. Suen, “Alignment of free layout color
texts for character recognition,” in Proc. the 6th Int. Conf. Document Analysis and
Recognition, 2001, pp. 932-936.
[57] P. Sibun and A. L. Spitz, “Language determination: natural language processing from
scanned document images,” in Proc. the 4th Conf. Applied Natural Language
Processing, Stuttgart, Germany, 1994, pp. 15-21.
[58] L. Spitz, “Determination of the script and language content of document images,”
IEEE Tran. Pattern Anal. Mach. Intell., vol. 19, pp. 235-245, Mar. 1997.
[59] G. Peake and T. Tan, “Script and language identification from document images,” in
Proc. BMVC’97, vol. 2, pp. 610-619.
[60] J. Hochberg, P. Kelly, T. Thomas, and L. Kerns, “Automatic script identification from
document images using cluster-based templates,” IEEE Tran. Pattern Anal. Mach.
Intell., vol. 19, pp. 176-181, 1997.
[61] P. Sanguansat, P. Yanwit, P. Tangwiwatwong, W. Asdornwised, and S. Jitapunkul,
“Language-based hand-printed character recognition: a novel method using spatial and
temporal informative features,” in IEEE 13th Workshop on Neural Networks for Signal
Processing, 2003, pp. 527-536.
[62] Y. H. Liu, C. C. Lin, and F. Chang, “Language identification of character images using
machine learning techniques,” in Proc. the 8th Int. Conf. Document Analysis and
Recognition, 2005, pp. 630-634.
[63] F. Chang, C. H. Chou, C. C. Lin, and C. J. Chen, “A prototype classification method
and its application to handwritten character recognition,” in Proc. IEEE Int. Conf. Syst.,
Man, Cybern., 2004, pp. 4738-4743.
[64] M. C. Jung, Y. C. Shin, and S.N. Srihari, “Machine printed character segmentation
method using side profiles,” in Proc. IEEE Int. Conf. Syst., Man, Cybern., 1999, vol.
6, pp. 863-867.
[65] S. Liang, M. Shridhar, and M. Ahmadi, “Efficient algorithms for segmentation and
recognition of printed characters in document processing,” in IEEE Pacific Rim Conf.
Communications, Computers and Signal Processing, 1993, vol. 1, pp. 240-243.
[66] C. M. Thillou, S. Ferreira, J. Demeyer, C. Minetti, and B. Gosselin, “A multifunctional
reading assistant for the visually impaired,” EURASIP J. Image and Video Processing,
vol. 2, pp. 1-11, 2007.
[67] Y. Lu “Machine printed character segmentation - An overview,” Pattern Recognition,
vol. 28, pp. 67-80, Jan. 1995.
[68] R. Lienhart and A. Wernicke, “Localizing and segmenting text in images and videos,”
IEEE Tran. Circuits Syst. Video Technol., vol. 12, pp. 256-268, Apr. 2002.
[69] N. Arica and F. T. Y. Vural, “Optical character recognition for cursive handwriting,”
IEEE Tran. Pattern Anal. Mach. Intell., vol. 24, pp. 801-813, June 2002.
[70] Y. K. Chen and J. F. Wang, “Segmentation of single- or multiple touching handwritten
numeral string using background and foreground analysis,” IEEE Tran. Pattern Anal.
Mach. Intell., vol. 22, pp. 1304-1317, Nov. 2000.
[71] J. Tse, D. Curtis, C. Jones, and E. Yfantis, “An OCR-independent character
segmentation using shortest-path in grayscale document images,” in Proc. the 6th Int.
Conf. Machine Learning and Applications, 2007, pp. 142-147.
[72] C. L. Liu, H. Sako, and H. Fujisawa, “Effects of classifier structures and training
regimes on integrated segmentation and recognition of handwritten numeral strings,”
IEEE Tran. Pattern Anal. Mach. Intell, vol. 26, pp. 1395-1407, Nov. 2004.
[73] R. G. Casey and E. Lecolinet, “A survey of methods and strategies in character
segmentation,” IEEE Tran. Pattern Anal. Mach. Intell., vol. 18, pp. 690-706, July
1996.
[74] C. J. C. Burges, O. Matan, Y. LeCun, J. S. Denker, L. D. Jackel, C. E. Stenard, C. R.
Nohl, and J. I. Ben, “Shortest path segmentation: a method for training a neural
network to recognize character strings,” in Int. Joint Conf. Neural Networks, 1992, vol.
3, pp. 165-172.
[75] G. Kim and V. Govindaraju, “A lexicon driven approach to handwritten word
recognition for real-time applications,” IEEE Tran. Pattern Anal. Mach. Intell., vol. 19,
pp. 366-379, Apr. 1997.
[76] M. Mohamed and P. Gader, “Handwritten word recognition using segmentation-free
hidden markov modeling and segmentation based dynamic programming techniques,”
IEEE Tran. Pattern Anal. Mach. Intell., vol. 18, pp. 548-554, May 1996.
[77] E. Vellasques, L. S. Oliveira, A. S. Britto, A. L. Koerich, and R. Sabourin, “Filtering
segmentation cuts for digit string recognition,” Pattern Recognition, vol. 41, pp.
3044-3053, Oct. 2008.
[78] S. Marinai, M. Gori, and G. Soda, “Artificial neural networks for document analysis
and recognition,” IEEE Tran. Pattern Anal. Mach. Intell, vol. 27, pp. 23-35, Jan. 2005.
[79] Y. Saifullah and M. T. Manry, “Classification-based segmentation of ZIP codes,”
IEEE Tran. Syst., Man, Cybern., vol. 23, pp. 1437-1443, Sep. 1993.
[80] C. M. Thillou, M. Mancas, and B. Gosselin, “Camera-based degraded character
segmentation into individual components,” in Proc. the 8th Int. Conf. Document
Analysis and Recognition, 2005, vol. 2, pp. 755-759.
[81] S. Marinai, M. Gori, and G. Soda, “Artificial neural networks for document analysis
and recognition,” IEEE Tran. Pattern Anal. Mach. Intell., vol. 27, pp. 23-35, Jan.
2005.
[82] J. H. Bae, K. C. Jung, J. W. Kim, and H. J. Kim, “Segmentation of touching characters
using an MLP,” Pattern Recognition Letters, vol. 19, Issue 8, pp. 701-709, June 1998.
[83] N. Otsu, “A threshold selection method from gray level histograms,” IEEE Tran. Syst.
Man Cybern., vol. 9, pp. 62-66, Jan. 1979.
[84] K. I. Kim, K. Jung, S. H. Park, and H. J. Kim, “Support vector machines for texture
classification,” IEEE Tran. Pattern Anal. Mach. Intell., vol. 24, pp. 1542-1550, Nov.
2002.
[85] V. Vapnik, The nature of statistical learning theory, New York: Springer-Verlag,
1995.
[86] L. S. Wang, Document analysis for lossless reproduction, Ph.D. dissertation, Inst.
Comput. Sci. Inform. Eng., Nat. Central Univ., Jhongli, Taiwan, R.O.C., 1997.
[87] A. Zramdini and R. Ingold, “Optical font recognition using typographical features,”
IEEE Tran. Pattern Anal. Mach. Intell., vol. 20, pp. 877-882, Aug. 1998.
[88] H. Beker and F. Piper, Cipher system: The protection of communication, John Wiley &
Sons, Jan. 1983.
[89] Y. Freund and R. E. Schapire, “A short introduction to boosting,” J. Japanese Society
for Artificial Intelligence, vol. 14, pp. 771-780, 1999.
[90] V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, 1995.
[91] P. Sabzmeydani and G. Mori, “Detecting pedestrians by learning shapelet features,” in
IEEE Conf. Computer Vision and Pattern Recognition, 2007, pp. 1-8.
[92] F. Chang, C. H. Chou, C. C. Lin, and C. J. Chen, “A prototype classification method
and its application to handwritten character recognition,” in Proc. IEEE Int. Conf. Syst.,
Man Cybern., 2004, vol. 5, pp. 4738-4743.
[93] L. W. Tsai, J. W. Hsieh, C. H. Chuang, Y. J. Tseng, K. C. Fan, and C. C. Li, “Road
Sign Detection Using Eigen Color,” IET Computer Vision, vol. 2, pp. 164-177, Sep.
2008.