| 研究生: |
王培儀 Pei-Yi Wang |
|---|---|
| 論文名稱: |
利用欄位群聚特徵和四個方向相鄰樹作表格文件分類 Table-Form Classification Using Field Clustering Features and Four Directional Adjacency Trees |
| 指導教授: |
范國清
Kuo-Chin Fan |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 畢業學年度: | 88 |
| 語文別: | 中文 |
| 論文頁數: | 86 |
| 中文關鍵詞: | 方向相鄰樹 、欄位抽取 、線條抽取 、表格文件分類 、群聚 |
| 外文關鍵詞: | four directional adjacency trees, field extraction, line extraction, table-form classification, clustering |
| 相關次數: | 點閱:11 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,辦公室自動化已成為時代的潮流。其中,自動文件處理系統在辦公室自動化中佔了不可或缺的地位。在辦公室中,被處理的文件種類相當繁多,其中以表格文件佔絕大多數,並且被廣泛的使用。因此,表格文件分類更是在自動文件處理系統中扮演著一重要之角色。
本篇論文提出了一個分類表格文件的新方法,並且對此方法做了深入的介紹。這個方法主要以表格文件中的欄位當作基礎特徵。因此,首先我們必須先抽取出所有的表格線,接著再利用表格線間相交的關係和左上角-右下角配對演算法將表格中所有的欄位取出。在所有的欄位被抽取出來之後,再從這些欄位中擷取出兩種展現欄位間相互關係的特徵來當作比對的依據,即欄位群聚特徵和方向相鄰樹特徵。表格文件的分類即是利用此兩種特徵與資料庫中現有的樣本文件作比對來達成。實驗結果將驗證我們所提的表格分類方法確實可行。
Office automation has become a trend during recent years. Many techniques have been proposed to achieve the goal of office automation. Among those techniques, automatic document processing is one of the most improtant one. In office, there are various kinds of documents to be processed. Most of them are table-form documents and are extensively used in different applications. Table-form classification thereby plays an important role in automatic document processing system.
In this thesis, we will present a novel mehtod for recognizing table-form documents. This method adopts the fields in the table-form document as the primary feautre for table-form classification. In our system, we have to extract all table-lines first and then utilize the line-crossing relation matrix and the corner-pair searching algorithm to extract all fields embedded in the table-form document. After that, we will extract two specific and useful features, i.e. the field clustering feature and the four directional adjacency trees (FDAT), which represent the interrelationship between the fields, to serve as the matching basis of the classification system. Last, the recognition of the table-form is achieved by using these two features to compare against a stored table-form library. Experimental results demonstrate the feasibility and the validity of our proposed system in recognizing table-form documents.
[1] Antoine Ting and Maylor K.H. Leung, “Form Recognition Using Linear Structure,” in Pattern Recognition, Vol. 32, pp. 645-646, 1999.
[2] Ren-Jean Liou and Mu-Song Chen, “Recognition of Table-form Documents Using High Order Correlation Method,” in Proceedings of the 1998 IEEE International Joint Conference on Neural Networks, Vol. 3, pp. 1851-1856, 1998.
[3] Chi-Fang Lin and Cheng-Yi Hsiao, “Structural Recognition for Table-form Documents Using Relaxation Techniques,” in International Journal of Pattern Recognition and Artificial Intelligence, Vol. 12, No. 7, pp. 985-1005, 1998.
[4] Lin-Yu Tseng and Rung-Ching Chen, “Recognition and Data Extraction of Form Documents Based on Three Types of Line Segments”, in Pattern Recognition, Vol. 32, No. 10, pp. 1525-1540, 1998
[5] Shigeyoshi Shimotsuji and Mieko Asano, “Form Identification based on Cell Structure,” in Proceedings of ICPR ’96, pp. 793-797, 1996.
[6] Toyohide Watanabe, Qin Luo and Noboru Sugie, “Layout Recognition of Multi-Kinds of Table-Form Documents,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 17, No. 4, pp. 432-445, 1995.
[7] 盧鎮明,“利用特徵圖比對法作表格文件之辨識,”國立中央大學資訊工程研究所碩士論文, 1995.
[8] Antoine Ting, Maylor K. Leung, Siu-Cheung Hui and Hai-Yun Chan, “ A Syntactic Business Form Classifier,” in Proceedings of the Third International Conference on Document Analysis and Recognition, Vol. 1, pp. 301-304, 1995.
[9] Toyohide Watanabe and Qin Luo, “A Multilayer Recognition Method for Understanding Table-Form Documents,” in International Journal of Imaging Systems and Technology, Vol. 7, pp. 279-288, 1996.
[10] Yuki Hirayama, “A Method for Table Structure Analysis Using DP Matching,” in Proceedings of the Third International Conference on Document Analysis and Recognition, Vol. 2, pp. 583-586, 1995.
[11] Osamu Hori and David S. Doermann, “Robust Table-form Structure Analysis Based on Box-Driven Reasoning,” in Proceedings of the Third International Conference on Document Analysis and Recognition, Vol. 1, pp. 218-221, 1995.
[12] E. Green and M. Krishnamoorthy, “Model-Based Analysis of Printed Tables,” in Proceedings of the Third International Conference on Document Analysis and Recognition, Vol. 1, pp. 214-217, 1995.
[13] Francesca Cesarini, Marco Gori, Simone Marinai, and Giovanni Soda, “INFORMys: A Flexible Invoice-Like Form-Reader System,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 7, pp. 730-745, 1998.
[14] John H. Shamilian, Henry S. Baird and Thomas L. Wood, “A Retargetable Table Reader,” in Proceedings of the Fourth International Conference on Document Analysis and Recognition, Vol. 1, pp. 158-163, 1997.
[15] Yuan F. Arias, Atul Chhabra and Vishal Misra, “Interpreting and Representing Tabular Documents,” in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 600-605, 1996.
[16] Jiun-Lin Chen and His-Jian Lee, “An Efficient Algorithm for Form Structure Extraction Using Strip Projection,” in Pattern Recognition, Vol. 31, N0. 9, pp. 1353-1368, 1998.
[17] Arturo Pizano, “Extracting Line Features from Images of Business Forms and Table,” in Proceedings of 11th IAPR International Conference on Pattern Recognition, Vol. III, pp. 399-403, 1992.
[18] Kuo-Chin Fan, Jeng-Ming Lu and Jing-Yuh Wang, “A Feature Point Clustering Approach to the Segmentation of Form Documents,” in Proceedings of the Third International Conference on Document Analysis and Recognition, Vol. 2, pp. 623-626, 1995.
[19] Y. Belaid, A. Belaid and E. Turolla, “Item Searching in Forms: Application to French Tax Form,” in Proceedings of the Third International Conference on Document Analysis and Recognition, Vol. 2,pp. 744-747, 1995.
[20] 王亮盛,“利用文件分析作文件之無失真重現,”國立中央大學資訊工程研究所博士論文, 1997.
[21] 張美齡,“以線條結構分析為基礎之表格文件分類法,”國立中央大學資訊工程研究所碩士論文, 1997.
[22] Hiroshi Shinjo, Kazuki Nakashima, Masashi Koga, Katsumi Marukawa, Yoshihiro Shima and Eiichi Hadano, “A Method of Connecting Disappeared Junction Patterns on Frame Lines in Form Documents,” in Proceedings of the Fourth International Conference on Document Analysis and Recognition, Vol. 2, pp. 667-670, 1997.
[23] Jianxing Yuan, Yuan Y. Tan and Ching Y. Suen, “Four Directional Adjacency Graphs (FDAG) and Their Application in Locating Fields in Forms,” in Proceedings of the Third International Conference on Document Analysis and Recognition, Vol. 2, pp.752-755, 1995.