多XML文件整合萃取工具之研究｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	林昌正 Chang-cheng Lin
論文名稱：	多XML文件整合萃取工具之研究 An Integrated extraction tool for multi-XML Documents
指導教授：	許秉瑜 Ping-yu Hsu
口試委員:
學位類別：	碩士 Master
系所名稱：	管理學院 - 企業管理學系 Department of Business Administration
畢業學年度：	96
語文別：	中文
論文頁數：	38
中文關鍵詞：	延伸標示語言、萃取工具
外文關鍵詞：	Extraction Tool, XML
相關次數：	點閱：13 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近年來對於XML技術的應用越來越盛行，目前市面上主流的辦公室應用軟體，像是 OpenOffice.org與Microsoft Office 都已改用XML為其文件的儲存格式。在電子商務上，XML也慢慢成為彼此間資料傳遞的重要格式。因為越來越多的應用採用XML技術，對於此一技術所進行的相關研究也就越來越熱絡。過去XML研究並未有系統的針對多文件進行統合的結構與文字內容萃取，但是因為XML文件資料相關研究非常熱門，所以此研究議題應是非常重要，也值得更多的研究與努力。
本研究將建立一套同時對多份XML文件進行資料整合萃取的工具。萃取出來的資料包括文件結構資料、文件文字內容與文件片段。使得未來相關研究將不再需要處理原始文件，而是直接利用本萃取工具萃取後的資料進行研究。

It is popular with the use of XML in recent years. The main office application software like OpenOffice.org or Microsoft Office has changed into XML for storage form. XML also has become the major format for data exchange in e-commerce gradually. Because of the more use of XML, the studies related to XML are more prevalent. There are not systemic for multi-document extracting structures and contents. Because of the popularity of XML, it is very important and is worth doing studies.
This study would establish a tool which extracts data from XML, and the extractives are XML’s structures、contents and fragments. It does not need processing original document anymore, and we could use the extractives doing research.

中文摘要......................................   i
英文摘要......................................  ii
目錄.....	..................................... iii
圖目錄........................................   v
表目錄........................................  vi
一、 諸論.....................................   1
1-1  研究動機.................................   1
1-2  研究目的.................................   1
二、 文獻探討.................................   4
2-1  延伸標示語言.............................   4
2-1-1特性與優點...............................   4
2-1-2XML文件基本內容..........................   6
2-1-3文件格式定義.............................   7
2-2  XML相關研究..............................   8
2-2-1XML檢索..................................   8
2-2-2XML Data Mining..........................  11
三、 研究方法.................................  13
3-1  結構分析演算法...........................  13
3-2  關鍵字矩陣演算法.........................  17
3-3  文件分割.................................  23
3-3-1自動分割演算法...........................  23
3-3-2條件分割演算法...........................  27
四、 實證分析.................................  29
4-1  XML資料整合萃取工具介面..................  29
4-2  執行結果說明.............................  31
4-2-1結構分析.................................  31
4-2-2關鍵字矩陣...............................  33
4-2-3文件分割.................................  35
五、 結論與未來研究建議.......................  37
5-1  結論.....................................  37
5-2  未來研究建議.............................  38
參考文獻  ....................................  39

                                

[1] OpenOffice.org，2008年4月30日，取自http://www.openoffice.org/
[2] Introducing the Office (2007) Open XML File Formats，2008年4月30日，取自http://msdn2.microsoft.com/en-us/library/aa338205.aspx
[3] ebXML， 2008年4月30日，取自 http://www.ebxml.org/
[4] Ling Feng, Tharam Dillon, Hans Weigand, and Elizabeth Chang, “An XML-Enabled Association Rule Framework”, LNCS , Vol. 2736,pp. 88-97, September 2003.
[5] INitiative for the Evaluation of XML Retrieval(INEX), 2008年4月30日，取自http://inex.is.informatik.uni-duisburg.de/
[6] Toshiyuki Shimizu, Norimasa Terada and Masatoshi Yoshikawa, “Development of an XML Information Retrieval System for Queries on Contents and Structures”, ICKS 2007, pp.161-168, Kyoto, January 2007.
[7]張真誠、蔡文輝，資料結構設計與C++程式應用，旅標出版股份有限公司，台北市，民國91年。
[8] A. Zisman, “An overview of XML”, Computing & Control Engineering Journal, Vol. 11, pp.165-167, August 2000.
[9]黃冠倫，「XML 文件的內容同等轉換」，東海大學，碩士論文，民國92年。
[10]Torsten Schlieder and Holger Meuss, “ Querying and Ranking XML Documents” , The Journal of American Society for Information Science and Technology, Vol. 53, pp.489-503,May 2002.
[11]Norbert Fuhr and Kai Grobjohann, “XIRQL: An XML Query Language Based on Information Retrieval Concepts”, ACM Transactions on Information Systems, Vol. 22, pp.313-356, April 2004.
[12]Jaap Kamps, Maarten Marx, Maarten de Rijke and Borkur Sigurbjornsson, “XML Retrieval: What to Retrieve? ”, SIGIR’03, Canada, July 2003.
[13]Liang Zuopeng, Hu Kongfa, Ye Ning and Dong Yisheng, “An efficient index structure for XML based on generalized suffix tree”, Information Systems, Vol. 32, pp.283-294, April 2007.
[14]Rebecca J. Cathey, Steven M. Beitzel, Eric C. Jensen, David Grossman and Ophir Frieder, “Using a Relation database for scalable XML search”, The Journal of Supercomputing, Vol. 44,pp.146-178, October 2007.
[15]XQuery，2008年5月05日，取至 http://www.w3.org/TR/xquery/
[16]Daniel Egnor and Robert Lord,"XYZFind–Searching in Context with XML”, ACM SIGIR 2000 Workshop on XML and Information Retrieval, Greece, July 2000.
[17]Daniel Braga, Alessandro Campi, Stefano Ceri, Mika Klemettinen and Pier Luca Lanzi, “A Tool for Extracting XML Association Rules”, IEEE International Conference on Tools with Artificial Intelligence, pp. 57-64, 2002
[18]Alexandre Termier, Marie-Christine Rousset and Michele Sebag, “TreeFinder : a first step towards XML data mining”, Proceedings of International Conference on Data Mining, pp.450-457,2002.
[19]T. Asai, K. Abe, S. Kawasoe, H. Arimura, H.Sakamoto and S. Arikawa, “Efficient Substructure Discovery from Large Semi-structured Data”, proceedings of the 2nd SIAM International Conference on Data Mining, April 2002.
[20]Ling Chen, Souray S. Bhowmick and Liang-Tien Chia, “FRACTURE mining: mining frequently and concurrently mutating structures from historical xml documents”, Data & Knowledge Engineering, Vol. 59,pp. 320-347,November 2006.
[21] Mong Li Lee, Liang Huai Yang, Wynne Hsu and Xia Yang, “XClust: Clustering XML schemas for effective integration”, Proceedings of the 11th ACM International Conference on Information and Knowledge Management , USA, November 2002.
[22] M. F. Porter, “An Algorithm for Suffix Stripping”, Program, Vol. 14, pp. 130-138, 1980.
[23]張志君，「高效率的跨版本XML文件儲存結構之研究-以OpenOffice.org為例」，中央大學，碩士論文，民國97年。
[24] CeBIT_OOo20.odp， 2008年5月6日，取至 http://de.openoffice.org/files/documents/66/3274/CeBIT_OOo20.odp
[25] CeBIT_OOo_En.odp， 2008年5月6日，取至http://www.ba.ncu.edu.tw/dmerplab/CeBIT_OOo_En.odp

簡易檢索 / 詳目顯示

相關論文