跳到主要內容

簡易檢索 / 詳目顯示

研究生: 施亮如
Liang-Lu Shih
論文名稱: 引用本體論至相關文件檢索之研究
Applying Ontology to Relevant Document Discovery
指導教授: 鄭裕勤
Eric Y. Cheng
口試委員:
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理學系
Department of Information Management
畢業學年度: 94
語文別: 英文
論文頁數: 56
中文關鍵詞: 相關文件檢索本體論萃取本體論對應本體論
外文關鍵詞: Relevant Document Discovery, Ontology Extraction, Ontology
相關次數: 點閱:17下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 相關文件檢索的議題已被廣泛地討論,並有各種不同的方法或技術被提出或實
    際應用至上線的文件檢索系統中。大部分的方法採取讓使用者輸入查詢,系統對
    查詢字串做些處理,再進行全文比對以找到相關文件;或者,提供使用者特定欄
    位的查詢,如標題、摘要、關鍵字、參考文獻等,再將這些特定欄位轉成特定的
    模式做相似度計算,如向量模式搭配TF/IDF 來計算文章相似度。整體而言,這
    些方法主要來自於資訊檢索(Information Retrieval)這門領域中。
    語意網(Semantic Web)是一門新興的研究領域,並已被用來和其他研究領域相
    結合以產生各種應用,這些領域包括知識管理、代理人通訊、網路服務等。語意
    網的核心概念為本體論(Ontology),根據本體論的特性,以標籤語言方式將特定
    內容具備的語意充份地呈現出來,不但具可讀性,更能被電腦系統作進一步的處
    理;而目前大多提出的相關文件檢索的方法對於文件內容中語意特性的處理仍然
    有限,再加上較少文獻論及將本體論的概念應用至相關文件檢索的方法,因此促
    成本研究的產生。
    於本研究中,將本體論應用至相關文件檢索的架構被設計出來,並實作一個雛
    型系統。系統的輸入為一份文件,而輸出為和輸入文件相關的文件;而系統處理
    程序主要分成若干步驟:(1)將輸入文件轉換成本體論的格式。(2)若輸入文件已
    存在於系統中,則直接輸出相關文件。(3)若輸入文件不存在於系統中,則進行
    輸入文件和已存在於系統中文件的相似度計算。其中,本研究設計兩種相似度計
    算方法來計算相似度,並搭配遺傳演算法來分別計算兩種相似度計算結果所對應
    的權重,完成最終的相似值。


    Research of relevant document discovery is practical and attractive to many
    researchers, and there are different solutions to this issue. Some solutions have been
    adopted in real world environments, such as electronic articles publishers. These
    publishers offer different information search options such as keywords, full-text,
    phrase, boolean expression…etc, for users to retrieve documents. Most relevant
    document discovery techniques are originally from the domain of information
    retrieval. The core concept of semantic web is ontology, which has been applied in
    various domains, such as web service, agent communication, knowledge
    management… etc. However, there was few paper applied ontology to the research of
    relevant document discovery. Therefore, in this paper, ontology is applied to the issue
    of relevant documents discovery and a prototype system is constructed to implement
    the method proposed. With the input of a user selected document, the designed
    prototype system could generate a number of closely related documents that originally
    stored in the repository. The process of the prototype system could be mainly divided
    into the following steps: (1) transforming the input text document into OWL format (2)
    determining if the input document already exists in the ontology repository of the
    system (3) if the input document does not exist in ontology repository, then the
    program will calculate the similarity between the input ontology and the documents
    originally stored in ontology repository, and retrieving related documents with higher
    similarity values. Ontology extraction and similarity calculation are the cores that
    applied the concept of ontology to the prototype system. The objective of ontology
    extraction is to transform TXT format documents into OWL formats according to the
    characteristics of ontology. Secondly, similarity calculation is composed of two
    methods: concept similarity and instance similarity are proposed and implemented in
    the prototype system.

    1. Introduction................................1 1.1 Research Background .......................1 1.2 Research Motivation .......................2 1.3 Purpose....................................3 2. Literature Review ..........................5 2.1 OWL Ontology ..............................5 2.2 Ontology Extraction........................7 2.3 Similarity Calculation ....................9 3. Method of Relevant Document Discovery .....12 3.1 SystemArchitecture ........................12 3.2 Ontology Extraction........................14 3.2.1 Preprocess...............................15 3.2.2 Find the Associated Content of Schema........16 3.2.3 Extract Instances from Content................17 3.2.4 Constructing Ontology........................20 3.3 Similarity Calculation .........................21 3.3.1 Definition of Similarity Calculation............21 3.3.2 Similarity Method 1: Concept Similarity ............22 3.3.3 Similarity Method 2: Instance Similarity..............24 3.3.4 Operational Definition of Instance Similarity ...........26 3.3.5 Weights of Similarity Measures...........................27 4. Implementation and Evaluation ...............................29 4.1 Implementation Tools and Environment........................29 4.2 Evaluation of Ontology Extraction ..........................29 4.2.1 Implement Sentences as Instances ..........................30 4.2.2 Implement multi-words as Instances.........................32 4.3 Evaluation of the Prototype System...........................34 4.3.1 Evaluation Method..........................................34 4.3.2 Experiment 1: only Concept Similarity......................36 4.3.3 Experiment 2: only Instance Similarity.....................36 4.3.4 Experiment 3: Concept and Instance Similarity .................38 5. Conclusion and Future Direction ...............................41 5.1 Conclusion ..................................................41 5.2 Contribution ...............................................41 5.3 Limitation.................................................41 5.4 Future Direction ..........................................42 References ....................................................44

    1. Alani, H., Kim, S., Millard, D. E., Weal, M. J., Hall, W., Lewis, P. H. and Shadbolt,
    N. R., Automatic Ontology-Based Knowledge Extraction from Web Documents,
    IEEE Intelligent Systems, Vol. 18, No.1, pp.14-21, 2003.
    2. Baeza-Yates, R., Ribeiro-Neto,B., 1999. Modern Information Retrieval, New York:
    Addison-Wesley.
    3. Baziz, M., Boughanem, M., Aussenac-Gilles,N., Chrisment,C., Semantic Cores for
    Representing Documents in IR, Proceedings of the 2005 ACM symposium on
    Applied computing SAC ''05 , pp.1011-1017, 2005.
    4. Berners-Lee Tim, Hendler James, Lassila Ora, THE SEMANTIC WEB,
    SCIENTIFIC AMERICAN, Vol. 284, Issue 5, pp. 34-44, 2001.
    5. Carmen Costilla, Juan P. Palacios, María José Rodríguez, José Cremades, Antonio
    Calleja, Raúl Fernández, Jorge Vila, Semantic Web Digital Archive Integration,
    DEXA Workshops 2004, pp. 179-185, 2004.
    6. Doan, A., Jayant, M., Pedro, D., Alon, H., “Learning to map between ontologies on
    the semantic web”, Proceedings of the Eleventh International WWW Conference,
    2002.
    7. Ehrig M., Haase P., Hefke M., Stojanovic N., “Similarity for Ontologies - a
    Comprehensive Framework, 13th European Conference on Information Systems,
    2005.
    8. Ehrig M., Staab S., QOM - Quick Ontology Mapping, Proceedings of the Third
    International SemanticWeb Conference, pp. 683-696 , 2004.
    9. Ehrig M., Sure Y., Ontology Mapping - An Integrated Approach, Proceedings of the
    1st European Semantic Web Symposium, pp. 76-91, 2004.
    10. Goldberg D.E., 1989, Genetic Algorithms in Search, Optimization, and Machine
    Learning, ADDISON-WESLEY
    45
    11. Golgher, P.B., Laender, A.H.F., Lage, J.P., e Silva, A.S , Automatic generation of
    agents for collecting hidden web pages for data extraction, Data & Knowledge
    Engineering, Vol.19, Issue2, pp. 177-196, 2004.
    12. Hotho, A., Staab, S. Maedche A., Ontology-based Text Clustering, Workshop
    "Text Learning: Beyond Supervision", 2001.
    13. Ian H.Witten, Eibe Frank, 1999, Data Mining-Practical Machine Learning Tools
    and Techniques with Java Implementations, the Morgan Kaufmann Series in Data
    Management Systems.
    14. Kalfoglou, Y., Schorlemmer, M., Ontology Mapping: The State of the Art, the
    Knowledge Engineering Review, Vol. 18, No.1, pp. 1-31, 2003.
    15.Kenneth P. Bogart, 1990, Introductory Combinatorics, Harcourt Brace Jovanovich.
    16. Kietz, J.U., Maedche A., Volz,R., A Method for Semi-Automatic Ontology
    Acquisition from a Corporate Intranet”, proc. of Workshop Ontologies and Text,
    co-located with the 12th International Workshop on Knowledge Engineering and
    Knowledge Management, 2000.
    17. Krishnamurthy V., 1986, COMBINATORICS-theory and applications, Ellis
    Horwood.
    18. Maedche, A., Motik, B., Stojanovic, L., Studer, R., Volz, R., Ontologies for
    Enterprise Knowledge Management, Intelligent Systems, IEEE, Vol. 18 , Issue 2,
    pp. 22-33, 2003.
    19. Maedche, A., Staab, S., Ontology Learning for the Semantic Web, IEEE
    INTELLIGENT SYSTEMS, Vol. 16, Issue 2, pp. 72-79, 2001.
    20. Mitra P., Noy N,F., Jaiswal A.R., OMEN: A Probabilistic Ontology Mapping Tool,
    International SemanticWeb Conference, pp. 537-547, 2005.
    21. Mitchell, T.M., 1997, MACHINE LEARNING”, McGraw-Hill.
    22. Natalya F. Noy, Mark A. Musen, The PROMPT Suite: Interactive Tools For
    Ontology Merging And Mapping, International Journal of Human-Computer
    Studies, pp. 983-1024, 2003.
    46
    23. Rodriguez, M.A., Egenhofer, M.J., Determining semantic similarity among entity
    classes from different ontologies, IEEE Transactions on Knowledge and Data
    Engineering, Vol.15, Issue 2, pp. 442-456, 2003.
    24. Schlobach, S., Assertional Mining in Description Logics, Description Logics,
    pp.237-246, 2000.
    25. Sridharan, B., Tretiakov, A., Kinshuk, Application of Ontology to Knowledge
    management in Web based Learning, IEEE International Conference, pp.
    663-665, 2004.
    26. Tan, K.W., Han, H., Elmasri, R., “Web Data Cleansing and Preparation for
    Ontology Extraction using WordNet”,Proceedings of the First International
    Conference, Vol. 2, pp. 11-18,2000.
    27. Williams, A.B., Tsatsoulis, C., An Instance-based Approach for Identifying
    Candidate Ontology Relations within a Multi-Agent System, Fourteenth
    European Conference on Artificial Intelligence, Ontology Learning ECAI-2000
    Workshop, Berlin, 2000.
    28. http://infomesh.net/2001/swintro/
    29. http://protege.stanford.edu/plugins/owl/documentation.html
    30. http://scholar.google.com/
    31. http://wordnet.princeton.edu/
    32. http://www.daml.org/
    33. http:// www.google.com/
    34. http://www.pdfbox.org/index.html
    35. http://www.seas.gwu.edu/~simhaweb/software/jwordnet/
    36. http://www.w3.org/RDF/
    37. http://www.w3.org/2004/OWL/

    QR CODE
    :::