跳到主要內容

簡易檢索 / 詳目顯示

研究生: 何承道
Cheng-Tao Ho
論文名稱: 頻繁同構圖形探勘策略之研究
HybirdGMiner:The Mining Strategy on Frequent Isomorphism Graph Structure
指導教授: 張嘉惠
Chia-Hui Chang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
畢業學年度: 94
語文別: 中文
論文頁數: 38
中文關鍵詞: 圖形探勘型樣探勘圖形同構
外文關鍵詞: pattern mining, graph isomorphism, graph structures, graph mining
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於在頻繁項目集合(Frequent Itemsets)和序列型樣(Sequential Patterns)的探勘技術日趨成熟,很自然的,我們會想再進一步探討另一種包涵更廣泛資料關聯性的型樣探勘(Pattern Mining)- 圖形探勘(Graph Mining)。圖形探勘的應用非常廣泛,較著名的應用領域像是化學(Chemistry)、生物學(Biology)和電腦網路方面(Computer Network),以及其它所有可以對應成圖形型樣(Graph Pattern)的實際資料,在這些領域都會需要圖形型樣的探勘技術來支援其資料的分析與預測。圖形探勘的主要挑戰在於如何解決子/圖形同構(Subgraph/ Graph Isomorphism)問題,在本篇論文中我們提出一個結合圖形標準型態(Canonical Form)和資料內嵌結構的演算法,針對圖形資料庫(Graph Databases)進行高效率探勘。其主要概念為利用圖形標準型態解決重覆列舉問題,以及有技巧的記錄圖形型樣在資料庫中的位置(Embedding List),完全避免子圖形同構的檢查問題。實驗顯示我們所提出的演算法無論在合成資料與實際資料中,探勘效率都會勝過gSpan。


    As the mining of frequent itemsets and sequential patterns became more mature, it is very natural that we would want to explore other patterns such as graph structures. Graph mining has very wide applications, such as chemistry, biology and computer networks. The main challenge in graph mining is how to solve the graph/ subgraph isomorphism problems. Thus, we propose an algorithm that combined previous pattern mining skills and some graph mining techniques to mine all frequent subgraph patterns efficiently. Our algorithm adopts canonical form to avoid the duplicate enumeration, and used an effective embedding list structure to avert the subgraph isomorphism checking completely. Our empirical study on synthetic and real datasets demonstrates that HybridGMiner achieves a substantial performance gain over the algorithm gSpan.

    第一章 緒論 1 1.1. 研究動機與目的 1 1.2. 論文架構 2 第二章 問題定義 3 第三章 相關研究 6 3.1. 以廣度優先搜尋(BFS)之演算法 6 3.1.1. AGM演算法 7 3.1.2. FSG演算法 7 3.2. 以深度優先搜尋(DFS)之演算法 9 3.2.1. gSpan演算法 9 3.2.2. MoFa演算法 10 3.2.3. FFSM演算法 11 3.2.4. Gaston演算法 12 3.3. 演算法之整體比較表 13 第四章 HybridGMiner演算法 15 4.1. 圖形探勘的挑戰 15 4.2. HybridGMiner演算法架構 15 4.2.1. 型樣列舉方法(Enumeration) 16 4.2.2. 搜尋空間刪減技術(Pruning) 18 4.2.3. 圖形型樣成長機制(Extension) 22 4.3. 虛擬碼(Pseudo Code) 25 第五章 實驗結果 28 5.1. 合成資料集(Synthetic Data) 28 5.1.1. 資料產生器 28 5.1.2. 實驗結果與分析 29 5.2. 實際資料(Real World Data) 32 第六章 結論 35 參考文獻 36

    [1] R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 1994 International Conference. Very Large Data Bases (VLDB’94), pages 487-499, Santiago, Chile, Sept.1994.
    [2] C. Borgelt. On Canonical Forms for Frequent Graph Mining. Workshop on Mining Graphs, Trees, and Sequences (MGTS''05 at PKDD''05, Porto, Portugal), 1-12. ECML/PKDD''05 Organization Committee, Porto, Portugal 2005.
    [3] C. Borgelt, M.R. Berthold. Mining Molecular Fragments: Finding Relevant Substructures of Molecules. In Proceedings of the International Conference on Data Mining (ICDM), pages 51-58, 2002.
    [4] L. Dehaspe, H. Toivonen, and R.D. King. Finding frequent substructures in chemical compounds. Proc. of the 4th International Conference on Knowledge Discovery and Data Mining, pages 30-36. AAAI Press. August 1998.
    [5] A. Deutsch, M. F. Fernandez, D. Suciu. Storing semistructured data with STORED. International Conference on Management of Data Proceedings of the 1999 ACM SIGMOD international conference on Management of data, Pages: 431 – 442, 1999.
    [6] R. Goldman, J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. VLDB''97, Proceedings of 23rd International Conference on Very Large Data Bases, pages: 436-445, 1997.
    [7] L. B. Holder, D. J. Cook, and S. Djoko. Substructure discovery in the subdue system. In Proceedings of the AAAI Workshop on Knowledge Discovery in Databases, pages 169-180, 1994.
    [8] K. Y. Huang, C. H. Chang and K. Z. Lin, PROWL: An efficient frequent continuity mining algorithm on event sequences. In Proc. of 6th International Conference on Data Warehousing and Knowledge Discovery (DaWak), 2004.
    [9] J. Huan, W. Wang, J. Prins. "Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism", in Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM’03), 2003.
    [10] J. Huan, W. Wang, J. Prins, J. Yang. SPIN: Mining Maximal Frequent Subgrsphs from Graph Databases. In Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’04), pages 581-586, 2004.
    [11] A. Inokuchi, T. Washio, H. Motoda. An Apriori-based Algorithm for Mining Frequent Substructures from Graph Data. the 4th European Conference on Principles and Practice of Knowledge Discovery in Data Mining (PKDD2000), pp.13-23, 2000.
    [12] M. Kuramochi, G. Karypis. Frequent Subgraph Discovery. Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM’02), pages 721-724, 2002.
    [13] B. D. McKay. Practical graph isomorphism. 10th. Manitoba Conference on Numerical Mathematics and Computing (Winnipeg, 1980); Congressus Numerantium, 30 (1981) 45-87.
    [14] Alípio M. Jorge, Luís Torgo, Pavel B. Brazdil, Rui Camacho, João Gama. A Quantitative Comparison of the Subgraph Miners MoFa, gSpan, FFSM, and Gaston. the 9th European Conference on Principles and Practice of Knowledge Discovery in Data Mining (PKDD2005), pages 392-403, 2005.
    [15] S. Nijssen, J.N. Kok. Frequent Graph Mining and its Application to Molecular Databases. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, SMC 2004, Den Haag, Netherlands, October 10-13, 2004. IEEE Press, 2004.
    [16] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal and M-C. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In. Proc. 2001 International Conference Data Engineering (ICDE''01), pages 215-224, Heidelberg, Germany, April 2001.
    [17] K. Shearer, H. bunks, S. Venkatesh. Video Indexing and Similarity Retrieval by Largest Common Subgraph Detection using Decision Trees. Pattern Recognition 34 (2001) 1075—1091.
    [18] X. Yan, J. Han. gSpan: gSpan: Graph-based Substructure Pattern Mining. In Proc. 2002 International Conference Data Engineering (ICDM’02), pages 721, 2002.
    [19] X. Yan, J. Han. gSpan: gSpan: Graph-based Substructure Pattern Mining Technical Report UIUCDCS-R-2002-2296, Department of Computer Science, University of Illinois at Urbana-Champaign, 2002.

    QR CODE
    :::