| 研究生: |
何承道 Cheng-Tao Ho |
|---|---|
| 論文名稱: |
頻繁同構圖形探勘策略之研究 HybirdGMiner:The Mining Strategy on Frequent Isomorphism Graph Structure |
| 指導教授: |
張嘉惠
Chia-Hui Chang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 畢業學年度: | 94 |
| 語文別: | 中文 |
| 論文頁數: | 38 |
| 中文關鍵詞: | 圖形探勘 、型樣探勘 、圖形同構 |
| 外文關鍵詞: | pattern mining, graph isomorphism, graph structures, graph mining |
| 相關次數: | 點閱:3 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於在頻繁項目集合(Frequent Itemsets)和序列型樣(Sequential Patterns)的探勘技術日趨成熟,很自然的,我們會想再進一步探討另一種包涵更廣泛資料關聯性的型樣探勘(Pattern Mining)- 圖形探勘(Graph Mining)。圖形探勘的應用非常廣泛,較著名的應用領域像是化學(Chemistry)、生物學(Biology)和電腦網路方面(Computer Network),以及其它所有可以對應成圖形型樣(Graph Pattern)的實際資料,在這些領域都會需要圖形型樣的探勘技術來支援其資料的分析與預測。圖形探勘的主要挑戰在於如何解決子/圖形同構(Subgraph/ Graph Isomorphism)問題,在本篇論文中我們提出一個結合圖形標準型態(Canonical Form)和資料內嵌結構的演算法,針對圖形資料庫(Graph Databases)進行高效率探勘。其主要概念為利用圖形標準型態解決重覆列舉問題,以及有技巧的記錄圖形型樣在資料庫中的位置(Embedding List),完全避免子圖形同構的檢查問題。實驗顯示我們所提出的演算法無論在合成資料與實際資料中,探勘效率都會勝過gSpan。
As the mining of frequent itemsets and sequential patterns became more mature, it is very natural that we would want to explore other patterns such as graph structures. Graph mining has very wide applications, such as chemistry, biology and computer networks. The main challenge in graph mining is how to solve the graph/ subgraph isomorphism problems. Thus, we propose an algorithm that combined previous pattern mining skills and some graph mining techniques to mine all frequent subgraph patterns efficiently. Our algorithm adopts canonical form to avoid the duplicate enumeration, and used an effective embedding list structure to avert the subgraph isomorphism checking completely. Our empirical study on synthetic and real datasets demonstrates that HybridGMiner achieves a substantial performance gain over the algorithm gSpan.
[1] R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 1994 International Conference. Very Large Data Bases (VLDB’94), pages 487-499, Santiago, Chile, Sept.1994.
[2] C. Borgelt. On Canonical Forms for Frequent Graph Mining. Workshop on Mining Graphs, Trees, and Sequences (MGTS''05 at PKDD''05, Porto, Portugal), 1-12. ECML/PKDD''05 Organization Committee, Porto, Portugal 2005.
[3] C. Borgelt, M.R. Berthold. Mining Molecular Fragments: Finding Relevant Substructures of Molecules. In Proceedings of the International Conference on Data Mining (ICDM), pages 51-58, 2002.
[4] L. Dehaspe, H. Toivonen, and R.D. King. Finding frequent substructures in chemical compounds. Proc. of the 4th International Conference on Knowledge Discovery and Data Mining, pages 30-36. AAAI Press. August 1998.
[5] A. Deutsch, M. F. Fernandez, D. Suciu. Storing semistructured data with STORED. International Conference on Management of Data Proceedings of the 1999 ACM SIGMOD international conference on Management of data, Pages: 431 – 442, 1999.
[6] R. Goldman, J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. VLDB''97, Proceedings of 23rd International Conference on Very Large Data Bases, pages: 436-445, 1997.
[7] L. B. Holder, D. J. Cook, and S. Djoko. Substructure discovery in the subdue system. In Proceedings of the AAAI Workshop on Knowledge Discovery in Databases, pages 169-180, 1994.
[8] K. Y. Huang, C. H. Chang and K. Z. Lin, PROWL: An efficient frequent continuity mining algorithm on event sequences. In Proc. of 6th International Conference on Data Warehousing and Knowledge Discovery (DaWak), 2004.
[9] J. Huan, W. Wang, J. Prins. "Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism", in Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM’03), 2003.
[10] J. Huan, W. Wang, J. Prins, J. Yang. SPIN: Mining Maximal Frequent Subgrsphs from Graph Databases. In Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’04), pages 581-586, 2004.
[11] A. Inokuchi, T. Washio, H. Motoda. An Apriori-based Algorithm for Mining Frequent Substructures from Graph Data. the 4th European Conference on Principles and Practice of Knowledge Discovery in Data Mining (PKDD2000), pp.13-23, 2000.
[12] M. Kuramochi, G. Karypis. Frequent Subgraph Discovery. Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM’02), pages 721-724, 2002.
[13] B. D. McKay. Practical graph isomorphism. 10th. Manitoba Conference on Numerical Mathematics and Computing (Winnipeg, 1980); Congressus Numerantium, 30 (1981) 45-87.
[14] Alípio M. Jorge, Luís Torgo, Pavel B. Brazdil, Rui Camacho, João Gama. A Quantitative Comparison of the Subgraph Miners MoFa, gSpan, FFSM, and Gaston. the 9th European Conference on Principles and Practice of Knowledge Discovery in Data Mining (PKDD2005), pages 392-403, 2005.
[15] S. Nijssen, J.N. Kok. Frequent Graph Mining and its Application to Molecular Databases. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, SMC 2004, Den Haag, Netherlands, October 10-13, 2004. IEEE Press, 2004.
[16] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal and M-C. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In. Proc. 2001 International Conference Data Engineering (ICDE''01), pages 215-224, Heidelberg, Germany, April 2001.
[17] K. Shearer, H. bunks, S. Venkatesh. Video Indexing and Similarity Retrieval by Largest Common Subgraph Detection using Decision Trees. Pattern Recognition 34 (2001) 1075—1091.
[18] X. Yan, J. Han. gSpan: gSpan: Graph-based Substructure Pattern Mining. In Proc. 2002 International Conference Data Engineering (ICDM’02), pages 721, 2002.
[19] X. Yan, J. Han. gSpan: gSpan: Graph-based Substructure Pattern Mining Technical Report UIUCDCS-R-2002-2296, Department of Computer Science, University of Illinois at Urbana-Champaign, 2002.