| 研究生: |
劉信義 Shin-Yat Liu |
|---|---|
| 論文名稱: |
使用群聚壓縮樹之高效率關聯法則挖掘法 An Efficiency Incremental Mining with Grouping Compress Tree |
| 指導教授: |
張瑞益
Ray-I Chang 陳彥良 Yen-Liang Chen |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理學系 Department of Information Management |
| 畢業學年度: | 92 |
| 語文別: | 中文 |
| 論文頁數: | 93 |
| 中文關鍵詞: | 遞增式關聯法則 、資料挖掘 、群聚壓縮 、虛擬投影 、調適性門檻支持度 |
| 外文關鍵詞: | incremental mining, association rule, data mining, grouping compress, pesudo projection |
| 相關次數: | 點閱:12 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
資料挖掘的相關研究在近幾年來受到許多學者的矚目及投入。其中,關聯法則是最常被運用到的方法。藉由關聯法則,決策者可以找到消費者購買商品時的一些特性,並依據這些特性來做行銷規劃、銷售分析及購買行為分析等動作。傳統的關聯法則必須給予一固定的minimum support (簡稱minisup)值,來求得large item sets。然而,在現實生活中使用者往往不瞭解最佳的minisup值是多少,所以必須透過多次的調整才能得到滿意的large item set,此時傳統的演算法就顯得很沒有效率。考慮現實生活中許多資料挖掘的應用,往往提供了額外的記憶體空間與前處理時間。William Cheung等人在 2003年發表的CATS Tree (Compressed and Arranged Transaction Sequences Tree),提出了一個將交易資料預先壓縮成樹狀結構表示,以達到不需事先設定minisup值的關聯法則挖掘方法。可惜其資料結構過於複雜,導致建構時間過長且mining的過程冗長。本論文嘗試改進CATS Tree,首先對資料做適當的前處理動作,然後將處理後的資料轉成自訂的群聚壓縮樹(Grouping Compress Tree簡稱GC Tree)資料結構,最後提出一個有效率的演算法來找出其中的large item set,以求簡化資料建構及挖掘過程的複雜度。實驗結果顯示我們所提出的GC Tree其建構與挖掘時間皆比CATS Tree有效率,此外在考量執行時所需的總記憶體空間亦可能較傳統CATS Tree來的少。是一個能改良系統執行效能以提升現實應用的高效率關聯法則挖掘法。
None
[1] Y. Sucahyo and P. Gopalan, “CT-ITL: Efficient Frequent Item Set Mining Using a Compressed Prefix Tree with Pattern Growth,” Proc. of ADC, 2003.
[2] J. Liu, Y. Pan, K. Wang, and J. Han, “Mining Frequent Item Sets by Opportunistic Projection,” Proc. of 2002 Int. Conf. on knowledge Discovery in Databases, 2002.
[3] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” Proc. 2000 ACM-SIGMOD Int. Conf. on Management of Data. Dallas, 2000.
[4] W. Cheung and R.Zaiane, “Incremental Mining of Frequent Patterns without Candidate Generation or Support Constraint,” Proc. of the Seventh International Database Engineering and Applications Symposium, 2003.
[5] M. Lin and S. Lee, “Improving the Efficiency of Interactive Sequential Pattern Mining by Incremental Pattern Discovery,” Proc. of the 36th Hawaii international Conference on System Sciences, 2002
[6] Y. Woon, W. Ng, and A. Das, “Fast Online Dynamic Association Rule Mining,” Proc. of Second International Conference on Web Information Systems Engineering Volume 1, 2001.
[7] R. Agrawal, C. Faloutsos, and A. Swami, “Efficient similarity search in sequence databases,” Proc. of the Fourth International Conference on Foundations of Data Organization and Algorithms, 1993.
[8] J. Pei, J. Han, H. LU, S. NISHIO, S. TANG, and D. YANG, “H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases,” Proc. of the IEEE ICDM, 2001.
[9] R. Agarwal, C. Faloutsos, and V. Prasad, “A Tree Projection Algorithm for Generation of Frequent Itemsets,” Proc. of Parallel and Distributed Computing (Special Issue on High Performance Data Mining), 2000.
[10] W. Cheung, "Frequent Pattern Mining without Candidate generation or Support Constraint,” Master''s Thesis, University of Alberta, 2002.
[11] H. Huang., X. Wu, and R. Relue, “Association Analysis with One Scan of Databases,” Proc. of the 2002 IEEE International Conference on Data Mining, 2002.
[12] K. Wang., L. Tang, J. Han, and J. Liu , “Top down FPGrowth for Association Rule Mining,” Proc. of Pacific-Asia Conference, 2002.
[13] M. Zaki, and C. Hsiao, “CHARM: An Efficient Algorithm for Closed Itemset Mining,” Proc. of SIAM International Conference on Data Mining, 2002.
[14] J. Pei, J. Han, S. Nishio, S. Tang, and D. Yang, “H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases,” Proc. of 2001 Int. Conf. on Data Mining, 2001.
[15] J. Pei, J. Han, and R. Mao, “CLOSET: An efficient algorithm for mining frequent closed itemsets,” Proc. of SIGMOD, 2000.
[16] A. Savasere, E. Omiecinski, and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. of the VLDB Conference, 1995.
[17] R. Agrawal, and R. Srikant, “Fast algorithms for mining association rules,” Proc. of VLDB, 1994.
[18] S. Brin, R. Motwani, D. Jeffrey, and T. Shalom, “Dynamic itemset counting and implication rules for market basket data,” Proc. of SIGMOD, 1997.
[19] IBM, QUEST Data Mining Project, http://www.almaden.ibm.com/cs/quest
[20] http://www.ecn.purdue.edu/KDDCUP/