使用群聚壓縮樹之高效率關聯法則挖掘法｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	劉信義 Shin-Yat Liu
論文名稱：	使用群聚壓縮樹之高效率關聯法則挖掘法 An Efficiency Incremental Mining with Grouping Compress Tree
指導教授：	張瑞益 Ray-I Chang 陳彥良 Yen-Liang Chen
口試委員:
學位類別：	碩士 Master
系所名稱：	管理學院 - 資訊管理學系 Department of Information Management
畢業學年度：	92
語文別：	中文
論文頁數：	93
中文關鍵詞：	遞增式關聯法則、資料挖掘、群聚壓縮、虛擬投影、調適性門檻支持度
外文關鍵詞：	incremental mining, association rule, data mining, grouping compress, pesudo projection
相關次數：	點閱：12 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

資料挖掘的相關研究在近幾年來受到許多學者的矚目及投入。其中，關聯法則是最常被運用到的方法。藉由關聯法則，決策者可以找到消費者購買商品時的一些特性，並依據這些特性來做行銷規劃、銷售分析及購買行為分析等動作。傳統的關聯法則必須給予一固定的minimum support (簡稱minisup)值，來求得large item sets。然而，在現實生活中使用者往往不瞭解最佳的minisup值是多少，所以必須透過多次的調整才能得到滿意的large item set，此時傳統的演算法就顯得很沒有效率。考慮現實生活中許多資料挖掘的應用，往往提供了額外的記憶體空間與前處理時間。William Cheung等人在 2003年發表的CATS Tree (Compressed and Arranged Transaction Sequences Tree)，提出了一個將交易資料預先壓縮成樹狀結構表示，以達到不需事先設定minisup值的關聯法則挖掘方法。可惜其資料結構過於複雜，導致建構時間過長且mining的過程冗長。本論文嘗試改進CATS Tree，首先對資料做適當的前處理動作，然後將處理後的資料轉成自訂的群聚壓縮樹(Grouping Compress Tree簡稱GC Tree)資料結構，最後提出一個有效率的演算法來找出其中的large item set，以求簡化資料建構及挖掘過程的複雜度。實驗結果顯示我們所提出的GC Tree其建構與挖掘時間皆比CATS Tree有效率，此外在考量執行時所需的總記憶體空間亦可能較傳統CATS Tree來的少。是一個能改良系統執行效能以提升現實應用的高效率關聯法則挖掘法。

None

續論	1
1資料挖掘的方法及應用	1
2關聯法則介紹	3
3現有方法的問題	7
4研究動機及目的	8
文獻探討	10
1 FP-growth挖掘方法	10
1.1 FP tree	11
1.2 FP-Growth 演算法	13
2 Opportunistic Projection (OP) 挖掘方法	15
3 CATS Tree 挖掘方法	15
3.1 CATS tree	15
3.2 CATS Tree挖掘演算法	19
4 CATS Tree的問題及改善方法	20
GC Tree挖掘方法	22
1 GC Tree的基本概念	23
2 GC Tree的前處理過程	26
3 GC Tree和CATS Tree的比較	35
4 GC Tree的異動處理	37
4.1新增	37
4.2刪除	38
4.3修改	40
使用GC Tree進行資料挖掘	41
1向上累加(upward counting)	41
1.1永久性的向上累加	42
1.2暫存性的向上累加	43
2投影技巧(Projection)	43
3由下往上挖掘(bottom-up mining)	45
4 GC Tree的挖掘演算法	49
4.1採取永久性的向上累加	49
4.2採取暫存性的向上累加	50
系統模擬與效能評估	52
1系統環境及資料來源	52
1.1系統環境	52
1.2資料來源	52
1.3參數設定	53
2效能評估	55
2.1模擬資料測試	55
2.2真實資料測試	61
2.3空間需求實驗	64
3進一步實驗GC Tree各項方法所帶來得效益	67
3.1採取不同向上累加方式對系統影響模擬	67
3.2探討GC Tree中有無利用虛擬投影技術的差別	73
4 GC Tree中資料異動所產生的影響實驗	79
4.1新增所產生的影響實驗	79
4.2刪除所產生的影響實驗	81
結論與未來研究方向	82
1結論	82
2論文貢獻	82
3未來研究方向	83
參考文獻	84

                                

[1] Y. Sucahyo and P. Gopalan, “CT-ITL: Efficient Frequent Item Set Mining Using a Compressed Prefix Tree with Pattern Growth,” Proc. of ADC, 2003.
[2] J. Liu, Y. Pan, K. Wang, and J. Han, “Mining Frequent Item Sets by Opportunistic Projection,” Proc. of 2002 Int. Conf. on knowledge Discovery in Databases, 2002.
[3] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” Proc. 2000 ACM-SIGMOD Int. Conf. on Management of Data. Dallas, 2000.
[4] W. Cheung and R.Zaiane, “Incremental Mining of Frequent Patterns without Candidate Generation or Support Constraint,” Proc. of the Seventh International Database Engineering and Applications Symposium, 2003.
[5] M. Lin and S. Lee, “Improving the Efficiency of Interactive Sequential Pattern Mining by Incremental Pattern Discovery,” Proc. of the 36th Hawaii international Conference on System Sciences, 2002
[6] Y. Woon, W. Ng, and A. Das, “Fast Online Dynamic Association Rule Mining,” Proc. of Second International Conference on Web Information Systems Engineering Volume 1, 2001.
[7] R. Agrawal, C. Faloutsos, and A. Swami, “Efficient similarity search in sequence databases,” Proc. of the Fourth International Conference on Foundations of Data Organization and Algorithms, 1993.
[8] J. Pei, J. Han, H. LU, S. NISHIO, S. TANG, and D. YANG, “H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases,” Proc. of the IEEE ICDM, 2001.
[9] R. Agarwal, C. Faloutsos, and V. Prasad, “A Tree Projection Algorithm for Generation of Frequent Itemsets,” Proc. of Parallel and Distributed Computing (Special Issue on High Performance Data Mining), 2000.
[10] W. Cheung, "Frequent Pattern Mining without Candidate generation or Support Constraint,” Master''s Thesis, University of Alberta, 2002.
[11] H. Huang., X. Wu, and R. Relue, “Association Analysis with One Scan of Databases,” Proc. of the 2002 IEEE International Conference on Data Mining, 2002.
[12] K. Wang., L. Tang, J. Han, and J. Liu , “Top down FPGrowth for Association Rule Mining,” Proc. of Pacific-Asia Conference, 2002.
[13] M. Zaki, and C. Hsiao, “CHARM: An Efficient Algorithm for Closed Itemset Mining,” Proc. of SIAM International Conference on Data Mining, 2002.
[14] J. Pei, J. Han, S. Nishio, S. Tang, and D. Yang, “H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases,” Proc. of 2001 Int. Conf. on Data Mining, 2001.
[15] J. Pei, J. Han, and R. Mao, “CLOSET: An efficient algorithm for mining frequent closed itemsets,” Proc. of SIGMOD, 2000.
[16] A. Savasere, E. Omiecinski, and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. of the VLDB Conference, 1995.
[17] R. Agrawal, and R. Srikant, “Fast algorithms for mining association rules,” Proc. of VLDB, 1994.
[18] S. Brin, R. Motwani, D. Jeffrey, and T. Shalom, “Dynamic itemset counting and implication rules for market basket data,” Proc. of SIGMOD, 1997.
[19] IBM, QUEST Data Mining Project, http://www.almaden.ibm.com/cs/quest
[20] http://www.ecn.purdue.edu/KDDCUP/

簡易檢索 / 詳目顯示

相關論文