| 研究生: |
張元哲 Yuan-Che Chang |
|---|---|
| 論文名稱: |
FP-tree(Frequent Pattern Tree)的調整維護技術研究 |
| 指導教授: |
陳彥良
Yen-Liang Chen |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理學系 Department of Information Management |
| 畢業學年度: | 89 |
| 語文別: | 中文 |
| 論文頁數: | 51 |
| 中文關鍵詞: | 資料挖掘 |
| 外文關鍵詞: | Data Mining, association rules, FP-tree |
| 相關次數: | 點閱:9 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
然而上述的兩類方法都需要掃描資料庫兩次以上,在超大型的交易資料庫上進行這樣的動作將相當費時,因此如果能夠依照交易資料庫的異動情況,以incremental的方式修改large itemsets來維護association rules的正確性,又不需要重新掃描資料庫,便相當有價值。過去所提出的incremental維護方法,均是針對第(1)類方法所進行的研究,而第(2)類方法目前都還沒有人提出相關的incremental維護方法,本研究便是針對第(2)類方法中的FP-tree結構,提出incremental維護演算法FPI(FP-tree Incremental),可以在資料庫發生insert、delete或update時,不需要重新掃描整個資料庫即可很有效率地動態調整FP-tree,使它維持正確的結構。另外一種常見的情況是產生association rule的minimum support變小時,由於FP-tree並未包含原先minimum support下屬於infrequent item的node,故需要重新掃描資料庫來建構新的FP-tree,FPI在這種情況下也可以達到動態調整的目的,而不用浪費掃描整個資料庫的I/O時間來重新建構FP-tree。
由於FP-tree通常遠小於交易資料庫本身而可以放在主記憶體中,所以一旦完成FP-tree的建構,以FP-growth來產生frequent patterns就非常的迅速,因此它所花費的主要時間便在掃描資料庫的I/O和建構FP-tree之上,本研究所提供的FPI演算法可以即時地維護FP-tree正確結構,使association rules可以快速產生。
[1] R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules. Proc. Int''l Conf. Very Large Data Bases, 487-499 (September 1994).
[2] J.S. Park, M.S. Chen, and P.S. Yu. An Effective Hash-Based Algorithm for Mining Association Rules. Proc. ACM-SIGMOD Int''l Conf. Management of Data, 175-186 ( May 1995).
[3] A. Savasere, E. Omiecinski, and S. Navathe. An Efficient Algorithm for Mining Association Rules in Large Databases. Proc. Int''l Conf. Very Large Data Bases, 432-444 (Sept. 1995).
[4] S. Brin, R. Motwani, J. Ullman and S. Tsur. Dynamic Itemset Counting and Implication Rules for Market Basket Data. In Proc. of the 1997 ACM-SIGMOD Conf. on Management of Data, 255-264 (1997).
[5] Mohammed Javeed Zaki, Srinivasan Parthasarathy, Wei Li and Mitsunori Ogihara. Evaluation of sampling for data mining of association rules. Technical Report 617, Computer Science Dept., U. Rochester, (May 1996).
[6] G. Gunopulos, H. Mannila and S. Saluja. Discovering All Most Specific Sentences by Randomized Algorithms. In Proc. of the 6th Int''l Conf. on Database Theory, 215-229 (1997).
[7] J. Roberto and Jr. Bayardo. Efficiently Mining Long Patterns from Databases. In Proc. of the ACM-SIGMOD Int''l Conf. on Management of Data, 85-93 (1998).
[8] Nicolas Pasquier, Yves Bastide, Rafik Taouil and Lotfi Lakhal. Efficient mining of association rules using closed itemset lattices. Information Systems, Volume: 24, Issue: 1, 25-46 (March 1999)
[9] S.J. Yen and A.L.P. Chen. An Efficient Approach to Discovery Knowledge from Large Database. Proceeding of the IEEE/ACM International Conference on Parallel and Distributed Information Systems, 8-18 (1996).
[10] R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for generation of frequent itemsets. In J. Parallel and Distributed Computing, (2000).
[11] Jiawei Han , Jian Pei and Yiwen Yin. Mining Frequent Patterns without Candidate Generation. Proc. 2000 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD''00), 1-12 (May 2000).
[12] J. Pei and J. Han. Can We Push More Constraints into Frequent Pattern Mining? Proc. 2000 Int. Conf. on Knowledge Discovery and Data Mining (KDD''00), Boston, MA, (August 2000).
[13] J. Han and J. Pei. Mining Frequent Patterns by Pattern-Growth: Methodology and Implications. ACM SIGKDD Explorations (Special Issue on Scaleble Data Mining Algorithms), 2(2) (December 2000).
[14] J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. Proc. 2001 Int. Conf. on Data Engineering (ICDE''01), Heidelberg, Germany, (April 2001).
[15] E. M. Rains. Increasing subsequences and the classical groups. Elec. J. Combin. 5 (1998).