| 研究生: |
楊士賢 Shi-Hsan Yang |
|---|---|
| 論文名稱: |
遞增資料關聯式規則探勘之改進 Extending SWF for Incremental Association Mining by Incorporating Previously Discovered Information |
| 指導教授: |
張嘉惠
Chia-Hui Chang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 畢業學年度: | 90 |
| 語文別: | 中文 |
| 論文頁數: | 69 |
| 中文關鍵詞: | 關聯式規則 、資料探勘 |
| 外文關鍵詞: | Data Mining, Association Rules |
| 相關次數: | 點閱:9 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
資料探勘在實際的應用上,已經從傳統的針對靜態的資料庫做探勘,演變成針對動態的資料庫做探勘,關聯規則的遞增探勘是其中較早為大家所重視的課題。近期對於關聯式法則遞增探勘提出的演算法有 FUP2、MAAP、PELICAN、SWF等,其中 SWF 在效能上優於其他同型的演算法。而在本篇論文中,我們提出了二個改進 SWF 的演算法-FI_SWF和CI_SWF,我們藉著儲存前一次探勘的頻繁項目集和支持度,對於目前探勘,我們只需要掃描資料庫變動的部分,即可得儲存的項目集的新支持度,不僅降低了在 SWF 中最後一次掃瞄資料庫的時間,也加速候選項目集的產生。在實驗中證明,改良後的 SWF 演算法確實能加快執行時間。雖然我們的演算法須要較多的硬體空間來儲存前一次的頻繁項目集或是侯選項目集,但是在最大記憶體的使用上是相當於SWF演算法。在實際的應用上,當資料探勘變成是一個重複而頻繁的工作時,執行時間更形重要,利用本篇論文提出的演算法來做資料探勘,是一個有效並簡單的好方法。
Incremental mining of association rules from dynamic databases refers to the maintenance
and utilization of the knowledge discovered in the previous mining operations.Sliding-
window-filtering (SWF)is a technique proposed to filtering false candidate 2-itemsets by
segmenting a transaction database into several partitions.SWF computes a set of candidate
2-itemsets that is close to frequent 2-itemsets.Therefore,it is possible to generate several candidate k -itemsets for one database scan.Such a database scan reduction technique greatly increase the performance for frequent itemsets discovery.In this paper,we extend SWF by incorporating previously discovered information and propose two algorithms to boost the
performance for incremental mining.The first algorithm FI SWF (SWF with Frequent
Itemset)reuse the frequent itemsets (and the counts)of previous mining task as FUP2 to
reduce the number of new candidate itemsets that have to be checked.The second algorithm
CI SWF (SWF with Candidate Itemset)reuse the candidate itemsets (and the counts)from the previously mining task.Experimental studies are performed to evaluate performance of the new algorithms.The study shows that the new incremental algorithm is signi ficantly faster than SWF.More importantly,the need for more disk space to store the previously discovered knowledge does not increase the maximum memory required during the execution time.
[1] R. Agarwal, C. Aggarwal, and V.V.V. Prasad. A Tree Projection Algorithm for Generation of Frequent Itemsets. Jornal of Parallel and Distributed Computing (Special Issue on High Performance Data Mining), 2000.
[2] R. Agrawal, T. Imielinski, and A. Swami. Mining Association Rules between Sets of Items in Large Databases. Proc.of ACMSIGMOD, pages 207—216, May 1993.
[3] R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. Proc. Of the 20th International Conference on Very Large Data Bases, pages 478—499, September 1994.
[4] N.F. Ayan, A.U. Tansel, and E. Arkun. An Ecient Algorithm to Update Large Itemsets with Early Pruning. Proc. of 1999 Int. Conf. on Knowledge Discovery and Data Mining, 1999.
[5] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic Itemset Counting and Implication Rules for Market Basket Data. ACM SIGMOD Record, 26(2):255—264, May 1997.
[6] M.-S. Chen, J. Han, and P.S. Yu. Data Mining: An Overview from Database Perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6):866—883, December 1996.
[7] M.-S. Chen, J.-S. Park, and P. S. Yu. Efficient Data Mining for Path Traversal Patterns. IEEE Transactions on Knowledge and Data Engineering, 10(2):209—221, April 1998.
[8] D. Cheung, J. Han, V. Ng, and C.Y. Wong. Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique. Proc. of 1996 Int’l Conf. on Data Engineering, pages 106—114, February 1996.
[9] D. Cheung, S.D. Lee, and B. Kao. A General Incremental Technique for Updating Discovered Association Rules. Proc. International Conference On Database Systems For Advanced Applications, April 1997.
[10] J. Han, L. V. S. Lakshmanan, and R. T. Ng. Constraint-Based, Multidimensional Data Mining. COMPUTER (special issues on Data Mining), pages 46—50, 1999.
[11] J. Han and J. Pei. Mining Frequent Patterns by Pattern-Growth: Methodology and Implications. ACM SIGKDD Explorations (Special Issue on Scaleble Data Mining Algorithms), December 2000.
[12] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.-C. Hsu. FreeSpan: Frequent pattern-projected sequential pattern mining. Proc.of 2000 Int.Conf.on Knowledge Discovery and Data Mining, pages 355—359, August 2000.
[13] J. Hipp, U. Güntzer, and G. Nakhaeizadeh. Algorithms for association rule mining —a general survey and comparison. SIGKDD Explorations, 2(1):58—64, July 2000.
[14] L. V. S. Lakshmanan, R. Ng, J. Han, and A. Pang. Optimization of Constrained Frequent Set Queries with 2-Variable Constraints. Proc. of 1999 ACM-SIGMOD Conf. on Management of Data, pages 157—168, June 1999.
[15] C.-H. Lee, C.-R. Lin and M.-S. Chen. Sliding-Window Filtering: An Efficient Algorithm for Incremental Mining. Proc. of the ACM 10th Intern''l Conf. on Information and Knowledge Management (CIKM-01), November 5-10, 2001.
[16] J.-L. Lin and M.H. Dunham. Mining Association Rules: Anti-Skew Algorithms. Proc.of 1998 Int’l Conf. on Data Engineering, pages 486—493, 1998.
[17] J.-S. Park, M.-S.Chen, and P.S.Yu. Using a Hash-Based Method with Transaction Trimming for Mining Association Rules. IEEE Transactions on Knowledge and Data Engineering, 9(5):8 3—825, October 1997.
[18] J. Pei and J. Han. Can We Push More Constraints into Frequent Pattern Mining? Proc. of 2000 Int. Conf. on Knowledge Discovery and Data Mining, August 2000.
[19] J. Pei, J. Han, and L.V.S. Lakshmanan. Mining Frequent Itemsets with Convertible Constraints. Proc. of the Intl. Conf. on Data Engineering, 2001.
[20] A. Savasere, E. Omiecinski, and S. Navathe. An Efficient Algorithm for Mining Association Rules in Large Databases. Proc. of the 21th International Conference on Very Large Data Bases, pages 432—444, September 1995.
[21] R. Srikant and R. Agrawal. Mining Generalized Association Rules. Proc. of the 21th International Conference on Very Large Data Bases, pages 407—49, September 1995.
[22] S. Thomas, S. Bodagala, K. Alsabti, and S. Ranka. An Efficient Algorithm for the Incremental Updating of Association Rules in Large Databases. Proc. of 1997 Intl. Conf. on Knowledge Discovery and Data Mining, 1997.
[23] H. Toivonen. Sampling Large Databases for Association Rules. Proc. of the 22th VLDB Conference,pages 34—1 45, September 1996.
[24] A.K.H. Tung, J. Han, L.V.S. Lakshmanan, and R. T. Ng. Constraint-Based Clustering in Large Databases. Proc. of 2001 Int. Conf. on Database Theory, January 2000.
[25] A. Veloso, B. Possas, W. Meira Jr., M. B. de Carvalho. Knowledge Management in Association Rule Mining, Integrating Data Mining and Knowledge Management, ICDM''01: The 2001 IEEE International Conference on Data Mining, California, USA. [26] K. Wang,Y. He and J. Han. Mining Frequent Itemsets Using Support Constraints. Proc.of 2000 Int. Conf. on Very Large Data Bases, September 2000.
[27] M. J. Zaki, S. Parthasarathy and W. Li. New Algorithm for Fast Discovery of Association Rules. In Proc. Of the 3rd Intl. Con. On Knowledge Discovery and Data Mining, 1997.
[28] Zhou Zequn and C.I. Ezeife. A Low-Scan Incremental Association Rule Maintenance Method, Proceedings of the fourteenth Canadian Conference on Artificial Intelligence, AI 2001, holding June 7 to June 9, 2001, Ottawa, Canada.