| 研究生: |
趙書榮 Shu-Jun Chao |
|---|---|
| 論文名稱: |
FP-Tree不同實作方式之效能比較 FP-Tree in different implement methods to compare the performances |
| 指導教授: |
陳彥良
Y.L.Chen |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理學系 Department of Information Management |
| 畢業學年度: | 91 |
| 語文別: | 中文 |
| 論文頁數: | 31 |
| 中文關鍵詞: | 資料挖掘 、關聯規則 、演算法 、FP-tree |
| 外文關鍵詞: | data mining, association rule, algorithm, FP-tre |
| 相關次數: | 點閱:12 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
摘要
目前挖掘關聯規則的演算法可依需不需產生candidate itemset的作法分為兩類,例如Frequent-Pattern tree與Apriroi-like approach。此兩者最主要的差異在於,FP-tree並不產生candidate itemsets,它將資料庫壓縮在Frequent-Pattern tree的結構中,避免多次的高成本的資料庫掃瞄;後者是需要產生candidate itemset的方法。
而本文的目的是以應用Frequent-Pattern tree之理論,在實作方面以不同資料結構技術作效能比較測試,得到以那一種資料結構應用在Frequent-Pattern tree上執行時間之效能較佳。
在本文中共建立了(一)FP-tree_tail演算法,tail為在head table中增加一個tail欄位,(二)FP-tree_hash演算法,hash為以hash function計算出每個node所在位置方式建立FP-tree,(三)FP-tree_hash+tail演算法,為結合(一)、(二)之優點,所完成之演算法.,並將以上三個演算法與傳統FP-tree演算法一起比較,以找出各演算法之優缺點。經由本文實驗測試資料數據中,發現在各種實驗參數下,傳統FP-tree演算法所需花費之時間,為三個改良FP-tree演算法的數十倍。
now the algorithm in the association rules can be seperated two kinds.first is Apriori-like approach.second is Frequent-Pattern tree.main different between the above is the Frequent-Pattern tree did not to generate the candidate itemsets.its avoid a huge cost to scan database many times.
this paper apply three different data structure(FP-tree_tail,FP-tree_hash,FP-tree_hash+tail) to improve Frequent-Pattern tree algorithm .then to compare the performance about them ,
accroding to the test data we found the performance of the FP-tree alogrithm are worst then the other algorithms many times.
1. 陳彥良、凌俊青、許秉瑜,2001『在包裹式資料庫中挖掘數量關連規則』,資訊管理學報,第七卷‧第二期:215~229頁。
2. 陳彥良等,民90,『資料間隱含關係的挖掘與展望』, 二十一世紀台灣湧現中的資訊管理議題專家研討會,大溪,鴻禧山莊。
3. Agrawal, R. and Srikant, R. "Fast Algorithms for Mining Association Rules," Proc. of the 20th Int''l Conference on Very Large Databases, Santiago, Chile, Sep. 1994.
4. Chen, M.S., Han, J. and Yu, P.S. "Data Mining: An Overview from a Database Perspective,'' IEEE Transactions on Knowledge and Data Engineering, (8:6) 1996, pp: 866-883.
5. Han, J. and Kamber, M. Data mining: Concepts and Techniques, Academic Press, 2001.
6. Han, J., Pei, J. and Yin, Y. "Mining Frequent Patterns without Candidate Generation," Proc. 2000 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD''00), Dallas, TX, May 2000, pp: 1-12.
7. Li, S., Shen, H. and Cheng, L. "New Algorithms For Efficient Mining Of Association Rules," Information Sciences (118:1-4) 1999, pp:251-268.
8. Park, J-S., Chen, M-S. and Yu, P. S. "Using a Hash-Based Method with Transaction Trimming for Mining Association Rules," IEEE Trans. on Knowledge and Data Engineering (19:5) 1997, pp:813-825.
9. Pasquier, N., Bastide, Y., Taouil, R. and Lakhal, L. "Efficient Mining Of Association Rules Using Closed Itemset Lattices," Information Systems (24:1) 1999, pp:25-46.
10. Savasere, A., Omiecinski, E. and Navathe, S. "An Efficient Algorithm for Mining Association Rules in Large Databases," Proc. Int''l Conf. Very Large Data Bases, Zurich, Switzerland, Sep. 1995, pp:432-444.
11. Toivonen, H. "Sampling Large Databases For Association Rules," The 22th International Conference on Very Large Databases (VLDB''96), Mumbay, India, Sep. 1996, pp:134-145.