跳到主要內容

簡易檢索 / 詳目顯示

研究生: 林國瑞
Kuo-Zui Lin
論文名稱: 時序資料庫中緊密頻繁連續事件型樣之有效探勘
ClosedPROWL: Efficient Mining of Closed Frequent Continuities in Temporal Databases
指導教授: 張嘉惠
Chia-Hui Chang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
畢業學年度: 92
語文別: 中文
論文頁數: 40
中文關鍵詞: 型樣探勘緊密頻繁連續事件交易間關聯性探勘資料探勘
外文關鍵詞: Pattern Mining, Closed Frequent Continuities, Inter-Transaction Association Mining, Data Mining
相關次數: 點閱:15下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在資料探勘的領域中,型樣探勘一直是個相當重要的課題。早期,大部分的研究如頻繁項目集,主要在找尋同一筆交易中項目間的關聯性。近來,為能更有效地預測分析資料庫的行為趨勢,學者開始將焦點集中在交易間關聯性之探勘,用來描述不同交易間項目彼此的關係。連續事件即為一種交易間關聯性型樣,其明確描述著不同交易之間的相對位置與前後順序等關係。由於連續事件跨越了交易記錄間的藩籬,以致於潛在型樣與規則的數量急遽增加,如此不但會降低整體演算法的效率,還會使探勘結果難以運用,因此我們選擇探勘緊密頻繁連續事件。緊密頻繁連續事件是一群具有代表性的頻繁連續事件,不但數量相對較少,且可以由其展開列舉出所有的頻繁連續事件,因此具有消除冗餘資訊又不喪失其完整性的優點。本篇論文中,我們提出一個有效率的演算法ClosedPROWL,主要採用投影視窗列表技術以進行緊密頻繁連續事件的探勘。實驗結果顯示,不論在合成資料集或真實資料集,相較於之前其他方法,我們的演算法皆擁有更佳的效能與延展性。


    Mining frequent patterns in temporal databases is a fundamental and essential problem in data mining areas. Over the past few years a considerable number of studies have been made in frequent itemset mining, which consider only relationships among items in the same transaction. Recently, researchers began to focus the problem on the inter-transaction association that describes the association relationships among different transactions. A continuity is a kind of inter-transaction association which describes definite temporal relationships among different transactions. Since continuities breaks the barrier of transactions, the number of potential patterns will increase drastically. An alternative idea is to mine closed frequent continuities. Mining closed frequent patterns has the same power as mining the complete set of frequent patterns, while substantially reduce redundant rules to be generated and increase the effectiveness of mining. In this paper, we propose an efficient algorithm, ClosedPROWL, for closed frequent continuities mining by projected window list technology. Experimental evaluation on both real world and synthetic datasets shows that our algorithm is more efficient and scalable compared to previously proposed algorithm.

    第一章 緒論 1 1.1. 研究動機與目的 1 1.2. 研究貢獻 4 1.3. 論文架構 4 第二章 相關研究 5 2.1. 頻繁事件序探勘 5 2.1.1. WINEPI演算法 5 2.1.2. MINEPI演算法 6 2.2. 週期性型樣探勘 7 2.2.1. LSI演算法 7 2.2.2. SMCA演算法 9 2.3. 頻繁連續事件探勘 11 2.3.1 FITI演算法 11 第三章 問題定義 14 第四章 ClosedPROWL演算法 18 4.1. ClosedPROWL演算法架構 18 4.2. 緊密頻繁事件集之探勘 20 4.3. 緊密頻繁事件集編碼與資料庫轉換 21 4.4. 緊密頻繁連續事件之探勘 21 4.4.1. 探勘流程 21 4.4.2. 搜尋空間刪減技術 24 4.4.3. 緊密連續事件檢查機制 27 4.4.4. 實例說明 28 4.5. ClosedPROWL演算法正確性分析 30 第五章 實驗結果 32 5.1. 合成資料集(Synthetic Data) 32 5.1.1. 資料產生器說明 32 5.1.2. 效能與延展性分析 33 5.2.真實資料集(Real World Data) 37 第六章 結論 40 參考文獻 41

    1. R.C. Agarwal, C.C. Aggarwal, and V. Parsad. A tree projection algorithm for generation of frequent itemsets. In Journal of Parallel and Distributed Computing, 61(3): 350-371, 2001.
    2. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. of the 20th International Conference Very Large Data Bases (VLDB''94), pp. 487-499, 1994.
    3. M. N. Garofalakis, R. Rastogi, and K. Shim. Spirit: Sequential pattern mining with regular expression of constraints. IEEE Transactions on Knowledge and Data Engineering (TKDE), 14(3): 530-552, 2002.
    4. K.Y. Huang and C.H. Chang, Asynchronous periodic patterns mining in temporal databases, In Proc. of the IASTED International Conference on Databases and Applications (DBA), pp. 43-48, February 17-19, 2004, Austria.
    5. K.Y. Huang, C.H. Chang and K.Z. Lin, PROWL: An efficient frequent continuity mining algorithm on event sequences. In Proc. of 6th International Conference on Data Warehousing and Knowledge Discovery (DaWak''04), Septemper 1-3, 2004, Spain. To appear.
    6. J. Han and J. Pei. Mining frequent patterns by pattern-growth: Methodology and implications. ACM SIGKDD Explorations (Special Issue on Scalable Data Mining Algorithms), 2(2): 14-20, 2000.
    7. J. Han, J. Pei, Y. Yin, and R. Mao. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery: An International Journal(DMKD), 8(1): 53-87, 2004.
    8. H. Mannila and H. Toivonen. Discovering generalized episodes using minimal occurrences. In Proc. of the International Conference on Knowledge Discovery and Data Mining, pp. 146-151, 1996.
    9. H. Mannila, H. Toivonen and A. I. Verkamo. Discovering frequent episodes in sequences. In Proc. of the First International Conference on Knowledge Discovery and Data Mining. (KDD''95), pp. 210-215, 1995.
    10. H. Mannila, H. Toivonen and A. I. Verkamo. Discovery of frequent episodes in event sequences. In Journal of the Data Mining and Knowledge Discovery, pp. 259-289, 1997.
    11. R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Proc. of the 5th International Conference on Extending Database Technology (EDBT''96), pp. 3-17, 1996.
    12. A. K. H. Tung, H. Lu, J. Han and L. Feng. Breaking the barrier of transactions: Mining inter-transaction association rules. In Proc. of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 297-301, 1999.
    13. A. K. H. Tung, H. Lu, J. Han and L. Feng. Efficient mining of intertransaction association rules. IEEE Transactions on Knowledge and Data Engineering, 15(1): 43-56, 2003.
    14. J. Yang, W. Wang, and P. S. Yu. Mining asynchronous periodic patterns in time series data. In Proc. of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD''00), pp. 275-279, 2000.
    15. J. Yang, W. Wang, and P. S. Yu. Mining asynchronous periodic patterns in time series data. IEEE Transactions on Knowledge and Data Engineering, 15(3): 613-628, 2003.
    16. M. J. Zaki. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering (TKDE), 12(3): 372-390, 2000.
    17. M. Zaki. Spade: An efficient algorithm for mining frequent sequences. Machine Learning, 42(1/2):31-60, 2001.
    18. M. J. Zaki and C. J. Hsiao. CHARM: An efficient algorithm for closed itemset mining. In Proc. of 2nd SIAM International Conference on Data Mining (SIAM’ 02), pp. 457-473, 2002.

    QR CODE
    :::