跳到主要內容

簡易檢索 / 詳目顯示

研究生: 李翊銘
Yi-Ming Lee
論文名稱: 從交易資料庫中以自我推導方式探勘具有多層次FP-tree
Mining Self-derivable Multilevel FP-tree From a Transactional Database
指導教授: 蔡孟峰
Meng-Feng Tsai
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
畢業學年度: 95
語文別: 英文
論文頁數: 61
外文關鍵詞: multilevel association rule, FP-growth, FP-tree, Apriori, association rule mining
相關次數: 點閱:9下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在探勘關聯式規則的領域裡,一些近期的研究中顯示出一些比Apriori-like演算法還要好的方法。探勘頻繁模式在發掘關聯式規則的領域裡占有很重要的角色。在過去,Apriori-like的方法被用在探勘頻繁模式中,但是這些方法對於探勘的工作過程中過於沒效率,這是因為在探勘過程中有著多次重複掃描資料庫和不斷遞迴式地靠著模式比對來產生大量候選者的集合。一個被叫作FP-tree精簡結構被發展出用來改善之前Apriori-like方法的缺點。靠著由J. Han所提出的FP-growth方法,我們可以更便利地去探勘多頻繁模式,雖然FP-growth在探勘多頻繁模式領域中,相較一些作法是一個比較有效率的方法,但是探勘的結果對管理者和決策者來說可能太過詳細。我們提出一個探勘具有較高階層次的頻繁模式的想法,也就是說那些較低階的頻繁模式和精簡的結構可以更加地被歸納和簡化。我們基本的概念是利用FP-tree的特性和結構並且根據一個現存自定的階層關係進行探勘工作。有鑒於此,我們提供有效率提升方法使得原本的FP-tree可以進而成為一個較高階層次的FP-tree。在我們的方法中,被轉換的高層次FP-tree仍保有原本FP-tree的特性。藉由這些提升方法,我們可以達到低階FP-tree到高階FP-tree轉換的目的,並提供管理者具歸納性質的資訊,在實驗結果中也顯示出我們所提出方法的效果性。


    Some recent works have showed the improved approaches which are certainly better than original Apriori-like algorithms for mining association rules. Mining frequent patterns (itemsets) plays an important role of discovering association rules. In the past, Apriori-like methods were adopted to mine frequent itemsets. But these approaches are inefficient to perform a mining task. This is a result from its repeatedly scans of database and iteratively checking a large set of candidates by pattern matching. A compact structure, called FP-tree, was developed to improve the disadvantages of Apriori-like algorithms. By FP-growth approach, proposed by J. Han, we can facilitate mining frequent itemsets. Although FP-growth is a relatively more efficient approach for mining frequent itemsets, the results deduced by FP-growth may be too detailed to satisfy managers or policymakers. We proposed that lower level frequent itemsets and those compressed data within a FP-tree can be generalized furthermore for mining higher level frequent itemsets. Our basic idea is employing the properties and structure of FP-tree according to an existed conceptual hierarchy on mined items. We then provide efficient evolution algorithms to modify the original FP-tree to a higher level FP-tree. In our approaches, the transformed FP-tree still retains the properties of primitive FP-tree. By these novel approaches, we can effectively achieve the goals of transforming from a lower level FP-tree to a higher one, providing more generalized information to managers. Our experimental results also show the effectiveness of the proposed methods.

    中文摘要 I ABSTRACT II ACKNOWLEDGEMENT IV TABLE OF CONTENTS V LIST OF TABLES VII LIST OF FIGURES VIII 1 INTRODUCTION 1 2 BACKGROUND AND RELATED WORK 6 2.1 A DATA WAREHOUSE VS. DATA MINING 6 2.2 ASSOCIATION RULES MINING 8 2.3 APRIORI ALGORITHM AND RELATED IMPROVEMENT 10 2.4 FREQUENT PATTERN TREE 12 3 PROBLEM DESCRIPTION 15 4 THE PROPOSED EVOLUTION APPROACHES 24 4.1 THE EVOLUTION APPROACHES 25 4.2 VALIDATION OF THE PROCESSES 46 5 EXPERIMENT 48 5.1 GENERATION OF SYNTHETIC DATA 48 5.2 GENERATION OF A CONCEPTUAL HIERARCHY 50 5.3 COMPARISON OF RUN TIME FOR MINING FREQUENT ITEMSETS 52 5.4 COMPARISON OF REDUCED SIZES 54 5.5 COMPARISON OF EVOLUTION AND REBUILDING 55 5.6 COMPARISON OF NUMBER OF RULES 56 6 CONCLUSION 59 REFERENCE 61

    [AIS93b] R. Agrawal, T. Imielinski,and A. Swami. Mining association rules between sets of items in large databases. In Proc. 1993 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD''93), pages 207-216, Washington, DC, May 1993.
    [AS94b] R. Agrawal, R. Srikant. Fast algorithm for mining association rules. In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB''94), pages 487-499, Santiago, Chile, Sept. 1994.
    [CD97] S. Chaudhuri and U. Dayal. An overview of data warehousing and OLAP technology. ACM SIGMOD Record, 26:65-74, March 1997.
    [HPY00] J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proc. 2000 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD''00), pages 1-12, Dallas, TX, May 2000.
    [KMR+94] M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and I. Verkamo. Finding interesting rules from large sets of discovered association rules. In Proc. 3rd Int. Conf. Information and Knowledge Management (CIKM''94), pages 401-408, Gaithersburg, MD, Nov. 1994.
    [PCY95a] J. S. Park, M. S. Chen, and P. S. Yu. An effective hash-based algorithm for mining association rules. In Proc. 1995 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD''95), pages 175-186, San Jose, CA, May 1995.
    [PS91a] G. Piatetsky-Shapiro. Discovery, analysis and presentation of strong rules. In G. Piatetsky-Shapiro and W. J. Frawley, editors, Knowledge Discovery in Databases. Pages 229-238, Cambridge, MA: AAAI/MIT Press, 1991.
    [SON95] A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In Proc. 1995 Int. Conf. Very Large Data Bases (VLDB''95), pages 432-443, Zurich, Switzerland, Sept. 1995.
    [STA98] S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with relational database systems: Alternatives and implications. In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD''98), pages 343-354, Seattle, WA, June 1998.
    [Toi96] H. Toivonen. Sampling large databases for association rules. In Proc. 1996 Int. Conf. Vert Large Data Base (VLDB''96), pages 134-145, Bombay, India, Sept. 1996.

    QR CODE
    :::