跳到主要內容

簡易檢索 / 詳目顯示

研究生: 張毓美
Yu-Mei Chang
論文名稱: 應用卡方獨立性檢定於關連式分類問題
Association Based Classification Using Chi-Square Independence Test
指導教授: 張嘉惠
Chia-Hui Chang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
畢業學年度: 90
語文別: 英文
論文頁數: 33
中文關鍵詞: 資料探勘關連式規則分類
外文關鍵詞: Association Rules, Classification, Data Mining
相關次數: 點閱:9下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 分類問題一直是機器學習領域中的主要問題。近年來,由於關連式規則挖掘技術的興起,使得越來越多的研究以關連式規則挖掘的技術來解決分類問題。在本篇論文中,我們研究及探討幾個關連式分類問題的方法,並且提出一個新的分類方法,此方法稱為ACC(意即「應用卡方獨立性檢定於關連式分類問題」)。ACC利用關連式規則挖掘技術找出所有頻繁且有趣的項目集,利用這些項目集建立屬性與屬性之間的關係。除此之外,ACC利用卡方獨立性檢定來檢測屬性與類別之間的關係,以保留與類別相關的頻繁集來做預測。我們使用UCI機器學習資料庫中的13個資料庫進行實驗,將我們的方法(ACC)與NB及LB兩種高效率及高正確性的方法做比較。實驗結果顯示,我們的方法在大多數的資料庫上優於NB及LB,亦是一種高效率及高正確性的分類方法。


    For many years, classification s one of the key problems in machine learning research. Since association rule mining is an important and highly active data mining research, there are more and more classification methods based on association rule mining techniques. In this thesis, we study several association based classification methods and provide the comparison of these classifiers. We present a new method, called ACC (i.e. Association based Classification using Chi-square Independence test), to solve the problems of classification.
    ACC finds frequent and interesting itemsets, which describe the relations between attributes. Moreover, it applies chi-square independence test to remain class-related itemsets for predicting new data objects. Besides, ACC provides an approach that considers the probability of missing value occurrence to solve the problem of missing value. Our method is experimented on 13 datasets from UCI machine learning database repository. We compare ACC with NB and LB, the state-of-the-art classifiers and the experimental results show that our method is a highly effective, accurate classifier.

    1 Introduction 1 1.1 Association Rule Mining .............................1 1.2 Concepts of Association Based Classification ..................2 1.3 Method and Goal.................................2 1.4 Organization of the Thesis............................3 2 Related Work 4 2.1 NB-Naïve Bayes Classifier ...........................4 2.2 LB-Large Bayes Classifier............................5 2.3 CBA -Classification Based on Associations ...................6 2.4 CMAR -Classification Based on Multiple Association Rules .........7 2.5 Comparison....................................7 2.6 Summary .....................................8 3 Classification Method 10 3.1 Learning Phase..................................12 3.1.1 Discovering Frequent Itemsets......................12 3.1.2 Discovering Interesting Itemsets.....................12 3.1.3 Discovering Class-Related Itemsets ...................13 3.2 Classification Phase................................14 3.3 Learning Algorithm................................15 3.4 Classification Algorithm.............................17 3.5 Zero Counts Smoothing .............................19 4 Experimental Results and Discussion 20 4.1 Parameter Setting.................................21 4.2 Experimental Results...............................22 4.3 Discussion.....................................22 4.3.1 The Effect of Missing Value .......................22 4.3.2 The Effect of Parameter Setting.....................26 5 Conclusion and Future Work.....................31

    [1] R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. VLDB-94 Sept 1994.
    [2] R. Duda and P. Hart. Pattern Classification and Scene Analysis John Wiley &Sons, 1973.
    [3] U. M. Fayyad and K. B. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. In Proc. of 13th Int. Joint Conference om Artificial Intelligence pages 1022 。V1027,1993.
    [4] P. M. Lewis. Approximation probability distributions to reduce storage requirements. Information and Control 2:214 。V225,1959.
    [5] W. Li, J. Han, and J. Pei. CMAR: Accurate and efficient classification based on multiple class-association. In ICDM, 2001.
    [6] B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In Proc. 4th Int’l Conf. on Knowledge Discovery and Data Mining 1998.
    [7] D. Meretakis and B. Wuthrich. Extending naive bayes classifiers using long itemsets. In KDD-99 pages 165, V174, 1999.
    [8] C. J. Merz and P. Murphy. UCI repository of machine learning databases,1996.
    [9] K. Wang, S. Zhou, and Y. He. Growing decision tree on support-less association rules. In KDD-00, Aug 2000.
    [10] D. H. Wolpert. The relationship between, pac the statistical physics framework, the bayesian framework, and the vc framework. The Mathematics of Generalization,1994.

    QR CODE
    :::