跳到主要內容

簡易檢索 / 詳目顯示

研究生: 張簡雅文
Ya-Wen Changchien
論文名稱: 挖掘數值資料之關聯分類規則
Mining Associative Classification Rules from Numerical Data
指導教授: 陳彥良
Yen-Liang Chen
口試委員:
學位類別: 博士
Doctor
系所名稱: 管理學院 - 資訊管理學系
Department of Information Management
畢業學年度: 98
語文別: 英文
論文頁數: 113
中文關鍵詞: 數值資料資料探勘基因演算法關聯分類規則
外文關鍵詞: Data Mining, Genetic Algorithm, Associative Classification, Numerical Data
相關次數: 點閱:11下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 關聯式分類是一種資料探勘方法,以關聯規則建構出分類系統。過去研究指出關聯式分類相較於傳統分類方法(如C4.5及ILP),有較高的分類準確率,然而關聯式分類存在無法處理數值資料以及表達數值資料間關係之缺點。傳統分類方法中的歸納邏輯規劃 (ILP)具有易於關係表達以及對於問題表示與問題特定限制上較具彈性等優點。零容錯率、無法有效處理數值資料以及關係中的參數過多會影響處理效率是納邏輯規劃方法的缺點。本研究首先提出一個多層生物特徵結構的基因演算法(PGA),改善歸納邏輯規劃系統的缺點。此結構可以表示數值資料間的關係,將之應用於關聯式分類規則編碼並建構出一個關聯式分類系統,以期兼具表達數值資料關係及高分類準確率之優點。實驗結果顯示本研究提出之方法(GA-ACR)具有高預測分類準確率,且優於根據資料分佈決定分類類別之資料分佈法。


    Associative classification, one of data mining techniques, is a classification system based on associative classification rules. Although associative classification is more accurate than traditional classification approaches, such as C4.5 and ILP, it cannot handle numerical data and its relations. Therefore, an ongoing research problem is how to build associative classifiers from numerical data. Inductive logic programming (ILP), one of traditional classification approaches, has great capability of relations representation, and flexibility for problem representation and problem-specific constraints. However, it is not suitable for noisy environment and has weak facilities for processing numerical data, including unsatisfactory learning time with a large number of arguments in the relations. A phenotypic genetic algorithm(PGA) with multi-level phonotypic encoding structure is proposed to solve the problems in the ILP system. This structure has great capability of relations representation between numerical data and is used for relations encoding between numerical data in associative classification rules mining. The experiment results show that the proposed approach(GA-ACR) has high prediction accuracy and is highly competitive when compared with the data distribution method.

    Abstract I 中文摘要 II 誌謝 III Contents IV List of Tables VI List of Figures VII Chapter 1 Introduction 1 1.1 Background 1 1.2. Organization of this Dissertation 4 Chapter 2 Related Works 5 2.1. Data Mining 5 2.2. Inductive Logic Programming (ILP) 7 2.3. Genetic Algorithm (GA) 11 2.3.1. Encoding Structure of GA 13 2.3.2. Multi-level Structure of GA 15 2.4. Associative Classification Rules 18 2.4.1. CBA 20 2.4.2. CMAR 22 2.4.3. CPAR 23 2.4.4. Summary 25 Chapter 3 Phenotypic Encoding Structure of GA 26 3.1. Problem Definition 27 3.2. PGA Algorithm 29 3.2.1. Multi-Level Encoding 31 3.2.2. Fitness Evaluation 34 3.2.3. Crossover 35 3.2.4. Mutation 38 3.3. Experiments 41 3.3.1. Experiment 1: Discover Top-k Rules 41 3.3.2. Experiment 2: Sensitivities of Different Settings 45 3.3.3. Summary 48 Chapter 4 Mining Associative Classification Rules from Numerical Data 49 4.1. Problem Definition 49 4.2. A Phenotypic Encoding Genetic Algorithm for Classification Rule Mining 54 4.2.1 A Phenotype Encoding Genetic Algorithm 55 4.2.2. Classify a New Tuple into a Class 61 Chapter 5 Experiments 63 5.1 Experiment Environment 63 5.2 Experiment 1: Predictive Accuracy 65 5.3 Experiment 2: Sensitivity Test 68 5.3.1 Sensitivity Test for Associative Classification 68 5.3.1 Sensitivity Test for Genetic Algorithm 80 5.4 Summary 101 Chapter 6 Conclusions 102 6.1 Implications for Academic Researches 103 6.2 Implications for Business Practitioners 103 6.3 Future Works 104 References 105

    1. S. B. Achelis, Technical Analysis from A to Z. McGraw-Hill, New York, 2000.
    2. R. Agrawal, T. Imielinski, and A. Swami, “Mining association rules between sets of items in large databases”, ACM SIGMOD Record, 22(2), 207-216, 1993.
    3. R. Agrawal, H. Mannila, R. Srikant, H. Tovionen and A.I. Verkamo, “Fast discovery of association rules”, Advances in knowledge discovery and data mining table of contents, 307-328, 1996.
    4. R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases”, in Proceedings of the 20th International Conference on Very Large Data Bases, 1994.
    5. K. Ali, S. Manganaris and R. Srikant, “Partial classification using association rules”, in Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, The AAAI Press, Newport Beach, California,115-118, 1997.
    6. M. Antonie and O. Zaiane, “An associative classifier based on positive and negative rules”, in Proceedings of the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Paris, France: ACM Press, 64–69, 2004.
    7. M. Antonie, O. Zaiane and A. Coman, “Associative classifiers for medical images”, Mining Multimedia and Complex Data (Lecture Notes in Artificial Intelligence, Vol. 2797), Berlin: Springer, 68–83, 2003.
    8. J.F. Baldwin and T.P. Martin, “Learning uncertain logic Programs from examples”, in Proceedings of the 2nd International Workshop on Logic Programming and Soft Computing, Manchester, UK, 1998.
    9. E. Baralis and P. Torino, “A lazy approach to pruning classification rules”, in Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), Maebashi City, Japan, 35-42, 2002.
    10. E. Baralis, S. Chiusano and P. Graza, “On support thresholds in associative classification”, in Proceedings of the 2004 ACM Symposium on Applied Computing, Nicosia, Cyprus: ACM Press, 553–558, 2004.
    11. R.J. Bauer, Jr., Genetic algorithms and investment strategies. Wiley, New York, 1994.
    12. A. Berson, S. Smith, and K. Thearling, Building data mining applications for CRM. McGraw-Hill New York, 2000.
    13. P.K. Chan, W. Fan, A.L. Prodromidis and S.J. Stolfo, “Distributed data mining in credit card fraud detection”, Intelligent Systems and Their Applications, IEEE (IEEE Intelligent Systems), 14(6), 67-74, 1999.
    14. Y.W. Chang Chien and Y.L. Chen, “A Phenotypic Genetic Algorithm for Inductive Logic Programming”, Expert Systems with Applications, Vol. 36, Issue 3, Part 2, 6935-6944, 2009.
    15. G. Chen, H. Liu, L. Yu, Q. Wei and X. Zhang, “A new approach to classification based on association rule mining”, Decision Support System, Vol.42, 674-689, 2006.
    16. M.S. Chen, J. Han, and P. S. Yu, “Data mining: an overview from a database perspective”, IEEE Transactions on Knowledge and Data Engineering, 8(6), 866-883, 1996.
    17. S.F. Chen and Y. Liu, “The application of multi-level genetic algorithms in assembly planning”, Journal of Industrial Technology, 17(4), 1-9, 2001.
    18. R.G. Cowell, S.L. Lauritizen, A.P. David and D.J. Spiegelhalter, Probabilistic networks and expert systems. Springer-Verlag, New York, 1999.
    19. D. Dasgupta and D. R. McGregor, “sGA: a structured genetic algorithm”, technical report, University of Strathclyde, UK, 1992.
    20. N. Dellaert, J. Jeunet and N. Jonard, “A genetic algorithm to solve the general multi-level lot-sizing problem with time-varying costs”, International Journal of Production Economics, 68(3), 241-257, 2000.
    21. R. Durbin, S.R. Eddy, A. Krogh and G. Mitchison, Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, 1998.
    22. W.J. Frawley, G. Piatetsky-Shapiro, and C. J. Matheus, “Knowledge discovery in databases: an overview”, AI Magazine, 13(3), 57-70, 1992.
    23. P. Giudici, Applied data mining: statistical methods for business and industry. Wiley, 2003.
    24. N. Guofang, L. Minqiang and K. Jisong, “Design and Analysis of Multi-level Genetic Algorithm with its Application to the Construction of Clock Binary Tree”, IJCSNS, 6(2), 75-82, 2006.
    25. J. Han and M. Kamber, Data mining: concepts and techniques. 2nd edition, Morgan Kaufmann, 2006.
    26. J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation”, ACM SIGMOD Record, 29(2), 1-12, 2000.
    27. J. Hipp, U. Guntzer, and G. Nakhaeizadeh, “Algorithms for association rule mining: general survey and comparison”, ACM SIGKDD Explorations Newsletter, 2(1), 58-64, 2000.
    28. J. H. Holland, Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT Press, Cambridge, 1992.
    29. F.V. Jensen, Bayesian networks and decision graphs. Springer-Verlag, New York, 2001.
    30. D.R. Jobman, The Handbook of Technical Analysis. Irwin, New York, 1995.
    31. B. Kovalerchuk and E. Vityaev, “Inductive logic programming for discovering financial regularities”, in Proceedings of the Workshop on Data Mining in Finance (KDD 98), New York, 1-16, 1998.
    32. B. Kovalerchuk and E. Vityaev, “Comparison of relational methods and attribute-based methods for data mining in intelligent systems”, in Proceedings of the 1999 IEEE international Symposium on Intelligent Control/ Intelligent Systems and Semiotics, 162-166, 1999.
    33. B. Kovalerchuk and E. Vityaev, Data mining in finance: advanced in relational and hybrid methods. Kluwer Academic, Norwell, MA, 2000.
    34. B. Kovalerchuk , E. Vityaev E and H. Yusupov, “Symbolic methodology in numeric data mining: relational techniques for financial applications”, ArXiv Computer Science e-prints, 1-20, 2002.
    35. R. Kumar, K. Izui, Y. Masataka and S. Nishiwaki, “Multilevel Redundancy Allocation Optimization Using Hierarchical Genetic Algorithm”, IEEE Transactions on reliability, 57(4), 650-661, 2008.
    36. R. Kumar, K. Izui, Y. Masataka and S. Nishiwaki, “Multilevel Redundancy Allocation Optimization Using Hierarchical Genetic Algorithm”, Reliability Engineering and System Safety, 94(4), 891-904, 2009.
    37. N. Lavarc, S. Dzeroski and M. Grobelnik, “Learning non-recursive definitions of relations with LINUS”, in Proceedings of the 5th European Working Session on Learning, Springer-Verlag, 265-281, 1991.
    38. W. Li, J. Han and J. Pei, “CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules”, in Proceedings of ICDM 2001, 369-376, 2001.
    39. H.Y. Liu, J. Chen and G. Chen, “Mining insightful classification rules directly and efficiently”, in Proceedings of the 1999 IEEE International Conference on Systems Man and Cybernetics, IEEE Computer Society, Tokyo, 911-916, 1999.
    40. B. Liu, W. Hsu and Y. Ma, “Integrating classification and association rule mining”, in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), New York City, USA, 80–86, 1998.
    41. B. Liu, M. Hu and W. Hsu, “Multi-level organization and summarization of the discovered rules”, in Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2000), Boston: ACM Press, 208-217, 2000.
    42. B. Liu. Y. Ma and C.K. Wong, “Classification using association rules: weakness and enhancements”, in V. Kumar, et al., (Eds.), Data Mining for Scientific and Engineering Application, 591, 2001.
    43. R. Mattison, Data warehousing and data mining for telecommunications. Artech House, Inc. Norwood, MA, USA, 1997.
    44. C. Merz and P. Murphy, UCI repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science, 1996.
    45. S. Muggleton, “Bayesian Inductive Logic Programming”, in Proceedings of the 7th Annual Conference on Computational Learning Theory, ACM, New York, 3-11, 1994.
    46. S. Muggleton, “Inductive Logic Programming: derivations, successes and shortcomings”, SIGART Bulletin, 5(1), 5-11, 1994.
    47. S. Muggleton and C. Feng, “Efficient induction of logic programs”, in Proceedings of the 1st Conference on Algorithmic Learning Theory, Ohmsma, Tokyo, Japan, 368-381, 1990.
    48. J.J. Murphy, Intermarket technical analysis: trading strategies for the global stock, bond, commodity, and currency markets. Wiley, New York, 1991.
    49. M. Pazzani, C. Brunk and G. Silverstein, “An information-based approach to integrating empirical and explanation-based learning”, in S. Muggleton(ed) Inductive Logic Programming, Academic Press, London, 373-394, 1992.
    50. J. Pearl, Reasoning in intelligent systems: networks of plausible inference. Morgan Kaufman, PALO ALTO, CA, 1991.
    51. G. Piatetsky-Shaprio, U. Fayyad and P. Smyth, “From data mining to knowledge discovery”, in G. Piatetsky-Shaprio, U. Fayyad, P. Smyth (Eds.), An Overview. Advanced in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1-35, 1996.
    52. H. Prade and M. Serrurier, “Getting adaptability or expressivity in inductive logic programming by using fuzzy predicates”, in Proceedings of the IEEE International Conference on Fuzzy Systems, Budapest, Hungary, Vol. 1, 73-77, 2004.
    53. J.R. Quinlan, “Induction of decision trees”, Machine Learning, 1(1), 81-106, 1986.
    54. J.R. Quinlan, “Simplifying decision trees”, International Journal of Man-Machine Studies, 27(3), 221-234, 1987.
    55. J.R. Quinlan, “Learning logical definitions from relations”, Machine Learning, 5(3), 239-266, 1990.
    56. J. R. Quilan, C4. 5: programs for machine learning. Morgan Kaufmann, 1993.
    57. L.R. Rabiner and B.H. Juang, “An introduction to hidden markov models”, IEEE ASSP Magazine, 3(1), 4-16, 1986.
    58. L.R. Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition”, in Proceedings of the IEEE, 77(22), 257-286, 1989.
    59. L. Raedt and K. Kersting, “Probabilistic logic learning”, ACM SIGKDD Explorations Newsletters, 5(1), 34-48, 2003.
    60. N. Safaei, S.J. Sadjadi, and M. Babakhani, “An efficient genetic algorithm for determining the optimal price discrimination”, Applied Mathematics and Computation, Vol. 181, 1693-1702, 2006.
    61. M. Serrurier, D. Dubois, H. Prade and T. Sudkamp, “Learning fuzzy rules with their implication operators”, Data & Knowledge Engineering, Vol. 60, 71-89, 2007.
    62. E.Y. Shapiro, Algorithmic program debugging, MIT Press, 1983.
    63. R. Sullivan, A. Timmermann, and H. White, The dangers of data-driven inference: the case of calendar effects in stock returns. LSE Financial Markets Group, 1998.
    64. F. Thabath, “A review of associative classification mining”, Knowledge Engineering Review, 22(1), 37-65, 2007.
    65. F. Thabtah, P. Cowling and Y. Peng, “MMAC: A new multi-class, multi-label associative classification approach”, in Proceedings of the 4th IEEE International Conference on Data Mining (ICDM’04), Brighton, UK, 217–224, 2004.
    66. F. Thabtah, P. Cowling and Y. Peng, “MCAR: Multi-class classification based on association rule approach”, in Proceeding of the 3rd IEEE International Conference on Computer Systems and Applications, Cairo, Egypt, 1–7, 2005.
    67. S. Thawornwong, D. Enke and C. Dagli, “Neural networks as a decision maker for stock trading: a technical analysis approach”, Journal of Smart Engineering Systems Design, Vol. 5, 1-13, 2003.
    68. K. Wang, L. Tang, J. Han and J. Li, “Top down FP-Growth for association rule mining”, in Proceedings of 6th Pacific-Asia conference on Knowledge Discovery and Data Mining, 2002.
    69. K. Wang, S. Zhou and Y. He, “Growing decision tree on support-less association rules”, in Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, 265–269, 2000.
    70. X. Xu, G. Han and H. Min, “A novel algorithm for associative classification of images blocks”, in Proceedings of the 4th IEEE International Conference on Computer and Information Technology, Lian, Shiguo, China, 46–51, 2004.
    71. X. Yin and J. Han, “CPAR: Classification based on predictive association rules”, in Proceedings of the Third SIAM International Conference on Data Mining, San Francisco, CA, USA, 208–217, 2003.
    72. O. Zaiane and A. Antonie, “Classifying text documents by associating terms with text categories”, in Proceedings of the 13th Australasian Database Conference (ADC’02), Melbourne, Australia, 215–222, 2002.
    73. M. J. Zaki, “Generating non-redundant association rules”, in Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, 2000.
    74. M. J. Zaki, S. Parthasarathy, M. Ogihara and W. Li, “New algorithms for fast discovery of association rules”, in Proceedings of 3rd Intl. Conf. on Knowledge Discovery and Data Mining, 1997.

    QR CODE
    :::