挖掘數值資料之關聯分類規則｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	張簡雅文 Ya-Wen Changchien
論文名稱：	挖掘數值資料之關聯分類規則 Mining Associative Classification Rules from Numerical Data
指導教授：	陳彥良 Yen-Liang Chen
口試委員:
學位類別：	博士 Doctor
系所名稱：	管理學院 - 資訊管理學系 Department of Information Management
畢業學年度：	98
語文別：	英文
論文頁數：	113
中文關鍵詞：	數值資料、資料探勘、基因演算法、關聯分類規則
外文關鍵詞：	Data Mining, Genetic Algorithm, Associative Classification, Numerical Data
相關次數：	點閱：11 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

關聯式分類是一種資料探勘方法，以關聯規則建構出分類系統。過去研究指出關聯式分類相較於傳統分類方法(如C4.5及ILP)，有較高的分類準確率，然而關聯式分類存在無法處理數值資料以及表達數值資料間關係之缺點。傳統分類方法中的歸納邏輯規劃 (ILP)具有易於關係表達以及對於問題表示與問題特定限制上較具彈性等優點。零容錯率、無法有效處理數值資料以及關係中的參數過多會影響處理效率是納邏輯規劃方法的缺點。本研究首先提出一個多層生物特徵結構的基因演算法(PGA)，改善歸納邏輯規劃系統的缺點。此結構可以表示數值資料間的關係，將之應用於關聯式分類規則編碼並建構出一個關聯式分類系統，以期兼具表達數值資料關係及高分類準確率之優點。實驗結果顯示本研究提出之方法(GA-ACR)具有高預測分類準確率，且優於根據資料分佈決定分類類別之資料分佈法。

Associative classification, one of data mining techniques, is a classification system based on associative classification rules. Although associative classification is more accurate than traditional classification approaches, such as C4.5 and ILP, it cannot handle numerical data and its relations. Therefore, an ongoing research problem is how to build associative classifiers from numerical data. Inductive logic programming (ILP), one of traditional classification approaches, has great capability of relations representation, and flexibility for problem representation and problem-specific constraints. However, it is not suitable for noisy environment and has weak facilities for processing numerical data, including unsatisfactory learning time with a large number of arguments in the relations. A phenotypic genetic algorithm(PGA) with multi-level phonotypic encoding structure is proposed to solve the problems in the ILP system. This structure has great capability of relations representation between numerical data and is used for relations encoding between numerical data in associative classification rules mining. The experiment results show that the proposed approach(GA-ACR) has high prediction accuracy and is highly competitive when compared with the data distribution method.

Abstract	I
中文摘要	II
誌謝	III
Contents	IV
List of Tables	VI
List of Figures	VII
Chapter 1 Introduction	1
1 Background	1
2. Organization of this Dissertation	4
Chapter 2 Related Works	5
1. Data Mining	5
2. Inductive Logic Programming (ILP)	7
3. Genetic Algorithm (GA)	11
3.1. Encoding Structure of GA	13
3.2. Multi-level Structure of GA	15
4. Associative Classification Rules	18
4.1. CBA	20
4.2. CMAR	22
4.3. CPAR	23
4.4. Summary	25
Chapter 3 Phenotypic Encoding Structure of GA	26
1. Problem Definition	27
2. PGA Algorithm	29
2.1. Multi-Level Encoding	31
2.2. Fitness Evaluation	34
2.3. Crossover	35
2.4. Mutation	38
3. Experiments	41
3.1. Experiment 1: Discover Top-k Rules	41
3.2. Experiment 2: Sensitivities of Different Settings	45
3.3. Summary	48
Chapter 4 Mining Associative Classification Rules from Numerical Data	49
1. Problem Definition	49
2. A Phenotypic Encoding Genetic Algorithm for Classification Rule Mining	54
2.1 A Phenotype Encoding Genetic Algorithm	55
2.2. Classify a New Tuple into a Class	61
Chapter 5 Experiments	63
1 Experiment Environment	63
2 Experiment 1: Predictive Accuracy	65
3 Experiment 2: Sensitivity Test	68
3.1 Sensitivity Test for Associative Classification	68
3.1 Sensitivity Test for Genetic Algorithm	80
4 Summary	101
Chapter 6 Conclusions	102
1 Implications for Academic Researches	103
2 Implications for Business Practitioners	103
3 Future Works	104
References	105

                                

1. S. B. Achelis, Technical Analysis from A to Z. McGraw-Hill, New York, 2000.
2. R. Agrawal, T. Imielinski, and A. Swami, “Mining association rules between sets of items in large databases”, ACM SIGMOD Record, 22(2), 207-216, 1993.
3. R. Agrawal, H. Mannila, R. Srikant, H. Tovionen and A.I. Verkamo, “Fast discovery of association rules”, Advances in knowledge discovery and data mining table of contents, 307-328, 1996.
4. R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases”, in Proceedings of the 20th International Conference on Very Large Data Bases, 1994.
5. K. Ali, S. Manganaris and R. Srikant, “Partial classification using association rules”, in Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, The AAAI Press, Newport Beach, California,115-118, 1997.
6. M. Antonie and O. Zaiane, “An associative classifier based on positive and negative rules”, in Proceedings of the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Paris, France: ACM Press, 64–69, 2004.
7. M. Antonie, O. Zaiane and A. Coman, “Associative classifiers for medical images”, Mining Multimedia and Complex Data (Lecture Notes in Artificial Intelligence, Vol. 2797), Berlin: Springer, 68–83, 2003.
8. J.F. Baldwin and T.P. Martin, “Learning uncertain logic Programs from examples”, in Proceedings of the 2nd International Workshop on Logic Programming and Soft Computing, Manchester, UK, 1998.
9. E. Baralis and P. Torino, “A lazy approach to pruning classification rules”, in Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), Maebashi City, Japan, 35-42, 2002.
10. E. Baralis, S. Chiusano and P. Graza, “On support thresholds in associative classification”, in Proceedings of the 2004 ACM Symposium on Applied Computing, Nicosia, Cyprus: ACM Press, 553–558, 2004.
11. R.J. Bauer, Jr., Genetic algorithms and investment strategies. Wiley, New York, 1994.
12. A. Berson, S. Smith, and K. Thearling, Building data mining applications for CRM. McGraw-Hill New York, 2000.
13. P.K. Chan, W. Fan, A.L. Prodromidis and S.J. Stolfo, “Distributed data mining in credit card fraud detection”, Intelligent Systems and Their Applications, IEEE (IEEE Intelligent Systems), 14(6), 67-74, 1999.
14. Y.W. Chang Chien and Y.L. Chen, “A Phenotypic Genetic Algorithm for Inductive Logic Programming”, Expert Systems with Applications, Vol. 36, Issue 3, Part 2, 6935-6944, 2009.
15. G. Chen, H. Liu, L. Yu, Q. Wei and X. Zhang, “A new approach to classification based on association rule mining”, Decision Support System, Vol.42, 674-689, 2006.
16. M.S. Chen, J. Han, and P. S. Yu, “Data mining: an overview from a database perspective”, IEEE Transactions on Knowledge and Data Engineering, 8(6), 866-883, 1996.
17. S.F. Chen and Y. Liu, “The application of multi-level genetic algorithms in assembly planning”, Journal of Industrial Technology, 17(4), 1-9, 2001.
18. R.G. Cowell, S.L. Lauritizen, A.P. David and D.J. Spiegelhalter, Probabilistic networks and expert systems. Springer-Verlag, New York, 1999.
19. D. Dasgupta and D. R. McGregor, “sGA: a structured genetic algorithm”, technical report, University of Strathclyde, UK, 1992.
20. N. Dellaert, J. Jeunet and N. Jonard, “A genetic algorithm to solve the general multi-level lot-sizing problem with time-varying costs”, International Journal of Production Economics, 68(3), 241-257, 2000.
21. R. Durbin, S.R. Eddy, A. Krogh and G. Mitchison, Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, 1998.
22. W.J. Frawley, G. Piatetsky-Shapiro, and C. J. Matheus, “Knowledge discovery in databases: an overview”, AI Magazine, 13(3), 57-70, 1992.
23. P. Giudici, Applied data mining: statistical methods for business and industry. Wiley, 2003.
24. N. Guofang, L. Minqiang and K. Jisong, “Design and Analysis of Multi-level Genetic Algorithm with its Application to the Construction of Clock Binary Tree”, IJCSNS, 6(2), 75-82, 2006.
25. J. Han and M. Kamber, Data mining: concepts and techniques. 2nd edition, Morgan Kaufmann, 2006.
26. J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation”, ACM SIGMOD Record, 29(2), 1-12, 2000.
27. J. Hipp, U. Guntzer, and G. Nakhaeizadeh, “Algorithms for association rule mining: general survey and comparison”, ACM SIGKDD Explorations Newsletter, 2(1), 58-64, 2000.
28. J. H. Holland, Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT Press, Cambridge, 1992.
29. F.V. Jensen, Bayesian networks and decision graphs. Springer-Verlag, New York, 2001.
30. D.R. Jobman, The Handbook of Technical Analysis. Irwin, New York, 1995.
31. B. Kovalerchuk and E. Vityaev, “Inductive logic programming for discovering financial regularities”, in Proceedings of the Workshop on Data Mining in Finance (KDD 98), New York, 1-16, 1998.
32. B. Kovalerchuk and E. Vityaev, “Comparison of relational methods and attribute-based methods for data mining in intelligent systems”, in Proceedings of the 1999 IEEE international Symposium on Intelligent Control/ Intelligent Systems and Semiotics, 162-166, 1999.
33. B. Kovalerchuk and E. Vityaev, Data mining in finance: advanced in relational and hybrid methods. Kluwer Academic, Norwell, MA, 2000.
34. B. Kovalerchuk , E. Vityaev E and H. Yusupov, “Symbolic methodology in numeric data mining: relational techniques for financial applications”, ArXiv Computer Science e-prints, 1-20, 2002.
35. R. Kumar, K. Izui, Y. Masataka and S. Nishiwaki, “Multilevel Redundancy Allocation Optimization Using Hierarchical Genetic Algorithm”, IEEE Transactions on reliability, 57(4), 650-661, 2008.
36. R. Kumar, K. Izui, Y. Masataka and S. Nishiwaki, “Multilevel Redundancy Allocation Optimization Using Hierarchical Genetic Algorithm”, Reliability Engineering and System Safety, 94(4), 891-904, 2009.
37. N. Lavarc, S. Dzeroski and M. Grobelnik, “Learning non-recursive definitions of relations with LINUS”, in Proceedings of the 5th European Working Session on Learning, Springer-Verlag, 265-281, 1991.
38. W. Li, J. Han and J. Pei, “CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules”, in Proceedings of ICDM 2001, 369-376, 2001.
39. H.Y. Liu, J. Chen and G. Chen, “Mining insightful classification rules directly and efficiently”, in Proceedings of the 1999 IEEE International Conference on Systems Man and Cybernetics, IEEE Computer Society, Tokyo, 911-916, 1999.
40. B. Liu, W. Hsu and Y. Ma, “Integrating classification and association rule mining”, in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), New York City, USA, 80–86, 1998.
41. B. Liu, M. Hu and W. Hsu, “Multi-level organization and summarization of the discovered rules”, in Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2000), Boston: ACM Press, 208-217, 2000.
42. B. Liu. Y. Ma and C.K. Wong, “Classification using association rules: weakness and enhancements”, in V. Kumar, et al., (Eds.), Data Mining for Scientific and Engineering Application, 591, 2001.
43. R. Mattison, Data warehousing and data mining for telecommunications. Artech House, Inc. Norwood, MA, USA, 1997.
44. C. Merz and P. Murphy, UCI repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science, 1996.
45. S. Muggleton, “Bayesian Inductive Logic Programming”, in Proceedings of the 7th Annual Conference on Computational Learning Theory, ACM, New York, 3-11, 1994.
46. S. Muggleton, “Inductive Logic Programming: derivations, successes and shortcomings”, SIGART Bulletin, 5(1), 5-11, 1994.
47. S. Muggleton and C. Feng, “Efficient induction of logic programs”, in Proceedings of the 1st Conference on Algorithmic Learning Theory, Ohmsma, Tokyo, Japan, 368-381, 1990.
48. J.J. Murphy, Intermarket technical analysis: trading strategies for the global stock, bond, commodity, and currency markets. Wiley, New York, 1991.
49. M. Pazzani, C. Brunk and G. Silverstein, “An information-based approach to integrating empirical and explanation-based learning”, in S. Muggleton(ed) Inductive Logic Programming, Academic Press, London, 373-394, 1992.
50. J. Pearl, Reasoning in intelligent systems: networks of plausible inference. Morgan Kaufman, PALO ALTO, CA, 1991.
51. G. Piatetsky-Shaprio, U. Fayyad and P. Smyth, “From data mining to knowledge discovery”, in G. Piatetsky-Shaprio, U. Fayyad, P. Smyth (Eds.), An Overview. Advanced in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1-35, 1996.
52. H. Prade and M. Serrurier, “Getting adaptability or expressivity in inductive logic programming by using fuzzy predicates”, in Proceedings of the IEEE International Conference on Fuzzy Systems, Budapest, Hungary, Vol. 1, 73-77, 2004.
53. J.R. Quinlan, “Induction of decision trees”, Machine Learning, 1(1), 81-106, 1986.
54. J.R. Quinlan, “Simplifying decision trees”, International Journal of Man-Machine Studies, 27(3), 221-234, 1987.
55. J.R. Quinlan, “Learning logical definitions from relations”, Machine Learning, 5(3), 239-266, 1990.
56. J. R. Quilan, C4. 5: programs for machine learning. Morgan Kaufmann, 1993.
57. L.R. Rabiner and B.H. Juang, “An introduction to hidden markov models”, IEEE ASSP Magazine, 3(1), 4-16, 1986.
58. L.R. Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition”, in Proceedings of the IEEE, 77(22), 257-286, 1989.
59. L. Raedt and K. Kersting, “Probabilistic logic learning”, ACM SIGKDD Explorations Newsletters, 5(1), 34-48, 2003.
60. N. Safaei, S.J. Sadjadi, and M. Babakhani, “An efficient genetic algorithm for determining the optimal price discrimination”, Applied Mathematics and Computation, Vol. 181, 1693-1702, 2006.
61. M. Serrurier, D. Dubois, H. Prade and T. Sudkamp, “Learning fuzzy rules with their implication operators”, Data & Knowledge Engineering, Vol. 60, 71-89, 2007.
62. E.Y. Shapiro, Algorithmic program debugging, MIT Press, 1983.
63. R. Sullivan, A. Timmermann, and H. White, The dangers of data-driven inference: the case of calendar effects in stock returns. LSE Financial Markets Group, 1998.
64. F. Thabath, “A review of associative classification mining”, Knowledge Engineering Review, 22(1), 37-65, 2007.
65. F. Thabtah, P. Cowling and Y. Peng, “MMAC: A new multi-class, multi-label associative classification approach”, in Proceedings of the 4th IEEE International Conference on Data Mining (ICDM’04), Brighton, UK, 217–224, 2004.
66. F. Thabtah, P. Cowling and Y. Peng, “MCAR: Multi-class classification based on association rule approach”, in Proceeding of the 3rd IEEE International Conference on Computer Systems and Applications, Cairo, Egypt, 1–7, 2005.
67. S. Thawornwong, D. Enke and C. Dagli, “Neural networks as a decision maker for stock trading: a technical analysis approach”, Journal of Smart Engineering Systems Design, Vol. 5, 1-13, 2003.
68. K. Wang, L. Tang, J. Han and J. Li, “Top down FP-Growth for association rule mining”, in Proceedings of 6th Pacific-Asia conference on Knowledge Discovery and Data Mining, 2002.
69. K. Wang, S. Zhou and Y. He, “Growing decision tree on support-less association rules”, in Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, 265–269, 2000.
70. X. Xu, G. Han and H. Min, “A novel algorithm for associative classification of images blocks”, in Proceedings of the 4th IEEE International Conference on Computer and Information Technology, Lian, Shiguo, China, 46–51, 2004.
71. X. Yin and J. Han, “CPAR: Classification based on predictive association rules”, in Proceedings of the Third SIAM International Conference on Data Mining, San Francisco, CA, USA, 208–217, 2003.
72. O. Zaiane and A. Antonie, “Classifying text documents by associating terms with text categories”, in Proceedings of the 13th Australasian Database Conference (ADC’02), Melbourne, Australia, 215–222, 2002.
73. M. J. Zaki, “Generating non-redundant association rules”, in Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, 2000.
74. M. J. Zaki, S. Parthasarathy, M. Ogihara and W. Li, “New algorithms for fast discovery of association rules”, in Proceedings of 3rd Intl. Conf. on Knowledge Discovery and Data Mining, 1997.

簡易檢索 / 詳目顯示

相關論文