跳到主要內容

簡易檢索 / 詳目顯示

研究生: 翁政雄
Cheng-Hsiung Weng
論文名稱: 從不精準或不確定性資料中挖掘關聯規則
Discovering Association Rules from Imprecise or Uncertain Data
指導教授: 陳彥良
Yen-Liang Chen
口試委員:
學位類別: 博士
Doctor
系所名稱: 管理學院 - 資訊管理學系
Department of Information Management
畢業學年度: 97
語文別: 英文
論文頁數: 101
中文關鍵詞: 可能性理論不確定性資料不精確序數資料資料探勘關聯規則模糊集合
外文關鍵詞: data mining, association rule, fuzzy sets, uncertain data, imprecise ordinal data, possibility theory
相關次數: 點閱:8下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 關聯規則是一項用用以分析資料間關聯性的研究。現今已經有許多關於名詞性資料(nominal data)的關聯規則研究,然而,卻沒有關於處理序數資料(ordinal data)的研究。除此之外,先前的研究通常假設資料是精確的而且確定的,然而,這項假設條件是不符合真實世界的狀況。由於人們的疏失、儀器測量或是紀錄上的限制導致無法精準紀錄精準的資料,所以真實世界的資料是不精確或是不確定的。因此,本研究提出一個結合資料探勘、模糊集合(Fuzzy sets)、可能性理論(Possibility theory)等技術的探勘方式,從不確定性資料中挖掘關聯規則(Discovering Association Rules from Imprecise or Uncertain Data, DARIUD),用以引導如何從不精確或是不確定的的資料中挖掘出有趣、多資料型態、高度確定性的關聯規則。本研究提出三種資料型態的研究,來證明這個新探勘方式的可行性(Workable)和其一般化(Generalization),並引導這些領域結合的新研究。


    Association rule mining is an emerging data analysis method that can discover associations within data. Although there have been numerous studies on finding association rules from nominal data, few have tried to do so from ordinal data. Additionally, previous mining algorithms usually assume that the input data is precise and certain. Unfortunately, real-world data tends to be uncertain due to human errors, instrument errors, recording errors, and so on. Therefore, a question arises immediately is how we can mine association rules from imprecise or uncertain data. Therefore, this study devotes to proposing a work process, Discovering Association Rules from Imprecise or Uncertain Data (DARIUD), to hold more general viewpoint combining Data Mining, Fuzzy Sets and Possibility theory fields for discovering interesting and certain patterns. The purpose of the process is to establish a cooperative relationship to understand and analyze the investigating steps of association rules mining from imprecise or uncertain Data. Three researches were proposed to demonstrate that the process, DARIUD, can be workable and capable of generalization to future studies in the three fields.

    1. INTRODUCTION 1 1.1 DESCRIPTION OF THE PROCESS 3 1.2 ORGANIZATION OF THE DISSERTATION 5 2. LITERATURE REVIEW 7 2.1 COMPLETENESS OF PATTERNS 7 2.2 DIMENSIONS OF PATTERNS 9 2.3 LEVELS OF PATTERNS 10 2.4 DATA-TYPES OF PATTERNS 11 2.5 KINDS OF RULES 12 2.6 FUZZY SETS APPLICATION 12 3. MINING ASSOCIATION RULES FROM IMPRECISE ORDINAL DATA 16 3.1 RESEARCH PROBLEM 16 3.2 PROBLEM DEFINITION 18 3.3 ALGORITHM FOR MINING ASSOCIATION RULES FROM IMPRECISE ORDINAL DATA 23 3.4 EXPERIMENT RESULTS 28 3.5 SUMMARY AND MANAGERIAL IMPLICATIONS 34 4. MINING FUZZY ASSOCIATION RULES FROM QUESTIONNAIRE DATA 35 4.1 RESEARCH PROBLEM 35 4.2 PROBLEM DEFINITION 37 4.3 AN ALGORITHM FOR MINING FUZZY ASSOCIATION RULES FROM QUESTIONNAIRE DATA 44 4.4 EXPERIMENT RESULTS 51 4.5 SUMMARY AND MANAGERIAL IMPLICATIONS 57 5. MINING FUZZY ASSOCIATION RULES FROM UNCERTAIN DATA 59 5.1 RESEARCH PROBLEM 59 5.2 POSSIBILITY THEORY APPLICATION 60 5.3 USING POSSIBILITY THEORY TO REPRESENT UNCERTAIN DATA 61 5.4 PROBLEM DEFINITION 64 5.5 ALGORITHM FOR MINING FUZZY ASSOCIATION RULE FROM UNCERTAIN DATA 72 5.6 EXPERIMENT RESULT 79 5.7 SUMMARY AND MANAGERIAL IMPLICATIONS 86 6. CONCLUSIONS AND FUTURE WORKS 88 REFERENCES 90 APPENDIXES 99 APPENDIX A. 99 APPENDIX B. 99 PUBLICATION LIST 101

    [1] F. Afrati, and A. Gionis, H. Mannila, Approximating a collection of frequent sets, in: Proc. 2004 ACM SIGKDD International Conference Knowledge Discovery in Database (KDD''04), Seattle, WA, 2004, pp. 12-19.
    [2] C.C. Aggarwal, and P.S. Yu, A new framework for itemset generation, in: Proc. 1998 ACM Symp. Principles of Database Systems (PODS''98), Seattle, WA, 1999, pp. 18-24.
    [3] R. Agrawal, T. Imielinski, and A. Swami, Mining association rules between sets of items in large databases, in: Proc. ACM SIGMOD, 1993, pp. 207-216.
    [4] R. Agrawal, R. Srikant, Fast algorithms for mining association rules, in: Proc. 1994 International Conference on Very Large Data Bases, 1994, pp. 487-499.
    [5] J. Alcala-Fdez, R. Alcala, M.J. Gacto, F. Herrera, Learning the membership function contexts for mining fuzzy association rules by using genetic algorithms, Fuzzy Sets and Systems, In Press, Corrected Proof, Available online 29 May 2008.
    [6] A. Arslan, M. Kaya, Determination of fuzzy logic membership functions using genetic algorithms, Fuzzy Sets and Systems 118 (2) (2001) 297-306.
    [7] W.H. Au, and K.C.C. Chan, FARM: A data mining system for discovering fuzzy association rules, in: Proc. FUZZ-IEEE''99, vol. 3, 1999, pp. 22-25.
    [8] W. H. Au, and K. C. C. Chan, Mining fuzzy association rules in a bank-account database, IEEE Transaction on Fuzzy Systems 11 (2) (2003) 238-248.
    [9] S. Auephanwiriyakul, J.M. Keller, and A. Adrian, Management questionnaire analysis through a linguistic hard C-means, Fuzzy Information Processing Society, NAFIPS. 19th International Conference of the North American, 2000, pp. 402-406.
    [10] Y. Aumann, and Y. Lindell, A statistical theory for quantitative association rules, Journal of Intelligent Information Systems 20 (3) (2003) 255-283.
    [11] M. Berry, and G. Linoff, Data Mining Techniques: For Marketing, Sales, and Customer Support, Wiley, New York, 1997.
    [12] S. Brin, R. Motwani, and C. Silverstein, Beyond market basket: Generalizing association rules to correlations, in: Proc. 1997 ACM-SIGMOD International Conference Management of Data (SIGMOD''97), Tucson, AZ, 1997, pp. 265-276.
    [13] S.E. Chang, S.W. Changchien, and R.H. Huang, Assessing users'' product-specific knowledge for personalization in electronic commerce, Expert Systems with Applications 30 (4) (2006) 682-693.
    [14] M.S. Chen, J. Han, and P.S. Yu, Data mining: An overview from a database perspective, IEEE Transactions on Knowledge and Data Engineering 8 (6) (1996) 866-883.
    [15] G. Chen, and Q. Wei, Fuzzy association rules and the extended mining algorithms, Information Sciences 147 (1-4) (2002) 201-228.
    [16] Y.L. Chen, C.H. Weng, Mining association rules from imprecise ordinal data, Fuzzy Sets and Systems 159 (4) (2008) 460-474.
    [17] Y.L. Chen, and T.C.K. Huang, A new approach for discovering fuzzy quantitative sequential patterns in sequence databases, Fuzzy Sets and Systems 157 (12) (2006) 1641-1661.
    [18] Y.L. Chen, and T.C.K. Huang, Discovering fuzzy time-interval sequential patterns in sequence databases, IEEE Transactions on Systems, Man, and Cybernetics Part B 35 (5) (2005) 959-972.
    [19] D.W. Cheung, V.T. Ng, A.W. Fu, and Y.J. Fu, Efficient mining of association rules in distributed databases, IEEE Transactions on Knowledge and Data Engineering 8 (6) (1996) 911-922.
    [20] A. Conci, and E.M.M.M. Castro, Image mining by content, Expert Systems with Applications 23 (4) (2002) 377-383.
    [21] S.K. De, and P.R. Krishna, Clustering web transactions using rough approximation, Fuzzy Sets and Systems 148 (1) (2004) 131-138.
    [22] M. Delgado, N. Marin, D. Sanchez, and M.A. Vila, Fuzzy association rules: general model and applications, IEEE Transactions on Fuzzy Systems 11 (2) (2003) 214-225.
    [23] A. Denguir-Rekik, G. Mauris, and J. Montmain, Propagation of uncertainty by the possibility theory in Choice integral-based decision making: application to an E-commerce website choice support, IEEE Transactions on Instrumentation and Measurement 55 (3) (2006) 721- 728.
    [24] N. Doherty, F. Ellis-Chadwick, and C. Hart, An analysis of the factors affecting the adoption of the Internet in the UK retail sector, Journal of Business Research 56 (11) (2003) 887-897.
    [25] D. Dubois, Possibility theory and statistical reasoning, Computational Statistics and Data Analysis 51 (1) (2006) 47-69.
    [26] C.F. Eick, A. Rouhana, A. Bagherjeiran, and R. Vilalta, Using clustering to learn distance functions for supervised similarity assessment, Engineering Applications of Artificial Intelligence 19 (4) (2006) 395-401.
    [27] L. Feng, T. Dillon, and J. Liu, Inter-transactional association rules for multi-dimensional contexts for prediction and their application to studying meteorological data, Data & Knowledge Engineering 37 (1) (2001) 85-115.
    [28] A.W.C. Fu, M.H. Wong, S.C. Sze, W.C. Wong, W.L. Wong, and W.K. Yu, Finding fuzzy sets for the mining of fuzzy association rules for numerical attributes, in: Proc. Int. Symposium Intelligent Data Engineering Learning (IDEAL’ 98), Hong Kong, 1998, pp. 263-268.
    [29] J. Han and Y. Fu, Mining multiple-level association rules in large databases, IEEE Transactions on Knowledge and Data Engineering 11 (5) (1999) 798-805.
    [30] J. Han, and Kamber, M., Data Mining: Concepts and Techniques, Morgan Kaufmann, San Francisco, 2006.
    [31] J. Han, J. Pei, and Y. Yin, Mining frequent patterns without candidate generation, in: Proc. 2000 ACM-SIGMOD International Conference Management of Data (SIGMOD''00), Dallas, TX, 2000, pp. 1-12.
    [32] J. Han, J. Pei, Y. Yin, and R. Mao, Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach, Data Mining & Knowledge Discovery 8 (1) (2004) 53-87.
    [33] J.D. Holt, and S.M. Chung, Mining association rules using inverted hashing and pruning, Information Processing Letters 83 (4) (2002) 211-220.
    [34] T.P. Hong, and J.B. Chen, Finding relevant attributes and membership functions, Fuzzy Sets and Systems 103 (3) (1999) 389-404.
    [35] T.P. Hong, C.S. Kuo, and S.C. Chi, Mining association rules from quantitative data, Intelligent Data Analysis 3 (5) (1999) 363-376.
    [36] T.P. Hong, C.S. Kuo, and S.C. Chi, Mining fuzzy sequential patterns from quantitative data, in: Proc. IEEE International Conference on Systems, Man, and Cybernetics, vol. 3, 1999, pp. 762-966.
    [37] T.P. Hong, and C.Y. Lee, Induction of fuzzy rules and membership functions from training examples, Fuzzy Sets and Systems 84 (1) (1996) 33-47.
    [38] T.P. Hong, K.Y. Lin, and B.C. Chien, Mining fuzzy multiple-level association rules from quantitative data, Applied Intelligence 18 (1) (2003) 79-90.
    [39] T.P. Hong, K.Y. Lin, and S.L. Wang, Fuzzy data mining for interesting generalized association rules, Fuzzy Sets and Systems 138 (2) (2003) 255-269.
    [40] T.P. Hong, C.S. Kuo, and S.L. Wang, A fuzzy AprioriTid mining algorithm with reduced computational time, Applied Soft Computing 5 (1) (2004) 1-10.
    [41] H.M. Hsu, and W.P. Wang, Possibilistic programming in production planning of assemble-to-order environments, Fuzzy Sets and Systems 119 (1) (2001) 59-70.
    [42] Y.C. Hu, R.S. Chen, and G.H. Tzeng, Discovering fuzzy association rules using fuzzy partition methods, Knowledge-Based Systems 16 (3) (2003) 137-147.
    [43] Y.C. Hu, and G.H. Tzeng, Elicitation of classification rules by fuzzy data mining, Engineering Applications of Artificial Intelligence 16 (7-8) (2003) 709-716.
    [44] J. Hun, and Y. Fu, Discovery of multiple-level association rules from large databases, in: Proc. 21st Internat. Conf. on Very Large Databases, Zurmh, Switzerland, 1995, pp. 420-431.
    [45] W.A. Kamakura, B.S. Kossar, M. Wedel, Identifying Innovators for the Cross-Selling of New Products, Management Science 50 (8) (2004) 1120–1133.
    [46] M. Kamber, J. Han, and J.Y. Chiang, Metarule-guided mining of multi-dimensional association rules using data cubes, in: Proc. 1997 Conference Knowledge Discovery and Data Mining (KDD''97), Newport Beach, CA, 1997, pp. 207-210.
    [47] P. Kotler, Marketing Management – Analysis, Planning, Implementation, and Control, Prentice-Hall, NJ, 1997.
    [48] C.M. Kuok, A. Fu, and M.H. Wong, Mining fuzzy association rules in databases, SIGMOD Record 27 (1) (1998) 41-46.
    [49] Y.K. Lee, W.Y. Kim, Y.D. Cai, and J. Han, Comine: Efficient mining of correlated pattern, in: Proc. 2003 International Conference Data Mining (ICDM''03), Melbourne, FL, 2003, pp. 581-584.
    [50] J.H. Lee, and H.L. Kwang, An extension of association rules using fuzzy sets, presented at the IFSA’97, Prague, Czech Republic, 1997.
    [51] J.W.T. Lee, An ordinal framework for data mining of fuzzy rules, in: Proc. The Ninth IEEE International Conference on Fuzzy Systems, FUZZ IEEE 2000, San Antonio, TX, 2000, pp. 399-404.
    [52] B. Lent, A. Swami, and J. Widom, Clustering association rules, in: Proc. 1997 International Conference Data Engineering (ICDE''97), Birmingham, England, 1997, pp. 220-231
    [53] W. Lian, D.W. Cheung, and S.M. Yiu, An efficient algorithm for finding dense regions for mining quantitative association rules, Computers and Mathematics with Applications 50 (3-4) (2005) 471-490.
    [54] Y. Loiseau, H. Prade, and M. Boughanem, Qualitative pattern matching with linguistic terms, AI Communications 17 (1) (2004) 25-34.
    [55] H. Lu, J. Han, and L. Feng, Stock price movement prediction and N-dimensional inter-transaction association rules, ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Seattle, WA, USA, 1998, pp. 12.1-12.7.
    [56] G. Marshall, The purpose, design and administration of a questionnaire for data collection, Radiography 11 (2) (2005), 131-136.
    [57] S. Medasani, J. Kim, and R. Krishnapuram, An overview of membership function generation techniques for pattern recognition, International Journal of Approximate Reasoning 19 (3-4) (1998) 391-417.
    [58] R.J. Miller, and Y. Yang, Association rules over interval data, in: Proc. ACM SIGMOD International Conference on Management of data, 1997, pp. 452-461.
    [59] S. Mohamed, and A.K. McCowan, Modelling project investment decisions under uncertainty using possibility theory, International Journal of Project Management 19 (4) (2001) 231-241.
    [60] E.R. Omiecinski, Alternative interesting measures for mining associations, IEEE Transactions on Knowledge and Data Engineering 15 (1) (2003) 57-69.
    [61] M. Oussalah, H. Maaref, and C. Barret, New fusion methodology approach and application to mobile robotics: investigation in the framework of possibility theory, Information Fusion 2 (1) (2001) 31-48.
    [62] K.G., Palshikar, M.S. Kale and M.M. Apte, Association rules mining using heavy itemsets, Data & Knowledge Engineering 61 (1) (2007) 93-113.
    [63] J.S. Park, M.S. Chen, and P.S. Yu, An effective hash-based algorithm for mining association rules, in: Proc. ACM SIGMOD International Conference on Management of Data, San Jose, CA, 1995, pp. 175-186.
    [64] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, Discovering frequent closed itemsets for association rules, in: Proc. 7th International Conference Database Theory (ICDT''99), Jerusalem, Israel, 1999, pp. 398-416.
    [65] W. Pedrycz, Fuzzy set technology in knowledge discovery, Fuzzy Sets and Systems 98 (3) (1998) 279-290.
    [66] J. Pei, J. Han, and R. Mao, CLOSET: An efficient algorithm for mining frequent closed itemsets, in: Proc. 2000 ACM-SIGMOD International Workshop Data Mining and Knowledge Discovery (DMKD''00), Dallas, 2000, pp. 11-20.
    [67] G. Piatetsky-Shapiro, Discovery, analysis, and presentation of strong rules, in: Knowledge discovery in Databases, AAAI/MIT, Menlo Park, Calif., USA, 1991, pp. 229-248.
    [68] H. Prade, and C. Testemale, Generalizing database relational algebra for the treatment of incomplete or uncertain information and vague queries, Information Sciences 34 (22) (1984) 115-143.
    [69] R. Rastogi, and K. Shim, Mining optimized support rules for numeric attributes, Information Systems 26 (6) (2001) 425-444.
    [70] T. J. Ross, Fuzzy logic with engineering applications, McGraw-Hill, Inc. 1995.
    [71] A. Savasere, E.R. Ommcinskl, and S.B. Navathe, An efficient algorithm for mining association rules in large databases, in: Proc. 21st International Conference on Very Large Databases, Zurich, Switzerland, 1995, pp. 432-444.
    [72] R. Srikant, and R. Agrawal, Mining Quantitative Association Rules in Large Relational Tables, SIGMOD, 1996, pp. 1-12.
    [73] R. Srikant, Q. Vu, R. Agrawal, Mining association rules with item constraints, in: Proc. ACM SIGMOD International Conference on Management of Data, Montreal, Canada, 1996, pp. 1-12.
    [74] A.K.H. Tung, H. Lu, J. Han, and L. Feng, Efficient mining of intertransaction association rules, IEEE Transactions on Knowledge and Data Engineering 15 (1) (2003) 43-56.
    [75] J. Wang, J. Han, Y. Lu, and P. Tzvetkov, TFP: An efficient algorithm for mining top-k frequent closed itemsets, IEEE Transactions on Knowledge and Data Engineering 17 (5) (2005) 652-664.
    [76] J. Wang, J. Han, and J. Pei, CLOSET+: searching for the best strategies for mining frequent closed itemsets, in: Proc. 2003 ACM SIGKDD International Conference Knowledge Discovery and Data Mining (KDD''03) 5, Washington, DC, 2003, pp. 236-245.
    [77] X. Wu, C. Zhang, and S. Zhang, Database classification for multi-database mining, Information Systems 30 (1) (2005) 71-88.
    [78] D. Xin, J. Han, X. Yan, and H. Cheng, Mining compressed frequent-pattern sets, In Proc. 2005 Int. Conf. Very Large Data Bases (VLDB''05), Trondheim, Norway, 2005, pp. 709-720.
    [79] J.S. Yue, E. Tsang, D. Yenng, and S. Daming, Mining fuzzy association rules with weighted items, In Proc. IEEE International Conference on Systems, Man, and Cybernetics, vol. 3 (8-11), 2000, pp. 1906-1911.
    [80] H. Yun, D. Ha, B. Hwang, K. and Ho Ryu, Mining association rules on significant rare data using relative support, The Journal of Systems & Software 67 (3) (2003) 181-191.
    [81] L.A. Zadeh, Fuzzy sets, Information and Control 8 (1965) 338-353.
    [82] L.A. Zadeh, Fuzzy logic and approximate reasoning, Synthese 30 (3-4) (1975) 407-428.
    [83] L.A. Zadeh, The concept of a linguistic variable and its application to approximate reasoning I II III, Information Sciences 8 (1975) 199-249; 301-357; 9, 43-93.
    [84] L.A. Zadeh, Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets and Systems 1 (1978) 3-28.
    [85] L.A. Zadeh, PRUF-a meaning representation language for natural languages, International Journal of Man-Machine Studies, 10 (1978) 395-460.
    [86] L.A. Zadeh, A theory of approximate reasoning, in: J.E. Hayes, D. Mitchie, L.I. Mikulich (Eds.), Machine Intelligence, vol. 9, Wiley, New York, 1979, pp. 149-194.

    QR CODE
    :::