跳到主要內容

簡易檢索 / 詳目顯示

研究生: 許昌齡
Chang-Ling Hsu
論文名稱: 資料挖掘的多值及多標籤決策樹分類法
Multi-valued and Multi-labeled Decision Tree Classifiers for Data Mining
指導教授: 周世傑
Shih-Chieh Chou
口試委員:
學位類別: 博士
Doctor
系所名稱: 管理學院 - 資訊管理學系
Department of Information Management
畢業學年度: 92
語文別: 英文
論文頁數: 45
中文關鍵詞: 資料挖掘多值屬性多標籤分類決策樹
外文關鍵詞: multiple labels, classification, decision tree, data mining, Multi-valued attribute
相關次數: 點閱:9下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 現今,決策樹分類法要求屬性及類別標籤均須為單值。然而,真實世界存在著多值多標籤的資料,為了能處理此種多值多標籤資料的分類,本研究首先設計了一個決策樹分類法並命名為MMC (Multi-valued and Multi-labeled Classifier);其次,藉由重新設計此演算法,我們發展另一個分類法並命名為MMDT (Multi-valued and Multi-labeled Decision Tree) 以改善 MMC 的正確率。
    MMC 和 MMDT不同於傳統決策樹分類法的一些主要功能,包括生長決策樹、選擇屬性、以標籤代表葉節點及預測新的資料。MMC的發展策略主要基於多標籤間的相似度測量,而MMDT 的發展策略主要暨基於多標籤間的相似度測量及評分。
    實驗結果說明 MMC 和 MMDT 不僅能從大量的多值及多標籤資料集來挖掘出規則,而且得到具說服性的正確率和規則良好度。


    Presently, decision tree classifiers require that attributes and class label of data set to be single-valued. However, there exist classification problems with multi-valued and multi-labeled data. Aiming to handle this multi-valued and multi-labeled data, this research has developed a decision tree classifier named MMC (Multi-valued and Multi-labeled Classifier) first. Then, by redesigning the algorithm, this research has further developed another classifier named MMDT (Multi-valued and Multi-labeled Decision Tree) to improve the accuracy of MMC.
    MMC and MMDT are different from the traditional decision tree classifiers in some major functions including growing a decision tree, selecting attribute, assigning labels to represent a leaf and making a prediction for a new data. The development strategy of MMC is mainly based on measuring similarity among multiple labels; the development strategy of MMDT is mainly based on both measuring similarity and scoring among multiple labels.
    The experimental results show that MMC and MMDT can not only mine classification rules from a large multi-valued and multi-labeled data set, but also get convincing accuracy and goodness of rules.

    Chinese Abstract I English Abstract II Contents IV List of Figures VI List of Tables VII 1. Introduction 1 1.1 Background and Motivation 1 1.2 Statements of Problem 2 1.3 Purposes of the Study 3 2. Literature Review 4 2.1 Clarification for the Confusion among the Multi-labeled Data, Two-classed Data and Multi-classed Data 4 2.2 Difficulties in Handling the Multi-valued and Multi-labeled Data by Traditional Classifiers 5 3. The Algorithms 8 3.1 MMC and MMDT Related Affairs and Symbols 8 3.1.1 MMC and MMDT Related Affairs 8 3.1.2 Symbols of MMC and MMDT 11 3.2 The Algorithms of MMC and MMDT 12 3.2.1 Measuring the Label Similarity 13 3.2.2 The MMC Algorithm 15 3.2.2.1 To determine the internal node and its branches 16 3.2.2.2 To determine the leaf node 20 3.2.3 Label Ratio and the MMDT Algorithm 21 3.2.3.1 Label Ratio 22 3.2.3.1.1 Label Similarity 23 3.2.3.2 The MMDT Algorithm 25 3.2.3.2.1 Function next_attribute of MMDT 27 3.2.3.2.1.1 Function weighted-labelRatio 28 3.2.4 Label Prediction for New Data and Evaluation on the Label Prediction 29 4. Experiments 30 4.1 Experimental Design 30 4.2 Experimental Results 32 4.2.1 Comparisons between MMC and MMDT 32 4.2.2 Examination on the Behavior of MMC and MMDT 34 5. Summary and Conclusion 41 References 43

    Adams, W. J. and Yellen, J. L. (1976). Commodity Bundling and the Burden of Monopoly. Quarterly Journal of Economics, 90(3), 475-498.
    Agrawal, R., Ghosh, S., Imielinski, T., Iyer, B., and Swami, A. (1992). An Interval Classifier for Database Mining Applications. Proceedings of the 18th International Conference on Very Large Databases. (pp. 560-573). Vancouver, BC.
    Blake, C. L. & Merz, C. J. (2004). UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.
    Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees. Belmont, CA: Wadsworth International.
    Chen, Y.-L., Hsu, C.-L., and Chou, S.-C. (2003). Constructing a multi-valued and multi-labeled decision tree, Expert Systems with Applications, 25(2), 199-209.
    Date, C. J. (1999). An Introduction to Database Systems, 7th edition. Addison Wesley.
    Gehrke, J., Ramakrishnan, R., and Ganti V. (1998). Rainforest: A framework for fast decision tree construction of large datasets. Proceedings of the 24th International Conference on Very Large Databases. New York.
    Gordon, D. F. and Desjardins, Marie (1995). Evaluation and Selection of Biases in Machine Learning. Machine Learning, 20(1-2), 5-22.
    Guiltinan, J. P. (1987). The Price Bundling of Services: A Normative Framework. Journal of Marketing, 51(2), 74-85.
    Han, J., Nishio, S., Kawano, H., and Wang, W. (1998). Generalization-Based Data Mining in Object-Oriented Databases Using an Object-Cube Model. Data and Knowledge Engineering, 25(1-2), 55-97.
    Han, J. (2000). From Data Mining To Web Mining: An Overview. Conference tutorial (in PowerPoint), 2000 International Database Systems Conference. Hong Kong, ftp://ftp.fas.sfu.ca/pub/cs/han/slides/hkw00.ppt.
    Han, J. and Kamber, M. (2001). Data Mining: Concepts and Techniques. (pp. 279-333). San Francisco, CA: Morgan Kaufmann.
    Hettich, S. and Bay, S. D. (2004). The UCI KDD Archive [http://kdd.ics.uci.edu]. Irvine, CA: University of California, Department of Information and Computer Science.
    Kotler, P. (1999). Marketing Management: Analysis, Planning, Implementation, and Control. Prentice Hall.
    Mantaras, R. L. D. (1991). A Distance-Based Attribute Selection Measure for Decision Tree Induction. Machine Learning, 6, 81-92.
    Mehta, M., Agrawal, R., and Rissanen, J. (1996). SLIQ: A Fast Scalable Classifier for Data Mining. Proceedings of the Fifth International Conference on Extending Database Technology.
    Quinlan, J. R. (1979). Discovering rules from large collections of examples: a case study. In Michie, D. (Ed.), Expert Systems in the Microelectronic Age. Edinburgh, Scotland: Edinburgh University Press.
    Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81-106.
    Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
    Rastogi, R. and Shim, K. (1998). Public: A decision tree classifier that integrates building and pruning. Proceedings of the 24th International Conference on Very Large Databases.
    Shafer, J. C., Agrawal, R., and Mehta, M. (1996). SPRINT: a scalable parallel classifier for data mining. Proceedings of the 22nd International Conference on Very Large Databases. (pp. 544-555). Mumbai (Bombay), India.
    Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal, 27, 379-423; 623-656.
    Silver, E. A. and Peterson, R. (1985). Decision systems for inventory management and production planning, 2nd edition. New York: Wiley.
    Steinberg, D. and Colla, P. L. (1995). CART: Tree-Structured Nonparametric Data Analysis. San Diego, CA: Salford Systems.
    Umano, M., Okamoto, H., Hatono, I., Tamura, H., Kawachi, F., Umedzu, S., and Kinoshita, J. (1994). Fuzzy decision trees by fuzzy ID3 algorithm and its application to diagnosis systems. Proceedings of the third IEEE International Conference on Fuzzy Systems, 3. (pp. 2113-2118). Orlando, FL.
    Wang, K., Zhou, S., and Liew, S. C. (1999). Building hierarchical classifiers using class proximity. Proceedings of the 25th International Conference on Very Large Data Bases. (pp. 363-374). Edinburgh, Scotland.
    Wang, H., & Zaniolo, C. (2000). CMP: a fast decision tree classifier using multivariate predictions. Proceedings of the 16th International Conference on Data Engineering (pp. 449-460).
    Zaiane, O. R. and Han, J. (1995). Resource and knowledge discovery in global information systems: A preliminary design and experiment. Proceedings of the First International Conference on Knowledge Discovery and Data Mining. (pp. 331-336). Montreal, Quebec.

    QR CODE
    :::