| 研究生: |
謝孟錫 Meng-Sei Hsieh |
|---|---|
| 論文名稱: |
分徑指標在建立決策樹的比較 |
| 指導教授: |
王丕承
Pe-Cheng Wang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 工業管理研究所 Graduate Institute of Industrial Management |
| 畢業學年度: | 90 |
| 語文別: | 中文 |
| 論文頁數: | 67 |
| 中文關鍵詞: | 資料探勘 、分類 、決策樹 、分徑指標 |
| 外文關鍵詞: | splitting index, data mining, classification, decision tree |
| 相關次數: | 點閱:18 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
資料探勘在近年來是非常受矚目的一個名詞,簡單來說,能從極龐大的資料中擷取出重要或有趣訊息的方法即為資料探勘,而它的目的主要概分為分類與預測(classification and prediction)及群集(clustering)兩種,其中決策樹是分類目的中接受度相當高的一種方法,它主要是利用演算法與電腦的便利性將資料受各變數影響的情形以樹狀展現出以達到分類的目的,並解決統計領域在資料量太大時無法全盤分析的問題,然而目前的文獻中幾乎都只對建立決策樹的演算法作探討,事實上當決定演算法後,如何選取分徑屬性與分徑點是一個關鍵,有鑑於此,本研究決定用建樹效率佳的BOAT演算法,再利用挖掘重要關聯規則的方法,將其應用到建立決策樹的分徑指標中,比較各指標包括Gini index、Entropy、λ、Rule interest與Laplace的預測率高低,並且利用它們節點數、最大階層數及平均階層數的不同透視樹的呈現有何異同。
參考文獻
BICKEL, P., RITOV, Y. and STOKER, T. (2001). Tailor-made tests for goodness of fit for semiparametric hypotheses. Unpublisher manuscript.
BREIMAN, L. (2001). Statistical Modeling: The Two Cultures. Statist. Sci. 16 3 199-231.
BREIMAN, L., FRIEDMAN, J., OLSHEN, R. and STONE, C. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA.
BRIN, S., MOTWANI, R. and SILVERSTEIN, C. (1997). Beyond market baskets: Generalizing association rules to correlations. In Proceedings of ACM SIGMOD Conference on Management of Data 265-276. Tucson, Arizona.
BRIN, S., MOTWANI, R., ULLMAN, J. and Tsur, S. (1997). Dymanic itemset counting and implication rules for market basket data. In Proceedings of ACM SIGMOD Conference on Management of Data 255-264. Tucson, Arizona.
CATLETT, J. (1991). Megainduction: Machine Learning on Very Large Databases. PhD thesis, Sydney Univ.
CHAN, P. and STOLFO, S. (1993). Experiment on multistrategy learning by meta-learning. In Proceedings of the Second International Conference on Information and Knowledge Management 314-323. Washington, DC.
CLARK, P. and BOSERLL, R. (1991). Rule induction with cn2: Some recent improvement. In Proceedings of the Fifth European Working Session on Learning 151-163. Springer, Berlin.
GREHRKE, J., GANTI, V., RAMAKRISHNAN. and LOH, W. Y. (1998). Rainforest-A framework for fast decision tree construction of large datasets. In Proceedings of the 1998 VLDB Conference 416-427. New York.
GREHRKE, J., GANTI, V., RAMAKRISHNAN. and LOH, W. Y. (1999). BOAT-optimistic decision tree construction. In Proceedings of the 1999 SIGMOD Conference 169-180. Philadelphia, Pennsylvania.
GOODMAN, L. and KRUSKAL, W. (1959). Measures of association for cross classifications, ii: Further discussion and References. J. Amer. Statist. Assoc. 54 123-163.
MEHTA, M., AGRAWAL, R. and RISSANEN, J. (1996). SLIQ: A fast scalable classifier for data mining. In Proceedings of the Fifth EDBT Conference 18-32. Avignon, France.
MEHTA, M., AGRAWAL, R. and SHAFER, J. (1996). SPRINT: A scalable parallel classifier for data mining. In Proceedings of the 1996 VLDB Conference 544-555. Mumbai (Bombay), India.
PIATETSKY-SHAPIRO, G. (1991). Discovery, analysis and presentation of strong rule. Know. Disc. in Databases 229-248. AAAI/MIT Press, Menlo Park, California.
QUINLAN, J. (1979). Induction over large database. Technical Report 79-14, Dept. Computer Science Stanford Uni.
QUINLAN, J. (1986). Induction of decision tree. Machine Learning 1 81-106.
STONE, M. (1974). Cross-validatory choice and assessment of statistical prediction. J. Roy. Statist. Soc. 36 111-147.
TAN, P., KUMAR and KUMAR, V. (2000). Interestingness measures for association patterns: A perspective. Technical Report 00-36, Dept. Computer Science Minnesota Uni.
WEISS, S. and KULIKOWSKI, C. (1991). Computer Systems that learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. Morgan Kaufmann.