| 研究生: |
王暉元 Hui-Yuan Wang |
|---|---|
| 論文名稱: |
混合式機器學習技術於破產預測之研究 |
| 指導教授: |
蔡志豐
Chih-Fong Tsai |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理學系在職專班 Executive Master of Information Management |
| 論文出版年: | 2018 |
| 畢業學年度: | 106 |
| 語文別: | 中文 |
| 論文頁數: | 79 |
| 中文關鍵詞: | 資料探勘 、混合架構 、Support vector machine 、Affinity propagation 、Logistics regression 、K-means |
| 外文關鍵詞: | Data mining, Support vector machine, Affinity propagation, Logistics regression, K-means, Hybrid machine learning model |
| 相關次數: | 點閱:15 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
企業對於財務困境的評估需求越來越高,因為全球企業的財務困境的案例越來越多,因此對於更有效的財務困境預測模型有更大的需求。機器學習技術在預測問題的應用,主要著重在不同的技術中尋找出一個準確率最高的模型來做為預測。僅使用監督式學習的單一模型,在預測的準確率上已經不容易有突破性的發展,導致出現一個新的趨勢,就是整合多種演算法來增加資料探勘的表現。混合式資料探勘乃是使用兩種以上的學習法的優點藉以提昇單一學習法的效能或效率。隨著混合分類器的漸漸成為研究的趨勢,使用混合式架構,整合多種技術的效果確實比單一分類技術還要來得好。過去的文獻鮮少相關研究探討不同類型的混合模式搭配的預測表現效果如何。
鑑於做出更準確預測的重要性,本研究將要比較機器學習與統計型的分類器搭,配機器學習或是統計型的分群法演算法,比較何種混合模式能提供在財務預測資料集有最準確的預測結果。本研究以四個組合的混合模式在多個破產預測資料集進行實驗,組合為Affinity propagation搭配Support vector machine;K-means搭配Logistics regression;Affinity propagation搭配Logistics regression;K-means搭配Support vector machine。經過實驗以Affinity propagation搭配Support vector machine在較多資料集有最好的預測表現,平均AUC亦是最好;Affinity propagation能幫助Logistics regression提升預測的準確率在要求較低建模時間的情況成為一種選擇。期望研究結果能幫助實際進行建立實務預測模型的參考。
This study investigates the efficacy of applying variant hybrid machine learning models to bankruptcy prediction problem. Although it is a well-known fact that the hybrid models perform well in prediction tasks, the method has some limitations in that it is an art to find an appropriate hybrid model structure. Fewer studies explore how the predictive performance of different types of mixed hybrid model collocations. This study will compare machine learning and statistical classifiers with machine learning or statistical clustering algorithms. Four combinations of mixed models were used to perform experiments on multiple bankruptcy prediction datasets. The combination was Affinity propagation with Support vector machine; K-means with Logis-tics regression; Affinity propagation with Logistics regression; K-means with Support vector machine.
The results demonstrate that the accuracy performance of Affinity propagation with Support vector machine has the best predictive performance in many datasets.
Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The Journal of Finance, 23(4), 589-609.
Altman, E. I., Haldeman, R. G., & Narayanan, P. (1977). ZETATM analysis A new model to identify bankruptcy risk of corporations. Journal of banking & finance, 1(1), 29-54.
Arora, P., & Varshney, S. (2016). Analysis of K-Means and K-Medoids algorithm for big data. Procedia Computer Science, 78, 507-512.
Beaver, W. H. (1966). Financial ratios as predictors of failure. Journal of accounting research, 71-111.
Beaver, W. H. (1968). Market prices, financial ratios, and the prediction of failure. Journal of accounting research, 179-192.
Berry, M. J., & Linoff, G. (1997). Data mining techniques: for marketing, sales, and customer support: John Wiley & Sons, Inc.
Chen, K. H., & Shimerda, T. A. (1981). An empirical analysis of useful financial ratios. Financial Management, 51-60.
Chen, W.-S., & Du, Y.-K. (2009). Using neural networks and data mining techniques for the financial distress prediction model. Expert Systems with Applications, 36(2), 4075-4086.
Chou, C.-H., Hsieh, S.-C., & Qiu, C.-J. (2017). Hybrid genetic algorithm and fuzzy clustering for bankruptcy prediction. Applied soft computing, 56, 298-316.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297. doi:10.1007/bf00994018
Dueck, D., & Frey, B. J. (2007). Non-metric affinity propagation for unsupervised image categorization. Paper presented at the Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on.
Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. science, 315(5814), 972-976.
Furey, T. S., Cristianini, N., Duffy, N., Bednarski, D. W., Schummer, M., & Haussler, D. (2000). Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16(10), 906-914.
Ghodousi, M., Alesheikh, A. A., & Saeidian, B. (2016). Analyzing public participant data to evaluate citizen satisfaction and to prioritize their needs via K-means, FCM and ICA. Cities, 55, 70-81.
Ghodselahi, A. (2011). A hybrid support vector machine ensemble model for credit scoring. International Journal of Computer Applications, 17(5), 1-5.
Grupe, F. H., & Mehdi Owrang, M. (1995). Data base mining discovering new knowledge and competitive advantage. Information System Management, 12(4), 26-31.
Hsieh, N.-C. (2005). Hybrid mining approach in the design of credit scoring models. Expert Systems with Applications, 28(4), 655-665.
Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2003). A practical guide to support vector classification.
Lenard, M. J., Madey, G. R., & Alam, P. (1998). The design and validation of a hybrid information system for the auditor’s going concern decision. Journal of Management Information Systems, 14(4), 219-237.
Lensberg, T., Eilifsen, A., & McKee, T. E. (2006). Bankruptcy theory development and classification via genetic programming. European journal of operational research, 169(2), 677-697.
Lin, W.-Y., Hu, Y.-H., & Tsai, C.-F. (2012). Machine learning in financial crisis prediction: a survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(4), 421-436.
Liu, X.-Z., & Feng, G.-C. (2008). Kernel bisecting k-means clustering for SVM training sample reduction. Paper presented at the Pattern Recognition, 2008. ICPR 2008. 19th International Conference on.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Paper presented at the Proceedings of the fifth Berkeley symposium on mathematical statistics and probability.
Mantovani, R. G., Rossi, A. L., Vanschoren, J., Bischl, B., & Carvalho, A. C. (2015). To tune or not to tune: recommending when to adjust SVM hyper-parameters via meta-learning. Paper presented at the Neural Networks (IJCNN), 2015 International Joint Conference on.
McCarty, J. A., & Hastak, M. (2007). Segmentation approaches in data-mining: A comparison of RFM, CHAID, and logistic regression. Journal of business research, 60(6), 656-662.
Min, J. H., & Lee, Y.-C. (2005). Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters. Expert Systems with Applications, 28(4), 603-614. doi:https://doi.org/10.1016/j.eswa.2004.12.008
Ohlson, J. A. (1980). Financial ratios and the probabilistic prediction of bankruptcy. Journal of accounting research, 109-131.
Shin, K.-S., Lee, T. S., & Kim, H.-j. (2005). An application of support vector machines in bankruptcy prediction model. Expert Systems with Applications, 28(1), 127-135.
Telmoudi, F., El Ghourabi, M., & Limam, M. (2011). RST–GCBR‐Clustering‐Based RGA–SVM Model for Corporate Failure Prediction. Intelligent Systems in Accounting, Finance and Management, 18(2-3), 105-120.
Tsai, C.-F. (2014). Combining cluster analysis with classifier ensembles to predict financial distress. Information Fusion, 16, 46-58.
Tsai, C.-F., & Chen, M.-L. (2010). Credit rating by hybrid machine learning techniques. Applied soft computing, 10(2), 374-380.
Tsai, C.-F., Hu, Y.-H., Hung, C.-S., & Hsu, Y.-F. (2013). A comparative study of hybrid machine learning techniques for customer lifetime value prediction. Kybernetes, 42(3), 357-370.
Tsai, C.-F., & Hung, C. (2014). Modeling credit scoring using neural network ensembles. Kybernetes, 43(7), 1114-1123.
West, D., Dellana, S., & Qian, J. (2005). Neural network ensemble strategies for financial decision applications. Computers & operations research, 32(10), 2543-2559.
Xu, X., & Wang, Y. (2009). Financial failure prediction using efficiency as a predictor. Expert Systems with Applications, 36(1), 366-373.
Yeh, C.-C., Chi, D.-J., & Hsu, M.-F. (2010). A hybrid approach of DEA, rough set and support vector machines for business failure prediction. Expert Systems with Applications, 37(2), 1535-1541.
Žalik, K. R. (2008). An efficient k′-means clustering algorithm. Pattern recognition letters, 29(9), 1385-1391.
Zhang, G., Hu, M. Y., Patuwo, B. E., & Indro, D. C. (1999). Artificial neural networks in bankruptcy prediction: General framework and cross-validation analysis. European journal of operational research, 116(1), 16-32.
Zou, G. (2004). A modified poisson regression approach to prospective studies with binary data. American journal of epidemiology, 159(7), 702-706.