自建風控模型在降低成本和提高收益方面的應用研究

簡易檢索 / 詳目顯示

回結果列表

研究生：	蕭琮寶 Chung-Pao Hsiao
論文名稱：	自建風控模型在降低成本和提高收益方面的應用研究 Application Study of Self-built Risk Control Models in Cost Reduction and Revenue Enhancement
指導教授：	梁德容 Deron Liang
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 資訊工程學系在職專班 Executive Master of Computer Science & Information Engineering
論文出版年：	2024
畢業學年度：	112
語文別：	中文
論文頁數：	48
中文關鍵詞：	風控評分卡、機器學習、模型解釋性、成本控制、收益率
外文關鍵詞：	Risk Scoring System, Machine Learning, Model Interpretability, Cost Control, Profitability
相關次數：	點閱：16 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本研究旨在探討自建風控模型在降低成本和提高收益方面的應用。當前許多
公司依賴外部風控商進行風險評估，這導致了高成本和模型不透明等問題。本研究
提出了一種基於堆疊技術的自建風控模型，旨在利用內部數據建立準確且高效的
風控評分卡模型，以取代外部供應商並提高整體收益。
本論文的目標是提出一個風險控制模型，使用 Stacking 技術結合多種基底模
型（如邏輯迴歸、決策樹、XGBoost、LightGBM）達成目標並引入 LIME（Local
Interpretable Model-agnostic Explanations）方法來提高模型解釋性。首先，收集公司
內部的貸款資料，並從中提取出用戶提交的相關信息，再利用模型輸出用戶違約機
率映射評分卡分數來調整貸款額度。
實驗結果顯示，自建風控模型在降低違約率和提升收益率方面表現優異，並且
相比外部風控模型有效降低了風控成本，提升了模型透明度和評估結果的精確性。
基於內部數據進行的風控模型在應對多變的市場需求和保障數據安全方面具有顯
著優勢。

This study aims to explore the application of self-built risk control models to reduce costs
and increase revenue. Currently, many companies rely on external providers for risk
assessment, leading to high costs and opaque models. This study proposes a self-built risk
control model based on stacking technology, aiming to use internal data to establish an
accurate and efficient risk scoring model to replace external providers and improve
overall revenue.
The goal of this thesis is to propose a risk control model that uses stacking technology
combined with multiple base models (such as logistic regression, decision trees, XGBoost,
and LightGBM) to achieve this goal. First, the company's internal loan data is collected,
and user-submitted loan information is extracted. Then, the model output probability is
mapped to a scoring card, and the method is gradually adjusted and optimized.
Experimental results show that the self-built risk control model performs excellently in
reducing default rates and improving return rates. Compared to external risk control
models, it effectively reduces risk control costs, improves model transparency, and
enhances the accuracy of evaluation results. Risk control models based on internal data
have significant advantages in responding to changing market demands and ensuring data
security.

中文摘要........................................................................................................................... i
ABSTRACT .................................................................................................................... iii
目錄................................................................................................................................. iv
圖目錄............................................................................................................................ vii
表目錄........................................................................................................................... viii
第一章 緒論............................................................................................................1
1 研究動機與目的 ............................................................................................2
2 研究目標 ........................................................................................................3
3 論文架構 ........................................................................................................4
第二章 文獻探討....................................................................................................5
1 風險控制模型 ................................................................................................5
2 風險評估技術的現狀 ....................................................................................7
3 機器學習模型 ................................................................................................8
3.1 邏輯迴歸模型 (Logistic Regression) ..................................................9
3.2 隨機森林.............................................................................................10
3.3 XGBOOST..........................................................................................12
3.4 LIGHTBGM........................................................................................13
3.5 LIME...................................................................................................14
第三章 解決方案..................................................................................................16
1 引言 ..............................................................................................................16
2 系統架構設計 ..............................................................................................17
v
3 數據收集與預處理 ......................................................................................18
3.1 數據來源.............................................................................................18
3.2 數據預處理.........................................................................................20
4 模型選擇與訓練 ..........................................................................................21
4.1 邏輯回歸 (Logistic Regression)模型訓練 ........................................21
5 風控評分卡設計 ..........................................................................................24
5.1 FICO 評分轉換...................................................................................24
5.2 評分卡生成.........................................................................................24
第四章 實驗設計與結果 .....................................................................................25
1 衡量指標 ......................................................................................................25
2 實驗一：模型性能評估 ..............................................................................26
2.1 實驗流程.............................................................................................26
2.2 實驗結果 (Results) ............................................................................26
3 實驗二：模型可解釋性評估 ......................................................................28
3.1 實驗流程.............................................................................................28
3.2 實驗結果.............................................................................................28
4 實驗三：內部與外部風控違約率比較 ......................................................30
4.1 實驗流程.............................................................................................30
4.2 實驗結果.............................................................................................30
5 實驗四：內部與外部風控報酬率比較 ......................................................32
5.1 實驗流程 (Experimental Procedure) .................................................32
5.2 實驗結果.............................................................................................32
vi
第五章 結論與未來展望 .....................................................................................35
1 結論 ..............................................................................................................35
2 未來展望 ......................................................................................................36
參考文獻.........................................................................................................................37

                                

[1] X. Zhu, et al., "Explainable prediction of loan default based on machine learning
models," Data Science and Management, vol. 6, no. 3, pp. 123-133, 2023.
[2] C.-Y. J. Peng, K. L. Lee, and G. M. Ingersoll, "An introduction to logistic regression
analysis and reporting," The Journal of Educational Research, vol. 96, no. 1, pp. 3-
14, 2002.
[3] J. R. Quinlan, "Induction of decision trees," Machine Learning, vol. 1, no. 1, pp. 81-
106, 1986.
[4] T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System," in
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, 2016, pp. 785-794.
[5] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu,
"LightGBM: A highly efficient gradient boosting decision tree," in Advances in
Neural Information Processing Systems 30 (NIPS 2017), 2017, pp. 3146-3154.
[6] M. T. Ribeiro, S. Singh, and C. Guestrin, "Why Should I Trust You?": Explaining
the Predictions of Any Classifier," in Proceedings of the 22nd ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, 2016, pp.
1135-1144.
[7] T. Fawcett, "An introduction to ROC analysis," Pattern Recognition Letters, vol. 27,
no. 8, pp. 861-874, June 2006.
[8] A. Alagic, N. Zivic, E. Kadusic, D. Hamzic, N. Hadzajlic, M. Dizdarevic, and E.
Selmanovic, "Machine Learning for an Enhanced Credit Risk Analysis: A
Comparative Study of Loan Approval Prediction Models Integrating Mental Health
Data," Machine Learning and Knowledge Extraction, vol. 6, no. 1, pp. 53-77, 2024.
[9] L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5-32, Oct.
2001.
[10] "Paytm Credit Score," Paytm, 2024. [Online]. Available: https://creditscore.lending.paytm.com/. [Accessed: July 22, 2024].
[11] J. Kittler, "Statistical Pattern Recognition: The State of the Art," IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 38-62, Jan. 2000.
[12] A. L. Samuel, "Some Studies in Machine Learning Using the Game of Checkers,"
IBM Journal of Research and Development, vol. 3, no. 3, pp. 210-229, July 1959.
38
[13] V. Verdhan, "Introduction to Supervised Learning," in Supervised Learning with
Python, Berkeley, CA: Apress, 2020, pp. 1-28.
[14] H. Li, "Introduction to Unsupervised Learning," in Machine Learning Methods,
Singapore: Springer, 2024, pp. 345-367.
[15] D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, and C. A. Raffel,
"MixMatch: A Holistic Approach to Semi-Supervised Learning," in Advances in
Neural Information Processing Systems 32 (NeurIPS 2019), pp. 5049-5059.
[16] T. Szandała, "Review and Comparison of Commonly Used Activation Functions for
Deep Neural Networks," arXiv preprint arXiv:2010.09458, 2020.
[17] L. Breiman, "Bagging predictors," Machine Learning, vol. 24, no. 2, pp. 123-140,
Aug. 1996.
[18] L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5-32, Oct.
2001.
[19] J. H. Friedman, "Greedy Function Approximation: A Gradient Boosting Machine,"
Annals of Statistics, vol. 29, no. 5, pp. 1189-1232, Oct. 2001.
[20] L. Li, H. Xiong, H. Wang, Y. Rao, L. Liu, Z. Chen, and J. Huan, "DELTA: DEep
Learning Transfer using Feature Map with Attention for Convolutional Networks,"
arXiv preprint arXiv:1901.09229, 2019.
[21] D. Ge, J. Gu, S. Chang, and J. Cai, "Credit Card Fraud Detection using LightGBM
Model," in Proceedings of the 2020 International Conference on E-commerce and
Internet Technology (ECIT), 2020, pp. 215-220.
[22] V. Taghian, S. H. Hassan, and M. K. Akbari, "H3O-LGBM: Hybrid Harris Hawk
Optimization-Based Light Gradient Boosting Machine Model for Real-Time
Trading," Artificial Intelligence Review, vol. 54, no. 4, pp. 2563-2582, 2022.
[23] P. Pokhrel, E. Ioup, M. Hoque, M. Abdelguerfi, and J. Simeonov, "A LightGBM
based Forecasting of Dominant Wave Periods in Oceanic Waters," arXiv preprint
arXiv:2105.08721, 2021.
[24] J. Bergstra and Y. Bengio, "Random Search for Hyper-Parameter Optimization,"
Journal of Machine Learning Research, vol. 13, pp. 281-305, 2012.
[25] C. Cortes, M. Mohri, and A. Rostamizadeh, "L2 Regularization for Learning
Kernels," in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial
Intelligence (UAI 2009), 2009, pp. 109-116.
[26] S.-A. N. Alexandropoulos, C. K. Aridas, S. B. Kotsiantis, and M. N. Vrahatis,
39
"Stacking strong ensembles of classifiers," in Artificial Intelligence Applications and
Innovations, J. MacIntyre, I. Maglogiannis, L. Iliadis, and E. Pimenidis, Eds. Cham:
Springer International Publishing, 2019, pp. 545-556.

簡易檢索 / 詳目顯示

相關論文