跳到主要內容

簡易檢索 / 詳目顯示

研究生: 賴沂彤
Yi-Tung Lai
論文名稱: 根據定性與定量資料預測企業信用評級
Predicting Corporate Credit Ratings Based on Qualitative and Quantitative Data
指導教授: 葉英傑
口試委員:
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理研究所
Graduate Institute of Industrial Management
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 49
中文關鍵詞: 企業信用評級預測MD&A文本分析深度學習極限梯度提升法
外文關鍵詞: Corporate Credit Rating Prediction, MD&A, Deep Learning, XGBoost
相關次數: 點閱:16下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 企業信用評級是資本市場裡投資者對一家公司的重要判斷依據,公開評級結果能夠使企業的違約風險與信用狀況更加公開透明,以降低投資者與經營者之間的資訊不對稱,讓資本市場的交易更公平。

    由於傳統的企業信用評級評定過程非常耗費時間與金錢成本,為了即時且快速地因應市場變化作出更新,採用機器學習方法來預測企業信用評級成了近年常被討論的研究課題。

    本研究將文本資料(公司管理階層觀點)透過自然語言模型(NLP)處理後匯出文本向量表示,並加入檢索增強生成(RAG)的輸出結果,最後再結合傳統財務數據並匯入極限梯度提升法(XGBoost)進行企業信用評級的預測,來獲得一個兼顧準確率與可解釋力的多元分類模型。


    Corporate credit rating serves as a critical reference for investors in the capital market to evaluate a company. Publicly available rating results enhance transparency regarding a company’s default risk and credit status, reducing information asymmetry between investors
    and operators, and promoting fairer transactions in the capital market.

    Traditional corporate credit rating processes are time-consuming and costly. To respond to market changes promptly and efficiently, applying machine learning methods to predict corporate credit ratings has become a widely discussed research topic in recent years.

    This study leverages textual data (MD&A), processed through natural language models (NLP), to generate text vector representations. It incorporates the output of Retrieval-Augmented Generation (RAG) and integrates it with traditional financial data. These inputs are then fed into the Extreme Gradient Boosting (XGBoost) algorithm to predict corporate credit ratings, aiming to develop a multi-class classification model that balances accuracy and interpretability.

    摘要 i Abstract ii 目錄 iii 圖目錄 v 表目錄 vi 第一章、緒論 1 1-1 研究背景與動機 1 1-2 問題定義 2 1-3 研究目的 3 1-4 研究方法 4 1-5 研究架構 4 第二章、文獻探討 5 2-1 預測企業信用評級的相關研究 5 2-2 文本分析 8 2-3 蒸餾BERT (Distil Bidirectional Encoder Representations from Transformers, DistilBERT) 9 2-3-1 詞嵌入 (Word Embedding) 9 2-3-2 轉換器 (Transformer) 10 2-4 檢索增強生成 (Retrieval-Augmented Generation, RAG) 12 2-4-1 檢索器(Retriever)與生成器(Generator) 12 2-4-2 大型語言模型(LLM) 12 2-5 極限梯度提升法 (Extreme Gradient Boosting, XGBoost) 13 第三章、方法論 14 3-1 原始資料介紹與資料前處理 15 3-1-1 數值資料 15 3-1-2 文本資料 15 3-1-3 企業信用評級的分類 16 3-2 模型設計 17 3-2-1 蒸餾BERT (Distil Bidirectional Encoder Representations from Transformers, DistilBERT) 17 3-2-2 檢索增強生成 (Retrieval-Augmented Generation, RAG) 18 3-2-3 極限梯度提升法 (Extreme Gradient Boosting, XGBoost) 19 3-3 模型評估 21 第四章、實驗結果 22 4-1 實驗設計 22 4-1-1 資料集分割 22 4-1-2 輸入變量組合與訓練參數設計 22 4-1-3 最佳化模型配置 23 4-2 實驗分析與評估 27 4-2-1 探討不同配置組合之預測績效 27 4-2-2 探討不同配置組合在不同模型下之預測績效 30 4-2-3 極限梯度提升法之可解釋力 34 第五章、結論與未來研究方向 35 參考文獻 37 附錄一 財務指標 40 附錄二 Query 41

    [1] Alberti, C., Lee, K., & Collins, M. (2019). A BERT Baseline for the Natural
    Questions. arXiv preprint arXiv:1901.08634.
    [2] Bengio, Y., Ducharme, R., & Vincent, P. (2003). A Neural Probabilistic Language
    Model. Advances in Neural Information Processing Systems 13 (NIPS 2000),50-56.
    [3] Bochkay, K., & Levine, Carolyn B. (2019). Using MD&A to Improve Earnings
    Forecasts. Journal of Accounting, 34(3), 458–482.
    [4] Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data
    Mining,785-794.
    [5] Choi, J., Suh, Y., & Jung, N. (2020). Predicting Corporate Credit Rating Based on Qualitative Information of MD&A Transformed Using Document Vectorization Techniques.
    Data Technologies and Applications, 54(2), 151–168.
    [6] Cuconasu, F., Trappolini, G., & Siciliano, F. (2024). The Power of Noise: Redefining Retrieval for RAG Systems. Proceedings of the 47th International ACM SIGIR Conference on
    Research and Development in Information Retrieval, 719-729.
    [7] Demoulin, N. T. M., & Coussement, D. (2020). Acceptance of Text-Mining Systems: The Signaling Role of Information Quality. Information & Management, 57(1), 78-82.
    [8] Durnev, A., & Mangen, C. (2020). The Spillover Effects of MD&A Disclosures for Real Investment: The Role of Industry Competition. Journal of Accounting and Economics, 70(1),
    24-32.
    [9] Feldman, R., Govindaraj, S., Livnat, J. and Segal, B. (2010). Management’s tone change, post earnings announcement drift and accruals. Review of Accounting Studies, 15(4), 915-953.
    [10] Gao, Y., Xiong, Y., & Gao, X. (2024). Retrieval-Augmented Generation for Large
    Language Models: A Survey. arXiv preprint arXiv:2312.10997.
    [11] Golbayani, P., Florescu, I., & Chatterjee, R. (2020). A Comparative Study of Forecasting Corporate Credit Ratings Using Neural Networks, Support Vector Machines, and Decision
    Trees. The North American Journal of Economics and Finance, 54, 77-82.
    [12] Golbayani, P., Wang, D., & Florescu, I. (2020). Application of Deep Neural Networks to
    Assess Corporate Credit Rating. arXiv preprint arXiv:2003.02334.
    [13] Hajek, P. and Michalak, K. (2013).Feature selection in corporate credit rating prediction.
    KnowledgeBased Systems, 51(4), 72-84.
    [14] Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the Knowledge in a Neural Network.
    arXiv preprint arXiv:1503.02531.
    [15] Huang, Z., Chen, H., Hsu, C.J., Chen, W.H. and Wu, S. (2004). Credit rating analysis with support vector machines and neural networks: a market comparative study. Decision Support
    Systems, 37(4), 543-558.
    [16] Kim, K., & Ahn, H. (2012). A Corporate Credit Rating Model Using Multi-Class Support Vector Machines with an Ordinal Pairwise Partitioning Approach. Computers & Operations
    Research, 39(8), 1800-1811.
    [17] Lee, Y. (2007). Application of Support Vector Machines to Corporate Credit Rating
    Prediction. Expert Systems with Applications, 33(2), 67-74.
    [18] Lewis, P., & Perez, E. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems 33 (NeurIPS 2020),105-124.
    [19] Li, J., Yuan, Y., & Zhang, Z. (2024). Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-
    Bases. arXiv preprint arXiv:2403.10446.
    [20] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word
    Representations in Vector Space. arXiv preprint arXiv:1301.3781.
    [21] Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2020). DistilBERT, a Distilled Version of
    BERT: Smaller, Faster, Cheaper and Lighter. arXiv preprint arXiv:1910.01108.
    [22] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention Is All You Need. Advances in Neural Information Processing
    Systems 30 (NIPS 2017),136-142.
    [23] Wang, M., & Ku, H. (2021). Utilizing Historical Data for Corporate Credit Rating
    Assessment. Expert Systems With Applications, 165(1),49-57.
    [24] Ye, Y., Liu, S., & Li, J. (2008). A Multiclass Machine Learning Approach to Credit Rating
    Prediction. IEEE Xplore, 57-61.
    [25] Zhang, S., Xu, J., Zhang, Q.J., & Root, D. E. (2016). Parallel matrix neural network training on cluster systems for dynamic FET modeling from large datasets. IEEE Xplore,1-3.

    QR CODE
    :::