跳到主要內容

簡易檢索 / 詳目顯示

研究生: 陳弘偉
Hong-Wei Chen
論文名稱: 以分層注意力網路建構財務報表欺詐檢測模型
Constructing a Financial Statement Fraud Detection Model Using Hierarchical Attention Networks
指導教授: 葉英傑
Ying-Chieh Yeh
口試委員:
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理研究所
Graduate Institute of Industrial Management
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 47
中文關鍵詞: 財務報表欺詐深度學習文本分析情緒分析年度報表
外文關鍵詞: financial statement fraud, deep learning, text analysis, sentiment analysis, annual reports
相關次數: 點閱:7下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 財務報表欺詐是一種白領犯罪會造成嚴重後果,包括投資者和債權人的財務損失、公司聲譽的損害以及個人和公司法律和監管後果。檢測欺詐的傳統方法非常耗時,並且需要大量的人工操作。本研究提出一種財務報表欺詐檢測系統之架構,其中包含四個方向的考量分別是採用分層注意力網路模型(HAN)從年度報告中管理討論與分析(MD&A)提取文本特徵以獲取管理階層對公司營運看法、利用Bi-LSTM及相似度分析提取MD&A的時間變化量以及利用該公司的財務報表理解該公司的營運狀況。與過去典型黑盒子的類神經網路不同,本研究利用深度模型的預測能力及注意力模型結合以得到擁有可解釋的模型,以及利用貝氏類神經網路(BNN)量化該模型的不確定性,還有藉由HAN提高提取文本特徵的準確度及效率。本研究通過提高預測分析中的準確度、效率以及將可解釋性和不確定性加入模型中為文獻做出貢獻,並為監管機構提供一種藉由檢查大量公開文本及資料以監控並預測財務報表欺詐的方法。


    Financial statement fraud is a type of white-collar crime that can have serious consequences, including financial losses for investors and creditors, damage to company reputation, and legal and regulatory consequences for individuals and companies. Traditional methods for detecting fraud are time-consuming and require extensive manual operations. This study proposes an architecture for a financial statement fraud detection system that incorporates four directions of consideration. These include using a Hierarchical Attention Network (HAN) model to extract textual features from Management's Discussion and Analysis (MD&A) in annual reports to obtain management's perspectives on company operations, utilizing Bi-LSTM and similarity analysis to extract temporal changes in MD&A, and using the company's financial statements to understand its operational status. Unlike typical black-box neural networks used in the past, this research utilizes the predictive ability of deep models and combines attention models to obtain an interpretable model. It also quantifies the model's uncertainty using Bayesian neural networks (BNN), and enhances the accuracy and efficiency of extracting textual features by leveraging HAN. This study contributes to the literature by improving accuracy, efficiency, interpretability, and incorporating uncertainty in predictive analytics, and provides a method for regulatory agencies to monitor and predict financial statement fraud by examining a large volume of public text and data.

    摘要 i Abstract ii 目錄 iii 圖目錄 v 表目錄 vi 第一章 緒論 1 1.1 研究背景與動機 1 1.2 研究挑戰 2 1.3 研究目的 2 1.4 研究方法 2 第二章 文獻回顧 3 2.1 財務報表欺詐檢測相關研究 3 2.2 文本分析 5 2.3 分層注意力網路(Hierarchical attention network, HAN) 5 2.3.1詞嵌入(Word Embedding) 6 2.3.2 雙向長短記憶(Bi-directional Long Short-Term Memory, Bi-LSTM) 8 2.3.3 注意力機制(Attention) 8 2.4 貝氏類神經網路(Bayesian neural network, BNN) 9 第三章 方法論 11 3.1 分層注意力網路模型(Hierarchical attention network, HAN) 12 3.1.1 FinBERT embedding 12 3.1.2 特徵值選取 13 3.1.4 雙向長短記憶(Bi-directional Long Short-Term Memory, Bi-LSTM) 14 3.1.5 句子注意力機制(Sentence Attention) 17 3.2 MD&A時間變化量 17 3.3 貝氏類神經網路(Bayesian neural network, BNN) 17 3.4評估指標 18 第四章 實驗結果 19 4.1 數據及前處理 19 4.1.1 數值資料 19 4.1.2 文本資料 20 4.1.3 標籤資料 21 4.1.4 數據集 21 4.2 實驗設置 22 4.3 分類結果 22 4.3.1 財務數據(FIN) 23 4.3.2 文本數據(TXT) 24 4.3.3 文本數據及財務數據(TXT+FIN) 25 4.4可解釋性 26 4.4.1 單詞級別 27 4.4.2 句子級別 28 第五章 結論 31 附錄1 財務變量 32 參考文獻 35

    [1] Abbasi, A., Albrecht, C., Vance, A., & Hansen, J. (2012). Metafraud: a meta-learning framework for detecting financial fraud. Mis Quarterly, 1293-1327.
    [2] Alberti, C., Andor, D., Pitler, E., Devlin, J., & Collins, M. (2019). Synthetic QA corpora generation with roundtrip consistency. arXiv preprint arXiv:1906.05416.
    [3] Bao, Y., Ke, B., Li, B., Yu, Y. J., & Zhang, J. (2020). Detecting accounting fraud in publicly traded US firms using a machine learning approach. Journal of Accounting Research, 58(1), 199-235.
    [4] Beneish, M. D. (1999). The detection of earnings manipulation. Financial Analysts Journal, 55(5), 24-36.
    [5] Cohen, L., Malloy, C., & Nguyen, Q. (2020). Lazy prices. The Journal of Finance, 75(3), 1371-1415.
    [6] Cornegruta, S., Bakewell, R., Withey, S., & Montana, G. (2016). Modelling radiological language with bidirectional long short-term memory networks. arXiv preprint arXiv:1609.08409.
    [7] Craja, P., Kim, A., & Lessmann, S. (2020). Deep learning for detecting financial statement fraud. Decision Support Systems, 139, 113421.
    [8] Dechow, P. M., Ge, W., Larson, C. R., & Sloan, R. G. (2011). Predicting material accounting misstatements. Contemporary Accounting Research, 28(1), 17-82.
    [9] Dogan, A., & Birant, D. (2021). Machine learning and data mining in manufacturing. Expert Systems with Applications, 166, 114060.
    [10] Dong, W., Liao, S., & Liang, L. (2016). Financial statement fraud detection using text mining: A systemic functional linguistics theory perspective.
    [11] Fanning, K. M., & Cogger, K. O. (1998). Neural network detection of management fraud using published financial data. Intelligent Systems in Accounting, Finance & Management, 7(1), 21-41.
    [12] Gaganis, C. (2009). Classification techniques for the identification of falsified financial statements: a comparative analysis. Intelligent Systems in Accounting, Finance & Management: International Journal, 16(3), 207-229.
    [13] Goel, S., Gangolly, J., Faerman, S. R., & Uzuner, O. (2010). Can linguistic predictors detect fraudulent financial filings? Journal of Emerging Technologies in Accounting, 7(1), 25-46.
    [14] Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural networks, 18(5-6), 602-610.
    [15] Hajek, P., & Henriques, R. (2017). Mining corporate annual reports for intelligent detection of financial statement fraud–A comparative study of machine learning methods. Knowledge-Based Systems, 128, 139-152.
    [16] Hamal, S., & Senvar, Ö. (2021). Comparing performances and effectiveness of machine learning classifiers in detecting financial accounting fraud for Turkish SMEs. Int. J. Comput. Intell. Syst., 14(1), 769-782.
    [17] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
    [18] Huang, A. H., Wang, H., & Yang, Y. (2023). FinBERT: A large language model for extracting information from financial text. Contemporary Accounting Research, 40(2), 806-841.
    [19] Humpherys, S. L., Moffitt, K. C., Burns, M. B., Burgoon, J. K., & Felix, W. F. (2011). Identification of fraudulent financial statements using linguistic credibility analysis. Decision Support Systems, 50(3), 585-594.
    [20] Izzalqurny, T. R., Subroto, B., & Ghofar, A. (2019). Relationship between financial ratio and financial statement fraud risk moderated by auditor quality. International Journal of Research in Business and Social Science (2147-4478), 8(4), 34-43.
    [21] conference proceddings :Jain, A., Patel, H., Nagalapatti, L., Gupta, N., Mehta, S., Guttula, S., Mujumdar, S., Afzal, S., Sharma Mittal, R., & Munigala, V. (2020). Overview and importance of data quality for machine learning tasks. Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, 3561-3562.
    [22] Jang, B., Kim, M., Harerimana, G., Kang, S.-u., & Kim, J. W. (2020). Bi-LSTM model to increase accuracy in text classification: Combining Word2vec CNN and attention mechanism. Applied Sciences, 10(17), 5841.
    [23] Karpoff, J. M., Koester, A., Lee, D. S., & Martin, G. S. (2014). Database challenges in financial misconduct research. Georgetown McDonough School of Business Research Paper(2012–15).
    [24] Khan, S., Fazil, M., Sejwal, V. K., Alshara, M. A., Alotaibi, R. M., Kamal, A., & Baig, A. R. (2022). BiCHAT: BiLSTM with deep CNN and hierarchical attention for hate speech detection. Journal of King Saud University-Computer and Information Sciences, 34(7), 4335-4344.
    [25] Lebret, R. P. (2016). Word embeddings for natural language processing. EPFL.
    [26] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
    [27] Li, C., Jan, N. M., & Huang, B. (2018). Data analytics for oil sands subcool prediction—a comparative study of machine learning algorithms. IFAC-PapersOnLine, 51(18), 886-891.
    [28] Li, F. (2010). Textual analysis of corporate disclosures: A survey of the literature. Journal of accounting literature, 29, 143.
    [29] Li, Y., Zhu, Z., Kong, D., Han, H., & Zhao, Y. (2019). EA-LSTM: Evolutionary attention-based LSTM for time series prediction. Knowledge-Based Systems, 181, 104785.
    [30] conference proceddings :Liu, S., Tao, H., & Feng, S. (2019). Text classification research based on bert model and bayesian network. 2019 Chinese Automation Congress (CAC), 5842-5846.
    [31] Loughran, T., & McDonald, B. (2016). Textual analysis in accounting and finance: A survey. Journal of Accounting Research, 54(4), 1187-1230.
    [32] conference proceddings :Ma, J., Gao, W., Joty, S., & Wong, K.-F. (2019). Sentence-level evidence embedding for claim verification with hierarchical attention networks.
    [33] Purda, L., & Skillicorn, D. (2015). Accounting variables, deception, and a bag of words: Assessing the tools of fraud detection. Contemporary Accounting Research, 32(3), 1193-1223.
    [34] conference proceddings :Rawte, V., Gupta, A., & Zaki, M. J. (2020). A comparative analysis of temporal long text similarity: Application to financial documents. Workshop on Mining Data for Financial Applications, 77-91.
    [35] conference proceddings :Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). " Why should i trust you?" Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 1135-1144.
    [36] Schilit, H. M., & Perler, J. (2010). Financial Shenanigans Third Edition. In: McGraw-Hill.
    [37] Shridhar, K., Laumann, F., & Liwicki, M. (2019). A comprehensive guide to bayesian convolutional neural network with variational inference. arXiv preprint arXiv:1901.02731.
    [38] conference proceddings :Wallach, H. M. (2006). Topic modeling: beyond bag-of-words. Proceedings of the 23rd international conference on Machine learning, 977-984.
    [39] West, J., & Bhattacharya, M. (2016). Intelligent financial fraud detection: a comprehensive review. Computers & security, 57, 47-66.
    [40] conference proceddings :Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). Hierarchical attention networks for document classification. Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, 1480-1489.
    [41] Yin, W., Kann, K., Yu, M., & Schütze, H. (2017). Comparative study of CNN and RNN for natural language processing. arXiv preprint arXiv:1702.01923.
    [42] Zhang, X., Chen, F., & Huang, R. (2018). A combination of RNN and CNN for attention-based relation classification. Procedia computer science, 131, 911-917.
    [43] Zhou, W., & Kapoor, G. (2011). Detecting evolutionary financial statement fraud. Decision Support Systems, 50(3), 570-575.

    QR CODE
    :::