| 研究生: |
江昀澄 Yun-Cheng Chiang |
|---|---|
| 論文名稱: |
股票特徵交互效果於報酬率預測的重要性 : 基於 SHAP-IQ框架的實證分析 The Importance of Stock Feature Interactions in Return Prediction: An Empirical Analysis Based on the SHAP- IQ Framework |
| 指導教授: |
邱信瑜
Hsin-Yu Chiu |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理學系 Department of Information Management |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 中文 |
| 論文頁數: | 37 |
| 中文關鍵詞: | SHAP-IQ 、特徵交互作用 、模型可解釋性 、XGBoost 、時變分析 |
| 外文關鍵詞: | SHAP-IQ, Feature Interactions, Model Interpretability, XGBoost, Temporal Analysis |
| 相關次數: | 點閱:67 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著金融市場結構日益複雜化,機器學習技術在股票報酬率預測中的應用愈 加普及,但模型的「黑盒子」特性使決策過程難以理解。傳統的特徵重要性分析 多聚焦於單一特徵的邊際效應,忽視了特徵間交互作用對預測準確性的潛在影響。 市場中各種因子並非獨立運作,而是存在複雜的相互關係,且這些關係會隨市場 環境變化而改變。因,發展能夠捕捉特徵交互效應並解釋其時變特性的預測方 法,對提升模型實用性具有重要意義。
本研究旨在透過 SHAP-IQ 框架深入探究股票報酬預測中的特徵交互效應, 我們以預測誤差為分析目標而非模型輸出,以提供理解模型失效原因的新視角。 建立更為全面且可解釋的預測模型。研究採用台灣股市 1991 年至 2023 年共 33 年的資料,包含價格資訊、財務數據、技術指標及市場變數等 110 個特徵。運用 XGBoost 模型結合滑動窗口技術,以前三年資料預測下一年股票報酬率,並透過 SHAP-IQ 方法量化特徵交互對預測誤差的貢獻程度,以探討導致預測不穩定 的關鍵特徵組合。研究建立的分析框架不僅擴展了量化金融理論的知識邊界,為 投資決策者提供更可靠的模型診斷工具。
As financial markets become increasingly complex, the use of machine learning techniques in forecasting stock returns has grown in popularity. However, the inherent "black-box" nature of such models often hinders interpretability, limiting their practical application in financial decision-making. Conventional feature importance analyses primarily emphasize the marginal effects of individual variables, while often overlooking the potential influence of interactions among features on predictive accuracy. In reality, financial and market factors rarely operate in isolation; instead, they exhibit intricate interdependencies that may shift over time with changes in the market environment. Accordingly, developing approaches that can account for feature interactions and their temporal dynamics is essential for enhancing model robustness and explanatory power.
This study adopts an interaction-based analytical framework to examine the effects of feature interactions on stock return prediction. By focusing on prediction errors rather than model outputs, we aim to offer insights into the underlying causes of model underperformance. The empirical analysis is based on data from the Taiwan stock market, spanning the years 1991 to 2023. The dataset includes 110 variables, covering price information, financial indicators, technical metrics, and market-level features. We implement an XGBoost model combined with a rolling-window approach, using three years of historical data to predict returns in the subsequent year. Feature interaction effects on prediction error are evaluated through a decomposition method based on SHAP (SHapley Additive exPlanations), which allows for quantifying the contribution of specific interactions to model performance. The proposed framework contributes to the literature by providing a systematic approach for model diagnostics and enhancing the interpretability of machine learning forecasts in financial contexts.
Adesina, M. T., S. D. Esebre, A. T. Adewuyi, M. Yussuf, O. A. Adigun, T. D. Olajide, C. I. Michael and D. ILOH (2024), "Algorithmic trading and machine learning: Advanced techniques for market prediction and strategy development," World Journal of Advanced Research and Reviews, Vol. 23, No 2, 979–990.
Amihud, Y. (2002), "Illiquidity and stock returns: cross-section and time-series effects ," Journal of Financial Markets, Vol. 5, 31-56.
Arrieta, A. B., N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. Garcia, S. Gil-Lopez, D. Molina, R. Benjamins, R. Chatila and F. Herrera (2019), "Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI," Information Fusion, Vol. 58, 82-115.
Arshad, S., S. Latif, A. Salman and S. Irfan (2023), "Increasing Protability and Confidence by using Interpretable Model for Investment Decisions,"
Asness, C. S., T. J. Moskowitz and L. H. Pedersen (2013), "Value and Momentum Everywhere," The Journal of Finance, Vol. 68, No. 3.
Bali, T. G., N. Cakici and R. F. Whitelaw (2011), "Maxing out: Stocks as lotteries and the cross-section of expected returns," Journal of Financial Economics, 427-446.
Barro, R. J. (2006), " Rare Disasters and Asset Markets in the Twentieth Century," The Quarterly Journal of Economics , Vol.121, no. 3: 823-866.
" : Lower Risk without Lower
Return,"
Boehmer, E., K. Fong and J. Wu (2021), "Algorithmic Trading and Market Quality: International Evidence," Journal of Financial and Quantitative Analysis , Vol. 56, 2659-2688.
Bury, T. (2012), "Statistical pairwise interaction model of stock market," The European Physical Journal B, Vol. 86.
Chen, T. and C. Guestrin (2016), "XGBoost: A Scalable Tree Boosting System," ACM Transactions on Knowledge Discovery from Data (TKDD), 785 - 794.
"
"
Chiu, H. Y., S. A. Huang and M. H. Chiang (2018), "A Liquidity-based Betting-against- beta Strategy," Review of Securities and Futures Markets, Vol. 30, No. 3, 41-74.
Chong, T. T. L., W. K. Ng and V. K. S. Liew (2014), "Revisiting the Performance of MACD and RSI Oscillators," Journal of Risk and Financial Management, Vol. 7, 1- 12.
Blitz, D. C. and P. van Vliet (2007).
The Volatility Effect
Journal of Portfolio Management 34, 102–13.
Chen, Y. and Hao, Y. (2017),
A Feature Weighted Support Vector Machine and K-
Nearest Neighbor Algorithm for Stock Market Indices Prediction,
Expert Systems
with Applications, 80, 340-355.
Chordia, T., A. Subrahmanyam and V. R. Anshuman (2001), "Trading activity and expected stock returns," Journal of Financial Economics, Vol. 59, 3-32.
Dhamdhere, K., A. Agarwal and M. Sundararajan (2020), "The Shapley Taylor Interaction Index," International Conference on Machine Learning, No. 858, 9259 - 9268.
Fama, E. F. and K. R. French (1993), "Common risk factors in the returns on stocks and bonds," Journal of Financial Economics, Vol. 33, 3-56.
Fumagalli, F., M. Muschalik, P. Kolpaczki, E. Hüllermeier and B. Hammer (2023), "SHAP-IQ: Unified Approximation of any-order Shapley Interactions," Conference on Neural Information Processing Systems, No. 508, 11515-11551.
Futagami, K., Y. Fukazawa, N. Kapoor and T. Kito (2021), "Pairwise acquisition prediction with SHAP value interpretation," The Journal of Finance and Data Science, Vol. 7, 22-44.
Gabaix, X. (2012), " Variable rare disasters: An exactly solved framework for ten puzzles in macro-finance," The Quarterly Journal of Economics , Vol. 127, 645–700.
Gould, J., J. W. Yang, R. Singh and B. Yeo (2023), "The seasonality of lottery-like stock returns, " International Review of Economics & Finance, Vol 83, 383-400.
Grabisch, M. and M. Roubens (1999), "An Axiomatic Approach to the Concept of Interaction among Players in Cooperative Games," International Journal of Game Theory, Vol. 28, 547-565.
Gu, S., B. Kelly and D. Xiu (2020), "Empirical Asset Pricing via Machine Learning," The Review of Financial Studies, Vol. 33, No. 5.
Han, Y., G. Zhou and Y. Zhu (2016), "A Trend Factor: Any Economic Gains from Using Information over Investment Horizons?," Journal of Financial Economics, Vol. 122, 352-375.
Harvey, C. R., Y. Liu and H. Zhu (2016), ". . . and the Cross-Section of Expected Returns ," The Review of Financial Studies, Vol. 29, 5-68.
Hong, H., T. Lim, J. C. Stein (2000), "Bad News Travels Slowly: Size, Analyst Coverage, and the Profitability of Momentum Strategies," The Journal of Finance, Vol. 55, No. 1.
Hsieh, D. A. (1990), "Chaos and Nonlinear Dynamics: Application to Financial Markets," The Journal of Finance, Vol. 46, No. 5, 1839-1877.
Kumbure, M. M., C. Lohrmann, P. Luukka and J. Porras (2022), "Machine learning techniques and data for stock market forecasting: A literature review," Expert Systems With Applications, Vol. 197.
Lee, C. M. C. and B. Swaminathan (2000), "Price Momentum and Trading Volume," The Journal of Finance, Vol. 55, No. 5, 2017-2069.
Li, Z. (2022), "Extracting spatial effects from machine learning model using local interpretation method: An example of SHAP and XGBoost," Computers, Environment and Urban Systems, Vol. 96.
Lipton, Z. C. (2017), "The Mythos of Model Interpretability," Communications of the ACM, Vol. 61, 36-43.
Liu, B., H. Wang, J. Yu and S. Zhao (2020), "Time-Varying Demand for Lottery: Speculation Ahead of Earnings Announcements, " Journal of Financial Economics, Vol. 138, 789-817.
Lundberg, S. M. and S. I. Lee (2017), "A Unified Approach to Interpreting Model Predictions," International Conference on Neural Information Processing Systems, 4768 – 4777.
Miller, E. M. (1977), "Risk, Uncertainty, and Divergence of Opinion," The Journal of Finance, Vol. 32, No. 4.
Molnar, C., G. Casalicchio and B. Bischl (2020), " "
Interpretable Machine Learning – A
Communications in Computer and
Brief History, State-of-the-Art and Challenges.
Information Science, Vol 1323.
Orsini, N., A. Moore and A. Wolk (2022), "Interaction Analysis Based on Shapley Values and Extreme Gradient Boosting: A Realistic Simulation and Application to a Large Epidemiological Prospective Study," Nutritional Epidemiology, Vol. 9.
Piotroski, J. D. (2000), "Value Investing: The Use of Historical Financial Statement Information to Separate Winners from Losers," Journal of Accounting Research, Vol. 38, 1-41.
Ribeiro, M. T., S. Singh and C. Guestrin (2016), "Why Should I Trust You? Explaining the Predictions of Any Classifier," International Conference on Knowledge Discovery and Data Mining, 1135-1144.
Rudin, C. (2019), "Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead," Nature Machine Intelligence, Vol. 1.
Samuels, J. I. (2024), "Understanding the Dynamics of Financial Markets: A Comprehensive Analysis,"
Selvin, S., V. Ravi, E. A. Gopalakrishnan and V. K. Menon (2017), "Stock price prediction using LSTM, RNN and CNN-sliding window model," International Conference on Advances in Computing, Communications and Informatics, 1643-1647.
Shah, J., D. Vaidya and M. Shah (2022), "A comprehensive review on multiple hybrid deep learning approaches for stock prediction," Intelligent Systems with Applications, Vol. 16.
Shapley, L. S. (1953), "A value for n-person games: Games,"
Taleb, N.N (2007), "The Black Swan The Impact of the Highly Improbable,"
Tsai, C. P., C. K. Yeh and P. Ravikumar (2023), "Faith-Shap: The Faithful Shapley Interaction Index," The Journal of Machine Learning Research, Vol. 24, No. 94, 4326- 4367.
Contributions to the Theory of
Princeton University Press, Princeton, 307-317.
Vuong, P. H., T. T. Dat, T. K. Mai, P. H. Uyen and P. T. Bao (2022), "Stock-price forecasting based on XGBoost and LSTM," Computer Systems Science and Engineering, Vol. 40, No. 1.
Wang, Y. and Guo Y. (2020), "Forecasting Method of Stock Market Volatility in Time Series Data Based on Mixed Model of ARIMA and XGBoost," China Communications, Vol. 17, 205-221.
Yang, Y., Y. Wu, P. Wang and Xujiali (2021), "Stock Price Prediction Based on XGBoost and LighGBM," International Conference on Economic Innovation and Low- carbon Development, Vol. 275.
Zhang, Y. (2022), "Stock Price Prediction Method Based on XGboost Algorithm," Proceedings of the 2022 International Conference on Bigdata Blockchain and Economy Management, 595-603.