跳到主要內容

簡易檢索 / 詳目顯示

研究生: 余宥辰
You-Chen Yu
論文名稱: 結合多來源文本與自注意力機制之多模態假評論偵測模型
A Multimodal Fake Review Detection Model Integrating with Multi- Source Textual Data and Self-Attention Mechanism
指導教授: 曾富祥
Fu-Shiang Tseng
口試委員:
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理研究所
Graduate Institute of Industrial Management
論文出版年: 2025
畢業學年度: 113
語文別: 英文
論文頁數: 61
中文關鍵詞: 假評論偵測多來源文本資料多模態模型自注意力機制模型解釋性
外文關鍵詞: Fake Review Detection, Multi-Source Textual Data, Multimodal Model, Self-Attention Mechanism, Model Interpretability
相關次數: 點閱:25下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著電子商務與社群媒體的普及,越來越多消費者在購物時仰賴網路評論作為參考依據。在實際購買產品或服務前,許多人會先閱讀他人分享的經驗,以減少資訊落差與降低錯誤決策的風險。然而,虛假評論問題日益嚴重,不僅誤導消費者的判斷,也進一步破壞市場的公平性與平台的可信度。特別是在推薦系統與生成式人工智慧技術迅速進步的情況下,假評論的數量、傳播速度與擬真程度都有顯著提升,使得傳統的偵測方法越來越難以應對這些新型態的挑戰。過去的研究多聚焦於語言特徵的分析,對評論者的行為模式與模型的可解釋性則關注較少,導致其在實務應用上的彈性與拓展性受限。近年來,研究逐漸朝向整合多重特徵與深度學習技術的方向發展,然而,由於深度學習模型本身缺乏透明性,使得管理者與實務使用者在信任模型預測結果方面仍存疑慮,進而影響其在實務上的採用意願與可行性。
    基於此,本研究致力於整合來自不同平台的多來源文本與結構化資料,並結合自注意力機制,提出一個具備高泛化能力與解釋性的多模態假評論偵測模型,以更有效地因應目前多樣且複雜的虛假評論問題。本研究資料來源包括 Yelp、Amazon,以及使用 ChatGPT 所生成的虛假評論,以模擬生成式 AI 帶來的挑戰與風險。整體研究分為三個階段:第一階段針對文字特徵與各類機器學習模型進行測試,選出最佳基礎模型組合;第二階段則將評論者行為特徵與商品屬性等結構化資料納入,建構多模態模型,以強化其整體偵測效能與跨平台適應能力;第三階段進一步導入自注意力機制,強化模型對關鍵特徵的辨識與預測解釋能力,並驗證其在實務應用中所展現之潛在價值與貢獻。


    With the widespread adoption of e-commerce and social media, online reviews have
    become an important reference for consumer purchasing decisions. However, the emergence
    and growing severity of fake reviews not only mislead consumers but also undermine market
    fairness and platform trust. In particular, with the rapid development of recommendation
    systems and generative artificial intelligence, the volume, spread, and realism of fake reviews
    have greatly increased, making traditional detection methods increasingly inadequate. The
    literature indicates that early detection methods mainly focused on linguistic features, with
    limited attention to reviewer behavior and model interpretability. Recent studies have shifted
    towards multi-feature integration and deep learning. However, the lack of interpretability
    inherent in deep learning models poses challenges for managerial trust, thereby reducing their
    practical adoption in real-world settings. In response, this study aims to develop a multimodal
    fake review detection model with high generalizability and interpretability by integrating
    review data from various platforms and multiple data types and incorporating a self-attention
    mechanism. The data sources include Yelp, Amazon, and synthetic fake reviews generated by
    ChatGPT to balance the dataset. The experiment is structured in three phases: first, evaluating
    various machine learning algorithms using textual features to establish a performance
    benchmark; second, incorporating reviewer behavioral data and product-related attributes to
    construct a multimodal framework aimed at boosting detection accuracy and cross-domain
    generalization; and third, implementing a self-attention mechanism to strengthen the model’s
    focus on critical features and enhance interpretability.

    摘要 i Abstract ii Table of Contents iii List of Figure v List of Table vi 1. Introduction 1 1.1. Research Background 1 1.2. Research Motivation 2 1.3. Research Objectives 4 2. Literature Review 7 2.1. Definition of Fake Reviews and Their Impact on Consumers 7 2.2. Dataset Generation Using ChatGPT 9 2.3. Review and Comparison of Fake Review Detection Methods 11 2.3.1. Machine Learning Methods 11 2.3.2. Deep Learning Methods 14 2.4. Model Interpretability and the Self-Attention Mechanism 16 3. Research Method 19 3.1. Data Sources 19 3.2. Data Preprocessing 20 3.3. Feature Extraction 21 3.4. Modeling Approaches 22 3.4.1. Machine Learning Methods 23 3.4.2. Deep Learning Methods 24 3.5. Evaluation Metrics 25 3.6. Experimental Framework 27 3.6.1. Experiment 1: Optimal Machine Learning Model for Multi-Source Textual Data 28 3.6.2. Experiment 2: Multimodal Model with Unstructured and Structured Data Integration 29 3.6.3. Experiment 3: Building an Explainable Multimodal Model with Self-Attention 31 4. Experimental Results and Analysis 33 4.1. Experiment 1: Optimal Machine Learning Model for Multi-Source Textual Data 33 4.2. Experiment 2: Multimodal Model with Textual and Structured Data Integration 36 4.3. Experiment 3: Building an Explainable Multimodal Model with Self-Attention 40 5. Conclusion and Future Work 46 5.1. Conclusion 46 5.2. Future Research 47 Reference 49

    [1] Abu Soud, S., Suhweil, Y., Bader, A., Shahin, D., & Alhijawi, B. (2023). Detecting ChatGPT generated fake reviews using supervised machine learning. Unpublished manuscript. Retrieved from https://www.researchgate.net/
    [2] Al-Adhaileh, M. H., & Alsaade, F. W. (2022). Detecting and analysing fake opinions using artificial intelligence algorithms. Intelligent Automation and Soft Computing, 32(1).
    [3] Ashraf, S. A., Javed, A. F., Bellary, S., Bala, P. K., & Panigrahi, P. K. (2024). Leveraging stacking framework for fake review detection in the hospitality sector. Journal of Theoretical and Applied Electronic Commerce Research, 19(2), 1517–1558.
    [4] Axios. (2022, July 22). Fake online reviews lead shoppers to overpay, new study says. Retrieved from https://www.axios.com/
    [5] Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
    [6] Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794).
    [7] Eftimov, D. (2023). The influence of online reviews on consumer behaviors and purchasing decisions: A narrative review. Available at SSRN 4770343.
    [8] Ge, Y., Zhao, S., Zhou, H., Pei, C., Sun, F., Ou, W., & Zhang, Y. (2020). Understanding echo chambers in e-commerce recommender systems. Preprint retrieved from https://arxiv.org/abs/2007.02474
    [9] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
    [10] Jain, S., & Wallace, B. C. (2019). Attention is not explanation. Preprint retrieved from https://arxiv.org/abs/1902.10186
    [11] Jiang, R., Chiappa, S., Lattimore, T., György, A., & Kohli, P. (2019). Degenerate feedback loops in recommender systems. Preprint retrieved from https://arxiv.org/abs/1902.10730
    [12] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems (Vol. 30, pp. 3146–3154).
    [13] Kochanek, M., Kazienko, P., Kocoń, J., Cichecki, I., Kaszyca, O., & Szydło, D. (2023). Can innovative prompt engineering with ChatGPT address imbalances in machine learning datasets? Preprint retrieved from https://www.authorea.com/
    [14] Kumar, P., Javeed, D., Islam, A. N., & Luo, X. R. (2025). DeepSecure: A computational design science approach for interpretable threat hunting in cybersecurity decision making. Decision Support Systems, 188, 114351.
    [15] Lu, J., Zhan, X., Liu, G., Zhan, X., & Deng, X. (2023). BSTC: A fake review detection model based on a pre-trained language model and convolutional neural network. Electronics, 12(10), 2165.
    [16] Mukherjee, A., Venkataraman, V., Liu, B., & Glance, N. (2013). Fake review detection: Classification and analysis of real and pseudo reviews. UIC-CS-03-2013 Technical Report.
    [17] Ott, M., Choi, Y., Cardie, C., & Hancock, J. T. (2011). Finding deceptive opinion spam by any stretch of the imagination. Preprint retrieved from https://arxiv.org/abs/1107.4557
    [18] Ubani, S., Polat, S. O., & Nielsen, R. (2023). ZeroShotDataAug: Generating and augmenting training data with ChatGPT. Preprint retrieved from http://arxiv.org/abs/2304.14334
    [19] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
    [20] Veyseh, A. P. B., Van Nguyen, M., Min, B., & Nguyen, T. H. (2021). Augmenting open-domain event detection with synthetic data from GPT-2. In Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2021, Proceedings, Part III, 644–660. Springer.
    [21] Wiegreffe, S., & Pinter, Y. (2019). Attention is not not explanation. Preprint retrieved from https://arxiv.org/abs/1908.04626
    [22] Wu, S., Wingate, N., Wang, Z., & Liu, Q. (2019). The influence of fake reviews on consumer perceptions of risks and purchase intentions. Journal of Marketing Development and Competitiveness, 13(3).
    [23] Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1480–1489.
    [24] Zhang, D., Li, W., Niu, B., & Wu, C. (2023). A deep learning approach for detecting fake reviewers: Exploiting reviewing behavior and textual information. Decision Support Systems, 166, 113911.
    [25] Zhang, D., Zhou, L., Kehoe, J. L., & Kilic, I. Y. (2016). What online reviewer behaviors really matter? Effects of verbal and nonverbal behaviors on detection of fake online reviews. Journal of Management Information Systems, 33(2), 456–481.

    QR CODE
    :::