結合多來源文本與自注意力機制之多模態假評論偵測模型

簡易檢索 / 詳目顯示

回結果列表

研究生：	余宥辰 You-Chen Yu
論文名稱：	結合多來源文本與自注意力機制之多模態假評論偵測模型 A Multimodal Fake Review Detection Model Integrating with Multi- Source Textual Data and Self-Attention Mechanism
指導教授：	曾富祥 Fu-Shiang Tseng
口試委員:
學位類別：	碩士 Master
系所名稱：	管理學院 - 工業管理研究所 Graduate Institute of Industrial Management
論文出版年：	2025
畢業學年度：	113
語文別：	英文
論文頁數：	61
中文關鍵詞：	假評論偵測、多來源文本資料、多模態模型、自注意力機制、模型解釋性
外文關鍵詞：	Fake Review Detection, Multi-Source Textual Data, Multimodal Model, Self-Attention Mechanism, Model Interpretability
相關次數：	點閱：25 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著電子商務與社群媒體的普及，越來越多消費者在購物時仰賴網路評論作為參考依據。在實際購買產品或服務前，許多人會先閱讀他人分享的經驗，以減少資訊落差與降低錯誤決策的風險。然而，虛假評論問題日益嚴重，不僅誤導消費者的判斷，也進一步破壞市場的公平性與平台的可信度。特別是在推薦系統與生成式人工智慧技術迅速進步的情況下，假評論的數量、傳播速度與擬真程度都有顯著提升，使得傳統的偵測方法越來越難以應對這些新型態的挑戰。過去的研究多聚焦於語言特徵的分析，對評論者的行為模式與模型的可解釋性則關注較少，導致其在實務應用上的彈性與拓展性受限。近年來，研究逐漸朝向整合多重特徵與深度學習技術的方向發展，然而，由於深度學習模型本身缺乏透明性，使得管理者與實務使用者在信任模型預測結果方面仍存疑慮，進而影響其在實務上的採用意願與可行性。
基於此，本研究致力於整合來自不同平台的多來源文本與結構化資料，並結合自注意力機制，提出一個具備高泛化能力與解釋性的多模態假評論偵測模型，以更有效地因應目前多樣且複雜的虛假評論問題。本研究資料來源包括 Yelp、Amazon，以及使用 ChatGPT 所生成的虛假評論，以模擬生成式 AI 帶來的挑戰與風險。整體研究分為三個階段：第一階段針對文字特徵與各類機器學習模型進行測試，選出最佳基礎模型組合；第二階段則將評論者行為特徵與商品屬性等結構化資料納入，建構多模態模型，以強化其整體偵測效能與跨平台適應能力；第三階段進一步導入自注意力機制，強化模型對關鍵特徵的辨識與預測解釋能力，並驗證其在實務應用中所展現之潛在價值與貢獻。

With the widespread adoption of e-commerce and social media, online reviews have
become an important reference for consumer purchasing decisions. However, the emergence
and growing severity of fake reviews not only mislead consumers but also undermine market
fairness and platform trust. In particular, with the rapid development of recommendation
systems and generative artificial intelligence, the volume, spread, and realism of fake reviews
have greatly increased, making traditional detection methods increasingly inadequate. The
literature indicates that early detection methods mainly focused on linguistic features, with
limited attention to reviewer behavior and model interpretability. Recent studies have shifted
towards multi-feature integration and deep learning. However, the lack of interpretability
inherent in deep learning models poses challenges for managerial trust, thereby reducing their
practical adoption in real-world settings. In response, this study aims to develop a multimodal
fake review detection model with high generalizability and interpretability by integrating
review data from various platforms and multiple data types and incorporating a self-attention
mechanism. The data sources include Yelp, Amazon, and synthetic fake reviews generated by
ChatGPT to balance the dataset. The experiment is structured in three phases: first, evaluating
various machine learning algorithms using textual features to establish a performance
benchmark; second, incorporating reviewer behavioral data and product-related attributes to
construct a multimodal framework aimed at boosting detection accuracy and cross-domain
generalization; and third, implementing a self-attention mechanism to strengthen the model’s
focus on critical features and enhance interpretability.

摘要    i
Abstract    ii
Table of Contents    iii
List of Figure    v
List of Table    vi
   Introduction    1
1.    Research Background    1
2.    Research Motivation    2
3.    Research Objectives    4
   Literature Review    7
1.    Definition of Fake Reviews and Their Impact on Consumers    7
2.    Dataset Generation Using ChatGPT    9
3.    Review and Comparison of Fake Review Detection Methods    11
3.1.    Machine Learning Methods    11
3.2.    Deep Learning Methods    14
4.    Model Interpretability and the Self-Attention Mechanism    16
   Research Method    19
1.    Data Sources    19
2.    Data Preprocessing    20
3.    Feature Extraction    21
4.    Modeling Approaches    22
4.1.    Machine Learning Methods    23
4.2.    Deep Learning Methods    24
5.    Evaluation Metrics    25
6.    Experimental Framework    27
6.1.    Experiment 1: Optimal Machine Learning Model for Multi-Source Textual Data    28
6.2.    Experiment 2: Multimodal Model with Unstructured and Structured Data Integration    29
6.3.    Experiment 3: Building an Explainable Multimodal Model with Self-Attention    31
   Experimental Results and Analysis    33
1.    Experiment 1: Optimal Machine Learning Model for Multi-Source Textual Data    33
2.    Experiment 2: Multimodal Model with Textual and Structured Data Integration    36
3.    Experiment 3: Building an Explainable Multimodal Model with Self-Attention    40
   Conclusion and Future Work    46
1.    Conclusion    46
2.    Future Research    47
Reference    49


                                

[1] Abu Soud, S., Suhweil, Y., Bader, A., Shahin, D., & Alhijawi, B. (2023). Detecting ChatGPT generated fake reviews using supervised machine learning. Unpublished manuscript. Retrieved from https://www.researchgate.net/
[2] Al-Adhaileh, M. H., & Alsaade, F. W. (2022). Detecting and analysing fake opinions using artificial intelligence algorithms. Intelligent Automation and Soft Computing, 32(1).
[3] Ashraf, S. A., Javed, A. F., Bellary, S., Bala, P. K., & Panigrahi, P. K. (2024). Leveraging stacking framework for fake review detection in the hospitality sector. Journal of Theoretical and Applied Electronic Commerce Research, 19(2), 1517–1558.
[4] Axios. (2022, July 22). Fake online reviews lead shoppers to overpay, new study says. Retrieved from https://www.axios.com/
[5] Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
[6] Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794).
[7] Eftimov, D. (2023). The influence of online reviews on consumer behaviors and purchasing decisions: A narrative review. Available at SSRN 4770343.
[8] Ge, Y., Zhao, S., Zhou, H., Pei, C., Sun, F., Ou, W., & Zhang, Y. (2020). Understanding echo chambers in e-commerce recommender systems. Preprint retrieved from https://arxiv.org/abs/2007.02474
[9] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
[10] Jain, S., & Wallace, B. C. (2019). Attention is not explanation. Preprint retrieved from https://arxiv.org/abs/1902.10186
[11] Jiang, R., Chiappa, S., Lattimore, T., György, A., & Kohli, P. (2019). Degenerate feedback loops in recommender systems. Preprint retrieved from https://arxiv.org/abs/1902.10730
[12] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems (Vol. 30, pp. 3146–3154).
[13] Kochanek, M., Kazienko, P., Kocoń, J., Cichecki, I., Kaszyca, O., & Szydło, D. (2023). Can innovative prompt engineering with ChatGPT address imbalances in machine learning datasets? Preprint retrieved from https://www.authorea.com/
[14] Kumar, P., Javeed, D., Islam, A. N., & Luo, X. R. (2025). DeepSecure: A computational design science approach for interpretable threat hunting in cybersecurity decision making. Decision Support Systems, 188, 114351.
[15] Lu, J., Zhan, X., Liu, G., Zhan, X., & Deng, X. (2023). BSTC: A fake review detection model based on a pre-trained language model and convolutional neural network. Electronics, 12(10), 2165.
[16] Mukherjee, A., Venkataraman, V., Liu, B., & Glance, N. (2013). Fake review detection: Classification and analysis of real and pseudo reviews. UIC-CS-03-2013 Technical Report.
[17] Ott, M., Choi, Y., Cardie, C., & Hancock, J. T. (2011). Finding deceptive opinion spam by any stretch of the imagination. Preprint retrieved from https://arxiv.org/abs/1107.4557
[18] Ubani, S., Polat, S. O., & Nielsen, R. (2023). ZeroShotDataAug: Generating and augmenting training data with ChatGPT. Preprint retrieved from http://arxiv.org/abs/2304.14334
[19] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
[20] Veyseh, A. P. B., Van Nguyen, M., Min, B., & Nguyen, T. H. (2021). Augmenting open-domain event detection with synthetic data from GPT-2. In Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2021, Proceedings, Part III, 644–660. Springer.
[21] Wiegreffe, S., & Pinter, Y. (2019). Attention is not not explanation. Preprint retrieved from https://arxiv.org/abs/1908.04626
[22] Wu, S., Wingate, N., Wang, Z., & Liu, Q. (2019). The influence of fake reviews on consumer perceptions of risks and purchase intentions. Journal of Marketing Development and Competitiveness, 13(3).
[23] Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1480–1489.
[24] Zhang, D., Li, W., Niu, B., & Wu, C. (2023). A deep learning approach for detecting fake reviewers: Exploiting reviewing behavior and textual information. Decision Support Systems, 166, 113911.
[25] Zhang, D., Zhou, L., Kehoe, J. L., & Kilic, I. Y. (2016). What online reviewer behaviors really matter? Effects of verbal and nonverbal behaviors on detection of fake online reviews. Journal of Management Information Systems, 33(2), 456–481.

簡易檢索 / 詳目顯示

相關論文