跳到主要內容

簡易檢索 / 詳目顯示

研究生: 陳莉茿
LI-JU CHEN
論文名稱: 跨領域分辨真假評論之研究-以BERT為基礎模型
Identify Deceptive Reviews in Cross-domain Content with BERT
指導教授: 許秉瑜
Ping-Yu Hsu
口試委員:
學位類別: 碩士
Master
系所名稱: 管理學院 - 企業管理學系
Department of Business Administration
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 53
中文關鍵詞: 跨領域BERT假評論虛假偵測遮蔽資訊
外文關鍵詞: cross-domain, BERT, fraud reviews, deception detection, masking information
相關次數: 點閱:16下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 線上評論在電子商務中具有重要的影響力,消費者越來越仰賴這些評論來做出購買決策,然而,不道德的企業可能散佈假評論以操縱消費者意見,而Ott et al. (2011) [19] 實驗表明,人類識別假評論的準確率僅有57.3%,且對於跨領域的真假評論分類模型,目前尚缺乏對於在不同領域間共享的文本特徵和規則之研究,由於模型過度依賴相同來源的資料,導致同個模型在其它資料集測試時,準確率急遽下降。
    因此,本研究提出基於 Bidirectional Encoder Representations from Transformers (BERT) 的模型,利用[MASK]替代評論中出現的該領域特定單詞,克服跨領域之間兩者評論風格差異性過大的問題,在我們的研究中使用來自Ott et al. (2011) [19] 和Li et al. (2014) [33] 在餐廳、旅館、醫生領域之評論,以及本研究額外加入Yelp真實評論做為訓練資料。最後,MASK-BERT於實驗結果中,與Ren & Ji (2017) [25] 為目前研究最佳之結果做比較,在Cross-domain中,F1-score最佳表現為 88.49%;而對於內容差異性較大的醫生領域,在本研究提出遮蔽機制後,Accuracy也提升了15~20%。


    Online reviews play a significant role in e-commerce. Consumer has been more relied on them when making decision in purchasing. However, unethical businesses may spread deceptive reviews to manipulate consumer`s opinion. Research by Ott et al. (2011) [19] showed that humans can only identify fraud reviews with only an accuracy of 57.3%. Besides, recent research face a crucial challenge that the cross-domain classification model is too rely on similar datasets from the same domain, which causes in a sharp decline in accuracy when testing on datasets from different domain. Currently, there is a lack of method on text features or rules to share with different domains.
    Hence, our study proposes a model based on Bidirectional Encoder Representations from Transformers (BERT). We suggest replacing domain-specific words in reviews with [MASK] to overcome the significant stylistic differences between cross-domain reviews. Our research utilizes reviews from Ott et al. (2011) [19] and Li et al. (2014) [33] in the domains of restaurants, hotels, and doctors, supplemented with Yelp reviews as real data for training. Finally, we compare the results of MASK-BERT with the state-of-the-art approach by Ren & Ji (2017) [25]. In the cross-domain, particularly in the doctor domain with larger content differences, our proposed masking mechanism leads to a highest accuracy improvement of 15-20%.

    中文摘要 i Abstract ii 誌謝 iii 目錄 iv 圖目錄 vii 表目錄 viii 第一章 緒論 1 1-1 研究背景 1 1-1-1 線上評論影響力 1 1-1-2 假評論來源 1 1-1-3 模型應用於真假評論分類 2 1-2 研究動機 4 1-2-1 假評論標註 4 1-2-2 過往研究結果 4 1-3 研究目的 5 1-4 研究架構 6 第二章 文獻探討 7 2-1 BERT應用於跨領域之真假評論分類 7 2-2 跨領域定義 Definition of Cross-domain 10 2-3 演算法應用於跨領域之真假評論分類文獻回顧 11 第三章 研究方法 15 3-1 研究流程 15 3-2 BERT 16 3-3 遮蔽機制 MASK mechanism 18 3-4 微調機制 Fine-tuning 21 3-4-1 AE-BERT (Auto-encoder based on BERT) 21 3-4-2 MASK-BERT (MASK mechanism based on BERT) 22 第四章 研究實驗 24 4-1 資料蒐集 24 4-2 資料前處理 25 4-2-1 MongoDB 25 4-2-2 特徵生成 26 4-3 超參數 28 4-4 實驗結果與分析 29 4-4-1 損失函數 30 4-4-2 In-domain 31 4-4-3 Cross-domain 33 第五章 結論與未來研究之建議 34 5-1 研究結論 34 5-2 研究限制與未來建議 35 第六章 參考文獻 36

    [1] Cao, N., Ji, S., Chiu, D.K.W., He, M. and Sun, X., (2020). A Deceptive Review Detection Framework: Combination of Coarse and Fine-grained Features. Expert Systems with Applications (2020).
    [2] Zhang, D., Li, W., Niu, B. and Wu, C., (2023). A deep learning approach for detecting fake reviewers: Exploiting reviewing behavior and textual information. Decision Support Systems, 166, 113911.
    [3] Du, C., Sun, H., Wang, J., Qi, Q. and Liao, J., (2020). Adversarial and domain-aware BERT for cross-domain sentiment analysis. Proceedings of the 58th annual meeting of the Association for Computational Linguistics (pp. 4019-4028).
    [4] Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F. and Vaughan, J. W., (2010). A theory of learning from different domains. Machine learning 79 (2010).
    [5] Salunkhe, A., (2021). Attention-based Bidirectional LSTM for Deceptive Opinion Spam Classification arXiv:2112.14789v1.
    [6] Devlin, J., Chang, M. W., Lee, K. and Toutanova, K., (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding arXiv:1810.04805v2.
    [7] Lee, K. D., Han, K., & Myaeng, S. H., (2016). Capturing word choice patterns with LDA for fake review detection in sentiment analysis. Proceedings of the 6th International Conference on Web Intelligence, Mining and Semantics (2016).
    [8] Salminen, J., Kandpal, C., Kamel, A. M., Jung, S. G. and Jansen, B. J., (2022). Creating and detecting fake reviews of online products. Journal of Retailing and Consumer Services, 64, 102771.
    [9] Hernández-Castañeda, Á., Calvo, H., Gelbukh, A. and Flores, J. J. G., (2017). Cross-domain deception detection using support vector networks. Soft Computing, 21, 585-595.
    [10] Alsubari, S. N., Deshmukh, S. N., Alqarni, A. A., Alsharif, N., Aldhyani, T. H., Alsaade, F. W. and Khalaf, O. I., (2022). Data analytics for the identification of fake reviews using supervised learning. Computers, Materials & Continua, 70(2), 3189-3204.
    [11] Cao, Z., Zhou, Y., Yang, A. and Peng, S., (2021). Deep transfer learning mechanism for fine-grained cross-domain sentiment classification. Connection Science, 33(4), 911-928.
    [12] Cagnina, L. C. and Rosso, P., (2017). Detecting deceptive opinions: intra and cross-domain classification using an efficient representation. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 25(Suppl. 2), 151-174.
    [13] Qu, Z., Jia, Q., Lyu, C., Liu, J., Liu, X. and Zheng, K., (2022). Detecting Fake Reviews with Generative Adversarial Networks for Mobile Social Networks. Security and Communication Networks, 2022.
    [14] Alsubari, S. N., Deshmukh, S. N., Al-Adhaileh, M. H., Alsaade, F. W. and Aldhyani, T. H., (2021). Development of integrated neural network model for identification of fake reviews in E-commerce using multidomain datasets. Applied Bionics and Biomechanics, (2021).
    [15] Wei, C. S., Hsu, P. Y., Huang, C. W., Cheng, M. S. and Prassida, G. F., (2020). Devising a Cross-Domain Model to Detect Fake Review Comments. Advances in Computational Collective Intelligence: 12th International Conference, ICCCI 2020, Da Nang, Vietnam, November 30–December 3, 2020, Proceedings 12 (pp. 714-725). Springer International Publishing.
    [16] Wu, Y., Ngai, E. W., Wu, P. and Wu, C., (2020). Fake online reviews: Literature review, synthesis, and directions for future research. Decision Support Systems, 132, 113280.
    [17] Jia, S., Zhang, X., Wang, X. and Liu, Y., (2018). Fake reviews detection based on LDA. 2018 4th International Conference on Information Management (ICIM) (pp. 280-283). Ieee.
    [18] Lin, T. Y., Goyal, P., Girshick, R., He, K. and Dollár, P., (2017). Focal loss for dense object detection. Proceedings of the IEEE international conference on computer vision (pp. 2980-2988).
    [19] Ott, M., Choi, Y., Cardie, C. and Hancock, J. T., (2011). Finding deceptive opinion spam by any stretch of the imagination. arXiv preprint arXiv:1107.4557.
    [20] Wang, Z., Gu, S. and Xu, X., (2018). GSLDA: LDA-based group spamming detection in product reviews. Applied Intelligence, 48, 3094-3107.
    [21] Li, Z., Wei, Y., Zhang, Y. and Yang, Q., (2018). Hierarchical attention transfer network for cross-domain sentiment classification. Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 32, No. 1).
    [22] Gupta, P., Gandhi, S. and Chakravarthi, B. R., (2021). Leveraging transfer learning techniques-bert, roberta, albert and distilbert for fake review detection. Forum for Information Retrieval Evaluation (pp. 75-82).
    [23] Sánchez-Junquera, J., Villaseñor-Pineda, L., Montes-y-Gómez, M., Rosso, P. and Stamatatos, E., (2020). Masking domain-specific information for cross-domain deception detection. Pattern Recognition Letters, 135, 122-130.
    [24] Dos Santos, B. N., Marcacini, R. M. and Rezende, S. O., (2021). Multi-domain aspect extraction using bidirectional encoder representations from transformers. IEEE Access, 9, 91604-91613.
    [25] Ren, Y. and Ji, D., (2017). Neural networks for deceptive opinion spam detection: An empirical study. Information Sciences, 385, 213-224.
    [26] Loper, E. and Bird, S., (2002). Nltk: The natural language toolkit. arXiv preprint cs/0205028.
    [27] Redko, I., Habrard, A. and Sebban, M., (2019). On the analysis of adaptability in multi-source domain adaptation. Machine Learning, 108(8-9), 1635-1652.
    [28] Luca, M., (2016). Reviews, reputation, and revenue: The case of Yelp. com. Harvard Business School NOM Unit Working Paper, (12-016).
    [29] Floh, A., Koller, M. and Zauner, A., (2013). Taking a deeper look at online reviews: The asymmetric effect of valence intensity on shopping behaviour. Journal of Marketing Management, 29(5-6), 646-670.
    [30] Hasanat, M. W., Hoque, A., Shikha, F. A., Anwar, M., Hamid, A. B. A. and Tat, H. H., (2020). The impact of coronavirus (COVID-19) on e-business in Malaysia. Asian Journal of Multidisciplinary Studies, 3(1), 85-90.
    [31] Li, J., Cardie, C. and Li, S., (2013). Topicspam: a topic-model based approach for spam detection. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 217-221).
    [32] Klaus, T. and Changchit, C., (2019). Toward an understanding of consumer attitudes on online review usage. Journal of Computer Information Systems, 59(3), 277-286.
    [33] Li, J., Ott, M., Cardie, C. and Hovy, E., (2014). Towards a general rule for identifying deceptive opinion spam. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1566-1576).
    [34] Fellbaum, C., (2010). WordNet. Theory and applications of ontology: computer applications (pp. 231-243). Dordrecht: Springer Netherlands.

    QR CODE
    :::