跨領域分辨真假評論之研究－以BERT為基礎模型｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳莉茿 LI-JU CHEN
論文名稱：	跨領域分辨真假評論之研究－以BERT為基礎模型 Identify Deceptive Reviews in Cross-domain Content with BERT
指導教授：	許秉瑜 Ping-Yu Hsu
口試委員:
學位類別：	碩士 Master
系所名稱：	管理學院 - 企業管理學系 Department of Business Administration
論文出版年：	2023
畢業學年度：	111
語文別：	中文
論文頁數：	53
中文關鍵詞：	跨領域、BERT 、假評論、虛假偵測、遮蔽資訊
外文關鍵詞：	cross-domain, BERT, fraud reviews, deception detection, masking information
相關次數：	點閱：16 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

線上評論在電子商務中具有重要的影響力，消費者越來越仰賴這些評論來做出購買決策，然而，不道德的企業可能散佈假評論以操縱消費者意見，而Ott et al. (2011) [19] 實驗表明，人類識別假評論的準確率僅有57.3%，且對於跨領域的真假評論分類模型，目前尚缺乏對於在不同領域間共享的文本特徵和規則之研究，由於模型過度依賴相同來源的資料，導致同個模型在其它資料集測試時，準確率急遽下降。
因此，本研究提出基於 Bidirectional Encoder Representations from Transformers (BERT) 的模型，利用[MASK]替代評論中出現的該領域特定單詞，克服跨領域之間兩者評論風格差異性過大的問題，在我們的研究中使用來自Ott et al. (2011) [19] 和Li et al. (2014) [33] 在餐廳、旅館、醫生領域之評論，以及本研究額外加入Yelp真實評論做為訓練資料。最後，MASK-BERT於實驗結果中，與Ren & Ji (2017) [25] 為目前研究最佳之結果做比較，在Cross-domain中，F1-score最佳表現為 88.49%；而對於內容差異性較大的醫生領域，在本研究提出遮蔽機制後，Accuracy也提升了15~20%。

Online reviews play a significant role in e-commerce. Consumer has been more relied on them when making decision in purchasing. However, unethical businesses may spread deceptive reviews to manipulate consumer`s opinion. Research by Ott et al. (2011) [19] showed that humans can only identify fraud reviews with only an accuracy of 57.3%. Besides, recent research face a crucial challenge that the cross-domain classification model is too rely on similar datasets from the same domain, which causes in a sharp decline in accuracy when testing on datasets from different domain. Currently, there is a lack of method on text features or rules to share with different domains.
Hence, our study proposes a model based on Bidirectional Encoder Representations from Transformers (BERT). We suggest replacing domain-specific words in reviews with [MASK] to overcome the significant stylistic differences between cross-domain reviews. Our research utilizes reviews from Ott et al. (2011) [19] and Li et al. (2014) [33] in the domains of restaurants, hotels, and doctors, supplemented with Yelp reviews as real data for training. Finally, we compare the results of MASK-BERT with the state-of-the-art approach by Ren & Ji (2017) [25]. In the cross-domain, particularly in the doctor domain with larger content differences, our proposed masking mechanism leads to a highest accuracy improvement of 15-20%.

中文摘要  i
Abstract    ii
誌謝    iii
目錄    iv
圖目錄    vii
表目錄    viii
第一章 緒論    1
1-1 研究背景    1
1-1-1 線上評論影響力    1
1-1-2 假評論來源    1
1-1-3 模型應用於真假評論分類    2
1-2 研究動機    4
1-2-1 假評論標註    4
1-2-2 過往研究結果    4
1-3 研究目的    5
1-4 研究架構    6
第二章 文獻探討    7
2-1 BERT應用於跨領域之真假評論分類    7
2-2 跨領域定義 Definition of Cross-domain    10
2-3 演算法應用於跨領域之真假評論分類文獻回顧    11
第三章 研究方法    15
3-1 研究流程    15
3-2 BERT    16
3-3 遮蔽機制 MASK mechanism    18
3-4 微調機制 Fine-tuning    21
3-4-1 AE-BERT (Auto-encoder based on BERT)    21
3-4-2 MASK-BERT (MASK mechanism based on BERT)    22
第四章 研究實驗    24
4-1 資料蒐集    24
4-2 資料前處理    25
4-2-1 MongoDB    25
4-2-2 特徵生成    26
4-3 超參數    28
4-4 實驗結果與分析    29
4-4-1 損失函數    30
4-4-2 In-domain    31
4-4-3 Cross-domain    33
第五章 結論與未來研究之建議    34
5-1 研究結論    34
5-2 研究限制與未來建議    35
第六章 參考文獻    36
                                

[1] Cao, N., Ji, S., Chiu, D.K.W., He, M. and Sun, X., (2020). A Deceptive Review Detection Framework: Combination of Coarse and Fine-grained Features. Expert Systems with Applications (2020).
[2] Zhang, D., Li, W., Niu, B. and Wu, C., (2023). A deep learning approach for detecting fake reviewers: Exploiting reviewing behavior and textual information. Decision Support Systems, 166, 113911.
[3] Du, C., Sun, H., Wang, J., Qi, Q. and Liao, J., (2020). Adversarial and domain-aware BERT for cross-domain sentiment analysis. Proceedings of the 58th annual meeting of the Association for Computational Linguistics (pp. 4019-4028).
[4] Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F. and Vaughan, J. W., (2010). A theory of learning from different domains. Machine learning 79 (2010).
[5] Salunkhe, A., (2021). Attention-based Bidirectional LSTM for Deceptive Opinion Spam Classification arXiv:2112.14789v1.
[6] Devlin, J., Chang, M. W., Lee, K. and Toutanova, K., (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding arXiv:1810.04805v2.
[7] Lee, K. D., Han, K., & Myaeng, S. H., (2016). Capturing word choice patterns with LDA for fake review detection in sentiment analysis. Proceedings of the 6th International Conference on Web Intelligence, Mining and Semantics (2016).
[8] Salminen, J., Kandpal, C., Kamel, A. M., Jung, S. G. and Jansen, B. J., (2022). Creating and detecting fake reviews of online products. Journal of Retailing and Consumer Services, 64, 102771.
[9] Hernández-Castañeda, Á., Calvo, H., Gelbukh, A. and Flores, J. J. G., (2017). Cross-domain deception detection using support vector networks. Soft Computing, 21, 585-595.
[10] Alsubari, S. N., Deshmukh, S. N., Alqarni, A. A., Alsharif, N., Aldhyani, T. H., Alsaade, F. W. and Khalaf, O. I., (2022). Data analytics for the identification of fake reviews using supervised learning. Computers, Materials & Continua, 70(2), 3189-3204.
[11] Cao, Z., Zhou, Y., Yang, A. and Peng, S., (2021). Deep transfer learning mechanism for fine-grained cross-domain sentiment classification. Connection Science, 33(4), 911-928.
[12] Cagnina, L. C. and Rosso, P., (2017). Detecting deceptive opinions: intra and cross-domain classification using an efficient representation. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 25(Suppl. 2), 151-174.
[13] Qu, Z., Jia, Q., Lyu, C., Liu, J., Liu, X. and Zheng, K., (2022). Detecting Fake Reviews with Generative Adversarial Networks for Mobile Social Networks. Security and Communication Networks, 2022.
[14] Alsubari, S. N., Deshmukh, S. N., Al-Adhaileh, M. H., Alsaade, F. W. and Aldhyani, T. H., (2021). Development of integrated neural network model for identification of fake reviews in E-commerce using multidomain datasets. Applied Bionics and Biomechanics, (2021).
[15] Wei, C. S., Hsu, P. Y., Huang, C. W., Cheng, M. S. and Prassida, G. F., (2020). Devising a Cross-Domain Model to Detect Fake Review Comments. Advances in Computational Collective Intelligence: 12th International Conference, ICCCI 2020, Da Nang, Vietnam, November 30–December 3, 2020, Proceedings 12 (pp. 714-725). Springer International Publishing.
[16] Wu, Y., Ngai, E. W., Wu, P. and Wu, C., (2020). Fake online reviews: Literature review, synthesis, and directions for future research. Decision Support Systems, 132, 113280.
[17] Jia, S., Zhang, X., Wang, X. and Liu, Y., (2018). Fake reviews detection based on LDA. 2018 4th International Conference on Information Management (ICIM) (pp. 280-283). Ieee.
[18] Lin, T. Y., Goyal, P., Girshick, R., He, K. and Dollár, P., (2017). Focal loss for dense object detection. Proceedings of the IEEE international conference on computer vision (pp. 2980-2988).
[19] Ott, M., Choi, Y., Cardie, C. and Hancock, J. T., (2011). Finding deceptive opinion spam by any stretch of the imagination. arXiv preprint arXiv:1107.4557.
[20] Wang, Z., Gu, S. and Xu, X., (2018). GSLDA: LDA-based group spamming detection in product reviews. Applied Intelligence, 48, 3094-3107.
[21] Li, Z., Wei, Y., Zhang, Y. and Yang, Q., (2018). Hierarchical attention transfer network for cross-domain sentiment classification. Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 32, No. 1).
[22] Gupta, P., Gandhi, S. and Chakravarthi, B. R., (2021). Leveraging transfer learning techniques-bert, roberta, albert and distilbert for fake review detection. Forum for Information Retrieval Evaluation (pp. 75-82).
[23] Sánchez-Junquera, J., Villaseñor-Pineda, L., Montes-y-Gómez, M., Rosso, P. and Stamatatos, E., (2020). Masking domain-specific information for cross-domain deception detection. Pattern Recognition Letters, 135, 122-130.
[24] Dos Santos, B. N., Marcacini, R. M. and Rezende, S. O., (2021). Multi-domain aspect extraction using bidirectional encoder representations from transformers. IEEE Access, 9, 91604-91613.
[25] Ren, Y. and Ji, D., (2017). Neural networks for deceptive opinion spam detection: An empirical study. Information Sciences, 385, 213-224.
[26] Loper, E. and Bird, S., (2002). Nltk: The natural language toolkit. arXiv preprint cs/0205028.
[27] Redko, I., Habrard, A. and Sebban, M., (2019). On the analysis of adaptability in multi-source domain adaptation. Machine Learning, 108(8-9), 1635-1652.
[28] Luca, M., (2016). Reviews, reputation, and revenue: The case of Yelp. com. Harvard Business School NOM Unit Working Paper, (12-016).
[29] Floh, A., Koller, M. and Zauner, A., (2013). Taking a deeper look at online reviews: The asymmetric effect of valence intensity on shopping behaviour. Journal of Marketing Management, 29(5-6), 646-670.
[30] Hasanat, M. W., Hoque, A., Shikha, F. A., Anwar, M., Hamid, A. B. A. and Tat, H. H., (2020). The impact of coronavirus (COVID-19) on e-business in Malaysia. Asian Journal of Multidisciplinary Studies, 3(1), 85-90.
[31] Li, J., Cardie, C. and Li, S., (2013). Topicspam: a topic-model based approach for spam detection. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 217-221).
[32] Klaus, T. and Changchit, C., (2019). Toward an understanding of consumer attitudes on online review usage. Journal of Computer Information Systems, 59(3), 277-286.
[33] Li, J., Ott, M., Cardie, C. and Hovy, E., (2014). Towards a general rule for identifying deceptive opinion spam. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1566-1576).
[34] Fellbaum, C., (2010). WordNet. Theory and applications of ontology: computer applications (pp. 231-243). Dordrecht: Springer Netherlands.

簡易檢索 / 詳目顯示

相關論文