跳到主要內容

簡易檢索 / 詳目顯示

研究生: 葉詠心
Ip Weng Sam
論文名稱: 針對特定領域任務—基於常識的BERT模型之應用
The Application of Common-Sense Knowledge-based BERT on Domain-Specific Tasks
指導教授: 周惠文
Huey-Wen Chou
柯士文
Shih-Wen Ke
口試委員:
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理學系
Department of Information Management
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 105
中文關鍵詞: 文本分類知識圖譜情感分析主題分類
外文關鍵詞: text classification, knowledge graph, sentiment analysis, topic categorization
相關次數: 點閱:24下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在今天競爭激烈的商業環境中,組織可以從通過文本分類進行主題分析中獲益良多。雖然有多種方法可供選擇,但BERT是自然語言處理中最有效的技術之一。BERT通常被用作特定領域的分類模型,但因模型通常沒有超出其訓練數據的知識,例如像人類一般的對事情之常識及事物之間的關聯性認知,因此限制了它與人類智能的相似度。
    為了解決這個限制,本研究討探了把BERT與另一個有價值的工具—知識圖譜相結合,以擴展分類模型的能力。通過融入知識圖谱,BERT模型可以像人類一樣獲得一般知識,提升其分類能力。BERT和知識圖谱的結合有潛力顯著提升組織從大量文本數據中提取有價值的洞察力的能力。經過實驗測試,本研究發現BERT模型在加入了不同種類的知識圖譜後,對於不同的分類任務帶來的成效不一。另外,本研究亦發現加入知識圖譜的BERT模型會面臨著不同的挑戰:如訓練模型的複雜度提高、長短文本應用上的挑戰、及確保句子與知識表示模型—知識三元組之關聯性。


    In today's highly competitive business environment, organizations can benefit greatly from subject analysis through text classification. While there are several methods available, BERT is one of the most effective techniques for natural language processing. However, BERT is typically used as a domain-specific classification model and may not possess knowledge beyond its training data, limiting its similarity to human intelligence.
    To address this limitation, this research is exploring the combination of BERT with another valuable instrument, the knowledge graph. By incorporating the knowledge graph, the BERT model can acquire general knowledge as humans do, enhancing its classification capabilities.
    The study found that the BERT model has different performances on classification tasks after adding various types of knowledge graphs. In addition, the study also found that the model will face different challenges: such as the increase in the complexity of the training model, challenges in the application of long and short texts, and ensuring the relevance of sentences and the knowledge representation models—knowledge triples.

    中文摘要 i Abstract ii 致謝辭 iii Table of Contents iv Figures vi Tables viii 1. INTRODUCTION 1 1.1 Background 1 1.2 Motivation 2 1.3 Objective 3 1.4 Structure 4 2. LITERATURE REVIEW 5 2.1 Text Classification 5 2.1.1 Text Preprocessing 5 2.1.2 Feature Extraction 7 2.2 Method of Text Classification 8 2.2.1 The Machine Learning Approaches 8 2.2.2 Lexicon-based Approach 9 2.3 Knowledge Graph 10 2.3.1 Knowledge Graph Representation 10 2.3.2 Knowledge Graph Reasoning Methods 12 2.3.3 The Integration of the Knowledge Graphs 13 2.4 Works of the Combination of Knowledge Graph and Text Classification 17 2.4.1 Previous Works of the Combination 17 2.4.2 The K-BERT Model 18 3. RESEARCH METHOD 20 3.1 Overview 20 3.2 Knowledge Graph Layer 21 3.3 The Knowledge Layer 23 3.3.1 Embedding Layer 23 3.3.2 Seeing Layer 24 3.4 Mask-Transformer Encoder 25 4. EXPERIMENT 27 4.1 Datasets 28 4.1.1 Sentiment-Related 28 4.1.2 Topic-Related 31 4.1.3 Question-Related 32 4.2 Knowledge Graph 33 4.2.1 The Common-Sense Knowledge Graph (CSKG) 33 4.2.2 Each Knowledge Graphs Used in the CSKG 33 4.3 Baseline 34 4.4 Parameters 35 5. RESULTS 36 5.1 BERT-CSKG 36 5.2 BERT-AT 37 5.3 BERT-CN 39 5.4 BERT-FN 39 5.5 BERT-RG 40 5.6 BERT-VG 41 5.7 BERT-WD 41 5.8 BERT-WN 43 5.9 Inspiration of the Experiment 44 5.9.1 Important Findings of the Experiment 44 5.9.2 Answers to the Research Questions 45 6. CONCLUSION AND FUTURE PERSPECTIVES 48 REFERENCES 49 APPENDIX A 55 APPENDIX B 58 APPENDIX C 61 APPENDIX D 64 APPENDIX E 67 APPENDIX F 70 APPENDIX G 73 APPENDIX H 76 APPENDIX I 79 APPENDIX J 80 APPENDIX K 88

    1. Aggarwal, C. C., & Zhai, C. (2012). A survey of text classification algorithms. In Springer eBooks (pp. 163–222).
    2. Almatarneh, S., & Gamallo, P. (2018). A lexicon based method to search for extreme opinions. PLOS ONE, 13(5), e0197816.
    3. Anand, R., & Jeffrey David, U. (2011). Mining of massive datasets. Cambridge university press.
    4. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., & Ives, Z. (2007). Dbpedia: A nucleus for a web of open data. In The Semantic Web: 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007+ ASWC 2007, Busan, Korea, November 11-15, 2007. Proceedings (pp. 722-735). Springer Berlin Heidelberg.
    5. Bai, J., Wang, Y., Chen, Y., Yang, Y., Bai, J., Yu, J., & Tong, Y. (2021). Syntax-BERT: Improving pre-trained transformers with syntax trees.
    6. Baker, C. F., Fillmore, C. J., & Lowe, J. (1998). The Berkeley FrameNet Project.
    7. Bandy, J., & Vincent, N. (2021). Addressing" documentation debt" in machine learning research: A retrospective datasheet for bookcorpus.
    8. Berger, A. C., Della Pietra, V. J., & Della Pietra, S. A. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22(1), 39–71.
    9. Birjali, M., Kasri, M., & Beni-Hssane, A. (2021). A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowledge Based Systems, 226, 107134.
    10. Blitzer, J., Dredze, M., & Pereira, F. (2007, June). Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th annual meeting of the association of computational linguistics (pp. 440-447).
    11. Brown, E. W., & Coden, A. R. (2001, September). Capitalization recovery for text. In Workshop on Information Retrieval Techniques for Speech Applications (pp. 11-22). Berlin, Heidelberg: Springer Berlin Heidelberg.
    12. Bejan, C. A., & Harabagiu, S. M. (2008, May). A linguistic resource for discovering event structures and resolving event coreference. In Language Resources and Evaluation Conference.
    13. Chen, X., Jia, S., & Xiang, Y. (2020). A review: Knowledge reasoning over knowledge graph. Expert Systems With Applications, 141, 112948.
    14. Chen, Y., Li, H., Li, H., Liu, W., Wu, Y., Huang, Q., & Wan, S. (2022). An overview of knowledge graph reasoning: key technologies and applications. Journal of Sensor and Actuator Networks, 11(4), 78.
    15. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
    16. Dang, N. C., Moreno-García, M. N., & De la Prieta, F. (2020). Sentiment analysis based on deep learning: A comparative study. Electronics, 9(3), 483.
    17. Deng, J., Dong, W., Socher, R., Li, L., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition.
    18. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, 4171-4186.
    19. Ettinger, A. (2020). What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models. Transactions of the Association for Computational Linguistics, 8, 34-48.
    20. Fagin, R., Halpern, J. Y., Moses, Y., & Vardi, M. (2004). Reasoning about knowledge. MIT press.
    21. Färber, M., Ell, B., Menne, C., & Rettinger, A. (2015). A comparative survey of dbpedia, freebase, opencyc, wikidata, and yago. Semantic Web Journal, 1(1), 1-5.
    22. Fox, C. P. (1989). A stop list for general text. Sigir Forum, 24(1–2), 19–21.
    23. Ghidini, C., & Serafini, L. (1999). A context-based logic for distributed knowledge representation and reasoning. In Modeling and Using Context: Second International and Interdisciplinary Conference, CONTEXT’99 Trento, Italy, September 9–11, 1999 Proceedings 2 (pp. 159-172). Springer Berlin Heidelberg.
    24. Grefenstette, G. (1999). Tokenization. Syntactic Wordclass Tagging, 117-133.
    25. Grosan, C., Abraham, A., Grosan, C., & Abraham, A. (2011). Rule-based expert systems. Intelligent systems: A modern approach, 149-185.
    26. Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., Melo, G. d., Gutierrez, C., Kirrane, S., Gayo, J. E. L., Navigli, R., & Neumaier, S. (2021). Knowledge graphs. ACM Computing Surveys (CSUR), 54(4), 1-37.
    27. Hwang, J. D., Bhagavatula, C., Le Bras, R., Da, J., Sakaguchi, K., Bosselut, A., & Choi, Y. (2021, May). (comet-) atomic 2020: On symbolic and neural commonsense knowledge graphs. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 7, pp. 6384-6392).
    28. Ikonomakis, M., Kotsiantis, S., & Tampakas, V. (2005). Text classification using machine learning techniques. WSEAS transactions on computers, 4(8), 966-974.
    29. Ilievski, F., Szekely, P., & Zhang, B. (2021). Cskg: The commonsense knowledge graph. In The Semantic Web: 18th International Conference, ESWC 2021, Virtual Event, June 6–10, 2021, Proceedings 18 (pp. 680-696). Springer International Publishing.
    30. Jesus, J., Araújo, D., & Canuto, A. (2016, October). Fusion approaches of feature selection algorithms for classification problems. In 2016 5th Brazilian Conference on Intelligent Systems (BRACIS) (pp. 379-384). IEEE.
    31. Ji, S., Pan, S., Cambria, E., Marttinen, P., & Philip, S. Y. (2021). A survey on knowledge graphs: Representation, acquisition, and applications. IEEE transactions on neural networks and learning systems, 33(2), 494-514.
    32. Jović, A., Brkić, K., & Bogunović, N. (2015, May). A review of feature selection methods with applications. In 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO) (pp. 1200-1205). Ieee.
    33. Jacovi, A., Shalom, O. S., & Goldberg, Y. (2018). Understanding convolutional neural networks for text classification. arXiv preprint arXiv:1809.08037.
    34. Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. (2019). Text classification algorithms: A survey. Information, 10(4), 150.
    35. Kamps, J., & Marx, M. (2002). Visualizing wordnet structure. In Proc. of the 1st International Conference on Global WordNet (pp. 182-186).
    36. Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.-J., & Shamma, D. A. (2017). Visual genome: Connecting language and vision using crowdsourced dense image annotations. International journal of computer vision, 123, 32-73.
    37. Li, X., & Roth, D. (2002). Learning question classifiers. In COLING 2002: The 19th International Conference on Computational Linguistics.
    38. Ligthart, A., Catal, C., & Tekinerdogan, B. (2021). Systematic reviews in sentiment analysis: a tertiary study. Artificial Intelligence Review, 1-57.
    39. Lin, Y., Liu, Z., Sun, M., Liu, Y., & Zhu, X. (2015, February). Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the AAAI conference on artificial intelligence (Vol. 29, No. 1).
    40. Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1), 1–167.
    41. Liu, H., & Singh, P. (2004). ConceptNet—a practical commonsense reasoning tool-kit. BT technology journal, 22(4), 211-226.
    42. Liu, W., Zhou, P., Zhao, Z., Wang, Z., Ju, Q., Deng, H., & Wang, P. (2020, April). K-bert: Enabling language representation with knowledge graph. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 03, pp. 2901-2908).
    43. Liu, P., Qiu, X., & Huang, X. (2016). Recurrent neural network for text classification with multi-task learning.
    44. Lovera, F. A., Cardinale, Y. C., & Homsi, M. N. (2021). Sentiment analysis in twitter based on knowledge graph and deep learning classification. Electronics, 10(22), 2739.
    45. Lilleberg, J., Zhu, Y., & Zhang, Y. (2015, July). Support vector machines and word2vec for text classification with semantic features. In 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC) (pp. 136-140). IEEE.
    46. Liu, G., & Guo, J. (2019). Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing, 337, 325-338.
    47. Liu, J., Lu, Z., & Du, W. (2019). Combining enterprise knowledge graph and news sentiment analysis for stock price prediction.
    48. Leskovec, J., Rajaraman, A., & Ullman, J. D. (2020). Mining of massive data sets. Cambridge university press.
    49. Ma, X., Xu, P., Wang, Z., Nallapati, R., & Xiang, B. (2019, November). Domain adaptation with BERT-based domain classification and data selection. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019) (pp. 76-83).
    50. Maas, A., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011, June). Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies (pp. 142-150).
    51. Manago, M., & Kodratoff, Y. (1987, August). Noise and Knowledge Acquisition. In IJCAI (pp. 348-354). Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams engineering journal, 5(4), 1093-1113.
    52. Miller, G. A. (1995). WordNet: a lexical database for English. Communications of the ACM, 38(11), 39-41.
    53. Marin, A., Holenstein, R., Sarikaya, R., & Ostendorf, M. (2014). Learning phrase patterns for text classification using a knowledge graph and unlabeled data. In Fifteenth annual conference of the international speech communication association.
    54. Ostendorff, M., Bourgonje, P., Berger, M., Moreno-Schneider, J., Rehm, G., & Gipp, B. (2019). Enriching bert with knowledge graph embeddings for document classification.
    55. Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1, 81-106.
    56. Rish, I. (2001, August). An empirical study of the naive Bayes classifier. In IJCAI 2001 workshop on empirical methods in artificial intelligence (Vol. 3, No. 22, pp. 41-46).
    57. Roget, P. M. (1911). Roget's Thesaurus of English Words and Phrases. TY Crowell Company.
    58. Sap, M., Le Bras, R., Allaway, E., Bhagavatula, C., Lourie, N., Rashkin, H., Roof, B., Smith, N. A., & Choi, Y. (2019). Atomic: An atlas of machine commonsense for if-then reasoning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, No. 01, pp. 3027-3035).
    59. Singh, J., & Gupta, V. (2016). Text stemming: Approaches, applications, and challenges. ACM Computing Surveys (CSUR), 49(3), 1-46.
    60. Sun, C., Qiu, X., Xu, Y., & Huang, X. (2019). How to fine-tune bert for text classification?. In Chinese Computational Linguistics: 18th China National Conference, CCL 2019, Kunming, China, October 18–20, 2019, Proceedings 18 (pp. 194-206). Springer International Publishing.
    61. Singhal, Amit (2012). Introducing the knowledge graph: Things, not strings. Google Official Blog.
    62. Speer, R., Chin, J., & Havasi, C. (2017, February). Conceptnet 5.5: An open multilingual graph of general knowledge. In Proceedings of the AAAI conference on artificial intelligence (Vol. 31, No. 1).
    63. Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational linguistics, 37(2), 267-307.
    64. Tong, S., & Koller, D. (2001). Support vector machine active learning with applications to text classification. Journal of machine learning research, 2(Nov), 45-66.
    65. Uysal, A. K., & Gunal, S. (2014). The impact of preprocessing on text classification. Information processing & management, 50(1), 104-112.
    66. Vizcarra, J., Kozaki, K., Torres Ruiz, M., & Quintero, R. (2021). Knowledge-based sentiment analysis and visualization on social networks. New Generation Computing, 39, 199-229.
    67. Vrandečić, D., & Krötzsch, M. (2014). Wikidata: a free collaborative knowledgebase. Communications of the ACM, 57(10), 78-85.
    68. Wankhade, M., Rao, A. C. S., & Kulkarni, C. (2022). A survey on sentiment analysis methods, applications, and challenges. Artificial Intelligence Review, 55(7), 5731-5780.
    69. Wadawadagi, R., & Pagi, V. (2020). Sentiment analysis with deep neural networks: comparative study and performance assessment. Artificial Intelligence Review, 53(8), 6155-6195.
    70. Xian, Y., Lampert, C. H., Schiele, B., & Akata, Z. (2018). Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly. IEEE transactions on pattern analysis and machine intelligence, 41(9), 2251-2265.
    71. Yadav, A., & Vishwakarma, D. K. (2020). Sentiment analysis using deep learning architectures: a review. Artificial Intelligence Review, 53(6), 4335-4385.
    72. Zhang, L., Ghosh, R., Dekhil, M., Hsu, M., & Liu, B. (2011). Combining lexicon-based and learning-based methods for Twitter sentiment analysis. HP Laboratories, Technical Report HPL-2011, 89, 1-8.
    73. Zhang, T., Fan, S., Hu, J., Guo, X., Li, Q., Zhang, Y., & Wulamu, A. (2021). A feature fusion method with guided training for classification tasks. Computational Intelligence and Neuroscience, 2021.
    74. Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level convolutional networks for text classification. Advances in neural information processing systems, 28.
    75. Zhong, Y., Zhang, Z., Zhang, W., & Zhu, J. (2021). BERT-KG: a short text classification model based on knowledge graph and deep semantics. In Natural Language Processing and Chinese Computing: 10th CCF International Conference, NLPCC 2021, Qingdao, China, October 13–17, 2021, Proceedings, Part I 10 (pp. 721-733). Springer International Publishing.

    QR CODE
    :::