| 研究生: |
丁于晏 Yu-Yen Ting |
|---|---|
| 論文名稱: |
基於提示學習的中文事實查核任務之研究 The Study of Prompt Based Learning for Chinese Fact Checking |
| 指導教授: |
張嘉惠
Chia-Hui Chang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 中文 |
| 論文頁數: | 40 |
| 中文關鍵詞: | 事實查核 、提示學習 、提示微調 、參數高效微調 |
| 外文關鍵詞: | Fact Checking, Prompt Based Learning, Prompt Tuning, Parameter-Efficient Fine-Tuning |
| 相關次數: | 點閱:9 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在當今資訊蓬勃發展的時代,網路上充斥各種主張,這些主張的真偽往往難
以分辨,而人工方式審核這些主張的真實性並不容易,因此,需要透過自動事實查核解決這個問題。本篇論文研究重點在於中文事實查核任務,過去的研究主要集中在英文或多語言的資料集上,並且著重於傳統的預訓練和微調方法。因此本研究旨在利用新興自然語言範式「預訓練、提示、預測」的提示學習,來提升中文事實查核的效能。
事實查核任務包括證據檢索 (Evidence Retrieval) 及宣稱驗證 (Claim Verification) 兩個子任務。在宣稱驗證方面,我們探討多種提示學習策略在宣稱驗證任務上。由於提示學習需要設計一個模板加入到輸入端,我們會分為人工設計的模板和自動生成的模板。對於自動生成方法,我們採用 Automated Prompt Engineer (APE) [1] 來生成的提示模板,研究結果顯示提示學習有助於提升宣稱驗證的 F1 效能 1%-2% (從 78.99% 到 80.70%)。在證據檢索方面,我們使用監督式的 SentenceBERT [2] 和非監督式的 PromptBERT [3] 改善證據檢索效能。非監督式 PromptBERT 可增加 F1 效能 18% (從 12.66% 到 30.61%),而監督式SentenceBERT 更可大幅提升 F1 效能 88.15%。最後,我們整合宣稱驗證和證據檢索後,在中文事實查核的資料集 CHEF 上,F1 效能可以達到 80.54%,大幅超過基線效能 63.47%,甚至超過使用人工標記的正確證據 (Golden Evidence) 的效能 78.99%。整體而言,提示學習在中文事實查核的效能能夠改善傳統微調的效能。
With the wide spread of information, there are many fake claims on the Web, but it is difficult for humans to check whether the claim is true or not. Therefore, automated fact-checking can solve the problem. Our research focuses on Chinese fact-checking. Previous work has focused on English or multilingual fact-checking datasets and on pre-train and fine-tuning methods. Therefore, we want to enhance the performance of Chinese fact-checking through prompt-based learning.
The fact-checking task consists of two subtasks evidence retrieval and claim verification. Since prompt based learning requires designing a template to be added to the input, we divide it into manually designed templates and automatically generated templates. For the automated method, we generate the template by Automatic Prompt Engineer (APE) and use various prompt-based learning training strategies for claim verification. Additionally, we will use supervised SentenceBERT [2] and unsupervised PromptBERT [3] models to improve the evidence retrieval. We show that prompt-based learning can improve the F1 score of claim verification by 1%-2% (from 78.99% to 80.70%), and both evidence retrieval models also show significant performance improvements by 18% (from 12.66% to 30.61%) and achieve the performance at 88.15%. Finally, we combine evidence retrieval with claim verification to construct the complete pipeline for fact-checking. We achieve an impressive F1 score of 80.54% which outperforms the baseline 63.47%, and even outperforms the gold evidence based claim verification, increasing from 78.99% to 80.54%.
[1] Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, and Jimmy Ba. Large language models are human-level prompt engineers, 2023.
[2] Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks, 2019.
[3] Ting Jiang, Jian Jiao, Shaohan Huang, Zihan Zhang, Deqing Wang, Fuzhen Zhuang, Furu Wei, Haizhen Huang, Denvy Deng, and Qi Zhang. PromptBERT: Improving BERT sentence embeddings with prompts. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 8826–8837, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics.
[4] Xuming Hu, Zhijiang Guo, GuanYu Wu, Aiwei Liu, Lijie Wen, and Philip Yu. CHEF: A pilot Chinese dataset for evidence-based fact-checking. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3362–3376, Seattle, United States, July 2022. Association for Computational Linguistics.
[5] Neema Kotonya and Francesca Toni. Explainable automated fact-checking: A survey. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5430–5443, Barcelona, Spain (Online), December 2020. In- ternational Committee on Computational Linguistics.
[6] Akhtar Mubashara, Schlichtkrull Michael, Guo Zhijiang, Cocarascu Oana, Sim- perl Elena, and Vlachos Andreas. Multimodal automated fact-checking: A survey, 2023.
[7] Isabelle Augenstein, Christina Lioma, Dongsheng Wang, Lucas Chaves Lima, Casper Hansen, Christian Hansen, and Jakob Grue Simonsen. MultiFC: A real-world multi-domain dataset for evidence-based fact checking of claims. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4685–4697, Hong Kong, China, November 2019. Association for Computational Linguistics.
[8] Rami Aly, Zhijiang Guo, Michael Schlichtkrull, James Thorne, Andreas Vla- chos, Christos Christodoulopoulos, Oana Cocarascu, and Arpit Mittal. Fever- ous: Fact extraction and verification over unstructured and structured information, 2021.
[9] Ashim Gupta and Vivek Srikumar. X-fact: A new benchmark dataset for multi- lingual fact checking. In Proceedings of the 59th Annual Meeting of the Associa- tion for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 675–682, On- line, August 2021. Association for Computational Linguistics.
[10] David Wadden, Kyle Lo, Lucy Lu Wang, Arman Cohan, Iz Beltagy, and Han- naneh Hajishirzi. MultiVerS: Improving scientific claim verification with weak supervision and full-document context. In Findings of the Association for Com- putational Linguistics: NAACL 2022, pages 61–76, Seattle, United States, July 2022. Association for Computational Linguistics.
[11] Anab Maulana Barik, Wynne Hsu, and Mong Li Lee. Incorporating external knowledge for evidence-based fact verification. In Companion Proceedings of the Web Conference 2022, WWW ’22, page 429–437, New York, NY, USA, 2022. Association for Computing Machinery.
[12] Canasai Kruengkrai, Junichi Yamagishi, and Xin Wang. A multi-level attention model for evidence-based fact checking, 2021.
[13] Pawan Kumar Sahu, Saksham Aggarwal, Taneesh Gupta, and Gyanendra Das. Gpts at factify 2022: Prompt aided fact-verification (short paper). ArXiv, abs/2206.14913, 2022.
[14] Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv., 55(9), jan 2023.
[15] Xiaoyu Li, Weihong Wang, Jifei Fang, Li Jin, Hankun Kang, and Chunbo Liu. Peinet: Joint prompt and evidence inference network via language family policy for zero-shot multilingual fact checking. Applied Sciences, 12(19), 2022.
[16] James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. FEVER: a large-scale dataset for fact extraction and VERification. In
Proceedings of the 2018 Conference of the North American Chapter of the Asso- ciation for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 809–819, New Orleans, Louisiana, June 2018. Associa- tion for Computational Linguistics.
[17] Yixin Nie, Haonan Chen, and Mohit Bansal. Combining fact extraction and verification with neural semantic matching networks, 2018.
[18] Andreas Hanselowski, Hao Zhang, Zile Li, Daniil Sorokin, Benjamin Schiller, Claudia Schulz, and Iryna Gurevych. UKP-athene: Multi-sentence textual entailment for claim verification. In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), pages 103–108, Brussels, Belgium, November 2018. Association for Computational Linguistics.
[19] Chen Zhao, Chenyan Xiong, Corby Rosset, Xia Song, Paul Bennett, and Saurabh Tiwary. Transformer-xh: Multi-evidence reasoning with extra hop attention. In International Conference on Learning Representations, 2020.
[20] Chris Samarinas, Wynne Hsu, and Mong Li Lee. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguis- tics: Human Language Technologies: Demonstrations, pages 84–91, Online, June 2021. Association for Computational Linguistics.
[21] Amir Soleimani, Christof Monz, and Marcel Worring. Bert for evidence re- trieval and claim verification. In Joemon M. Jose, Emine Yilmaz, João Mag- alhães, Pablo Castells, Nicola Ferro, Mário J. Silva, and Flávio Martins, editors, Advances in Information Retrieval, pages 359–366, Cham, 2020. Springer Inter- national Publishing.
[22] Zhenghao Liu, Chenyan Xiong, Maosong Sun, and Zhiyuan Liu. Fine-grained fact verification with kernel graph attention network. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7342–7351, Online, July 2020. Association for Computational Linguistics.
[23] Shaden Shaar, Nikolay Babulkov, Giovanni Da San Martino, and Preslav Nakov. That is a known lie: Detecting previously fact-checked claims. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3607–3618, Online, July 2020. Association for Computational Linguistics.
[24] Tianyu Gao, Xingcheng Yao, and Danqi Chen. Simcse: Simple contrastive learning of sentence embeddings, 2022.
[25] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Ka- plan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language models are few-shot learners, 2020.
[26] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020.
[27] Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Nat- ural Language Processing (Volume 1: Long Papers), pages 4582–4597, Online, August 2021. Association for Computational Linguistics.
[28] Taylor Shin, Yasaman Razeghi, Robert L. Logan IV au2, Eric Wallace, and Sameer Singh. Autoprompt: Eliciting knowledge from language models with automatically generated prompts, 2020.
[29] Tianyu Gao, Adam Fisch, and Danqi Chen. Making pre-trained language mod- els better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3816–3830, Online, August 2021. Association for Computational Linguistics.
[30] Timo Schick and Hinrich Schütze. Exploiting cloze-questions for few-shot text classification and natural language inference. In Proceedings of the 16th Confer- ence of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 255–269, Online, April 2021. Association for Computa- tional Linguistics.
[31] ClueAI. Promptclue:全中文任務零樣本學習模型, 2022.
[32] Xu Han, Weilin Zhao, Ning Ding, Zhiyuan Liu, and Maosong Sun. Ptr: Prompt tuning with rules for text classification, 2021.
[33] Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang. Gpt understands, too, 2021.
[34] Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, Online and Punta Cana, Dominican Republic, November 2021. Association for Com- putational Linguistics.
[35] Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models, 2021.
[36] Yiheng Liu, Tianle Han, Siyuan Ma, Jiayue Zhang, Yuanyuan Yang, Jiaming Tian, Hao He, Antong Li, Mengshen He, Zhengliang Liu, Zihao Wu, Dajiang Zhu, Xiang Li, Ning Qiang, Dingang Shen, Tianming Liu, and Bao Ge. Sum- mary of chatgpt/gpt-4 research and perspective towards the future of large language models, 2023.
[37] Hao-Wen Cheng. Challenges and limitations of chatgpt and artificial intelligence for scientific research: A perspective from organic materials. AI, 4(2):401–405, 2023.