跳到主要內容

簡易檢索 / 詳目顯示

研究生: 李正倫
Zheng-Lun Li
論文名稱: 評估中文摘要之事實一致性並探討斷詞對其之影響
Does the Tokenization Influence the Faithfulness? Evaluation of Hallucinations for Chinese Abstractive Summarization
指導教授: 蔡宗翰
Richard Tzong-Han Tsai
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 41
中文關鍵詞: 自動萃取式摘要預訓練模型中文斷詞事實一致性
外文關鍵詞: Abstractive Summarization, Pre­trained Model, Tokenization, Hallucination
相關次數: 點閱:17下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 事實一致性問題是自動萃取式摘要中關鍵且棘手的問題,近年來受
    到許多研究者的關注,然而先前之研究集中於探討英文摘要中的事實
    一致性問題,中文摘要的事實一致性則尚被評估與研究。
    我們基於中文相對於英文較為不同的部分進行研究,也就是斷詞,
    現今的中文預訓練模型大多使用和 BERT 相同的斷詞系統,實際上相
    當接近單純使用字元進行斷詞。
    透過使用不同中文斷詞套件來訓練中文 BART 模型,並在 LCSTS
    中文摘要資料集上微調,我們證實了斷詞不只影響傳統 ROUGE 分數
    也同時影響了事實一致性。
    此外考慮到簡體和繁體中文的用詞差異,我們也建立了台灣新聞弱
    監督自動萃取式摘要資料集 TWNSum ,透過最簡單的 LEAD 方式抽
    取摘要並使用事實一致性評估篩選,表明從大量未標記的新聞語料中
    生成自動萃取式摘要資料集是可行的。


    Hallucination is a critical and hard problem in abstractive summarization,
    getting increasing attention in recent years. However, hallucination in some
    languages, or specifically, in Chinese, is still unexplored. We experiment with
    a special procedure in the Chinese modeling, which is tokenization, to figure
    out the effect of tokenization on hallucinations in abstractive summarization.
    Tokenization is not often taken out for additional experimented in English
    due to the language characteristics. In the Chinese scenario, current models
    use either the character­level tokenization or the tokenization similar to the
    character­level tokenization, such as the BERT tokenizer. By applying different Chinese tokenizers to the BART model, we confirm that the tokenizer
    will affect both the ROUGE score and the faithfulness of the model. Moreover, considering the difference between the traditional Chinese and simplified Chinese tokenizers, we create Taiwan Weakly supervised News Summarization dataset (TWNSum) by using the simple LEAD method and the
    hallucination evaluation filtering. Additionally, our TWNSum dataset shows
    that creating an abstractive summarization dataset from a large amount of
    unlabeled news by a weakly supervised method is feasible.

    中文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii 誌謝 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1 Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Pre­trained Language Model . . . . . . . . . . . . . . . . . . . . . 4 2.3 Tokenization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.4 Summarization . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4.1 Extractive Summarization . . . . . . . . . . . . . . . . . 7 2.4.2 Abstractive Summarization . . . . . . . . . . . . . . . . 8 2.5 Hallucination . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.5.1 Intrinsic and Extrinsic Hallucinations . . . . . . . . . . 9 2.5.2 Current Evaluations and Solutions for Hallucination . . . 10 3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4 Experiments and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    [1] J. Devlin, M.­W. Chang, K. Lee, and K. Toutanova, “BERT: Pre­training of deep
    bidirectional transformers for language understanding,” in Proceedings of the 2019
    Conference of the North American Chapter of the Association for Computational
    Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).
    Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp.
    4171–4186. [Online]. Available: https://www.aclweb.org/anthology/N19­1423
    [2] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,”
    arXiv preprint arXiv:1907.11692, 2019.
    [3] G. Lample and A. Conneau, “Cross­lingual language model pretraining,” arXiv
    preprint arXiv:1901.07291, 2019.
    [4] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov,
    and L. Zettlemoyer, “BART: Denoising sequence­to­sequence pre­training for
    natural language generation, translation, and comprehension,” in Proceedings of the
    58th Annual Meeting of the Association for Computational Linguistics. Online:
    Association for Computational Linguistics, Jul. 2020, pp. 7871–7880. [Online].
    Available: https://www.aclweb.org/anthology/2020.acl­main.703
    [5] C.­Y. Lin, “ROUGE: A package for automatic evaluation of summaries,”
    in Text Summarization Branches Out. Barcelona, Spain: Association for
    Computational Linguistics, Jul. 2004, pp. 74–81. [Online]. Available: https:
    //www.aclweb.org/anthology/W04­1013
    [6] K. M. Hermann, T. Kocisky, E. Grefenstette, L. Espeholt, W. Kay, M. Suleyman,
    and P. Blunsom, “Teaching machines to read and comprehend,” Advances in neural
    information processing systems, vol. 28, pp. 1693–1701, 2015.
    [7] R. Nallapati, B. Zhou, C. dos Santos, Ç. glar Gulçehre, and B. Xiang, “Abstractive
    text summarization using sequence­to­sequence rnns and beyond,” CoNLL 2016, p.
    280, 2016.
    [8] S. Narayan, S. B. Cohen, and M. Lapata, “Don't give me the details, just the summary! topic­aware convolutional neural networks for extreme summarization,” in
    Proceedings of the 2018 Conference on Empirical Methods in Natural Language
    Processing, 2018, pp. 1797–1807.
    [9] B. Hu, Q. Chen, and F. Zhu, “LCSTS: A large scale Chinese short text
    summarization dataset,” in Proceedings of the 2015 Conference on Empirical
    Methods in Natural Language Processing. Lisbon, Portugal: Association
    for Computational Linguistics, Sep. 2015, pp. 1967–1972. [Online]. Available:
    https://www.aclweb.org/anthology/D15­1229
    [10] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser,
    and I. Polosukhin, “Attention is all you need,” in Advances in neural information
    processing systems, 2017, pp. 5998–6008.
    [11] Y. Bengio, P. Simard, and P. Frasconi, “Learning long­term dependencies with gradient descent is difficult,” IEEE transactions on neural networks, vol. 5, no. 2, pp.
    157–166, 1994.
    [12] M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer,
    “Deep contextualized word representations,” in Proceedings of the 2018 Conference
    of the North American Chapter of the Association for Computational Linguistics:
    Human Language Technologies, Volume 1 (Long Papers). New Orleans, Louisiana:
    Association for Computational Linguistics, Jun. 2018, pp. 2227–2237. [Online].
    Available: https://www.aclweb.org/anthology/N18­1202
    [13] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, “Xlnet: Generalized autoregressive pretraining for language understanding,” Advances
    in neural information processing systems, vol. 32, 2019.
    [14] R. Sennrich, B. Haddow, and A. Birch, “Neural machine translation of rare words
    with subword units,” arXiv preprint arXiv:1508.07909, 2015.
    [15] M. Schuster and K. Nakajima, “Japanese and korean voice search,” in 2012 IEEE
    International Conference on Acoustics, Speech and Signal Processing (ICASSP).
    IEEE, 2012, pp. 5149–5152.
    [16] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever et al., “Language
    models are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, p. 9, 2019.
    [17] X. Li, Y. Meng, X. Sun, Q. Han, A. Yuan, and J. Li, “Is word segmentation
    necessary for deep learning of Chinese representations?” in Proceedings of the 57th
    Annual Meeting of the Association for Computational Linguistics. Florence, Italy:
    Association for Computational Linguistics, Jul. 2019, pp. 3242–3252. [Online].
    Available: https://www.aclweb.org/anthology/P19­1314
    [18] N. Moratanch and S. Chitrakala, “A survey on extractive text summarization,” in
    2017 international conference on computer, communication and signal processing
    (ICCCSP). IEEE, 2017, pp. 1–6.
    [19] Q. Zhou, N. Yang, F. Wei, S. Huang, M. Zhou, and T. Zhao, “Neural
    document summarization by jointly learning to score and select sentences,” in
    Proceedings of the 56th Annual Meeting of the Association for Computational
    Linguistics (Volume 1: Long Papers). Melbourne, Australia: Association
    for Computational Linguistics, Jul. 2018, pp. 654–663. [Online]. Available:
    https://www.aclweb.org/anthology/P18­1061
    [20] Y. Liu and M. Lapata, “Text summarization with pretrained encoders,” arXiv preprint
    arXiv:1908.08345, 2019.
    [21] X. Zhang, F. Wei, and M. Zhou, “HIBERT: Document level pre­training
    of hierarchical bidirectional transformers for document summarization,” in
    Proceedings of the 57th Annual Meeting of the Association for Computational
    Linguistics. Florence, Italy: Association for Computational Linguistics, Jul. 2019,
    pp. 5059–5069. [Online]. Available: https://www.aclweb.org/anthology/P19­1499
    [22] J. Gu, Z. Lu, H. Li, and V. O. Li, “Incorporating copying mechanism in
    sequence­to­sequence learning,” in Proceedings of the 54th Annual Meeting of
    the Association for Computational Linguistics (Volume 1: Long Papers). Berlin,
    Germany: Association for Computational Linguistics, Aug. 2016, pp. 1631–1640.
    [Online]. Available: https://www.aclweb.org/anthology/P16­1154
    [23] A. See, P. J. Liu, and C. D. Manning, “Get to the point: Summarization with
    pointer­generator networks,” in Proceedings of the 55th Annual Meeting of the
    Association for Computational Linguistics (Volume 1: Long Papers). Vancouver,
    Canada: Association for Computational Linguistics, Jul. 2017, pp. 1073–1083.
    [Online]. Available: https://www.aclweb.org/anthology/P17­1099
    [24] J. Zhang, Y. Zhao, M. Saleh, and P. Liu, “Pegasus: Pre­training with extracted gapsentences for abstractive summarization,” in International Conference on Machine
    Learning. PMLR, 2020, pp. 11 328–11 339.
    [25] W. Qi, Y. Yan, Y. Gong, D. Liu, N. Duan, J. Chen, R. Zhang, and M. Zhou,
    “ProphetNet: Predicting future n­gram for sequence­to­SequencePre­training,” in
    Findings of the Association for Computational Linguistics: EMNLP 2020. Online:
    Association for Computational Linguistics, Nov. 2020, pp. 2401–2410. [Online].
    Available: https://www.aclweb.org/anthology/2020.findings­emnlp.217
    [26] J. Maynez, S. Narayan, B. Bohnet, and R. McDonald, “On faithfulness and
    factuality in abstractive summarization,” in Proceedings of the 58th Annual
    Meeting of the Association for Computational Linguistics. Online: Association
    for Computational Linguistics, Jul. 2020, pp. 1906–1919. [Online]. Available:
    https://www.aclweb.org/anthology/2020.acl­main.173
    [27] E. Durmus, H. He, and M. Diab, “FEQA: A question answering evaluation framework for faithfulness assessment in abstractive summarization,” in Proceedings of
    the 58th Annual Meeting of the Association for Computational Linguistics. Online:
    Association for Computational Linguistics, Jul. 2020, pp. 5055–5070. [Online].
    Available: https://www.aclweb.org/anthology/2020.acl­main.454
    [28] A. Wang, K. Cho, and M. Lewis, “Asking and answering questions to evaluate
    the factual consistency of summaries,” in Proceedings of the 58th Annual
    Meeting of the Association for Computational Linguistics. Online: Association
    for Computational Linguistics, Jul. 2020, pp. 5008–5020. [Online]. Available:
    https://www.aclweb.org/anthology/2020.acl­main.450
    [29] F. Nan, C. Nogueira dos Santos, H. Zhu, P. Ng, K. McKeown, R. Nallapati,
    D. Zhang, Z. Wang, A. O. Arnold, and B. Xiang, “Improving factual consistency
    of abstractive summarization via question answering,” in Proceedings of the 59th
    Annual Meeting of the Association for Computational Linguistics and the 11th
    International Joint Conference on Natural Language Processing (Volume 1: Long
    Papers). Online: Association for Computational Linguistics, Aug. 2021, pp.
    6881–6894. [Online]. Available: https://aclanthology.org/2021.acl­long.536
    [30] R. Pasunuru and M. Bansal, “Multi­reward reinforced summarization with saliency
    and entailment,” in Proceedings of the 2018 Conference of the North American
    Chapter of the Association for Computational Linguistics: Human Language
    Technologies, Volume 2 (Short Papers). New Orleans, Louisiana: Association
    for Computational Linguistics, Jun. 2018, pp. 646–653. [Online]. Available:
    https://www.aclweb.org/anthology/N18­2102
    [31] S. Chen, F. Zhang, K. Sone, and D. Roth, “Improving faithfulness in abstractive
    summarization with contrast candidate generation and selection,” in Proceedings
    of the 2021 Conference of the North American Chapter of the Association for
    Computational Linguistics: Human Language Technologies. Online: Association
    for Computational Linguistics, Jun. 2021, pp. 5935–5941. [Online]. Available:
    https://aclanthology.org/2021.naacl­main.475
    [32] F. Nan, R. Nallapati, Z. Wang, C. Nogueira dos Santos, H. Zhu, D. Zhang,
    K. McKeown, and B. Xiang, “Entity­level factual consistency of abstractive text
    summarization,” in Proceedings of the 16th Conference of the European Chapter of
    the Association for Computational Linguistics: Main Volume. Online: Association
    for Computational Linguistics, Apr. 2021, pp. 2727–2733. [Online]. Available:
    https://aclanthology.org/2021.eacl­main.235
    [33] M. Ott, S. Edunov, A. Baevski, A. Fan, S. Gross, N. Ng, D. Grangier, and M. Auli,
    “fairseq: A fast, extensible toolkit for sequence modeling,” in Proceedings of the
    2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), 2019, pp. 48–53.
    [34] P.­H. Li, T.­J. Fu, and W.­Y. Ma, “Why attention? analyze bilstm deficiency and
    its remedies in the case of ner,” Proceedings of the AAAI Conference on Artificial
    Intelligence, vol. 34, no. 05, pp. 8236–8244, Apr. 2020. [Online]. Available:
    https://ojs.aaai.org/index.php/AAAI/article/view/6338

    QR CODE
    :::