| 研究生: |
李正倫 Zheng-Lun Li |
|---|---|
| 論文名稱: |
評估中文摘要之事實一致性並探討斷詞對其之影響 Does the Tokenization Influence the Faithfulness? Evaluation of Hallucinations for Chinese Abstractive Summarization |
| 指導教授: |
蔡宗翰
Richard Tzong-Han Tsai |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 英文 |
| 論文頁數: | 41 |
| 中文關鍵詞: | 自動萃取式摘要 、預訓練模型 、中文斷詞 、事實一致性 |
| 外文關鍵詞: | Abstractive Summarization, Pretrained Model, Tokenization, Hallucination |
| 相關次數: | 點閱:16 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
事實一致性問題是自動萃取式摘要中關鍵且棘手的問題,近年來受
到許多研究者的關注,然而先前之研究集中於探討英文摘要中的事實
一致性問題,中文摘要的事實一致性則尚被評估與研究。
我們基於中文相對於英文較為不同的部分進行研究,也就是斷詞,
現今的中文預訓練模型大多使用和 BERT 相同的斷詞系統,實際上相
當接近單純使用字元進行斷詞。
透過使用不同中文斷詞套件來訓練中文 BART 模型,並在 LCSTS
中文摘要資料集上微調,我們證實了斷詞不只影響傳統 ROUGE 分數
也同時影響了事實一致性。
此外考慮到簡體和繁體中文的用詞差異,我們也建立了台灣新聞弱
監督自動萃取式摘要資料集 TWNSum ,透過最簡單的 LEAD 方式抽
取摘要並使用事實一致性評估篩選,表明從大量未標記的新聞語料中
生成自動萃取式摘要資料集是可行的。
Hallucination is a critical and hard problem in abstractive summarization,
getting increasing attention in recent years. However, hallucination in some
languages, or specifically, in Chinese, is still unexplored. We experiment with
a special procedure in the Chinese modeling, which is tokenization, to figure
out the effect of tokenization on hallucinations in abstractive summarization.
Tokenization is not often taken out for additional experimented in English
due to the language characteristics. In the Chinese scenario, current models
use either the characterlevel tokenization or the tokenization similar to the
characterlevel tokenization, such as the BERT tokenizer. By applying different Chinese tokenizers to the BART model, we confirm that the tokenizer
will affect both the ROUGE score and the faithfulness of the model. Moreover, considering the difference between the traditional Chinese and simplified Chinese tokenizers, we create Taiwan Weakly supervised News Summarization dataset (TWNSum) by using the simple LEAD method and the
hallucination evaluation filtering. Additionally, our TWNSum dataset shows
that creating an abstractive summarization dataset from a large amount of
unlabeled news by a weakly supervised method is feasible.
[1] J. Devlin, M.W. Chang, K. Lee, and K. Toutanova, “BERT: Pretraining of deep
bidirectional transformers for language understanding,” in Proceedings of the 2019
Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).
Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp.
4171–4186. [Online]. Available: https://www.aclweb.org/anthology/N191423
[2] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,”
arXiv preprint arXiv:1907.11692, 2019.
[3] G. Lample and A. Conneau, “Crosslingual language model pretraining,” arXiv
preprint arXiv:1901.07291, 2019.
[4] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov,
and L. Zettlemoyer, “BART: Denoising sequencetosequence pretraining for
natural language generation, translation, and comprehension,” in Proceedings of the
58th Annual Meeting of the Association for Computational Linguistics. Online:
Association for Computational Linguistics, Jul. 2020, pp. 7871–7880. [Online].
Available: https://www.aclweb.org/anthology/2020.aclmain.703
[5] C.Y. Lin, “ROUGE: A package for automatic evaluation of summaries,”
in Text Summarization Branches Out. Barcelona, Spain: Association for
Computational Linguistics, Jul. 2004, pp. 74–81. [Online]. Available: https:
//www.aclweb.org/anthology/W041013
[6] K. M. Hermann, T. Kocisky, E. Grefenstette, L. Espeholt, W. Kay, M. Suleyman,
and P. Blunsom, “Teaching machines to read and comprehend,” Advances in neural
information processing systems, vol. 28, pp. 1693–1701, 2015.
[7] R. Nallapati, B. Zhou, C. dos Santos, Ç. glar Gulçehre, and B. Xiang, “Abstractive
text summarization using sequencetosequence rnns and beyond,” CoNLL 2016, p.
280, 2016.
[8] S. Narayan, S. B. Cohen, and M. Lapata, “Don't give me the details, just the summary! topicaware convolutional neural networks for extreme summarization,” in
Proceedings of the 2018 Conference on Empirical Methods in Natural Language
Processing, 2018, pp. 1797–1807.
[9] B. Hu, Q. Chen, and F. Zhu, “LCSTS: A large scale Chinese short text
summarization dataset,” in Proceedings of the 2015 Conference on Empirical
Methods in Natural Language Processing. Lisbon, Portugal: Association
for Computational Linguistics, Sep. 2015, pp. 1967–1972. [Online]. Available:
https://www.aclweb.org/anthology/D151229
[10] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser,
and I. Polosukhin, “Attention is all you need,” in Advances in neural information
processing systems, 2017, pp. 5998–6008.
[11] Y. Bengio, P. Simard, and P. Frasconi, “Learning longterm dependencies with gradient descent is difficult,” IEEE transactions on neural networks, vol. 5, no. 2, pp.
157–166, 1994.
[12] M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer,
“Deep contextualized word representations,” in Proceedings of the 2018 Conference
of the North American Chapter of the Association for Computational Linguistics:
Human Language Technologies, Volume 1 (Long Papers). New Orleans, Louisiana:
Association for Computational Linguistics, Jun. 2018, pp. 2227–2237. [Online].
Available: https://www.aclweb.org/anthology/N181202
[13] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, “Xlnet: Generalized autoregressive pretraining for language understanding,” Advances
in neural information processing systems, vol. 32, 2019.
[14] R. Sennrich, B. Haddow, and A. Birch, “Neural machine translation of rare words
with subword units,” arXiv preprint arXiv:1508.07909, 2015.
[15] M. Schuster and K. Nakajima, “Japanese and korean voice search,” in 2012 IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP).
IEEE, 2012, pp. 5149–5152.
[16] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever et al., “Language
models are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, p. 9, 2019.
[17] X. Li, Y. Meng, X. Sun, Q. Han, A. Yuan, and J. Li, “Is word segmentation
necessary for deep learning of Chinese representations?” in Proceedings of the 57th
Annual Meeting of the Association for Computational Linguistics. Florence, Italy:
Association for Computational Linguistics, Jul. 2019, pp. 3242–3252. [Online].
Available: https://www.aclweb.org/anthology/P191314
[18] N. Moratanch and S. Chitrakala, “A survey on extractive text summarization,” in
2017 international conference on computer, communication and signal processing
(ICCCSP). IEEE, 2017, pp. 1–6.
[19] Q. Zhou, N. Yang, F. Wei, S. Huang, M. Zhou, and T. Zhao, “Neural
document summarization by jointly learning to score and select sentences,” in
Proceedings of the 56th Annual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers). Melbourne, Australia: Association
for Computational Linguistics, Jul. 2018, pp. 654–663. [Online]. Available:
https://www.aclweb.org/anthology/P181061
[20] Y. Liu and M. Lapata, “Text summarization with pretrained encoders,” arXiv preprint
arXiv:1908.08345, 2019.
[21] X. Zhang, F. Wei, and M. Zhou, “HIBERT: Document level pretraining
of hierarchical bidirectional transformers for document summarization,” in
Proceedings of the 57th Annual Meeting of the Association for Computational
Linguistics. Florence, Italy: Association for Computational Linguistics, Jul. 2019,
pp. 5059–5069. [Online]. Available: https://www.aclweb.org/anthology/P191499
[22] J. Gu, Z. Lu, H. Li, and V. O. Li, “Incorporating copying mechanism in
sequencetosequence learning,” in Proceedings of the 54th Annual Meeting of
the Association for Computational Linguistics (Volume 1: Long Papers). Berlin,
Germany: Association for Computational Linguistics, Aug. 2016, pp. 1631–1640.
[Online]. Available: https://www.aclweb.org/anthology/P161154
[23] A. See, P. J. Liu, and C. D. Manning, “Get to the point: Summarization with
pointergenerator networks,” in Proceedings of the 55th Annual Meeting of the
Association for Computational Linguistics (Volume 1: Long Papers). Vancouver,
Canada: Association for Computational Linguistics, Jul. 2017, pp. 1073–1083.
[Online]. Available: https://www.aclweb.org/anthology/P171099
[24] J. Zhang, Y. Zhao, M. Saleh, and P. Liu, “Pegasus: Pretraining with extracted gapsentences for abstractive summarization,” in International Conference on Machine
Learning. PMLR, 2020, pp. 11 328–11 339.
[25] W. Qi, Y. Yan, Y. Gong, D. Liu, N. Duan, J. Chen, R. Zhang, and M. Zhou,
“ProphetNet: Predicting future ngram for sequencetoSequencePretraining,” in
Findings of the Association for Computational Linguistics: EMNLP 2020. Online:
Association for Computational Linguistics, Nov. 2020, pp. 2401–2410. [Online].
Available: https://www.aclweb.org/anthology/2020.findingsemnlp.217
[26] J. Maynez, S. Narayan, B. Bohnet, and R. McDonald, “On faithfulness and
factuality in abstractive summarization,” in Proceedings of the 58th Annual
Meeting of the Association for Computational Linguistics. Online: Association
for Computational Linguistics, Jul. 2020, pp. 1906–1919. [Online]. Available:
https://www.aclweb.org/anthology/2020.aclmain.173
[27] E. Durmus, H. He, and M. Diab, “FEQA: A question answering evaluation framework for faithfulness assessment in abstractive summarization,” in Proceedings of
the 58th Annual Meeting of the Association for Computational Linguistics. Online:
Association for Computational Linguistics, Jul. 2020, pp. 5055–5070. [Online].
Available: https://www.aclweb.org/anthology/2020.aclmain.454
[28] A. Wang, K. Cho, and M. Lewis, “Asking and answering questions to evaluate
the factual consistency of summaries,” in Proceedings of the 58th Annual
Meeting of the Association for Computational Linguistics. Online: Association
for Computational Linguistics, Jul. 2020, pp. 5008–5020. [Online]. Available:
https://www.aclweb.org/anthology/2020.aclmain.450
[29] F. Nan, C. Nogueira dos Santos, H. Zhu, P. Ng, K. McKeown, R. Nallapati,
D. Zhang, Z. Wang, A. O. Arnold, and B. Xiang, “Improving factual consistency
of abstractive summarization via question answering,” in Proceedings of the 59th
Annual Meeting of the Association for Computational Linguistics and the 11th
International Joint Conference on Natural Language Processing (Volume 1: Long
Papers). Online: Association for Computational Linguistics, Aug. 2021, pp.
6881–6894. [Online]. Available: https://aclanthology.org/2021.acllong.536
[30] R. Pasunuru and M. Bansal, “Multireward reinforced summarization with saliency
and entailment,” in Proceedings of the 2018 Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language
Technologies, Volume 2 (Short Papers). New Orleans, Louisiana: Association
for Computational Linguistics, Jun. 2018, pp. 646–653. [Online]. Available:
https://www.aclweb.org/anthology/N182102
[31] S. Chen, F. Zhang, K. Sone, and D. Roth, “Improving faithfulness in abstractive
summarization with contrast candidate generation and selection,” in Proceedings
of the 2021 Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies. Online: Association
for Computational Linguistics, Jun. 2021, pp. 5935–5941. [Online]. Available:
https://aclanthology.org/2021.naaclmain.475
[32] F. Nan, R. Nallapati, Z. Wang, C. Nogueira dos Santos, H. Zhu, D. Zhang,
K. McKeown, and B. Xiang, “Entitylevel factual consistency of abstractive text
summarization,” in Proceedings of the 16th Conference of the European Chapter of
the Association for Computational Linguistics: Main Volume. Online: Association
for Computational Linguistics, Apr. 2021, pp. 2727–2733. [Online]. Available:
https://aclanthology.org/2021.eaclmain.235
[33] M. Ott, S. Edunov, A. Baevski, A. Fan, S. Gross, N. Ng, D. Grangier, and M. Auli,
“fairseq: A fast, extensible toolkit for sequence modeling,” in Proceedings of the
2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), 2019, pp. 48–53.
[34] P.H. Li, T.J. Fu, and W.Y. Ma, “Why attention? analyze bilstm deficiency and
its remedies in the case of ner,” Proceedings of the AAAI Conference on Artificial
Intelligence, vol. 34, no. 05, pp. 8236–8244, Apr. 2020. [Online]. Available:
https://ojs.aaai.org/index.php/AAAI/article/view/6338