跳到主要內容

簡易檢索 / 詳目顯示

研究生: 莊家閔
Chia-Min Chuang
論文名稱: 使用預訓練編碼器提升跨語言摘要能力
Improving Cross-Lingual Text Summarization using Pretrained Encoder
指導教授: 蔡宗翰
Tzong-Han Tsai
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 軟體工程研究所
Graduate Institute of Software Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 59
中文關鍵詞: 文本摘要預訓練模型跨語言處理
外文關鍵詞: Summarization, Pretraining language model, Cross-lingual
相關次數: 點閱:8下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 跨語言文本摘要是透過機器將一種語言的文章轉換成另
    一種語言的摘要,先前的研究大多將該任務以兩步驟方法處
    理──「先翻譯後摘要」或「先摘要後翻譯」。但是,這兩
    種方法皆會有翻譯錯誤的問題,且其中的機器翻譯模型難以
    隨著摘要任務繼續更新微調(fine-tune)。針對上述問題,
    我們採用預訓練跨語言編碼器以向量表示(represent)不同
    語言的輸入,將其映射至相同的向量空間。預訓練方法已被
    廣泛應用在各種自然語言生成任務中,並取得優異的模型表
    現。此編碼器使得模型在學習摘要能力的過程中,同時保有
    跨語言能力。本研究中,我們實驗三種不同的微調方法,
    證明預訓練跨語言編碼器可以學習單詞階層(word-level)
    的語意特徵。在我們所有的模型組態裡,最優異的模型可
    在ROUGE-1分數上,超越基準模型3分。


    Cross-lingual text summarization (CLTS) is the task to generate a summary in one language given a document in a another language. Most of the previous work consider CLTS as two sub-tasks: translate-then-summarize and summarize-then-translate. Both of them are suffered from translation error and the translation system is hard to be fine-tuned with text summarization directly. To
    deal with the above problems, we utilize a pretrained cross-lingual encoder, which has been demonstrated the effectiveness in natural language generation, to represent text inputs from from different languages. We augment a standard sequence-to-sequence (Seq2Seq) network with our pretrained cross-lingual encoder so as to capture cross-lingual contextualized word representation. We show that the pretrained cross-lingual encoder can be fine-tuned on a text summarization dataset while keeping the cross-lingual ability. We experiment three different fine-tune strategies and show that the pretrained encoder can capture cross-lingual semantic features. The best of the proposed models obtains 42.08 Rouge-1 on ZH2ENSUM datasets [Zhu et al., 2019], significantly improving
    our baseline model by more than 3 Rouge-1.

    Chinese Abstract . . . . . . . . . . . . . . . . . . . . . . i English Abstract . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgements . . . . . . . . . . . . . . . . . . . . iv Content . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Figure . . . . . . . . . . . . . . . . . . . . . . . viii List of Table . . . . . . . . . . . . . . . . . . . . . . . . x 1 Introduction . . . . . . . . . . . . . . . . . . 1 2 Related Work . . . . . . . . . . . . . . . . . 4 2.1 End-to-end Cross-lingual Language Generation . . . . . . . . . . . . . . . . . . . 4 2.2 Cross-lingual Pretraining . . . . . . . . . . 6 3 Method . . . . . . . . . . . . . . . . . . . . 8 3.1 Transformer . . . . . . . . . . . . . . . . 8 3.2 Baseline Model . . . . . . . . . . . . . . . 10 3.3 Weights Transformation . . . . . . . . . . 12 v 3.4 Pretrained Cross-lingual Masked Language Model . . . . . . . . . . . . . . . . . . . 12 3.5 Cross-lingual Contextualized Word Representations . . . . . . . . . . . . . . . . . . 13 3.5.1 Cross-lingual Encoder (CLTS-XENC) . . . . . 15 3.5.2 Cross-lingual ELMo (CLTS-ELMo) . . . . . . 15 4 Experiments . . . . . . . . . . . . . . . . . . 17 4.1 Datasets . . . . . . . . . . . . . . . . . . 17 4.2 Evaluation Metrics . . . . . . . . . . . . . 18 4.3 Training Details . . . . . . . . . . . . . . 20 4.4 Result and Analysis . . . . . . . . . . . . . 21 4.4.1 Fine-Tuning Strategies . . . . . . . . . . . . 22 4.4.2 Pretraining Steps . . . . . . . . . . . . . . . 23 4.4.3 Cross-lingual Word Embeddings . . . . . . . 24 4.4.4 Human Evaluation . . . . . . . . . . . . . . 27 5 Conclusion and Future Work . . . . . . . . . 30 5.1 Conclusion . . . . . . . . . . . . . . . . . 30 5.2 Future Work . . . . . . . . . . . . . . . . 31 5.2.1 Adversarial Training . . . . . . . . . . . . . 32 5.2.2 Multi-task Learning . . . . . . . . . . . . . . 33 Appendix A . . . . . . . . . . . . . . . . . . . . . . . . 40 A.1 Cross-lingual Text Summarization Examples 40 Appendix B . . . . . . . . . . . . . . . . . . . . . . . . 43 vi B.1 Round-Trip Translation . . . . . . . . . . . 43

    Hanan Aldarmaki and Mona Diab. Context-aware cross-lingual
    mapping. arXiv preprint arXiv:1903.03243, 2019.
    Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer
    normalization. arXiv preprint arXiv:1607.06450, 2016.
    Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural
    machine translation by jointly learning to align and translate.
    arXiv preprint arXiv:1409.0473, 2014.
    Zewen Chi, Li Dong, Furu Wei, Wenhui Wang, Xian-Ling Mao,
    and Heyan Huang. Cross-lingual natural language generation via
    pre-training. arXiv preprint arXiv:1909.10481, 2019.
    Alexis Conneau, Guillaume Lample, Marc’Aurelio Ranzato, Ludovic
    Denoyer, and Hervé Jégou. Word translation without parallel
    data. arXiv preprint arXiv:1710.04087, 2017.
    Alexis Conneau, Guillaume Lample, Ruty Rinott, Adina Williams,
    Samuel R Bowman, Holger Schwenk, and Veselin Stoyanov.
    Xnli: Evaluating cross-lingual sentence representations. arXiv
    preprint arXiv:1809.05053, 2018.
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina
    Toutanova. Bert: Pre-training of deep bidirectional transformers
    for language understanding. arXiv preprint arXiv:1810.04805,
    2018.
    Sergey Edunov, Alexei Baevski, and Michael Auli. Pre-trained
    language model representations for language generation. arXiv
    preprint arXiv:1903.09722, 2019.
    Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and
    Yann N Dauphin. Convolutional sequence to sequence learning.
    arXiv preprint arXiv:1705.03122, 2017.
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep
    residual learning for image recognition. In Proceedings of the
    IEEE conference on computer vision and pattern recognition,
    pages 770–778, 2016.
    Jeremy Howard and Sebastian Ruder. Universal language
    model fine-tuning for text classification. arXiv preprint
    arXiv:1801.06146, 2018.
    Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic
    optimization. arXiv preprint arXiv:1412.6980, 2014.
    Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-
    Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan,Wade
    Shen, Christine Moran, Richard Zens, et al. Moses: Open source
    toolkit for statistical machine translation. In Proceedings of the
    45th annual meeting of the association for computational linguistics
    companion volume proceedings of the demo and poster sessions,
    pages 177–180, 2007.
    Vishwajeet Kumar, Nitish Joshi, Arijit Mukherjee, Ganesh Ramakrishnan,
    and Preethi Jyothi. Cross-lingual training for automatic
    question generation. arXiv preprint arXiv:1906.02525,
    2019.
    Guillaume Lample and Alexis Conneau. Cross-lingual language
    model pretraining. arXiv preprint arXiv:1901.07291, 2019.
    Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries.
    In Text summarization branches out, pages 74–81, 2004.
    Edward Loper and Steven Bird. Nltk: the natural language toolkit.
    arXiv preprint cs/0205028, 2002.
    Minh-Thang Luong, Hieu Pham, and Christopher D Manning.
    Bilingual word representations with monolingual quality in mind.
    In Proceedings of the 1st Workshop on Vector Space Modeling for
    Natural Language Processing, pages 151–159, 2015.
    Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff
    Dean. Distributed representations of words and phrases and their
    compositionality. In Advances in neural information processing
    systems, pages 3111–3119, 2013.
    Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa, and
    Eneko Agirre. Analyzing the limitations of cross-lingual word
    embedding mappings. arXiv preprint arXiv:1906.05407, 2019.
    Jessica Ouyang, Boya Song, and Kathleen McKeown. A robust abstractive
    system for cross-lingual summarization. In Proceedings
    of the 2019 Conference of the North American Chapter of the Association
    for Computational Linguistics: Human Language Technologies,
    Volume 1 (Long and Short Papers), pages 2025–2031,
    2019.
    Matthew E Peters,Waleed Ammar, Chandra Bhagavatula, and Russell
    Power. Semi-supervised sequence tagging with bidirectional
    language models. arXiv preprint arXiv:1705.00108, 2017.
    Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner,
    Christopher Clark, Kenton Lee, and Luke Zettlemoyer.
    Deep contextualized word representations. arXiv preprint
    arXiv:1802.05365, 2018.
    Telmo Pires, Eva Schlinger, and Dan Garrette. How multilingual is
    multilingual bert? arXiv preprint arXiv:1906.01502, 2019.
    Martin Popel and Ondˇrej Bojar. Training tips for the transformer
    model. The Prague Bulletin of Mathematical Linguistics, 110(1):
    43–70, 2018.
    Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya
    Sutskever. Improving language understanding by generative
    pre-training. URL https://s3-us-west-2. amazonaws. com/openaiassets/
    researchcovers/languageunsupervised/language understanding
    paper. pdf, 2018.
    Sebastian Ruder, Ivan Vuli´c, and Anders Søgaard. A survey of
    cross-lingual word embedding models. Journal of Artificial Intelligence
    Research, 65:569–631, 2019.
    Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural machine
    translation of rare words with subword units. arXiv preprint
    arXiv:1508.07909, 2015.
    Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence
    learning with neural networks. In Advances in neural information
    processing systems, pages 3104–3112, 2014.
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit,
    Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin.
    Attention is all you need. In Advances in neural information
    processing systems, pages 5998–6008, 2017.
    Xiaojun Wan, Huiying Li, and Jianguo Xiao. Cross-language document
    summarization based on machine translation quality pre-
    diction. In Proceedings of the 48th Annual Meeting of the Association
    for Computational Linguistics, pages 917–926. Association
    for Computational Linguistics, 2010.
    Shijie Wu and Mark Dredze. Beto, bentz, becas: The surprising
    cross-lingual effectiveness of bert. arXiv preprint
    arXiv:1904.09077, 2019.
    Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron
    Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio.
    Show, attend and tell: Neural image caption generation with
    visual attention. In International conference on machine learning,
    pages 2048–2057, 2015.
    Junnan Zhu, Qian Wang, Yining Wang, Yu Zhou, Jiajun Zhang,
    Shaonan Wang, and Chengqing Zong. Ncls: Neural cross-lingual
    summarization. arXiv preprint arXiv:1909.00156, 2019.

    QR CODE
    :::