跳到主要內容

簡易檢索 / 詳目顯示

研究生: 林佑錡
Yu-Chi Lin
論文名稱: History Aware Multi-Stage Prompting for Neural Chat Translation
指導教授: 柯士文
Shin-Wen Ke
口試委員:
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理學系
Department of Information Management
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 95
中文關鍵詞: 神經網路聊天翻譯機器翻譯提示調整深度學習
外文關鍵詞: neural chat translation, machine translation, prompt tuning, deep learning
相關次數: 點閱:9下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 神經網路聊天翻譯 (Neural Chat Translation, NCT) 是近年於機器翻譯領域中興起的任務,與神經機器翻譯 (Neural Machine Translation, NMT) 不同的是,神經網路聊天翻譯還涉及了多輪對話,因此是一項相當具挑戰性的二合一任務。雖然先前已經有研究使用上下文感知模型,並加入不同的輔助任務來來解決此任務,但往往需要很高的訓練成本。
    在微調預訓練語言的成本逐漸提升下,提示 (Prompt) 調整的開始興起,該方法展現了具備參數效率以及在表現上可與微調預訓練語言比較的特性。而最近此方法有被應用至機器翻譯領域中,但是仍只考慮句子級別的翻譯,沒辦法有效將神經網路聊天翻譯任務重視的聊天內容納入考量。因此在本研究中,我們為這項任務提出一個新的提示調整方法稱為 History Aware Multi-stage Prompting (HAMSP),透過將聊天歷史內容資訊納入到提示,以引導預訓練語言模型生成與對話情境一致的翻譯結果。
    在實驗結果中,我們展示了我們提出的 HAMSP 與基準方法相較之下達到更好的表現性能,並且能夠與微調方法相互抗衡。而透過進一步的內在評估,我們說明了我們的方法更加的穩健,並且能夠有效提升翻譯結果的對話連貫性,以及可以提升訓練效率與降低硬體成本,具備廣泛應用至真實世界中不同的聊天系統之潛力。


    Neural Chat Translation (NCT) is an emerging task in the field of machine translation. Unlike Neural Machine Translation (NMT), NCT involves multi-turn conversations, making it a challenging dual-task. Previous research has explored the use of context-aware models and auxiliary tasks to address this task, but often at a high training cost.
    As the cost of fine-tuning pre-trained language models continues to rise, prompt tuning has emerged as a promising alternative. This method demonstrates the characteristics of parameter efficiency and comparable performance to fine-tuning pre-trained language models. Recently, this method has been applied to the field of machine translation, but it only considers sentence-level translations and does not incorporate the conversational content that is crucial in neural chat translation tasks. Therefore, in this study, we present a new prompt tuning method called History Aware Multi-Stage Prompting (HAMSP). By incorporating the information from the chat history into the prompts, we guide the pre-trained language model to generate translations that are consistent with the conversational context.
    In the experimental results, we demonstrate that our proposed HAMSP outperforms the baseline methods and can compete with fine-tuning methods. Through further intrinsic evaluation, we illustrate the robustness of our method and its ability to enhance the dialogue coherence of translations. Additionally, our method shows potential for improving training efficiency and reducing hardware costs, making it suitable for various chat systems in real-world applications.

    摘要 I Abstract II Acknowledgements III Table of Contents IV List of Figures VI List of Tables VII 1. Introduction 1 1.1. Overview 1 1.2. Motivation 2 1.3. Objectives 4 1.4. Thesis Organization 5 2. Related Works 6 2.1. Neural Machine Translation 6 2.1.1. Sentence-level NMT 6 2.1.2. Document-level NMT 8 2.1.3. Neural Chat Translation 10 2.2. Prompt Tuning 12 2.2.1. Manual Prompt 13 2.2.2. Discrete Prompt 14 2.2.3. Continuous Prompt 20 2.3. Multilingual Pre-trained Language Models 28 2.3.1. mBART 28 2.3.2. mT5 29 2.3.3. mGPT 30 2.4. Discussion 31 3. Methodology 34 3.1. Model Overview 34 3.2. Model Architecture 35 3.2.1. Prompt Generator 36 3.2.2. Multi-Stage 38 3.3. Training Phase 39 3.4. Datasets 39 3.5. Experiment setting 41 3.5.1. Data preprocessing and postprocessing 41 3.5.2. Model Setting 41 3.6. Flow Chart 43 3.7. Experiment Design 43 3.7.1. Experiment - The effectiveness of our proposed prompting method applied to NCT tasks. 43 3.7.2. Evaluation Metrics 44 4. Experiment Results 48 4.1. Experiment - The effectiveness of our proposed prompting method applied to NCT tasks. 48 4.1.1. Experiment Results 48 4.1.2. Intrinsic evaluation 56 5. Conclusion 72 5.1. Overall summary 72 5.2. Contributions 72 5.3. Study limitation 73 5.4. Future work 74 Reference 75

    Agichtein, E., Gravano, L., 2000. Snowball: extracting relations from large plain-text collections, in: Proceedings of the Fifth ACM Conference on Digital Libraries, DL ’00. Association for Computing Machinery, New York, NY, USA, pp. 85–94. https://doi.org/10.1145/336597.336644
    Bahdanau, D., Cho, K., Bengio, Y., 2015. Neural Machine Translation by Jointly Learning to Align and Translate, in: Bengio, Y., LeCun, Y. (Eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
    Banerjee, S., Lavie, A., 2005. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments, in: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Association for Computational Linguistics, Ann Arbor, Michigan, pp. 65–72.
    Bawden, R., Sennrich, R., Birch, A., Haddow, B., 2018. Evaluating Discourse Phenomena in Neural Machine Translation, in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Presented at the NAACL-HLT 2018, Association for Computational Linguistics, New Orleans, Louisiana, pp. 1304–1313. https://doi.org/10.18653/v1/N18-1118
    Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D., 2020. Language Models are Few-Shot Learners, in: Advances in Neural Information Processing Systems. Curran Associates, Inc., pp. 1877–1901.
    Chen, J., Li, X., Zhang, J., Zhou, C., Cui, J., Wang, B., Su, J., 2020. Modeling Discourse Structure for Document-level Neural Machine Translation, in: Proceedings of the First Workshop on Automatic Simultaneous Translation. Presented at the AutoSimTrans 2020, Association for Computational Linguistics, Seattle, Washington, pp. 30–36. https://doi.org/10.18653/v1/2020.autosimtrans-1.5
    Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y., 2014. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation, in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Presented at the EMNLP 2014, Association for Computational Linguistics, Doha, Qatar, pp. 1724–1734. https://doi.org/10.3115/v1/D14-1179
    Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., Stoyanov, V., 2020. Unsupervised Cross-lingual Representation Learning at Scale, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Presented at the ACL 2020, Association for Computational Linguistics, Online, pp. 8440–8451. https://doi.org/10.18653/v1/2020.acl-main.747
    Devlin, J., Chang, M.-W., Lee, K., Toutanova, K., 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Presented at the NAACL-HLT 2019, Association for Computational Linguistics, Minneapolis, Minnesota, pp. 4171–4186. https://doi.org/10.18653/v1/N19-1423
    Farajian, M.A., Lopes, A.V., Martins, A.F.T., Maruf, S., Haffari, G., 2020. Findings of the WMT 2020 Shared Task on Chat Translation, in: Proceedings of the Fifth Conference on Machine Translation. Presented at the EMNLP-WMT 2020, Association for Computational Linguistics, Online, pp. 65–75.
    Gao, T., Fisch, A., Chen, D., 2021. Making Pre-trained Language Models Better Few-shot Learners, in: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Presented at the ACL-IJCNLP 2021, Association for Computational Linguistics, Online, pp. 3816–3830. https://doi.org/10.18653/v1/2021.acl-long.295
    Gu, Y., Han, X., Liu, Z., Huang, M., 2022. PPT: Pre-trained Prompt Tuning for Few-shot Learning, in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Presented at the ACL 2022, Association for Computational Linguistics, Dublin, Ireland, pp. 8410–8423. https://doi.org/10.18653/v1/2022.acl-long.576
    Haviv, A., Berant, J., Globerson, A., 2021. BERTese: Learning to Speak to BERT, in: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Presented at the EACL 2021, Association for Computational Linguistics, Online, pp. 3618–3623. https://doi.org/10.18653/v1/2021.eacl-main.316
    Hendrycks, D., Gimpel, K., 2020. Gaussian Error Linear Units (GELUs). https://doi.org/10.48550/arXiv.1606.08415
    Jiang, Z., Anastasopoulos, A., Araki, J., Ding, H., Neubig, G., 2020a. X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Presented at the EMNLP 2020, Association for Computational Linguistics, Online, pp. 5943–5959. https://doi.org/10.18653/v1/2020.emnlp-main.479
    Jiang, Z., Xu, F.F., Araki, J., Neubig, G., 2020b. How Can We Know What Language Models Know? Trans. Assoc. Comput. Linguist. 8, 423–438. https://doi.org/10.1162/tacl_a_00324
    Kingma, D.P., Ba, J., 2014. Adam: A method for stochastic optimization. ArXiv Prepr. ArXiv14126980.
    Lapata, M., Barzilay, R., 2005. Automatic evaluation of text coherence: models and representations, in: Proceedings of the 19th International Joint Conference on Artificial Intelligence, IJCAI’05. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp. 1085–1090.
    Läubli, S., Sennrich, R., Volk, M., 2018. Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Presented at the EMNLP 2018, Association for Computational Linguistics, Brussels, Belgium, pp. 4791–4796. https://doi.org/10.18653/v1/D18-1512
    Lee, S., Lee, J., Moon, H., Park, C., Seo, J., Eo, S., Koo, S., Lim, H., 2023. A Survey on Evaluation Metrics for Machine Translation. Mathematics 11, 1006. https://doi.org/10.3390/math11041006
    Lei, Y., Ren, Y., Xiong, D., 2022. CoDoNMT: Modeling Cohesion Devices for Document-Level Neural Machine Translation, in: Proceedings of the 29th International Conference on Computational Linguistics. Presented at the COLING 2022, International Committee on Computational Linguistics, Gyeongju, Republic of Korea, pp. 5205–5216.
    Lester, B., Al-Rfou, R., Constant, N., 2021. The Power of Scale for Parameter-Efficient Prompt Tuning, in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Presented at the EMNLP 2021, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, pp. 3045–3059. https://doi.org/10.18653/v1/2021.emnlp-main.243
    Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., Zettlemoyer, L., 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Presented at the ACL 2020, Association for Computational Linguistics, Online, pp. 7871–7880. https://doi.org/10.18653/v1/2020.acl-main.703
    Li, X.L., Liang, P., 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation, in: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Presented at the ACL-IJCNLP 2021, Association for Computational Linguistics, Online, pp. 4582–4597. https://doi.org/10.18653/v1/2021.acl-long.353
    Li, Y., Yin, Y., Li, J., Zhang, Y., 2022. Prompt-Driven Neural Machine Translation, in: Findings of the Association for Computational Linguistics: ACL 2022. Presented at the Findings 2022, Association for Computational Linguistics, Dublin, Ireland, pp. 2579–2590. https://doi.org/10.18653/v1/2022.findings-acl.203
    Liang, Y., Meng, F., Chen, Y., Xu, J., Zhou, J., 2021a. Modeling Bilingual Conversational Characteristics for Neural Chat Translation, in: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Presented at the ACL-IJCNLP 2021, Association for Computational Linguistics, Online, pp. 5711–5724. https://doi.org/10.18653/v1/2021.acl-long.444
    Liang, Y., Meng, F., Xu, J., Chen, Y., Zhou, J., 2022. Scheduled Multi-task Learning for Neural Chat Translation, in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Presented at the ACL 2022, Association for Computational Linguistics, Dublin, Ireland, pp. 4375–4388. https://doi.org/10.18653/v1/2022.acl-long.300
    Liang, Y., Zhou, C., Meng, F., Xu, J., Chen, Y., Su, J., Zhou, J., 2021b. Towards Making the Most of Dialogue Characteristics for Neural Chat Translation, in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Presented at the EMNLP 2021, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, pp. 67–79. https://doi.org/10.18653/v1/2021.emnlp-main.6
    Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., Neubig, G., 2022. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Comput. Surv. https://doi.org/10.1145/3560815
    Liu, X., Ji, K., Fu, Y., Tam, W., Du, Z., Yang, Z., Tang, J., 2022. P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks, in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Presented at the ACL 2022, Association for Computational Linguistics, Dublin, Ireland, pp. 61–68. https://doi.org/10.18653/v1/2022.acl-short.8
    Liu, X., Zheng, Y., Du, Z., Ding, M., Qian, Y., Yang, Z., Tang, J., 2021. GPT Understands, Too. https://doi.org/10.48550/arXiv.2103.10385
    Liu, Y., Gu, J., Goyal, N., Li, X., Edunov, S., Ghazvininejad, M., Lewis, M., Zettlemoyer, L., 2020. Multilingual Denoising Pre-training for Neural Machine Translation. Trans. Assoc. Comput. Linguist. 8, 726–742. https://doi.org/10.1162/tacl_a_00343
    Ma, S., Zhang, D., Zhou, M., 2020. A Simple and Effective Unified Encoder for Document-Level Machine Translation, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Presented at the ACL 2020, Association for Computational Linguistics, Online, pp. 3505–3511. https://doi.org/10.18653/v1/2020.acl-main.321
    Maruf, S., Martins, A.F.T., Haffari, G., 2019. Selective Attention for Context-aware Neural Machine Translation, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Presented at the NAACL-HLT 2019, Association for Computational Linguistics, Minneapolis, Minnesota, pp. 3092–3102. https://doi.org/10.18653/v1/N19-1313
    Maruf, S., Martins, A.F.T., Haffari, G., 2018. Contextual Neural Model for Translating Bilingual Multi-Speaker Conversations, in: Proceedings of the Third Conference on Machine Translation: Research Papers. Presented at the WMT 2018, Association for Computational Linguistics, Brussels, Belgium, pp. 101–112. https://doi.org/10.18653/v1/W18-6311
    Miculicich, L., Ram, D., Pappas, N., Henderson, J., 2018. Document-Level Neural Machine Translation with Hierarchical Attention Networks, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Presented at the EMNLP 2018, Association for Computational Linguistics, Brussels, Belgium, pp. 2947–2954. https://doi.org/10.18653/v1/D18-1325
    Mikolov, T., Chen, K., Corrado, G., Dean, J., 2013. Efficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.-J., 2002. Bleu: a Method for Automatic Evaluation of Machine Translation, in: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Presented at the ACL 2002, Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp. 311–318. https://doi.org/10.3115/1073083.1073135
    Petroni, F., Rocktäschel, T., Riedel, S., Lewis, P., Bakhtin, A., Wu, Y., Miller, A., 2019. Language Models as Knowledge Bases?, in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Presented at the EMNLP-IJCNLP 2019, Association for Computational Linguistics, Hong Kong, China, pp. 2463–2473. https://doi.org/10.18653/v1/D19-1250
    Poria, S., Hazarika, D., Majumder, N., Naik, G., Cambria, E., Mihalcea, R., 2019. MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Presented at the ACL 2019, Association for Computational Linguistics, Florence, Italy, pp. 527–536. https://doi.org/10.18653/v1/P19-1050
    Post, M., 2018. A Call for Clarity in Reporting BLEU Scores, in: Proceedings of the Third Conference on Machine Translation: Research Papers. Presented at the WMT 2018, Association for Computational Linguistics, Brussels, Belgium, pp. 186–191. https://doi.org/10.18653/v1/W18-6319
    Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 9.
    Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J., 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 21, 1–67.
    Ravichandran, D., Hovy, E., 2002. Learning surface text patterns for a Question Answering System, in: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Presented at the ACL 2002, Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp. 41–47. https://doi.org/10.3115/1073083.1073092
    Sanh, V., Debut, L., Chaumond, J., Wolf, T., 2020. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. https://doi.org/10.48550/arXiv.1910.01108
    Shazeer, N., 2020. GLU Variants Improve Transformer. https://doi.org/10.48550/arXiv.2002.05202
    Shin, T., Razeghi, Y., Logan IV, R.L., Wallace, E., Singh, S., 2020. AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Presented at the EMNLP 2020, Association for Computational Linguistics, Online, pp. 4222–4235. https://doi.org/10.18653/v1/2020.emnlp-main.346
    Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J., 2006. A Study of Translation Edit Rate with Targeted Human Annotation, in: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers. Presented at the AMTA 2006, Association for Machine Translation in the Americas, Cambridge, Massachusetts, USA, pp. 223–231.
    Sohn, K., Lee, H., Yan, X., 2015. Learning Structured Output Representation using Deep Conditional Generative Models, in: Advances in Neural Information Processing Systems. Curran Associates, Inc.
    Sutskever, I., Vinyals, O., Le, Q.V., 2014. Sequence to Sequence Learning with Neural Networks, in: Advances in Neural Information Processing Systems. Curran Associates, Inc.
    Tan, Z., Zhang, J., Huang, X., Chen, G., Wang, S., Sun, M., Luan, H., Liu, Y., 2020. THUMT: An Open-Source Toolkit for Neural Machine Translation, in: Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track). Presented at the AMTA 2020, Association for Machine Translation in the Americas, Virtual, pp. 116–122.
    Tan, Z., Zhang, X., Wang, S., Liu, Y., 2022. MSP: Multi-Stage Prompting for Making Pre-trained Language Models Better Translators, in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Presented at the ACL 2022, Association for Computational Linguistics, Dublin, Ireland, pp. 6131–6142. https://doi.org/10.18653/v1/2022.acl-long.424
    Tang, T., Li, J., Zhao, W.X., Wen, J.-R., 2022. Context-Tuning: Learning Contextualized Prompts for Natural Language Generation, in: Proceedings of the 29th International Conference on Computational Linguistics. Presented at the COLING 2022, International Committee on Computational Linguistics, Gyeongju, Republic of Korea, pp. 6340–6354.
    Tiedemann, J., Scherrer, Y., 2017. Neural Machine Translation with Extended Context, in: Proceedings of the Third Workshop on Discourse in Machine Translation. Association for Computational Linguistics, Copenhagen, Denmark, pp. 82–92. https://doi.org/10.18653/v1/W17-4811
    Toral, A., Castilho, S., Hu, K., Way, A., 2018. Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation, in: Proceedings of the Third Conference on Machine Translation: Research Papers. Presented at the WMT 2018, Association for Computational Linguistics, Brussels, Belgium, pp. 113–123. https://doi.org/10.18653/v1/W18-6312
    Tu, Z., Liu, Y., Shi, S., Zhang, T., 2018. Learning to Remember Translation History with a Continuous Cache. Trans. Assoc. Comput. Linguist. 6, 407–420. https://doi.org/10.1162/tacl_a_00029
    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I., 2017. Attention is All you Need, in: Advances in Neural Information Processing Systems. Curran Associates, Inc.
    Voita, E., Serdyukov, P., Sennrich, R., Titov, I., 2018. Context-Aware Neural Machine Translation Learns Anaphora Resolution, in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Presented at the ACL 2018, Association for Computational Linguistics, Melbourne, Australia, pp. 1264–1274. https://doi.org/10.18653/v1/P18-1117
    Wang, C., Wang, J., Qiu, M., Huang, J., Gao, M., 2021. TransPrompt: Towards an Automatic Transferable Prompting Framework for Few-shot Text Classification, in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Presented at the EMNLP 2021, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, pp. 2792–2802. https://doi.org/10.18653/v1/2021.emnlp-main.221
    Wang, T., Zhao, C., Wang, M., Li, L., Xiong, D., 2021. Autocorrect in the Process of Translation — Multi-task Learning Improves Dialogue Machine Translation, in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers. Presented at the NAACL-HLT 2021, Association for Computational Linguistics, Online, pp. 105–112. https://doi.org/10.18653/v1/2021.naacl-industry.14
    Wenzek, G., Lachaux, M.-A., Conneau, A., Chaudhary, V., Guzmán, F., Joulin, A., Grave, E., 2020. CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data, in: Proceedings of the Twelfth Language Resources and Evaluation Conference. Presented at the LREC 2020, European Language Resources Association, Marseille, France, pp. 4003–4012.
    Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Kaiser, Ł., Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W., Young, C., Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Corrado, G., Hughes, M., Dean, J., 2016. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. https://doi.org/10.48550/arXiv.1609.08144
    Wu, Z., Wang, S., Gu, J., Hou, R., Dong, Y., Vydiswaran, V.G.V., Ma, H., 2022. IDPG: An Instance-Dependent Prompt Generation Method, in: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Presented at the NAACL-HLT 2022, Association for Computational Linguistics, Seattle, United States, pp. 5507–5521. https://doi.org/10.18653/v1/2022.naacl-main.403
    Xiong, H., He, Z., Wu, H., Wang, H., 2019. Modeling Coherence for Discourse Neural Machine Translation. Proc. AAAI Conf. Artif. Intell. 33, 7338–7345. https://doi.org/10.1609/aaai.v33i01.33017338
    Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., Raffel, C., 2021. mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer, in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Presented at the NAACL-HLT 2021, Association for Computational Linguistics, Online, pp. 483–498. https://doi.org/10.18653/v1/2021.naacl-main.41
    Zhang, J., Luan, H., Sun, M., Zhai, F., Xu, J., Zhang, M., Liu, Y., 2018. Improving the Transformer Translation Model with Document-Level Context, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Presented at the EMNLP 2018, Association for Computational Linguistics, Brussels, Belgium, pp. 533–542. https://doi.org/10.18653/v1/D18-1049

    QR CODE
    :::