跳到主要內容

簡易檢索 / 詳目顯示

研究生: 呂昕恩
Sin-En Lu
論文名稱: 基於台語與華語之語碼混合資料集與翻譯模型
Hokkien-Mandarin Code-Mixing Dataset and Neural Machine Translation
指導教授: 蔡宗翰
Richard Tzong-Han Tsai
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系在職專班
Executive Master of Computer Science & Information Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 96
中文關鍵詞: 語碼混合機器翻譯損失函數重構低資源語言
外文關鍵詞: Code-Mixing, Loss Function Reconstruction
相關次數: 點閱:29下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 台語與中文語碼混合在台灣是一個常見的口語現象,然而台灣遲至 21 世紀才開始建立官方書寫系統。缺少官方書寫系統,不僅代表著我們在 NLP 領域面臨資源不足的問題,導致我們在方言代碼混合任務上難以取得突破性研究,更意味著我們面臨著語言傳承的困難。基於上述問題,本研究將從簡要介紹台語的歷史以及台灣語碼混合現象著手,討論台灣語碼混合的語言比例組成與文法結構,建立基於台文字的台語語華語之語碼混合資料集,並介紹可應用於台文的現有斷詞工具。
    同時我們將在本研究介紹台語語言模型的訓練方法,同時使用我們提出的資料集,利用 XLM 開發台語語碼混合翻譯模型。

    為適用於語碼混合的情境,我們提出自動化語言標注(DLI)機制,並使用遷移學習提升翻譯模型表現。
    最後我們根據交叉熵(Cross-entropy, CE)的問題,提出三種利用詞彙詞相似度來重構損失函數。我們提出 WBI 機制,解決詞彙資訊與字符集預訓練模型不相容的問題,並引入 WordNet 知識在模型中。與標準 CE 相比,在單語和語碼混資料集的實驗結果表明,我們的最佳損失函數在單語和 CM 在 BLEU 上,分別進步 2.42分(62.11 到 64.53)和 0.7(62.86 到 63.56)分。我們的實驗證明即使是使用基於字符訓練的語言模型,我們可以將辭彙的資訊攜帶到下游任務中。


    Code-mixing is a complicated task in Natural Language Processing (NLP), especially for mixed languages are dialects. In Taiwan, code-mixing is a common phenomenon. The most popular code-mixed language pair is Hokkien and Mandarin. However, there is a lack of resources in Hokkien. Therefore, we proposed a Hokkien-Mandarin code-mixing dataset and offered the efficient Hokkien word segment method through an open-source toolkit. These could overcome the morphology issue under the Sino-Tibetan family. We modify an XLM model (cross-lingual language model) with the dynamic language identify (DLI) mechanism and use transfer learning to train our proposed dataset on translation tasks. We found that by applying language knowledge, rules and offering the language tags, the model achieves good performance on code-mixing data translation results and maintains the quality of monolingual translation.

    Recently, most neural machine translation models (NMT) use cross-entropy as the loss function, including XLM model. However, standard cross-entropy penalizes the model when it fails to generate ground truth answers, eliminating the opportunity to consider other possibilities. It can cause problems with \textit{overcorrection} or \textit{over-confident}. Some solutions to reconstruct the loss function using word similarity have been proposed. But these solutions are not suitable for Chinese because most Chinese models are pre-trained at the character level. In this work, we propose a simple but effective method, Word Boundary Insertion (WBI), to address the inconsistency between word-level and character-level by reconstructing the loss function of Chinese NMT models. WBI considers word similarity without modifying or retraining a new language model. We propose three modified loss functions for use with XLM, and the calculation of these loss functions also refers to the WordNet. Compared with the standard cross-entropy, experimental results on both monolingual and code-mixing (code-mixing) Hokkien-Chinese datasets show that our best loss function achieves BLEU score improvements of 2.42 (62.11 to 64.53) and 0.7 (62.86 to 63.56) on monolingual and code-mixing data, respectively.

    中文摘要 v Abstract vi 致謝 viii Contents x List of Figures xiii List of Tables xiv 1 Introduction 1 1.1 Motivate . 1 1.1.1 Code-Mixing 1 1.1.2 Neural Machine Translation 2 1.2 Goal . 3 2 Background of Taiwanese Hokkien 5 2.1 History of Taiwanese Hokkien . 5 2.2 Taiwanese Hokkien Writing System and Difficulties 6 2.3 Difficulties in Written Taiwanese Hokkien . 9 2.3.1 Ambiguous Word Boundary in Written Taiwanese Hokkien 9 2.3.2 Literary and Colloquial Readings . 10 2.3.3 Insufficient Resource 11 x 2.4 Recap 11 3 Related Work 12 3.1 Code-mixing Theory 12 3.2 Code-Mixing Research in Taiwanese Hokkien 14 3.3 Code-Mixed Corpus 16 3.4 Code-Mixed Translation 18 3.5 Pre-trained Language Models . 20 3.6 Transfer Learning . 22 3.7 Chinese Language Model and Machine Translation 23 3.8 Recap 24 4 Dataset and Evaluation 25 4.1 Data source . 25 4.2 Synthetic Hokkien-Madarin Code-mixed Data . 26 4.2.1 Problems in using Chinese Toolkit . 27 4.2.2 Articut : Solution of Hokkien Word Segment . 29 4.2.3 ( CHECK DONE ) Synthetic Approach 31 4.3 Data Analysis 32 4.3.1 Human Scoring . 33 4.3.2 Code-Mixied Complexity . 35 4.3.3 Inter-rater score . 35 4.4 Recap 36 5 Hokkien Language Model and Translation System 37 5.1 Assumption . 37 5.2 Hokkien Language Model . 37 5.3 XLM Model . 39 5.3.1 Dynamic Language Identify and AutoEncoder 40 5.3.2 Transfer Learning 41 5.4 Loss Function 42 xi 5.4.1 Word-Boundary Insertion (WBI) . 43 5.4.2 Word Similarity . 44 5.4.3 Loss Function Modification 45 5.5 Recap 48 6 Experiment and Result 49 6.1 Dataset, Baseline, and Evaluation Metrics . 49 6.2 Experiment Settings . 50 6.3 Metrics . 50 6.4 Hokkien Language Model and XLM 51 6.5 Dynamic Language Identity and AutoEncoder . 52 6.6 Transfer Learning 53 6.7 Loss function 53 6.8 Recap 55 7 Loss Function Analysis 56 7.1 Similar-word Configuration . 56 7.2 Proposed Loss Function . 56 7.3 Case Study . 57 7.3.1 Similar-word Configuration 57 7.3.2 Proposed Loss Function 59 7.4 Recap 60 8 Conclusion and Future Work 61 8.1 Conclusion . 61 8.2 Future Work . 62 Bibliography 64

    [1] S. Rijhwani, R. Sequiera, M. C. Choudhury, and K. Bali, “Translating codemixed
    tweets: A language detection based system,” in 3rd workshop on Indian language
    data resource and evaluation-WILDRE-3, 2016, pp. 81–82.
    [2] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u.
    Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural
    Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach,
    R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates,
    Inc., 2017. [Online]. Available: https://proceedings.neurips.cc/paper/2017/file/
    3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
    [3] M. A. Hedderich, L. Lange, H. Adel, J. Strötgen, and D. Klakow, “A survey
    on recent approaches for natural language processing in low-resource scenarios,”
    in Proceedings of the 2021 Conference of the North American Chapter of
    the Association for Computational Linguistics: Human Language Technologies.
    Online: Association for Computational Linguistics, Jun. 2021, pp. 2545–2568.
    [Online]. Available: https://aclanthology.org/2021.naacl-main.201
    [4] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep
    bidirectional transformers for language understanding,” in Proceedings of the 2019
    Conference of the North American Chapter of the Association for Computational
    Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).
    Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019,
    pp. 4171–4186. [Online]. Available: https://aclanthology.org/N19-1423
    64
    [5] G. Lample and A. Conneau, “Cross-lingual language model pretraining,” Advances
    in Neural Information Processing Systems (NeurIPS), 2019.
    [6] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
    [7] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on
    Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–1359, 2010.
    [8] B. Zoph, D. Yuret, J. May, and K. Knight, “Transfer learning for low-resource
    neural machine translation,” in Proceedings of the 2016 Conference on Empirical
    Methods in Natural Language Processing. Austin, Texas: Association for
    Computational Linguistics, Nov. 2016, pp. 1568–1575. [Online]. Available:
    https://aclanthology.org/D16-1163
    [9] R. Wang, X. Tan, R. Luo, T. Qin, and T.-Y. Liu, “A survey on low-resource
    neural machine translation,” in Proceedings of the Thirtieth International Joint
    Conference on Artificial Intelligence, IJCAI-21, Z.-H. Zhou, Ed. International
    Joint Conferences on Artificial Intelligence Organization, 8 2021, pp. 4636–4643,
    survey Track. [Online]. Available: https://doi.org/10.24963/ijcai.2021/629
    [10] M. Johnson, M. Schuster, Q. V. Le, M. Krikun, Y. Wu, Z. Chen, N. Thorat,
    F. Viégas, M. Wattenberg, G. Corrado, M. Hughes, and J. Dean, “Google’s
    multilingual neural machine translation system: Enabling zero-shot translation,”
    Transactions of the Association for Computational Linguistics, vol. 5, pp. 339–351,
    2017. [Online]. Available: https://aclanthology.org/Q17-1024
    [11] T. Pires, E. Schlinger, and D. Garrette, “How multilingual is multilingual BERT?”
    in Proceedings of the 57th Annual Meeting of the Association for Computational
    Linguistics. Florence, Italy: Association for Computational Linguistics, Jul.
    2019, pp. 4996–5001. [Online]. Available: https://aclanthology.org/P19-1493
    65
    [12] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural
    networks,” in Advances in neural information processing systems, 2014, pp. 3104–
    3112.
    [13] P.-T. De Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein, “A tutorial on the
    cross-entropy method,” Annals of operations research, vol. 134, no. 1, pp. 19–67,
    2005.
    [14] O. Kovaleva, A. Rumshisky, and A. Romanov, “Similarity-based reconstruction
    loss for meaning representation,” in Proceedings of the 2018 Conference
    on Empirical Methods in Natural Language Processing. Brussels, Belgium:
    Association for Computational Linguistics, Oct.-Nov. 2018, pp. 4875–4880.
    [Online]. Available: https://aclanthology.org/D18-1525
    [15] R. Müller, S. Kornblith, and G. E. Hinton, “When does label smoothing help?” in
    Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle,
    A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32. Curran
    Associates, Inc., 2019. [Online]. Available: https://proceedings.neurips.cc/paper/
    2019/file/f1748d6b0fd9d439f71450117eba2725-Paper.pdf
    [16] W. Zhang, Y. Feng, F. Meng, D. You, and Q. Liu, “Bridging the gap between
    training and inference for neural machine translation,” in Proceedings of the 57th
    Annual Meeting of the Association for Computational Linguistics. Florence, Italy:
    Association for Computational Linguistics, Jul. 2019, pp. 4334–4343. [Online].
    Available: https://aclanthology.org/P19-1426
    [17] M.-L. Chen, “Code-switching in mandarin and taiwan southern min:a case study
    of two tv talk shows,” Linguistics at National Tsing Hua University, pp. 1–131,
    2008.
    [18] H.-N. Yeh, H. chen Chan, and Y. show Cheng, “Language use in taiwan: Language proficiency and domain analysis,” in Journal of Taiwan Normal University:Humanities & Social Sciences, 2004, 49(1), 75-10, 2004.
    66
    [19] C. Mu-Chen, “白 話 字 的 起 源 與 在 台 灣 的 發 展,” 2015. [Online]. Available: https://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dnclcdr&s=
    id=%22104NTNU5642004%22.&searchmode=basic
    [20] T.-S. Tan, “Tâi-oân-ōe án-tsuánn 寫-現今台灣話書寫系統 ê 整合過程 kah
    各派立場 ê 比較分析,” 2019. [Online]. Available: http://rportal.lib.ntnu.edu.tw:
    80/handle/20.500.12235/110446
    [21] Y.-F. Liao, C.-Y. Chang, H.-K. Tiun, H.-L. Su, H.-L. Khoo, J. S. Tsay, L.-K.
    Tan, P. Kang, T.-g. Thiann, U.-G. Iunn, J.-H. Yang, and C.-N. Liang, “Formosa
    speech recognition challenge 2020 and taiwanese across taiwan corpus,” in 2020
    23rd Conference of the Oriental COCOSDA International Committee for the Coordination and Standardisation of Speech Databases and Assessment Techniques
    (O-COCOSDA), 2020, pp. 65–70.
    [22] 中央研究院語言學研究所, Ed., Yu yan zheng ce de duo yuan wen hua si kao /
    Zheng Jinquan [and others] bian ji., chu ban. ed., ser. Yu yan, she hui yu wen hua
    xi lie cong shu ; 2. Taibei Shi: Zhong yang yan jiu yuan yu yuan xue yan jiu suo,
    2007.
    [23] Y. 楊晏彰, “台灣人出現華語與台語語碼轉換的現狀與發生契機之研究 ― 以
    電視節目為分析語料,” Journal of Taiwan studies, Takushoku University, vol. 5,
    pp. 91–118, mar 2021.
    [24] 潘惠華, “Localization of language usage in ethnic chinese media and hybridity in
    southern min soap operas and movies of taiwan, singapore, and china,” 臺灣國際
    研究季刊, vol. 12, no. 3, pp. 173–206, 2016.
    [25] P.-H. Ho, Code-Mixing of Taiwan Mandarin and Southern-Min: A Case Study of
    the Use of Hybrid Words, 2020. [Online]. Available: https://books.google.com.tw/
    books?id=R516zgEACAAJ
    [26] H. F. Yang, “Wén bái yì dú 文白異讀 (literary and colloquial readings),” Dec 2015.
    [Online]. Available: https://referenceworks.brillonline.com/entries/encyclopedia67
    of-chinese-language-and-linguistics/wen-bai-yi-du-literary-and-colloquialreadings-COM_00000446
    [27] C. Myers-Scotton, Duelling languages: Grammatical structure in codeswitching.
    Oxford University Press, 1997.
    [28] E. Mcclure, “Aspects of code-switching in the discourse of bilingual mexicanamerican children. technical report no. 44.” 1977.
    [29] C. Hoffmann, An Introduction to Bilingualism, ser. Longman linguistics library.
    Longman, 1991. [Online]. Available: https://books.google.com.tw/books?id=
    XUuxngEACAAJ
    [30] D. M. Lance, “The codes of the spanish-english bilingual,” TESOL Quarterly, pp.
    343–351, 1970.
    [31] A. J. AGUIRRE, “An experimental study of code alternation,” 1985.
    [32] E. G. Bokamba, “Code-mixing, language variation, and linguistic theory:: Evidence from bantu languages,” Lingua, vol. 76, no. 1, pp. 21–62, 1988.
    [33] C. Myers-Scotton, “Common and uncommon ground: Social and structural factors
    in codeswitching,” Language in society, vol. 22, no. 4, pp. 475–503, 1993.
    [34] S. N. Sridhar and K. K. Sridhar, “The syntax and psycholinguistics of bilingual
    code mixing.” Canadian Journal of Psychology/Revue canadienne de psychologie,
    vol. 34, no. 4, p. 407, 1980.
    [35] L. A. Timm, “Spanish-english code-switching: el porque y how-not-to,” Romance
    philology, vol. 28, no. 4, pp. 473–482, 1975.
    [36] S. Poplack, “Sometimes i'll start a sentence in spanish y termino en espanol: toward
    a typology of code-switching1,” 1980.
    [37] C. W. Pfaff, “Constraints on language mixing: Intrasentential code-switching and
    borrowing in spanish/english,” Language, pp. 291–318, 1979.
    68
    [38] S. Poplack, Syntactic structure and social function of code-switching. Centro de
    Estudios Puertorriqueños,[City University of New York], 1978, vol. 2.
    [39] A. Joshi, “Processing of sentences with intrasentential code-switching,” in Coling
    1982: Proceedings of the Ninth International Conference on Computational Linguistics, 1982.
    [40] A.-M. Di Sciullo, P. Muysken, and R. Singh, “Government and code-mixing1,”
    Journal of linguistics, vol. 22, no. 1, pp. 1–24, 1986.
    [41] H. M. Belazi, E. J. Rubin, and A. J. Toribio, “Code switching and x-bar theory: The
    functional head constraint,” Linguistic inquiry, pp. 221–237, 1994.
    [42] S. chen Chang, “Code-mixing of english and taiwanese in mandarin discourse,”
    2001.
    [43] A.-C. SUN, “The language interference in the taiwanese and mandarin contact from
    the ”fiery thunderbolt” drama series,” 2019.
    [44] Y. Shih and Z. Su, “A study of mandarin code-mixing in taiwanese speech,” in
    First International Symposium on Languages in Taiwan, and then collected in The
    Proceedings of the Symposium, 1995, pp. 731–767.
    [45] Y.-L. Wu, C.-W. Hsieh, W.-H. Lin, C.-Y. Liu, and L.-C. Yu, “Unknown word
    extraction from multilingual code-switching sentences in Chinese,” in ROCLING
    2011 Poster Papers. Taipei, Taiwan: The Association for Computational
    Linguistics and Chinese Language Processing (ACLCLP), Sep. 2011, pp.
    349–360. [Online]. Available: https://aclanthology.org/O11-2013
    [46] Y.-s. Cheng, “A preliminar synctactic study on mandarin/taiwanese code switching,” Unpublished MA thesis. National Taiwan Normal University, 1989.
    [47] S. Schuster, S. Gupta, R. Shah, and M. Lewis, “Cross-lingual transfer learning
    for multilingual task oriented dialog,” in Proceedings of the 2019 Conference of
    the North American Chapter of the Association for Computational Linguistics:
    69
    Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis,
    Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 3795–3805.
    [Online]. Available: https://aclanthology.org/N19-1380
    [48] T. D. Singh and T. Solorio, “Towards translating mixed-code comments from social
    media,” in Computational Linguistics and Intelligent Text Processing, A. Gelbukh,
    Ed. Cham: Springer International Publishing, 2018, pp. 457–468.
    [49] B. G. Patra, D. Das, and A. Das, “Sentiment analysis of code-mixed indian
    languages: An overview of sail_code-mixed shared task @icon-2017,” CoRR, vol.
    abs/1803.06745, 2018. [Online]. Available: http://arxiv.org/abs/1803.06745
    [50] S. Lee and Z. Wang, “Emotion in code-switching texts: Corpus construction and
    analysis,” in Proceedings of the Eighth SIGHAN Workshop on Chinese Language
    Processing. Beijing, China: Association for Computational Linguistics, Jul.
    2015, pp. 91–99. [Online]. Available: https://aclanthology.org/W15-3116
    [51] A. Sharma, S. Gupta, R. Motlani, P. Bansal, M. Shrivastava, R. Mamidi, and D. M.
    Sharma, “Shallow parsing pipeline - Hindi-English code-mixed social media text,”
    in Proceedings of the 2016 Conference of the North American Chapter of the
    Association for Computational Linguistics: Human Language Technologies. San
    Diego, California: Association for Computational Linguistics, Jun. 2016, pp.
    1340–1345. [Online]. Available: https://aclanthology.org/N16-1159
    [52] S. Banerjee, N. Moghe, S. Arora, and M. M. Khapra, “A dataset for building
    code-mixed goal oriented conversation systems,” in Proceedings of the 27th
    International Conference on Computational Linguistics. Santa Fe, New Mexico,
    USA: Association for Computational Linguistics, Aug. 2018, pp. 3766–3780.
    [Online]. Available: https://aclanthology.org/C18-1319
    [53] K. Singh, I. Sen, and P. Kumaraguru, “A Twitter corpus for Hindi-English
    code mixed POS tagging,” in Proceedings of the Sixth International Workshop
    on Natural Language Processing for Social Media. Melbourne, Australia:
    70
    Association for Computational Linguistics, Jul. 2018, pp. 12–17. [Online].
    Available: https://aclanthology.org/W18-3503
    [54] M. Dhar, V. Kumar, and M. Shrivastava, “Enabling code-mixed translation:
    Parallel corpus creation and MT augmentation approach,” in Proceedings of the
    First Workshop on Linguistic Resources for Natural Language Processing. Santa
    Fe, New Mexico, USA: Association for Computational Linguistics, Aug. 2018,
    pp. 131–140. [Online]. Available: https://aclanthology.org/W18-3817
    [55] B. R. Chakravarthi, V. Muralidaran, R. Priyadharshini, and J. P. McCrae,
    “Corpus creation for sentiment analysis in code-mixed Tamil-English text,”
    in Proceedings of the 1st Joint Workshop on Spoken Language Technologies
    for Under-resourced languages (SLTU) and Collaboration and Computing
    for Under-Resourced Languages (CCURL). Marseille, France: European
    Language Resources association, May 2020, pp. 202–210. [Online]. Available:
    https://aclanthology.org/2020.sltu-1.28
    [56] R. Xiang, M. Wan, Q. Su, C.-R. Huang, and Q. Lu, “Sina Mandarin alphabetical
    words:a web-driven code-mixing lexical resource,” in Proceedings of the 1st
    Conference of the Asia-Pacific Chapter of the Association for Computational
    Linguistics and the 10th International Joint Conference on Natural Language
    Processing. Suzhou, China: Association for Computational Linguistics, Dec.
    2020, pp. 833–842. [Online]. Available: https://aclanthology.org/2020.aaclmain.84
    [57] V. Srivastava and M. Singh, “PHINC: A parallel Hinglish social media
    code-mixed corpus for machine translation,” in Proceedings of the Sixth
    Workshop on Noisy User-generated Text (W-NUT 2020). Online: Association
    for Computational Linguistics, Nov. 2020, pp. 41–49. [Online]. Available:
    https://aclanthology.org/2020.wnut-1.7
    [58] A. Pratapa, G. Bhat, M. Choudhury, S. Sitaram, S. Dandapat, and K. Bali,
    “Language modeling for code-mixing: The role of linguistic theory based
    71
    synthetic data,” in Proceedings of the 56th Annual Meeting of the Association
    for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia:
    Association for Computational Linguistics, Jul. 2018, pp. 1543–1553. [Online].
    Available: https://aclanthology.org/P18-1143
    [59] C.-T. Chang, S.-P. Chuang, and H. yi Lee, “Code-switching sentence generation
    by generative adversarial networks and its application to data augmentation,” in
    INTERSPEECH, 2019.
    [60] Y. Gao, J. Feng, Y. Liu, L. Hou, X. Pan, and Y. Ma, “Code-switching sentence
    generation by bert and generative adversarial networks.” in INTERSPEECH, 2019,
    pp. 3525–3529.
    [61] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
    A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural
    information processing systems, vol. 27, 2014.
    [62] L. Yu, W. Zhang, J. Wang, and Y. Yu, “Seqgan: Sequence generative adversarial nets with policy gradient,” in Proceedings of the AAAI conference on artificial
    intelligence, vol. 31, no. 1, 2017.
    [63] B. Samanta, S. Reddy, H. Jagirdar, N. Ganguly, and S. Chakrabarti, “A deep
    generative model for code switched text,” in Proceedings of the Twenty-Eighth
    International Joint Conference on Artificial Intelligence, IJCAI-19. International
    Joint Conferences on Artificial Intelligence Organization, 7 2019, pp. 5175–5181.
    [Online]. Available: https://doi.org/10.24963/ijcai.2019/719
    [64] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint
    arXiv:1312.6114, 2013.
    [65] G. I. Winata, A. Madotto, C.-S. Wu, and P. Fung, “Code-switching language
    modeling using syntax-aware multi-task learning,” in Proceedings of the
    Third Workshop on Computational Approaches to Linguistic Code-Switching.
    72
    Melbourne, Australia: Association for Computational Linguistics, Jul. 2018, pp.
    62–67. [Online]. Available: https://aclanthology.org/W18-3207
    [66] ——, “Code-switched language models using neural based synthetic data from
    parallel sentences,” in Proceedings of the 23rd Conference on Computational
    Natural Language Learning (CoNLL). Hong Kong, China: Association
    for Computational Linguistics, Nov. 2019, pp. 271–280. [Online]. Available:
    https://aclanthology.org/K19-1026
    [67] D. Gupta, A. Ekbal, and P. Bhattacharyya, “A semi-supervised approach to
    generate the code-mixed text using pre-trained encoder and transfer learning,”
    in Findings of the Association for Computational Linguistics: EMNLP 2020.
    Online: Association for Computational Linguistics, Nov. 2020, pp. 2267–2280.
    [Online]. Available: https://aclanthology.org/2020.findings-emnlp.206
    [68] A. Gupta, A. Vavre, and S. Sarawagi, “Training data augmentation for code-mixed
    translation,” in Proceedings of the 2021 Conference of the North American
    Chapter of the Association for Computational Linguistics: Human Language
    Technologies. Online: Association for Computational Linguistics, Jun. 2021, pp.
    5760–5766. [Online]. Available: https://aclanthology.org/2021.naacl-main.459
    [69] D. Gautam, P. Kodali, K. Gupta, A. Goel, M. Shrivastava, and P. Kumaraguru,
    “CoMeT: Towards code-mixed translation using parallel monolingual sentences,”
    in Proceedings of the Fifth Workshop on Computational Approaches to Linguistic
    Code-Switching. Online: Association for Computational Linguistics, Jun. 2021,
    pp. 47–55. [Online]. Available: https://aclanthology.org/2021.calcs-1.7
    [70] R. M. K. Sinha and A. Thakur, “Machine translation of bi-lingual HindiEnglish (Hinglish) text,” in Proceedings of Machine Translation Summit X:
    Papers, Phuket, Thailand, Sep. 13-15 2005, pp. 149–156. [Online]. Available:
    https://aclanthology.org/2005.mtsummit-papers.20
    73
    [71] T. D. Singh and T. Solorio, “Towards translating mixed-code comments from social
    media,” in Computational Linguistics and Intelligent Text Processing, A. Gelbukh,
    Ed. Cham: Springer International Publishing, 2018, pp. 457–468.
    [72] S. K. Mahata, S. Mandal, D. Das, and S. Bandyopadhyay, “Code-mixed to monolingual translation framework,” in Proceedings of the 11th Forum for Information
    Retrieval Evaluation, 2019, pp. 30–35.
    [73] Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun,
    Y. Cao, Q. Gao, K. Macherey et al., “Google’s neural machine translation system: Bridging the gap between human and machine translation,” arXiv preprint
    arXiv:1609.08144, 2016.
    [74] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation,” California Univ San Diego La Jolla Inst for Cognitive
    Science, Tech. Rep., 1985.
    [75] Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” IEEE transactions on neural networks, vol. 5, no. 2, pp.
    157–166, 1994.
    [76] R. Sennrich, B. Haddow, and A. Birch, “Neural machine translation of rare words
    with subword units,” in Proceedings of the 54th Annual Meeting of the Association
    for Computational Linguistics (Volume 1: Long Papers). Berlin, Germany:
    Association for Computational Linguistics, Aug. 2016, pp. 1715–1725. [Online].
    Available: https://aclanthology.org/P16-1162
    [77] S. Wu and M. Dredze, “Beto, bentz, becas: The surprising cross-lingual
    effectiveness of BERT,” in Proceedings of the 2019 Conference on Empirical
    Methods in Natural Language Processing and the 9th International Joint
    Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong,
    China: Association for Computational Linguistics, Nov. 2019, pp. 833–844.
    [Online]. Available: https://aclanthology.org/D19-1077
    74
    [78] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy,
    V. Stoyanov, and L. Zettlemoyer, “BART: Denoising sequence-to-sequence
    pre-training for natural language generation, translation, and comprehension,” in
    Proceedings of the 58th Annual Meeting of the Association for Computational
    Linguistics. Online: Association for Computational Linguistics, Jul. 2020, pp.
    7871–7880. [Online]. Available: https://aclanthology.org/2020.acl-main.703
    [79] L. Torrey and J. Shavlik, “Transfer learning,” in Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI
    global, 2010, pp. 242–264.
    [80] S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman, “Indexing by
    latent semantic analysis.” Journal of the American Society for Information Science
    41, pp. 391–407, 1990.
    [81] P. F. Brown, V. J. Della Pietra, P. V. deSouza, J. C. Lai, and R. L.
    Mercer, “Class-based n-gram models of natural language,” Computational
    Linguistics, vol. 18, no. 4, pp. 467–480, 1992. [Online]. Available: https:
    //www.aclweb.org/anthology/J92-4003
    [82] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
    [83] S. Ruder, M. E. Peters, S. Swayamdipta, and T. Wolf, “Transfer learning in natural
    language processing,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials, 2019, pp.
    15–18.
    [84] N. Farra, “Cross-lingual and low-resource sentiment analysis,” Ph.D. dissertation,
    Columbia University, 2019.
    [85] S. Ruder, I. Vulić, and A. Søgaard, “A survey of cross-lingual word embedding
    models,” Journal of Artificial Intelligence Research, vol. 65, pp. 569–631, 2019.
    75
    [86] J. Gu, H. Hassan, J. Devlin, and V. O. Li, “Universal neural machine translation for
    extremely low resource languages,” arXiv preprint arXiv:1802.05368, 2018.
    [87] Y. Wang, L. Cui, and Y. Zhang, “Does Chinese BERT encode word structure?” in
    Proceedings of the 28th International Conference on Computational Linguistics.
    Barcelona, Spain (Online): International Committee on Computational Linguistics,
    Dec. 2020, pp. 2826–2836. [Online]. Available: https://aclanthology.org/2020.
    coling-main.254
    [88] W. Liu, X. Fu, Y. Zhang, and W. Xiao, “Lexicon enhanced Chinese sequence
    labeling using BERT adapter,” in Proceedings of the 59th Annual Meeting of
    the Association for Computational Linguistics and the 11th International Joint
    Conference on Natural Language Processing (Volume 1: Long Papers). Online:
    Association for Computational Linguistics, Aug. 2021, pp. 5847–5858. [Online].
    Available: https://aclanthology.org/2021.acl-long.454
    [89] M. Zhang, Z. Li, G. Fu, and M. Zhang, “Syntax-enhanced neural machine
    translation with syntax-aware word representations,” in Proceedings of the 2019
    Conference of the North American Chapter of the Association for Computational
    Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).
    Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019,
    pp. 1151–1161. [Online]. Available: https://aclanthology.org/N19-1118
    [90] H. Xu, Z. Chen, S. Wang, and X. Jiang, Chinese NER Using ALBERT and MultiWord Information. New York, NY, USA: Association for Computing Machinery,
    2021, p. 141–145. [Online]. Available: https://doi.org/10.1145/3472634.3472667
    [91] D. Teng, L. Qin, W. Che, S. Zhao, and T. Liu, “Injecting word information with
    multi-level word adapter for chinese spoken language understanding,” in ICASSP
    2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 8188–8192.
    76
    [92] Z. Zhang, X. Han, Z. Liu, X. Jiang, M. Sun, and Q. Liu, “ERNIE: Enhanced
    language representation with informative entities,” in Proceedings of the 57th
    Annual Meeting of the Association for Computational Linguistics. Florence, Italy:
    Association for Computational Linguistics, Jul. 2019, pp. 1441–1451. [Online].
    Available: https://aclanthology.org/P19-1139
    [93] S. Diao, J. Bai, Y. Song, T. Zhang, and Y. Wang, “ZEN: Pre-training
    Chinese text encoder enhanced by n-gram representations,” in Findings of the
    Association for Computational Linguistics: EMNLP 2020. Online: Association
    for Computational Linguistics, Nov. 2020, pp. 4729–4740. [Online]. Available:
    https://aclanthology.org/2020.findings-emnlp.425
    [94] Z. Zhang, X. Han, H. Zhou, P. Ke, Y. Gu, D. Ye, Y. Qin, Y. Su, H. Ji, J. Guan, F. Qi,
    X. Wang, Y. Zheng, G. Zeng, H. Cao, S. Chen, D. Li, Z. Sun, Z. Liu, M. Huang,
    W. Han, J. Tang, J. Li, X. Zhu, and M. Sun, “Cpm: A large-scale generative chinese
    pre-trained language model,” AI Open, vol. 2, pp. 93–99, 2021. [Online]. Available:
    https://www.sciencedirect.com/science/article/pii/S266665102100019X
    [95] Z. Ke, L. Shi, S. Sun, E. Meng, B. Wang, and X. Qiu, “Pre-training
    with meta learning for Chinese word segmentation,” in Proceedings of the
    2021 Conference of the North American Chapter of the Association for
    Computational Linguistics: Human Language Technologies. Online: Association
    for Computational Linguistics, Jun. 2021, pp. 5514–5523. [Online]. Available:
    https://aclanthology.org/2021.naacl-main.436
    [96] J. Su, “Wobert: Word-based chinese bert model - zhuiyiai,” Tech. Rep., 2020.
    [Online]. Available: https://github.com/ZhuiyiTechnology/WoBERT
    [97] L. Zhang and M. Komachi, “Neural machine translation of logographic language
    using sub-character level information,” in Proceedings of the Third Conference
    on Machine Translation: Research Papers. Brussels, Belgium: Association
    for Computational Linguistics, Oct. 2018, pp. 17–25. [Online]. Available:
    https://aclanthology.org/W18-6303
    77
    [98] W. Lu, L. Zhou, G. Liu, and Q. Zhang, “A mixed learning objective for
    neural machine translation,” in Proceedings of the 19th Chinese National
    Conference on Computational Linguistics. Haikou, China: Chinese Information
    Processing Society of China, Oct. 2020, pp. 974–983. [Online]. Available:
    https://aclanthology.org/2020.ccl-1.90
    [99] T. Tang, Minnan yu yu fa yan jiu shi lun, chu ban ed., ser. Xian dai yu yan xue lun
    cong. Taiwan xue sheng shu ju, 1999.
    [100] W.-j. Wang, C.-j. Chen, C.-m. Lee, C.-y. Lai, and H.-h. Lin, “Articut: Chinese Word
    Segmentation and POS Tagging System,” 2021.
    [101] N. Chomsky, “Remarks on nominalization. ra jacobs & ps rosembaum (eds.), readings in english transformational grammar,” 1970.
    [102] ——, “A minimalist program for linguistic theory,” The view from Building 20:
    Essays in linguistics in honor of Sylvain Bromberger, 1993.
    [103] S. Khanuja, S. Dandapat, S. Sitaram, and M. Choudhury, “A new dataset
    for natural language inference from code-mixed conversations,” arXiv preprint
    arXiv:2004.05051, 2020.
    [104] S. Ghosh, S. Ghosh, and D. Das, “Complexity metric for code-mixed social media
    text,” Computación y Sistemas, vol. 21, 07 2017.
    [105] B. Gambäck and A. Das, “Comparing the level of code-switching in corpora,”
    in Proceedings of the Tenth International Conference on Language Resources
    and Evaluation (LREC’16). Portorož, Slovenia: European Language Resources
    Association (ELRA), May 2016, pp. 1850–1855. [Online]. Available: https:
    //aclanthology.org/L16-1292
    [106] J. L. Fleiss and J. Cohen, “The equivalence of weighted kappa and the intraclass
    correlation coefficient as measures of reliability,” Educational and psychological
    measurement, vol. 33, no. 3, pp. 613–619, 1973.
    78
    [107] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with
    subword information,” Transactions of the Association for Computational Linguistics, vol. 5, pp. 135–146, 2017.
    [108] S. Wang and F. Bond, “Building the chinese open wordnet (cow): Starting from core
    synsets,” in Sixth International Joint Conference on Natural Language Processing,
    2013, pp. 10–18.
    [109] C.-R. Huang, S.-K. Hsieh, J.-F. Hong, Y.-Z. Chen, I.-L. Su, Y.-X. Chen, and
    S.-W. Huang, “Chinese wordnet: Design and implementation of a cross-lingual
    knowledge processing infrastructure,” Journal of Chinese Information Processing,
    vol. 24, no. 2, pp. 14–23, 2010, (in Chinese).
    [110] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a method for automatic
    evaluation of machine translation,” in Proceedings of the 40th Annual Meeting of
    the Association for Computational Linguistics. Philadelphia, Pennsylvania, USA:
    Association for Computational Linguistics, Jul. 2002, pp. 311–318. [Online].
    Available: https://aclanthology.org/P02-1040
    [111] T. Zhang*, V. Kishore*, F. Wu*, K. Q. Weinberger, and Y. Artzi, “Bertscore:
    Evaluating text generation with bert,” in International Conference on Learning
    Representations, 2020. [Online]. Available: https://openreview.net/forum?id=
    SkeHuCVFDr

    QR CODE
    :::