跳到主要內容

簡易檢索 / 詳目顯示

研究生: 黃晉豪
Jin-Hao Huang
論文名稱: 以注意力機制輔助文本分類中的資料增益
指導教授: 林熙禎
She-Jen Lin
口試委員:
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理學系
Department of Information Management
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 70
中文關鍵詞: 資料增益文本分類自然語言處理注意力機制
外文關鍵詞: Data augmentation, Attention Mechanism, Text classification, Natural language processing
相關次數: 點閱:9下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 資料增益(Data Augmentation)在許多研究領域中都顯示對於模型預測的準確率有幫助,然而在自然語言處理中,資料增益方式大多都具隨機性,或者在增益前需要許多語言的先備知識,因此本研究提出以注意力機制作為輔助的資料增益方法,以進行資料增益,探討注意力機制對於文本分類中的資料增益是否有影響。最後在實驗結果中證實本研究提出之方法,可有效提升分類器在分類上之準確度,在僅有500筆資料量下提升分類器10%的分類準確度。


    Data augmentation is a strategy to increase the quantity of the data, in order to improve the performance of the model. This strategy is widely used in natural language processing field, however, data augmentation strategies in natural language field nowadays, either consist of a lot of randomness, or require a lot of human pre-defined rules. In our work, we propose a novel approach to augment the data according to the attention weight, which doesn’t require any human pre-defined rules yet can get rid of the randomness. Our approach increases the accuracy of the classifier model, and it shows the feasibility of taking attention weight as a basis to perform data augmentation.

    摘要 1 Abstract ii 目錄 iii 圖目錄 vi 表目錄 viii 一、 緒論 1 1-1 研究背景 1 1-2 研究動機 2 1-3 研究目的 3 1-4 論文架構 4 二、 文獻探討 5 2-1 圖像資料增益(Image Data Augmentation) 6 2-1-1 調變原圖之資料增益法 6 2-1-2 深度學習之資料增益法 8 2-2 文字的資料增益 (Text Data Augmentation) 12 2-2-1 原文句增益法 13 2-2-2 反向翻譯法 19 2-3 以額外知識輔助資料增益之方法 20 2-3-1 以停用詞表為輔助的資料增益 20 2-3-2 以LDA (Latent Dirichlet Allocation)為輔助的資料增益 22 2-3-3 以詞性為輔助之資料增益 22 2-3-4 以TF-IDF為輔助之資料增益 23 2-4 注意力機制(Attention Mechanism) 24 2-4-1 注意力機制之發展 25 三、 研究方法 29 3-1 研究流程 29 3-2 階層式注意力機制 29 3-3 注意力資料增益 33 3-4 下游分類器 34 四、 實驗設計與分析 35 4-1 前處理、資料集與下游模型 35 4-2 實驗環境 35 4-3 實驗設計與結果 36 4-3-1 實驗一、注意力機制閾值設置實驗 36 4-3-2 實驗二、注意力機制與使用語言知識的資料增益之比較 41 4-3-3 實驗三、注意力機制在各資料增益方法中之表現 46 4-3-4 實驗四、 注意力機制生成資料與原始資料之比較 49 4-4 實驗小結 50 五、 結論與未來方向 52 5-1 結論 52 5-2 研究限制 52 5-3 未來研究方向 53 參考文獻 54

    Bahdanau, D., Cho, K., & Bengio, Y. (2016). Neural Machine Translation by Jointly Learning to Align and Translate. ArXiv:1409.0473 [Cs, Stat]. http://arxiv.org/abs/1409.0473
    Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. ArXiv:1406.1078 [Cs, Stat]. http://arxiv.org/abs/1406.1078
    Chorowski, J. K., Bahdanau, D., Serdyuk, D., Cho, K., & Bengio, Y. (2015). Attention-Based Models for Speech Recognition. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 28 (pp. 577–585). Curran Associates, Inc. http://papers.nips.cc/paper/5847-attention-based-models-for-speech-recognition.pdf
    Claude, C. (2018, December 5). Text Data Augmentation Made Simple By Leveraging NLP Cloud APIs. https://arxiv.org/ftp/arxiv/papers/1812/1812.04718.pdf
    Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv:1810.04805 [Cs]. http://arxiv.org/abs/1810.04805
    Doersch, C. (2016). Tutorial on Variational Autoencoders. ArXiv:1606.05908 [Cs, Stat]. http://arxiv.org/abs/1606.05908
    Duan, S., Zhao, H., Zhang, D., & Wang, R. (2020). Syntax-aware Data Augmentation for Neural Machine Translation. ArXiv:2004.14200 [Cs]. http://arxiv.org/abs/2004.14200
    Elder, H., & Hokamp, C. (2018). Generating High-Quality Surface Realizations Using Data Augmentation and Factored Sequence Models. ArXiv:1805.07731 [Cs]. http://arxiv.org/abs/1805.07731
    Galassi, A., Lippi, M., & Torroni, P. (2020). Attention in Natural Language Processing. ArXiv:1902.02181 [Cs, Stat]. http://arxiv.org/abs/1902.02181
    Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative Adversarial Nets. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 27 (pp. 2672–2680). Curran Associates, Inc. http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf
    Hu, T., Qi, H., Huang, Q., & Lu, Y. (2019). See Better Before Looking Closer: Weakly Supervised Data Augmentation Network for Fine-Grained Visual Classification. ArXiv:1901.09891 [Cs]. http://arxiv.org/abs/1901.09891
    Ioffe, S., & Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. ArXiv:1502.03167 [Cs]. http://arxiv.org/abs/1502.03167
    Jiao, X., Yin, Y., Shang, L., Jiang, X., Chen, X., Li, L., Wang, F., & Liu, Q. (2019). TinyBERT: Distilling BERT for Natural Language Understanding. ArXiv:1909.10351 [Cs]. http://arxiv.org/abs/1909.10351
    Kang, G., Dong, X., Zheng, L., & Yang, Y. (2017). PatchShuffle Regularization. ArXiv:1707.07103 [Cs]. http://arxiv.org/abs/1707.07103
    Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. ArXiv:1408.5882 [Cs]. http://arxiv.org/abs/1408.5882
    Kobayashi, S. (2018). Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations. ArXiv:1805.06201 [Cs]. http://arxiv.org/abs/1805.06201
    Loper, E., & Bird, S. (2002). NLTK: The Natural Language Toolkit. ArXiv:Cs/0205028. http://arxiv.org/abs/cs/0205028
    Luo, C., Zhu, Y., Jin, L., & Wang, Y. (2020). Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition. ArXiv:2003.06606 [Cs]. http://arxiv.org/abs/2003.06606
    Luque, F. M. (2019). Atalaya at TASS 2019: Data Augmentation and Robust Embeddings for Sentiment Analysis. ArXiv:1909.11241 [Cs]. http://arxiv.org/abs/1909.11241
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 26 (pp. 3111–3119). Curran Associates, Inc. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
    Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39–41. https://doi.org/10.1145/219717.219748
    Moreno-Barea, F. J., Strazzera, F., Jerez, J. M., Urda, D., & Franco, L. (2018). Forward Noise Adjustment Scheme for Data Augmentation. 2018 IEEE Symposium Series on Computational Intelligence (SSCI), 728–734. https://doi.org/10.1109/SSCI.2018.8628917
    Muhammad, A., & Amit, K. S. (2019). A Text Data Augmentation Approach for Improving the Performance of CNN.
    Pan, S. J., & Yang, Q. (2010). A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359. https://doi.org/10.1109/TKDE.2009.191
    Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543. https://doi.org/10.3115/v1/D14-1162
    Radford, A., Metz, L., & Chintala, S. (2016). Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. ArXiv:1511.06434 [Cs]. http://arxiv.org/abs/1511.06434
    Ratner, A. J., Ehrenberg, H., Hussain, Z., Dunnmon, J., & Ré, C. (2017). Learning to Compose Domain-Specific Transformations for Data Augmentation. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 30 (pp. 3236–3246). Curran Associates, Inc. http://papers.nips.cc/paper/6916-learning-to-compose-domain-specific-transformations-for-data-augmentation.pdf
    Sennrich, R., Haddow, B., & Birch, A. (2016). Improving Neural Machine Translation Models with Monolingual Data. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 86–96. https://doi.org/10.18653/v1/P16-1009
    Shleifer, S. (2019). Low Resource Text Classification with ULMFit and Backtranslation. ArXiv, abs/1903.09244.
    Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on Image Data Augmentation for Deep Learning. Journal of Big Data, 6(1), 60. https://doi.org/10.1186/s40537-019-0197-0
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (n.d.). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. 30.
    Sugiyama, A., & Yoshinaga, N. (2019). Data augmentation using back-translation for context-aware neural machine translation. Proceedings of the Fourth Workshop on Discourse in Machine Translation (DiscoMT 2019), 35–44. https://doi.org/10.18653/v1/D19-6504
    Tan, J., Wan, X., & Xiao, J. (2017). Abstractive Document Summarization with a Graph-Based Attentional Neural Model. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1171–1181. https://doi.org/10.18653/v1/P17-1108
    Tan, M., & Le, Q. V. (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. ArXiv:1905.11946 [Cs, Stat]. http://arxiv.org/abs/1905.11946
    Taylor, L., & Nitschke, G. (2017). Improving Deep Learning using Generic Data Augmentation. ArXiv:1708.06020 [Cs, Stat]. http://arxiv.org/abs/1708.06020
    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. ArXiv:1706.03762 [Cs]. http://arxiv.org/abs/1706.03762
    Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., & Tang, X. (2017). Residual Attention Network for Image Classification. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6450–6458. https://doi.org/10.1109/CVPR.2017.683
    Wang, W. Y., & Yang, D. (2015). That’s So Annoying!!!: A Lexical and Frame-Semantic Embedding Based Data Augmentation Approach to Automatic Categorization of Annoying Behaviors using #petpeeve Tweets. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2557–2563. https://doi.org/10.18653/v1/D15-1306
    Wei, J., & Zou, K. (2019). EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. https://openreview.net/forum?id=BJelsDvo84
    Xie, Q., Dai, Z., Hovy, E., Luong, M.-T., & Le, Q. V. (2019). Unsupervised Data Augmentation for Consistency Training. ArXiv:1904.12848 [Cs, Stat]. http://arxiv.org/abs/1904.12848
    Xie, Z., Wang, S. I., Li, J., Lévy, D., Nie, A., Jurafsky, D., & Ng, A. Y. (2017). Data Noising as Smoothing in Neural Network Language Models. ArXiv:1703.02573 [Cs]. http://arxiv.org/abs/1703.02573
    Xiong, C., Merity, S., & Socher, R. (2016). Dynamic Memory Networks for Visual and Textual Question Answering. ArXiv:1603.01417 [Cs]. http://arxiv.org/abs/1603.01417
    Xu, Y., Jia, R., Mou, L., Li, G., Chen, Y., Lu, Y., & Jin, Z. (2016). Improved relation classification by deep recurrent neural networks with data augmentation. Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, 1461–1470. https://www.aclweb.org/anthology/C16-1138
    Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). Hierarchical Attention Networks for Document Classification. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1480–1489. https://doi.org/10.18653/v1/N16-1174
    Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level Convolutional Networks for Text Classification. Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, 649–657. http://dl.acm.org/citation.cfm?id=2969239.2969312

    QR CODE
    :::