| 研究生: |
黃晉豪 Jin-Hao Huang |
|---|---|
| 論文名稱: |
以注意力機制輔助文本分類中的資料增益 |
| 指導教授: |
林熙禎
She-Jen Lin |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理學系 Department of Information Management |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 中文 |
| 論文頁數: | 70 |
| 中文關鍵詞: | 資料增益 、文本分類 、自然語言處理 、注意力機制 |
| 外文關鍵詞: | Data augmentation, Attention Mechanism, Text classification, Natural language processing |
| 相關次數: | 點閱:9 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
資料增益(Data Augmentation)在許多研究領域中都顯示對於模型預測的準確率有幫助,然而在自然語言處理中,資料增益方式大多都具隨機性,或者在增益前需要許多語言的先備知識,因此本研究提出以注意力機制作為輔助的資料增益方法,以進行資料增益,探討注意力機制對於文本分類中的資料增益是否有影響。最後在實驗結果中證實本研究提出之方法,可有效提升分類器在分類上之準確度,在僅有500筆資料量下提升分類器10%的分類準確度。
Data augmentation is a strategy to increase the quantity of the data, in order to improve the performance of the model. This strategy is widely used in natural language processing field, however, data augmentation strategies in natural language field nowadays, either consist of a lot of randomness, or require a lot of human pre-defined rules. In our work, we propose a novel approach to augment the data according to the attention weight, which doesn’t require any human pre-defined rules yet can get rid of the randomness. Our approach increases the accuracy of the classifier model, and it shows the feasibility of taking attention weight as a basis to perform data augmentation.
Bahdanau, D., Cho, K., & Bengio, Y. (2016). Neural Machine Translation by Jointly Learning to Align and Translate. ArXiv:1409.0473 [Cs, Stat]. http://arxiv.org/abs/1409.0473
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. ArXiv:1406.1078 [Cs, Stat]. http://arxiv.org/abs/1406.1078
Chorowski, J. K., Bahdanau, D., Serdyuk, D., Cho, K., & Bengio, Y. (2015). Attention-Based Models for Speech Recognition. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 28 (pp. 577–585). Curran Associates, Inc. http://papers.nips.cc/paper/5847-attention-based-models-for-speech-recognition.pdf
Claude, C. (2018, December 5). Text Data Augmentation Made Simple By Leveraging NLP Cloud APIs. https://arxiv.org/ftp/arxiv/papers/1812/1812.04718.pdf
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv:1810.04805 [Cs]. http://arxiv.org/abs/1810.04805
Doersch, C. (2016). Tutorial on Variational Autoencoders. ArXiv:1606.05908 [Cs, Stat]. http://arxiv.org/abs/1606.05908
Duan, S., Zhao, H., Zhang, D., & Wang, R. (2020). Syntax-aware Data Augmentation for Neural Machine Translation. ArXiv:2004.14200 [Cs]. http://arxiv.org/abs/2004.14200
Elder, H., & Hokamp, C. (2018). Generating High-Quality Surface Realizations Using Data Augmentation and Factored Sequence Models. ArXiv:1805.07731 [Cs]. http://arxiv.org/abs/1805.07731
Galassi, A., Lippi, M., & Torroni, P. (2020). Attention in Natural Language Processing. ArXiv:1902.02181 [Cs, Stat]. http://arxiv.org/abs/1902.02181
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative Adversarial Nets. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 27 (pp. 2672–2680). Curran Associates, Inc. http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf
Hu, T., Qi, H., Huang, Q., & Lu, Y. (2019). See Better Before Looking Closer: Weakly Supervised Data Augmentation Network for Fine-Grained Visual Classification. ArXiv:1901.09891 [Cs]. http://arxiv.org/abs/1901.09891
Ioffe, S., & Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. ArXiv:1502.03167 [Cs]. http://arxiv.org/abs/1502.03167
Jiao, X., Yin, Y., Shang, L., Jiang, X., Chen, X., Li, L., Wang, F., & Liu, Q. (2019). TinyBERT: Distilling BERT for Natural Language Understanding. ArXiv:1909.10351 [Cs]. http://arxiv.org/abs/1909.10351
Kang, G., Dong, X., Zheng, L., & Yang, Y. (2017). PatchShuffle Regularization. ArXiv:1707.07103 [Cs]. http://arxiv.org/abs/1707.07103
Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. ArXiv:1408.5882 [Cs]. http://arxiv.org/abs/1408.5882
Kobayashi, S. (2018). Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations. ArXiv:1805.06201 [Cs]. http://arxiv.org/abs/1805.06201
Loper, E., & Bird, S. (2002). NLTK: The Natural Language Toolkit. ArXiv:Cs/0205028. http://arxiv.org/abs/cs/0205028
Luo, C., Zhu, Y., Jin, L., & Wang, Y. (2020). Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition. ArXiv:2003.06606 [Cs]. http://arxiv.org/abs/2003.06606
Luque, F. M. (2019). Atalaya at TASS 2019: Data Augmentation and Robust Embeddings for Sentiment Analysis. ArXiv:1909.11241 [Cs]. http://arxiv.org/abs/1909.11241
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 26 (pp. 3111–3119). Curran Associates, Inc. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39–41. https://doi.org/10.1145/219717.219748
Moreno-Barea, F. J., Strazzera, F., Jerez, J. M., Urda, D., & Franco, L. (2018). Forward Noise Adjustment Scheme for Data Augmentation. 2018 IEEE Symposium Series on Computational Intelligence (SSCI), 728–734. https://doi.org/10.1109/SSCI.2018.8628917
Muhammad, A., & Amit, K. S. (2019). A Text Data Augmentation Approach for Improving the Performance of CNN.
Pan, S. J., & Yang, Q. (2010). A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359. https://doi.org/10.1109/TKDE.2009.191
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543. https://doi.org/10.3115/v1/D14-1162
Radford, A., Metz, L., & Chintala, S. (2016). Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. ArXiv:1511.06434 [Cs]. http://arxiv.org/abs/1511.06434
Ratner, A. J., Ehrenberg, H., Hussain, Z., Dunnmon, J., & Ré, C. (2017). Learning to Compose Domain-Specific Transformations for Data Augmentation. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 30 (pp. 3236–3246). Curran Associates, Inc. http://papers.nips.cc/paper/6916-learning-to-compose-domain-specific-transformations-for-data-augmentation.pdf
Sennrich, R., Haddow, B., & Birch, A. (2016). Improving Neural Machine Translation Models with Monolingual Data. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 86–96. https://doi.org/10.18653/v1/P16-1009
Shleifer, S. (2019). Low Resource Text Classification with ULMFit and Backtranslation. ArXiv, abs/1903.09244.
Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on Image Data Augmentation for Deep Learning. Journal of Big Data, 6(1), 60. https://doi.org/10.1186/s40537-019-0197-0
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (n.d.). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. 30.
Sugiyama, A., & Yoshinaga, N. (2019). Data augmentation using back-translation for context-aware neural machine translation. Proceedings of the Fourth Workshop on Discourse in Machine Translation (DiscoMT 2019), 35–44. https://doi.org/10.18653/v1/D19-6504
Tan, J., Wan, X., & Xiao, J. (2017). Abstractive Document Summarization with a Graph-Based Attentional Neural Model. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1171–1181. https://doi.org/10.18653/v1/P17-1108
Tan, M., & Le, Q. V. (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. ArXiv:1905.11946 [Cs, Stat]. http://arxiv.org/abs/1905.11946
Taylor, L., & Nitschke, G. (2017). Improving Deep Learning using Generic Data Augmentation. ArXiv:1708.06020 [Cs, Stat]. http://arxiv.org/abs/1708.06020
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. ArXiv:1706.03762 [Cs]. http://arxiv.org/abs/1706.03762
Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., & Tang, X. (2017). Residual Attention Network for Image Classification. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6450–6458. https://doi.org/10.1109/CVPR.2017.683
Wang, W. Y., & Yang, D. (2015). That’s So Annoying!!!: A Lexical and Frame-Semantic Embedding Based Data Augmentation Approach to Automatic Categorization of Annoying Behaviors using #petpeeve Tweets. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2557–2563. https://doi.org/10.18653/v1/D15-1306
Wei, J., & Zou, K. (2019). EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. https://openreview.net/forum?id=BJelsDvo84
Xie, Q., Dai, Z., Hovy, E., Luong, M.-T., & Le, Q. V. (2019). Unsupervised Data Augmentation for Consistency Training. ArXiv:1904.12848 [Cs, Stat]. http://arxiv.org/abs/1904.12848
Xie, Z., Wang, S. I., Li, J., Lévy, D., Nie, A., Jurafsky, D., & Ng, A. Y. (2017). Data Noising as Smoothing in Neural Network Language Models. ArXiv:1703.02573 [Cs]. http://arxiv.org/abs/1703.02573
Xiong, C., Merity, S., & Socher, R. (2016). Dynamic Memory Networks for Visual and Textual Question Answering. ArXiv:1603.01417 [Cs]. http://arxiv.org/abs/1603.01417
Xu, Y., Jia, R., Mou, L., Li, G., Chen, Y., Lu, Y., & Jin, Z. (2016). Improved relation classification by deep recurrent neural networks with data augmentation. Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, 1461–1470. https://www.aclweb.org/anthology/C16-1138
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). Hierarchical Attention Networks for Document Classification. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1480–1489. https://doi.org/10.18653/v1/N16-1174
Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level Convolutional Networks for Text Classification. Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, 649–657. http://dl.acm.org/citation.cfm?id=2969239.2969312