| 研究生: |
董子瑄 Tzu-Hsuan Tung |
|---|---|
| 論文名稱: |
結合Selective Mechanism與多向注意力機制應用於自動文本摘要之研究 |
| 指導教授: |
林熙禎
Shi-Jen Lin |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理學系 Department of Information Management |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 中文 |
| 論文頁數: | 53 |
| 中文關鍵詞: | Transformer 、Selective Mechanism 、自注意力機制 、萃取式摘要 、中文文本摘要 |
| 外文關鍵詞: | Transformer, Selective mechanism, Self-attention, Abstractive summarization, Chinese summarization |
| 相關次數: | 點閱:15 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
文本摘要任務的目的在於將原始文本以精簡的文字重新呈現,同時要保留重點且不失原文語意。本研究結合Selective Mechanism與Transformer模型中的多向注意力機制以提升萃取式摘要模型的生成摘要品質,透過一個可訓練的Selective Gate Network對Transformer編碼器的多向注意力輸出進行過濾,產生二次潛在語意向量,以達到精煉的效果,其目的在於以過濾的方式,除去次要的資訊,萃取出應保留在摘要中的重點資訊,並使用二次潛在語意向量進行解碼,來產生更好的摘要。
本研究並將此模型應用於中文文本摘要生成上,以ROUGE值做為評估指標,實驗結果顯示此模型在ROUGE-1、ROUGE-2、ROUGE-L都能超越Baseline模型,在Word-based ROUGE上提升約7.3~12.7%,在Character-based ROUGE上提升約4.9~7.9%,此外搭配Word to Character的斷詞方法並擴大編碼器更可以大幅提升各項ROUGE指標,在Word-based ROUGE可再提升20.4~41.8%,Character-based ROUGE可再提升約21.5~31.1%。
Text summarization task aims to represent the original article in condensed text, while retaining the key points and the original semantics. This research combines selective mechanism with multi-head attention to improve the generated summary quality of the abstractive summarization model. A trainable selective gate network is used to filter the multi-head attention outputs in the Transformer encoder, which can select important information and discard unimportant information, and finally construct second level representation. The second level representation is a tailored sentence representation, which can be decoded into a better summary.
This model is applied to Chinese text summarization task, and the evaluation metric is ROUGE score. The experiment result shows that the model performance exceed the baseline by 7.3 to 12.7% on word-based ROUGE, and 4.9 to 7.9% on character-based ROUGE. Moreover, with word to character tokenization and larger vocabulary banks can significantly improve the performance. In word-based ROUGE, it can increase by 20.4 to 41.8%, and character-based ROUGE can increase by 21.5 to 31.1%.
Bahdanau, D., Cho, K., & Bengio, Y. (2016). Neural Machine Translation by Jointly Learning to Align and Translate. ArXiv:1409.0473 [Cs, Stat]. http://arxiv.org/abs/1409.0473
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching Word Vectors with Subword Information. ArXiv:1607.04606 [Cs]. http://arxiv.org/abs/1607.04606
Chang, C.-T., Huang, C.-C., Yang, C.-Y., & Hsu, J. Y.-J. (2018). A Hybrid Word-Character Approach to Abstractive Summarization. ArXiv:1802.09968 [Cs]. http://arxiv.org/abs/1802.09968
Chen, Q., Zhu, X., Ling, Z., Wei, S., & Jiang, H. (2016). Distraction-Based Neural Networks for Document Summarization. ArXiv:1610.08462 [Cs]. http://arxiv.org/abs/1610.08462
Chen, X., Xu, L., Liu, Z., Sun, M., & Luan, H. (2015). Joint learning of character and word embeddings. Proceedings of the 24th International Conference on Artificial Intelligence, 1236–1242.
Christian, H., Agus, M. P., & Suhartono, D. (2016). Single Document Automatic Text Summarization using Term Frequency-Inverse Document Frequency (TF-IDF). ComTech: Computer, Mathematics and Engineering Applications, 7(4), 285–294. https://doi.org/10.21512/comtech.v7i4.3746
Chuang, W. T., & Yang, J. (2000). Extracting sentence segments for text summarization: A machine learning approach. Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 152–159. https://doi.org/10.1145/345508.345566
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv:1810.04805 [Cs]. http://arxiv.org/abs/1810.04805
Duan, X., Yu, H., Yin, M., Zhang, M., Luo, W., & Zhang, Y. (2019). Contrastive Attention Mechanism for Abstractive Sentence Summarization. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 3044–3053. https://doi.org/10.18653/v1/D19-1301
Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211. https://doi.org/10.1016/0364-0213(90)90002-E
Gu, J., Lu, Z., Li, H., & Li, V. O. K. (2016). Incorporating Copying Mechanism in Sequence-to-Sequence Learning. ArXiv:1603.06393 [Cs]. http://arxiv.org/abs/1603.06393
Hochreiter, S., & Schmidhuber, J. (1997). Long Short-term Memory. Neural Computation, 9, 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Hu, B., Chen, Q., & Zhu, F. (2016). LCSTS: A Large Scale Chinese Short Text Summarization Dataset. ArXiv:1506.05865 [Cs]. http://arxiv.org/abs/1506.05865
Kaibi, I., Nfaoui, E. H., & Satori, H. (2019). A Comparative Evaluation of Word Embeddings Techniques for Twitter Sentiment Analysis. 2019 International Conference on Wireless Technologies, Embedded and Intelligent Systems (WITS), 1–4. https://doi.org/10.1109/WITS.2019.8723864
Kedzie, C., McKeown, K., & Daume III, H. (2019). Content Selection in Deep Learning Models of Summarization. ArXiv:1810.12343 [Cs]. http://arxiv.org/abs/1810.12343
Kilimci, Z. H., & Akyokuş, S. (2019). The Evaluation of Word Embedding Models and Deep Learning Algorithms for Turkish Text Classification. 2019 4th International Conference on Computer Science and Engineering (UBMK), 548–553. https://doi.org/10.1109/UBMK.2019.8907027
Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. ArXiv:1408.5882 [Cs]. http://arxiv.org/abs/1408.5882
Klein, G., Kim, Y., Deng, Y., Senellart, J., & Rush, A. M. (2017). OpenNMT: Open-Source Toolkit for Neural Machine Translation. ArXiv:1701.02810 [Cs]. http://arxiv.org/abs/1701.02810
Lin, C.-Y. (2004). ROUGE: A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out, 74–81. https://www.aclweb.org/anthology/W04-1013
Lin, J., Sun, X., Ma, S., & Su, Q. (2018). Global Encoding for Abstractive Summarization. ArXiv:1805.03989 [Cs]. http://arxiv.org/abs/1805.03989
Liu, Y. (2019). Fine-tune BERT for Extractive Summarization. ArXiv:1903.10318 [Cs]. http://arxiv.org/abs/1903.10318
Liu, Y., & Lapata, M. (2019). Text Summarization with Pretrained Encoders. ArXiv:1908.08345 [Cs]. http://arxiv.org/abs/1908.08345
Luong, M.-T., Pham, H., & Manning, C. D. (2015). Effective Approaches to Attention-based Neural Machine Translation. ArXiv:1508.04025 [Cs]. http://arxiv.org/abs/1508.04025
Ma, S., Sun, X., Lin, J., & Wang, H. (2018). Autoencoder as Assistant Supervisor: Improving Text Representation for Chinese Social Media Text Summarization. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 725–731. https://doi.org/10.18653/v1/P18-2115
Mihalcea, R., & Tarau, P. (2004). TextRank: Bringing Order into Text. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, 404–411. https://www.aclweb.org/anthology/W04-3252
Nallapati, R., Zhou, B., santos, C. N. dos, Gulcehre, C., & Xiang, B. (2016). Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond. ArXiv:1602.06023 [Cs]. http://arxiv.org/abs/1602.06023
Nenkova, A., & Vanderwende, L. (2005). The impact of frequency on summarization.
Rush, A. M., Chopra, S., & Weston, J. (2015). A Neural Attention Model for Abstractive Sentence Summarization. ArXiv:1509.00685 [Cs]. http://arxiv.org/abs/1509.00685
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to Sequence Learning with Neural Networks. ArXiv:1409.3215 [Cs]. http://arxiv.org/abs/1409.3215
Tas, O., & Kiyani, F. (2017). A SURVEY AUTOMATIC TEXT SUMMARIZATION. PressAcademia Procedia, 5(1), 205–213. https://doi.org/10.17261/Pressacademia.2017.591
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. ArXiv:1706.03762 [Cs]. http://arxiv.org/abs/1706.03762
Wang, L., Yao, J., Tao, Y., Zhong, L., Liu, W., & Du, Q. (2018). A Reinforced Topic-Aware Convolutional Sequence-to-Sequence Model for Abstractive Text Summarization. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 4453–4460. https://doi.org/10.24963/ijcai.2018/619
Wei, B., Ren, X., Sun, X., Zhang, Y., Cai, X., & Su, Q. (2018). Regularizing Output Distribution of Abstractive Chinese Social Media Text Summarization for Improved Semantic Consistency. ArXiv:1805.04033 [Cs]. http://arxiv.org/abs/1805.04033
Zhou, Q., Yang, N., Wei, F., & Zhou, M. (2017). Selective Encoding for Abstractive Sentence Summarization. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1095–1104. https://doi.org/10.18653/v1/P17-1101
張昇暉(2017)。中文文件串流之摘要擷取研究。國立中央大學資訊管理研究所碩士論文,桃園市。
楊佩臻(2013)。利用文句關係網路自動萃取文件摘要之研究。國立中央大學資訊管理研究所碩士論文,桃園市。
王美淋(2020)。結合擷取式與萃取式兩段式模型以增進摘要效能之研究。國立中央大學資訊管理研究所碩士論文,桃園市。
王蓮淨(2015)。以主題事件追蹤為基礎之摘要擷取。國立中央大學資訊管理研究所碩士論文,桃園市。
蔡汶霖(2018)。以詞向量模型增進基於遞歸神經網路之中文文字摘要系統效能。國立中央大學資訊管理研究所碩士論文,桃園市。
陳俞琇(2019)。具擷取及萃取能力的摘要模型。國立中央大學資訊管理研究所碩士論文,桃園市。
麥嘉芳(2019)。基於注意力機制之詞向量中文萃取式摘要研究。國立中央大學資訊管理研究所碩士論文,桃園市。
黃嘉偉(2014)。以文句網路分群架構萃取多文件摘要。國立中央大學資訊管理研究所碩士論文,桃園市。