跳到主要內容

簡易檢索 / 詳目顯示

研究生: 董子瑄
Tzu-Hsuan Tung
論文名稱: 結合Selective Mechanism與多向注意力機制應用於自動文本摘要之研究
指導教授: 林熙禎
Shi-Jen Lin
口試委員:
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理學系
Department of Information Management
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 53
中文關鍵詞: TransformerSelective Mechanism自注意力機制萃取式摘要中文文本摘要
外文關鍵詞: Transformer, Selective mechanism, Self-attention, Abstractive summarization, Chinese summarization
相關次數: 點閱:16下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 文本摘要任務的目的在於將原始文本以精簡的文字重新呈現,同時要保留重點且不失原文語意。本研究結合Selective Mechanism與Transformer模型中的多向注意力機制以提升萃取式摘要模型的生成摘要品質,透過一個可訓練的Selective Gate Network對Transformer編碼器的多向注意力輸出進行過濾,產生二次潛在語意向量,以達到精煉的效果,其目的在於以過濾的方式,除去次要的資訊,萃取出應保留在摘要中的重點資訊,並使用二次潛在語意向量進行解碼,來產生更好的摘要。
    本研究並將此模型應用於中文文本摘要生成上,以ROUGE值做為評估指標,實驗結果顯示此模型在ROUGE-1、ROUGE-2、ROUGE-L都能超越Baseline模型,在Word-based ROUGE上提升約7.3~12.7%,在Character-based ROUGE上提升約4.9~7.9%,此外搭配Word to Character的斷詞方法並擴大編碼器更可以大幅提升各項ROUGE指標,在Word-based ROUGE可再提升20.4~41.8%,Character-based ROUGE可再提升約21.5~31.1%。


    Text summarization task aims to represent the original article in condensed text, while retaining the key points and the original semantics. This research combines selective mechanism with multi-head attention to improve the generated summary quality of the abstractive summarization model. A trainable selective gate network is used to filter the multi-head attention outputs in the Transformer encoder, which can select important information and discard unimportant information, and finally construct second level representation. The second level representation is a tailored sentence representation, which can be decoded into a better summary.
    This model is applied to Chinese text summarization task, and the evaluation metric is ROUGE score. The experiment result shows that the model performance exceed the baseline by 7.3 to 12.7% on word-based ROUGE, and 4.9 to 7.9% on character-based ROUGE. Moreover, with word to character tokenization and larger vocabulary banks can significantly improve the performance. In word-based ROUGE, it can increase by 20.4 to 41.8%, and character-based ROUGE can increase by 21.5 to 31.1%.

    摘要 I ABSTRACT II 誌謝 III 目錄 IV 圖目錄 VI 表目錄 VII 一、緒論 1 1-1 研究背景 1 1-2 研究動機 2 1-3 研究目的 2 1-4 論文架構 3 二、文獻探討 4 2-1 自動文本摘要 4 2-2 編解碼器架構 5 2-3 RNN 5 2-3-1 RNN + 注意力機制 7 2-3-2 RNN + Selective Mechanism 9 2-4 Transformer 12 2-5 BERT 14 三、研究方法 16 3-1 研究流程 16 3-2 資料前處理 17 3-3 摘要模型架構 19 3-3-1 預訓練詞向量 20 3-3-2 Selective Gate Multi-Head Attention 21 3-4 結果評估 23 四、實驗 25 4-1 實驗設置 25 4-2 實驗資料集 25 4-3 實驗設計與結果 26 4-3-1 實驗一:Transformer 與 Transformer + Selective Mechanism 之比較 26 4-3-2 實驗二:Selective Mechanism 應用於不同架構之比較 28 4-3-3 實驗三:斷詞模式與辭典大小之影響 29 4-3-4 實驗四:辭典大小於訓練時間和評估指標的影響 32 4-4 實驗結果與其他學者之比較 34 五、結論與未來方向 36 5-1 結論 36 5-2 研究限制 36 5-3 未來方向 36 參考文獻 38

    Bahdanau, D., Cho, K., & Bengio, Y. (2016). Neural Machine Translation by Jointly Learning to Align and Translate. ArXiv:1409.0473 [Cs, Stat]. http://arxiv.org/abs/1409.0473
    Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching Word Vectors with Subword Information. ArXiv:1607.04606 [Cs]. http://arxiv.org/abs/1607.04606
    Chang, C.-T., Huang, C.-C., Yang, C.-Y., & Hsu, J. Y.-J. (2018). A Hybrid Word-Character Approach to Abstractive Summarization. ArXiv:1802.09968 [Cs]. http://arxiv.org/abs/1802.09968
    Chen, Q., Zhu, X., Ling, Z., Wei, S., & Jiang, H. (2016). Distraction-Based Neural Networks for Document Summarization. ArXiv:1610.08462 [Cs]. http://arxiv.org/abs/1610.08462
    Chen, X., Xu, L., Liu, Z., Sun, M., & Luan, H. (2015). Joint learning of character and word embeddings. Proceedings of the 24th International Conference on Artificial Intelligence, 1236–1242.
    Christian, H., Agus, M. P., & Suhartono, D. (2016). Single Document Automatic Text Summarization using Term Frequency-Inverse Document Frequency (TF-IDF). ComTech: Computer, Mathematics and Engineering Applications, 7(4), 285–294. https://doi.org/10.21512/comtech.v7i4.3746
    Chuang, W. T., & Yang, J. (2000). Extracting sentence segments for text summarization: A machine learning approach. Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 152–159. https://doi.org/10.1145/345508.345566
    Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv:1810.04805 [Cs]. http://arxiv.org/abs/1810.04805
    Duan, X., Yu, H., Yin, M., Zhang, M., Luo, W., & Zhang, Y. (2019). Contrastive Attention Mechanism for Abstractive Sentence Summarization. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 3044–3053. https://doi.org/10.18653/v1/D19-1301
    Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211. https://doi.org/10.1016/0364-0213(90)90002-E
    Gu, J., Lu, Z., Li, H., & Li, V. O. K. (2016). Incorporating Copying Mechanism in Sequence-to-Sequence Learning. ArXiv:1603.06393 [Cs]. http://arxiv.org/abs/1603.06393
    Hochreiter, S., & Schmidhuber, J. (1997). Long Short-term Memory. Neural Computation, 9, 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
    Hu, B., Chen, Q., & Zhu, F. (2016). LCSTS: A Large Scale Chinese Short Text Summarization Dataset. ArXiv:1506.05865 [Cs]. http://arxiv.org/abs/1506.05865
    Kaibi, I., Nfaoui, E. H., & Satori, H. (2019). A Comparative Evaluation of Word Embeddings Techniques for Twitter Sentiment Analysis. 2019 International Conference on Wireless Technologies, Embedded and Intelligent Systems (WITS), 1–4. https://doi.org/10.1109/WITS.2019.8723864
    Kedzie, C., McKeown, K., & Daume III, H. (2019). Content Selection in Deep Learning Models of Summarization. ArXiv:1810.12343 [Cs]. http://arxiv.org/abs/1810.12343
    Kilimci, Z. H., & Akyokuş, S. (2019). The Evaluation of Word Embedding Models and Deep Learning Algorithms for Turkish Text Classification. 2019 4th International Conference on Computer Science and Engineering (UBMK), 548–553. https://doi.org/10.1109/UBMK.2019.8907027
    Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. ArXiv:1408.5882 [Cs]. http://arxiv.org/abs/1408.5882
    Klein, G., Kim, Y., Deng, Y., Senellart, J., & Rush, A. M. (2017). OpenNMT: Open-Source Toolkit for Neural Machine Translation. ArXiv:1701.02810 [Cs]. http://arxiv.org/abs/1701.02810
    Lin, C.-Y. (2004). ROUGE: A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out, 74–81. https://www.aclweb.org/anthology/W04-1013
    Lin, J., Sun, X., Ma, S., & Su, Q. (2018). Global Encoding for Abstractive Summarization. ArXiv:1805.03989 [Cs]. http://arxiv.org/abs/1805.03989
    Liu, Y. (2019). Fine-tune BERT for Extractive Summarization. ArXiv:1903.10318 [Cs]. http://arxiv.org/abs/1903.10318
    Liu, Y., & Lapata, M. (2019). Text Summarization with Pretrained Encoders. ArXiv:1908.08345 [Cs]. http://arxiv.org/abs/1908.08345
    Luong, M.-T., Pham, H., & Manning, C. D. (2015). Effective Approaches to Attention-based Neural Machine Translation. ArXiv:1508.04025 [Cs]. http://arxiv.org/abs/1508.04025
    Ma, S., Sun, X., Lin, J., & Wang, H. (2018). Autoencoder as Assistant Supervisor: Improving Text Representation for Chinese Social Media Text Summarization. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 725–731. https://doi.org/10.18653/v1/P18-2115
    Mihalcea, R., & Tarau, P. (2004). TextRank: Bringing Order into Text. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, 404–411. https://www.aclweb.org/anthology/W04-3252
    Nallapati, R., Zhou, B., santos, C. N. dos, Gulcehre, C., & Xiang, B. (2016). Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond. ArXiv:1602.06023 [Cs]. http://arxiv.org/abs/1602.06023
    Nenkova, A., & Vanderwende, L. (2005). The impact of frequency on summarization.
    Rush, A. M., Chopra, S., & Weston, J. (2015). A Neural Attention Model for Abstractive Sentence Summarization. ArXiv:1509.00685 [Cs]. http://arxiv.org/abs/1509.00685
    Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to Sequence Learning with Neural Networks. ArXiv:1409.3215 [Cs]. http://arxiv.org/abs/1409.3215
    Tas, O., & Kiyani, F. (2017). A SURVEY AUTOMATIC TEXT SUMMARIZATION. PressAcademia Procedia, 5(1), 205–213. https://doi.org/10.17261/Pressacademia.2017.591
    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. ArXiv:1706.03762 [Cs]. http://arxiv.org/abs/1706.03762
    Wang, L., Yao, J., Tao, Y., Zhong, L., Liu, W., & Du, Q. (2018). A Reinforced Topic-Aware Convolutional Sequence-to-Sequence Model for Abstractive Text Summarization. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 4453–4460. https://doi.org/10.24963/ijcai.2018/619
    Wei, B., Ren, X., Sun, X., Zhang, Y., Cai, X., & Su, Q. (2018). Regularizing Output Distribution of Abstractive Chinese Social Media Text Summarization for Improved Semantic Consistency. ArXiv:1805.04033 [Cs]. http://arxiv.org/abs/1805.04033
    Zhou, Q., Yang, N., Wei, F., & Zhou, M. (2017). Selective Encoding for Abstractive Sentence Summarization. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1095–1104. https://doi.org/10.18653/v1/P17-1101
    張昇暉(2017)。中文文件串流之摘要擷取研究。國立中央大學資訊管理研究所碩士論文,桃園市。
    楊佩臻(2013)。利用文句關係網路自動萃取文件摘要之研究。國立中央大學資訊管理研究所碩士論文,桃園市。
    王美淋(2020)。結合擷取式與萃取式兩段式模型以增進摘要效能之研究。國立中央大學資訊管理研究所碩士論文,桃園市。
    王蓮淨(2015)。以主題事件追蹤為基礎之摘要擷取。國立中央大學資訊管理研究所碩士論文,桃園市。
    蔡汶霖(2018)。以詞向量模型增進基於遞歸神經網路之中文文字摘要系統效能。國立中央大學資訊管理研究所碩士論文,桃園市。
    陳俞琇(2019)。具擷取及萃取能力的摘要模型。國立中央大學資訊管理研究所碩士論文,桃園市。
    麥嘉芳(2019)。基於注意力機制之詞向量中文萃取式摘要研究。國立中央大學資訊管理研究所碩士論文,桃園市。
    黃嘉偉(2014)。以文句網路分群架構萃取多文件摘要。國立中央大學資訊管理研究所碩士論文,桃園市。

    QR CODE
    :::