利用記憶增強條件隨機場域之深度學習及自動化詞彙特徵於中文命名實體辨識之研究

簡易檢索 / 詳目顯示

回結果列表

研究生：	簡國峻 Kuo-Chun Chien
論文名稱：	利用記憶增強條件隨機場域之深度學習及自動化詞彙特徵於中文命名實體辨識之研究 Leveraging Memory Enhanced Condition Random Fields with Convolutional and Automatic Lexical Feature for Chinese Named Entity Recognition
指導教授：	張嘉惠 Chia-Hui Chang
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 資訊工程學系在職專班 Executive Master of Computer Science & Information Engineering
論文出版年：	2018
畢業學年度：	107
語文別：	中文
論文頁數：	52
中文關鍵詞：	機器學習、命名實體辨識、記憶網路、特徵探勘
外文關鍵詞：	Machine Learning, Named Entity Recognition, Memory Network, Feature Mining
相關次數：	點閱：11 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

序列標記的模型被廣泛的運用在自然語言處理的範疇當中，如：命名實體辨識、詞性標記、斷詞等。命名實體辨識(Named Entity Recognition, NER)是自然語言處理當中一項重要的任務，因為它可以將未經過處理的文章，提取當中的命名實體並歸類到預先定義的分類當中，如：人名、地名、組織等。
命名實體辨識任務當中，大多數的研究是針對英文的資料集，不同於英文通常以空格做為每個單字的分割，且每個單字通常具有其獨特的意思；中文字通常隱含許多不同的資訊，根據所在的詞彙當中不同的位置，就有可能代表不同的意思，也因此中文當中並沒有明確的斷詞特徵。而傳統的機器學習於中文命名實體的辨識任務中，大多係使用統計的方式，並採取條件隨機場域進行序列標記，因此受限於小範圍的特徵擷取，如何在中文的資料集當中擷取參考長距離上下文資訊，判斷當前字詞正確的語意，進而正確的辨識命名實體，是一個充滿挑戰性及前瞻性的任務。
為克服上述的挑戰，本研究係使用深度學習的條件隨機場域進行中文命名實體辨識任務；首先透過訓練詞向量模型，將字元轉換為數值化之資料，再藉由卷積層、雙向GRU層，及整合長距離文章資訊的記憶層，使命名實體任務不同於往常僅能夠擷取小範圍的資訊，能夠獲取豐富完整的文章訊息。此外，也藉由特徵的探勘[1]，並使用深度學習模型可自動訓練的參數，自動調整詞向量及詞彙特徵，除長距離的文章資訊外，更能充分獲得文章所隱藏的訊息。
本研究所使用的資料集包含使用自製爬蟲軟體所蒐集的網路文章做為訓練資料，另以網路新聞做為測試資料[3]的PerNews及SIGHAN Bakeoff-3[2]；經研究實驗結果呈現，在網路社群媒體的資料中可以達到的91.67％的標記準確率，與尚未加入記憶的模型相比大幅提升2.9％，再加入詞彙詞向量及詞彙特徵，與基礎的記憶模型相比更是提升了6.04％。本研究所提出之模型在SIGHAN-MSRA中也得到最高的92.45％地名實體辨識效果及90.95％召回率。

Sequence labeling model has been widely used in Natural Language Processing (NLP). Ex: Named Entity recognition (NER), Part-Of-Speech tagging (POS) and Word Segmentation. Named Entity Recognition (NER) is one of the important tasks of Natural Language Processing because it can extract unnamed articles and extract them into pre-defined categories, such as person name, place name, organization, etc.
Most of the research in Named Entity Recognition (NER) focused on English data. In English, spaces are usually used for dividing words, and each word has its own meaning. While in Chinese, each characters contains different information, different location of the vocabulary, may represent different meanings, so Chinese is without explicit word delimiters. However, the traditional machine learning of Chinese Named Entity Recognition (CNER), most of them use statistical methods and take the Conditional Random Field (CRF) to complete the sequence labeling task. Therefore, it only can capture local features. It is a challenging and forward-looking task to capturing long-range context information in Chinese dataset, determine the correct semantic meaning of the current word, and correctly identify the named entity.
In order to overcome the challenges, this study used the deep learning Condition Random Fields to execute Chinese Named Entity Recognition task. Firstly, training a word vector model to convert characters to numeric data. And used convolutional layer, bidirectional GRU layer, and the memory layer that integrates external memory contains long-range context information. Making the task different from usual, only can capture local information, but can obtain rich message of article. Also by feature extraction generate some lexical features[1]. And use a automatically trained variable of deep learning model to automatically adjust the weight of word embedding and lexical features. In addition of long-range article information, the model also can fully obtain the hidden information of article.
The data set used in this research includes PerNews which is online articles collected using custom crawler as training data and online news articles as test data, and SIGHAN Bakeoff-3. According to the results, the model proposed in this research achieve 91.67% tagging accuracy in the online social media data. The result is significantly higher than the model that doesn’t add memory layer by 2.9%. And then the word embedding and lexical features are added, compared with the basic memory model increase 6.04%. The model proposed in this study also achieve the highest F1-score 92.45% at location name entity recognition performance and 90.95% overall recall rate in SIGHAN-MSRA dataset.

摘要    i
Abstract    ii
目錄    iv
表目錄    vi
圖目錄    vii
一、    簡介    1
二、    相關研究    3
2-1    條件隨機場域(Condition Random Fields)    3
2-2    卷積神經網路(Convolutional Neural Networks)    3
2-3    遞歸神經網路(Recurrent Neural Networks)    4
2-4    記憶網路(Memory Networks)    4
三、    模型架構及方法    6
3-1    輸入層(Input Layer)    8
3-2    卷積層(Convolutional Layer)    10
3-3    雙向GRU層(Bidirectional GRU Layer)    11
3-4    記憶層(Memory Layer)    12
3-4-1輸入記憶(Input Memory)    13
3-4-2輸出記憶(Output Memory)    14
3-4-3當前輸入(Current Input)    14
3-4-4注意力(Attention)    14
3-4-5記憶層輸出(Memory Layer Output)    15
3-5    條件隨機場域層(Condition Random Fields Layer)    15
四、    實驗與系統效能    17
4-1    資料集    17
4-1-1 PerNews    17
4-1-2 SIGHAN-MSRA    19
4-1-3 資料分析    19
4-1-4 詞向量(Word Vector)    21
4-2    效能評估方法    22
4-3    模型參數調整    23
4-3-1 記憶層之激活函數    23
4-3-2 記憶產生方式及記憶大小    24
4-3-3 卷積層過濾器數量    25
4-4    增加額外的特徵    26
4-4-1 詞彙詞向量(Word Embedding)    27
4-4-2 詞彙特徵(Lexical Features)    28
4-4-3 效能評估    29
4-5    與其他研究模型的效能評估    32
五、    結論與未來展望    36
參考文獻    37

                                

[1]C. Chou and C. Chang, "Mining features for web ner model construction based on distant learning," 2017 International Conference on Asian Language Processing (IALP), Singapore, 2017, pp. 322-325.
[2]Levow, G.A.: The third international Chinese language processing bakeoff: word segmentation and named entity recognition. In: Computational Linguistics, pp. 108–117 (2006)
[3]Y. Y. Huang, C.H. Chung, “A Tool for Web NER Model Generation Based on Google Snippets,” Proceedings of the 27th Conference on Computational Linguistics and Speech Processing, pp. 148–163, 2015.
[4]Sunita Sarawagi (2008), “Information Extraction,” Foundations and Trends® in Databases, pp. 261-377, 2008.
[5]L. Satish and B.I. Gururaj. 1993. Use of hidden Markov models for partial discharge pattern classification. Electrical Insulation, IEEE Transactions on 28, 2 (Apr 1993), 172–182.
[6]Gideon S. Mann and Andrew McCallum. 2010. Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data. J. Mach. Learn. Res. 11 (March 2010), 955–984.
[7]Andrew McCallum and Wei Li. 2003. Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-enhanced Lexicons. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 -Volume 4 (CONLL ’03). Association for Computational Linguistics, Stroudsburg, PA,USA, 188–191
[8]Yoshua Bengio, Patrice Simard, and Paolo Frasconi. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5(2):157–166
[9]Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, and J¨urgen Schmidhuber. 2001. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies.
[10]Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI 2015). Austin, USA, volume 333, pages 2267–2273.
[11]Tal Linzen, Emmanuel Dupoux, and Yoav Goldberg. 2016. Assessing the ability of lstms to learn syntax sensitive dependencies. Transactions of the Association for Computational Linguistics (TACL 2016) 4:521–535.
[12]Jason Weston, Sumit Chopra, and Antoine Bordes. 2015. Memory networks. In Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015). San Diego, USA.
[13]Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. 2016. Language modeling with gated convolutional networks. arXiv Preprint. arXiv: 1612.08083.
[14]John D. Lafferty, Andrew Mccallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. pages 282–289.
[15]Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005). Ann Arbor, USA, pages 363–370.
[16]Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 12:2493–2537.
[17]Wang, C., and Xu, B. (2017) Convolutional Neural Network with Word Embeddings for Chinese Word Segmentation. preprint arXiv:1711.04411
[18]Sepp Hochreiter, Jürgen Schmidhuber, “Long Short-Term Memory”, in Neural Computation 9(8):1735-80, December 1997.
[19]Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
[20]Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging [OL]. arXiv Preprint.arXiv: 1508.01991.
[21]Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673-2681.
[22]Kyunghyun Cho, Bart Van Merri¨enboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation. Doha, Qatar, pages 103–111.
[23]Liu, Fei and Baldwin, Timothy and Cohn, Trevor, 2017, Capturing Long-range Contextual Dependencies with Memory-enhanced Conditional Random Fields, Proceedings of the Eighth International Joint Conference on Natural Language Processing (IJCNLP 2017), Taipei, Taiwan, pages 555—565
[24]TensorFlow, https://www.tensorflow.org/
[25]Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. In Proceedings of NAACL-2016, San Diego, California, USA, June.
[26]Joohui An, Seungwoo Lee, and Gary Geunbae Lee. 2003. Automatic Acquisition of Named Entity Tagged Corpus from World Wide Web. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics -Volume 2 (ACL’03). Association for Computational Linguistics, Stroudsburg, PA, USA, 165–168.
[27]Salton, G., Wong, A., Yang, C. S., “A Vector Space Model for Automatic Indexing,” Commun. ACM, vol. 18, 1975, pp：613-620
[28]Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[29]Jieba,https://github.com/fxsjy/jieba
[30]CRF++: Yet Another CRFtoolkit：http://crfpp.sourceforge.net/
[31]Zhou, J., He, L., Dai, X., Chen, J.: Chinese named entity recognition with a multiphase model. In: Proceedings of 5th SIGHAN Workshop on Chinese Language Processing, pp. 213–216 (2006)
[32]Chen, A., Peng, F., Shan, R., Sun, G.: Chinese named entity recognition with conditional probabilistic models. In: Proceedings of 5th SIGHAN Workshop on Chinese Language Processing, pp. 173–176 (2006)
[33]Zhou, J., Qu, W., Zhang, F.: Chinese named entity recognition via joint identification and categorization. Chin. J. Electron. 22, 225–230 (2013)
[34]Zhang, S., Qin, Y., Wen, J., Wang, X.: Word segmentation and named entity recognition for SIGHAN Bakeoff3. In: Proceedings of 5th SIGHAN Workshop on Chinese Language Processing, pp. 158–161 (2006)
[35]Chuanhai Dong, Jiajun Zhang, Chengqing Zong,Masanori Hattori, and Hui Di. 2016. Characterbased LSTM-CRF with radical-level features for Chinese named entity recognition. In International Conference on Computer Processing of Oriental Languages. Springer, pages 239–250.
[36]Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of EMNLP-2014, pages 1532–1543, Doha, Qatar, October.
[37]Bottou. Stochastic gradient learning in neural networks. In Proceedings of Neuro-Nˆımes. EC2, 1991.
[38]Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky,Ilya Sutskever, and Ruslan Salakhutdinov. 2014.Dropout: a simple way to prevent neural networks from overfitting. JMLR 15(1):1929–1958.
[39]Nanyun Peng and Mark Dredze. 2015. Named entity recognition for Chinese social media with jointly trained embeddings. In Proceedings of EMNLP-2015, pages 548–554, Lisbon, Portugal, September.
[40]Zhang, Y., Clark, S.: A fast decoder for joint word segmentation and POS-tagging using a single discriminative model. In: Proceedings of 2010 Conference on Empirical Methods in Natural Language Processing, pp. 843–852 (2010)
[41]Xinxiong Chen, Lei Xu, Zhiyuan Liu, Maosong Sun, Huanbo Luan. Joint Learning of Character and Word Embeddings. The 25th International Joint Conference on Artificial Intelligence (IJCAI 2015)

簡易檢索 / 詳目顯示

相關論文