跳到主要內容

簡易檢索 / 詳目顯示

研究生: 周建龍
Chien-Lung Chou
論文名稱: 基於半監督式學習的網路命名實體辨識模型訓練框架
A Framework for Web NER Model Trainingbased on Semi-supervised Learning
指導教授: 張嘉惠
Chia-Hui Chang
口試委員:
學位類別: 博士
Doctor
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 86
中文關鍵詞: 命名實體辨識半監督式學習Distant supervision局部敏感哈希Tri-training
外文關鍵詞: Named entity recognition (NER), Semi-supervised Learning, Distant supervision, Locality-Sensitive Hashing (LSH), Tri-training
相關次數: 點閱:15下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 命名實體辨識(NER)是自然語言理解中的一項重要任務,因為它可用來擷取文章中的關鍵實體(人名、地點名稱、組織名稱、日期、數字等)和對象(產品、歌曲、電影、活動名稱等)。
    這些實體對於眾多相關應用至關重要,例如用於分析社交網絡上的公眾意見的意見分析以及用於進行交互式對話和提供智能客戶服務的智慧型交談系統。

    然而,現有的自然語言處理(NLP)工具(例如Stanford named entity recognizer)僅可識別一般的命名實體(人名、地點名稱、組織名稱),或者需準備特定格式的註釋訓練資料與特徵工程方可透過監督式學習訓練一客制的NER模型。
    由於並非所有語言或命名實體都具有公開的NER工具可使用,因此構建NER模型培訓框架對於低資源的語言或罕見的命名實體提取至關重要。

    要構建客制的NER模型,通常需要大量時間來準備,註釋和評估訓練/測試資料以及進行語言相關的特徵工程。
    現有的研究依賴於帶註釋的訓練數據,這對於大量數據集的準備而言,時間與人工成本是非常昂貴的。
    同時,這亦限制了命名實體辨識的有效性。
    在本論文中,我們研究並開發一個基於半監督學習的網路命名實體辨識模型訓練框架,可利用Web上所收集的大量資料以及已知命名實體解決為自定義NER模型準備訓練語料庫的問題。

    我們考量自動標記的有效性和效率問題以及與語言無關的特徵探勘來準備和註釋訓練數據。
    自動標記的主要挑戰在於標記策略的選擇以避免由於短種子和長種子而導致的假陽性(false positive)和假陰性(false negative)訓練資料,以及巨量的語料庫和已知命名實體而導致的標記時間過長。

    Distant supervision利用已知命名實體為關鍵字並收集搜索片段並當作訓練資料並不是一個新的概念;然而,當大量的已知命名實體(例如550k)和收集到的句子(例如2M)時,自動標記訓練資料的效率變得至關重要。
    另外一個問題是用於監督學習的語言相關特徵挖掘。
    此外,我們亦修改了序列標記的tri-training,並為大型數據集推導出適當的初始化公式,並提升tri-training於較大的資料集上的效能。

    最後,我們對五種類型的實體識別任務進行了實驗,包括中式人名,食物名稱,地點名稱,興趣點(POI)和活動名稱的辨識,以證明所提出的Web NER模型構建框架是有效的。


    Named entity recognition (NER) is an important task in natural language understanding because it extracts the key entities (e.g., person, organization, location, date, and number) and objects (e.g., product, song, movie, and activity name) mentioned in texts.
    These entities are essential to numerous text applications, such as those used for analyzing public opinion on social networks, and to the interfaces used to conduct interactive conversations and provide intelligent customer services.

    However, existing natural language processing (NLP) tools (such as Stanford named entity recognizer) recognize only general named entities or require annotated training examples and feature engineering for supervised model construction.
    Since not all languages or entities have public NER support, constructing a framework for NER model training is essential for low-resource language or entity information extraction (IE).

    Building a customized NER model often requires a significant amount of time to prepare, annotate, and evaluate the training/testing and language-dependent feature engineering.
    Existing studies rely on annotated training data; however, it is quite expensive to obtain large datasets, thus limiting the effectiveness of recognition.
    In this thesis, we examine the problem of developing a framework to prepare a training corpus from the web with known entities for custom NER model training via semi-supervised learning.

    We consider the effectiveness and efficiency problems of automatic labeling and language-independent feature mining to prepare and annotate the training data.
    The major challenge of automatic labeling lies in the choice of labeling strategies to avoid false positive and false negative examples, due to short and long seeds, and a long labeling time, due to large corpus and seed entities.

    Distant supervision, which collects training sentences from search snippets with known entities is not new; however, the efficiency of automatic labeling becomes critical when dealing with a large number of known entities (e.g., 550k) and sentences (e.g., 2M).
    Additionally, to address the language-dependent feature mining for supervised learning, we modify tri-training for sequence labeling and derive a proper initialization for large dataset training to improve the entity recognition performance for a large corpus.

    We conduct experiments on five types of entity recognition tasks including Chinese person names, food names, locations, points of interest (POIs), and activity names to demonstrate the improvements with the proposed web NER model construction framework.

    摘要 iv Abstract vi 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Overview of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Related Work 5 2.1 Supervised Sequence Labeling Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 Named Entity Recognition (NER) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.2 DNN-Based Sequence Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.3 Word Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Semi-supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.1 Distant Supervision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.2 Co-training Based Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Locality-Sensitive Hashing (LSH) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3 System Architecture 18 3.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.1 Preprocessing Pseudo Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Automatic Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.1 Automatic Labeling Pseudo Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3 Feature Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4 Self-testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.5 Evaluation and Category-Based Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4 Tri-training 36 4.1 Modification for the initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Co-labeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.3 Tri-training Pseudo Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5 Experiments 42 5.1 Corpus Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.2 Labeling Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.3 NER Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.4 Error Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.5 Self-testing Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.6 Tri-training Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.7 Tri-training Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6 Applications 60 7 Conclusions 64 Bibliography 66

    [1] K. N. Vavliakis, A. L. Symeonidis, and P. A. Mitkas, “Event identification in web social media
    through named entity recognition and topic modeling,” Data Knowl. Eng., vol. 88, pp. 1–24, Nov.
    2013, 􀶂􀶌􀶌􀶇: 0169-023X. 􀵽􀶈􀶂: 10.1016/j.datak.2013.08.006.
    [2] H.-M. Chuang, C.-H. Chang, and T.-Y. Kao, “Effective web crawling for chinese addresses and
    associated information,” in E-Commerce and Web Technologies, Springer, 2014, pp. 13–25.
    [3] A. Gattani, D. S. Lamba, N. Garera, M. Tiwari, X. Chai, S. Das, S. Subramaniam, A. Rajaraman,
    V. Harinarayan, and A. Doan, “Entity extraction, linking, classification, and tagging for social
    media: A wikipedia-based approach,” Proc. VLDB Endow., vol. 6, no. 11, pp. 1126–1137, Aug.
    2013, 􀶂􀶌􀶌􀶇: 2150-8097. 􀵽􀶈􀶂: 10.14778/2536222.2536237.
    [4] M. Hu and B. Liu, “Mining and summarizing customer reviews,” in Proceedings of the Tenth
    ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD
    ’04, Seattle, WA, USA: ACM, 2004, pp. 168–177, 􀶂􀶌􀵻􀶇: 1-58113-888-1. 􀵽􀶈􀶂: 10.1145/1014052.
    1014073.
    [5] E. Cambria, B. Schuller, Y. Xia, and C. Havasi, “New avenues in opinion mining and sentiment
    analysis,” IEEE Intelligent Systems, vol. 28, no. 2, pp. 15–21, Mar. 2013, 􀶂􀶌􀶌􀶇: 1541-1672. 􀵽􀶈􀶂:
    10.1109/MIS.2013.30.
    [6] Q. Zhang, Y. Gong, J. Wu, H. Huang, and X. Huang, “Retweet prediction with attention-based
    deep neural network,” in Proceedings of the 25th ACM International on Conference on Information
    and Knowledge Management, ser. CIKM ’16, Indianapolis, Indiana, USA: ACM, 2016, pp. 75–
    84, 􀶂􀶌􀵻􀶇: 978-1-4503-4073-1. 􀵽􀶈􀶂: 10.1145/2983323.2983809. [Online]. Available: http://doi.
    acm.org/10.1145/2983323.2983809.
    [7] J. Burger, C. Cardie, V. Chaudhri, R. Gaizauskas, S. Harabagiu, D. Israel, C. Jacquemin, C.-Y. Lin,
    S. Maiorano, G. Miller, D. Moldovan, B. Ogden, J. Prager, E. Riloff, A. Singhal, R. Shrihari, T.
    Strzalkowski, E. Voorhees, and R. Weischedel, “Issues, tasks and program structures to roadmap
    research in question & answering (Q&A),” NIST, Tech. Rep., 2001.
    [8] J. Lin, “The web as a resource for question answering: Perspectives and challenges,” in IN PROCEEDINGS
    OF THE THIRD INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES
    AND EVALUATION (LREC-2002, 2002.
    [9] J. R. Finkel, T. Grenager, and C. Manning, “Incorporating non-local information into information
    extraction systems by gibbs sampling,” in Proceedings of the 43rd Annual Meeting on Association
    for Computational Linguistics, ser. ACL ’05, Ann Arbor, Michigan: Association for Computational
    Linguistics, 2005, pp. 363–370. 􀵽􀶈􀶂: 10.3115/1219840.1219885.
    66
    [10] X. Qiu, Q. Zhang, and X. Huang, “Fudannlp: A toolkit for chinese natural language processing,”
    in Proceedings of Annual Meeting of the Association for Computational Linguistics, 2013.
    [11] J. An, S. Lee, and G. G. Lee, “Automatic acquisition of named entity tagged corpus from world
    wide web,” in Proceedings of the 41st Annual Meeting on Association for Computational Linguistics
    - Volume 2, ser. ACL ’03, Sapporo, Japan: Association for Computational Linguistics, 2003,
    pp. 165–168, 􀶂􀶌􀵻􀶇: 0-111-456789. 􀵽􀶈􀶂: 10.3115/1075178.1075207.
    [12] A. Rae, V. Murdock, A. Popescu, and H. Bouchard, “Mining the web for points of interest,” in
    Proceedings of the 35th International ACM SIGIR Conference on Research and Development in
    Information Retrieval, ser. SIGIR ’12, Portland, Oregon, USA: ACM, 2012, pp. 711–720, 􀶂􀶌􀵻􀶇:
    978-1-4503-1472-5. 􀵽􀶈􀶂: 10.1145/2348283.2348379.
    [13] C.-L. Chou, C.-H. Chang, and Y.-Y. Huang, “Boosted web named entity recognition via tri-training,”
    ACM Trans. Asian Low-Resour. Lang. Inf. Process., vol. 16, no. 2, 10:1–10:23, Oct. 2016, 􀶂􀶌􀶌􀶇:
    2375-4699. 􀵽􀶈􀶂: 10.1145/2963100. [Online]. Available: http://doi.acm.org/10.1145/2963100.
    [14] Z.-H. Zhou and M. Li, “Tri-training: Exploiting unlabeled data using three classifiers,” IEEE
    Trans. on Knowl. and Data Eng., vol. 17, no. 11, pp. 1529–1541, Nov. 2005, 􀶂􀶌􀶌􀶇: 1041-4347.
    􀵽􀶈􀶂: 10.1109/TKDE.2005.186.
    [15] P. Indyk and R. Motwani, “Approximate nearest neighbors: Towards removing the curse of dimensionality,”
    in Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing,
    ser. STOC ’98, Dallas, Texas, USA: ACM, 1998, pp. 604–613, 􀶂􀶌􀵻􀶇: 0-89791-962-9. 􀵽􀶈􀶂:
    10.1145/276698.276876. [Online]. Available: http://doi.acm.org/10.1145/276698.276876.
    [16] E. Kushilevitz, R. Ostrovsky, and Y. Rabani, “Efficient search for approximate nearest neighbor
    in high dimensional spaces,” SIAM J. Comput., vol. 30, no. 2, pp. 457–474, Apr. 2000, 􀶂􀶌􀶌􀶇: 0097-
    5397. 􀵽􀶈􀶂: 10 . 1137 / S0097539798347177. [Online]. Available: http : / / dx . doi . org / 10 . 1137 /
    S0097539798347177.
    [17] A. McCallum and W. Li, “Early results for named entity recognition with conditional random
    fields, feature induction and web-enhanced lexicons,” in Proceedings of the Seventh Conference
    on Natural Language Learning at HLT-NAACL 2003 - Volume 4, ser. CONLL ’03, Edmonton,
    Canada: Association for Computational Linguistics, 2003, pp. 188–191. 􀵽􀶈􀶂: 10.3115/1119176.
    1119206.
    [18] L. Satish and B. Gururaj, “Use of hidden markov models for partial discharge pattern classification,”
    Electrical Insulation, IEEE Transactions on, vol. 28, no. 2, pp. 172–182, Apr. 1993, 􀶂􀶌􀶌􀶇:
    0018-9367. 􀵽􀶈􀶂: 10.1109/14.212242.
    [19] A. McCallum, D. Freitag, and F. C. N. Pereira, “Maximum entropy markov models for information
    extraction and segmentation,” in Proceedings of the Seventeenth International Conference on
    Machine Learning, ser. ICML ’00, San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,
    2000, pp. 591–598, 􀶂􀶌􀵻􀶇: 1-55860-707-2.
    [20] A. E. Borthwick, “A maximum entropy approach to named entity recognition,” AAI9945252, PhD
    thesis, New York, NY, USA, 1999, 􀶂􀶌􀵻􀶇: 0-599-47232-4.
    [21] S. Sarawagi, “Information extraction,” Found. Trends databases, vol. 1, no. 3, pp. 261–377, Mar.
    2008, 􀶂􀶌􀶌􀶇: 1931-7883. 􀵽􀶈􀶂: 10.1561/1900000003.
    67
    [22] Y. Bengio, “Learning deep architectures for ai,” Found. Trends Mach. Learn., vol. 2, no. 1, pp. 1–
    127, Jan. 2009, 􀶂􀶌􀶌􀶇: 1935-8237. 􀵽􀶈􀶂: 10.1561/2200000006. [Online]. Available: http://dx.doi.
    org/10.1561/2200000006.
    [23] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation.,”
    in EMNLP, vol. 14, 2014, pp. 1532–1543.
    [24] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align
    and translate,” CoRR, vol. abs/1409.0473, 2014. arXiv: 1409 . 0473. [Online]. Available: http :
    //arxiv.org/abs/1409.0473.
    [25] K. Cho, B. van Merrienboer, D. Bahdanau, and Y. Bengio, “On the properties of neural machine
    translation: Encoder-decoder approaches,” CoRR, vol. abs/1409.1259, 2014. arXiv: 1409.1259.
    [Online]. Available: http://arxiv.org/abs/1409.1259.
    [26] J. Lafferty, “Conditional random fields: Probabilistic models for segmenting and labeling sequence
    data,” Morgan Kaufmann, 2001, pp. 282–289.
    [27] Z. Huang, W. Xu, and K. Yu, “Bidirectional lstm-crf models for sequence tagging.,” CoRR, vol. abs/
    1508.01991, 2015.
    [28] S. N. Kim, L. Wang, and T. Baldwin, “Tagging and linking web forum posts,” in Proceedings
    of the Fourteenth Conference on Computational Natural Language Learning, ser. CoNLL ’10,
    Uppsala, Sweden: Association for Computational Linguistics, 2010, pp. 192–202, 􀶂􀶌􀵻􀶇: 978-1-
    932432-83-1. [Online]. Available: http://dl.acm.org/citation.cfm?id=1870568.1870591.
    [29] L. Wang, “Knowledge discovery and extraction of domain-specific web data,” The University of
    Melbourne, 2014.
    [30] C. Sutton and A. McCallum, “Collective segmentation and labeling of distant entities in information
    extraction,” in ICML Workshop on Statistical Relational Learning and Its Connections to
    Other Fields. 2004.
    [31] A. Graves and J. Schmidhuber, “Offline handwriting recognition with multidimensional recurrent
    neural networks,” in Advances in Neural Information Processing Systems 21, D. Koller, D.
    Schuurmans, Y. Bengio, and L. Bottou, Eds., Curran Associates, Inc., 2009, pp. 545–552. [Online].
    Available: http : / / papers . nips . cc / paper / 3449 - offline - handwriting - recognition - with -
    multidimensional-recurrent-neural-networks.pdf.
    [32] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8,
    pp. 1735–1780, Nov. 1997, 􀶂􀶌􀶌􀶇: 0899-7667. 􀵽􀶈􀶂: 10.1162/neco.1997.9.8.1735. [Online]. Available:
    http://dx.doi.org/10.1162/neco.1997.9.8.1735.
    [33] K. Cho, B. van Merriënboer, Ç. Gülçehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio,
    “Learning phrase representations using rnn encoder–decoder for statistical machine translation,”
    in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing
    (EMNLP), Doha, Qatar: Association for Computational Linguistics, Oct. 2014, pp. 1724–1734.
    [Online]. Available: http://www.aclweb.org/anthology/D14-1179.
    68
    [34] S. Lai, L. Xu, K. Liu, and J. Zhao, “Recurrent convolutional neural networks for text classification,”
    in Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, ser. AAAI’15,
    Austin, Texas: AAAI Press, 2015, pp. 2267–2273, 􀶂􀶌􀵻􀶇: 0-262-51129-0. [Online]. Available: http:
    //dl.acm.org/citation.cfm?id=2886521.2886636.
    [35] T. Linzen, E. Dupoux, and Y. Goldberg, “Assessing the ability of lstms to learn syntax-sensitive
    dependencies,” TACL, vol. 4, pp. 521–535, 2016. [Online]. Available: https://transacl.org/ojs/
    index.php/tacl/article/view/972.
    [36] J. Weston, S. Chopra, and A. Bordes, “Memory networks,” CoRR, vol. abs/1410.3916, 2014.
    arXiv: 1410.3916. [Online]. Available: http://arxiv.org/abs/1410.3916.
    [37] Y. Bengio, Y. LeCun, and D. Henderson, “Globally trained handwritten word recognizer using spatial
    representation, convolutional neural networks, and hidden markov models,” in NIPS, Morgan
    Kaufmann, 1993, pp. 937–944.
    [38] X. Ma and E. H. Hovy, “End-to-end sequence labeling via bi-directional lstm-cnns-crf,” CoRR,
    vol. abs/1603.01354, 2016. arXiv: 1603.01354. [Online]. Available: http://arxiv.org/abs/1603.
    01354.
    [39] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, “Neural architectures
    for named entity recognition,” CoRR, vol. abs/1603.01360, 2016. arXiv: 1603.01360. [Online].
    Available: http://arxiv.org/abs/1603.01360.
    [40] F. Liu, T. Baldwin, and T. Cohn, “Capturing long-range contextual dependencies with memoryenhanced
    conditional random fields,” in Proceedings of the Eighth International Joint Conference
    on Natural Language Processing (IJCNLP 2017), Taipei, Taiwan, 2017, pp. 555–565.
    [41] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in
    vector space,” CoRR, vol. abs/1301.3781, 2013. arXiv: 1301 . 3781. [Online]. Available: http :
    //arxiv.org/abs/1301.3781.
    [42] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words
    and phrases and their compositionality,” in Proceedings of the 26th International Conference on
    Neural Information Processing Systems - Volume 2, ser. NIPS’13, Lake Tahoe, Nevada: Curran
    Associates Inc., 2013, pp. 3111–3119. [Online]. Available: http://dl.acm.org/citation.cfm?id=
    2999792.2999959.
    [43] M. Rei, G. K. O. Crichton, and S. Pyysalo, “Attending to characters in neural sequence labeling
    models,” in COLING 2016, 26th International Conference on Computational Linguistics,
    Proceedings of the Conference: Technical Papers, December 11-16, 2016, Osaka, Japan, 2016,
    pp. 309–318. [Online]. Available: http://aclweb.org/anthology/C/C16/C16-1030.pdf.
    [44] E. Riloff, J. Wiebe, and T. Wilson, “Learning subjective nouns using extraction pattern bootstrapping,”
    in Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL
    2003 - Volume 4, ser. CONLL ’03, Edmonton, Canada: Association for Computational Linguistics,
    2003, pp. 25–32. 􀵽􀶈􀶂: 10.3115/1119176.1119180.
    [45] K. P. Bennett and A. Demiriz, “Semi-supervised support vector machines,” in Proceedings of the
    1998 Conference on Advances in Neural Information Processing Systems II, Cambridge, MA,
    USA: MIT Press, 1999, pp. 368–374, 􀶂􀶌􀵻􀶇: 0-262-11245-0.
    69
    [46] A. Blum and T. Mitchell, “Combining labeled and unlabeled data with co-training,” in Proceedings
    of the Eleventh Annual Conference on Computational Learning Theory, ser. COLT’ 98, Madison,
    Wisconsin, USA: ACM, 1998, pp. 92–100, 􀶂􀶌􀵻􀶇: 1-58113-057-0. 􀵽􀶈􀶂: 10.1145/279943.279962.
    [47] O. Chapelle, B. Schölkopf, and A. Zien, Semi-Supervised Learning. 2006.
    [48] N. Yu and S. Kubler, “Semi-supervised learning for opinion detection,” 2014 IEEE/WIC/ACM International
    Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT),
    vol. 3, pp. 249–252, 2010. 􀵽􀶈􀶂: http://doi.ieeecomputersociety.org/10.1109/WI-IAT.2010.263.
    [49] R. K. Ando and T. Zhang, “A high-performance semi-supervised learning method for text chunking,”
    in Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics,
    ser. ACL ’05, Ann Arbor, Michigan: Association for Computational Linguistics, 2005, pp. 1–9.
    􀵽􀶈􀶂: 10.3115/1219840.1219841.
    [50] G. S. Mann and A. McCallum, “Generalized expectation criteria for semi-supervised learning with
    weakly labeled data,” J. Mach. Learn. Res., vol. 11, pp. 955–984, Mar. 2010, 􀶂􀶌􀶌􀶇: 1532-4435.
    [51] S. Pawar, G. K. Palshikar, and P. Bhattacharyya, “Relation extraction : A survey,” CoRR, vol. abs/
    1712.05191, 2017. eprint: 1712.05191. [Online]. Available: http://arxiv.org/abs/1712.05191.
    [52] H. Chuang, C. Chang, T. Kao, C. Cheng, Y. Huang, and K. Cheong, “Enabling maps/location
    searches on mobile devices: Constructing a POI database via focused crawling and information
    extraction,” International Journal of Geographical Information Science, vol. 30, no. 7, pp. 1405–
    1425, 2016. 􀵽􀶈􀶂: 10.1080/13658816.2015.1133820. [Online]. Available: https://doi.org/10.1080/
    13658816.2015.1133820.
    [53] M. Michelson and C. A. Knoblock, “Exploiting background knowledge to build reference sets
    for information extraction,” in Proceedings of the 21st International Jont Conference on Artifical
    Intelligence, ser. IJCAI’09, Pasadena, California, USA: Morgan Kaufmann Publishers Inc., 2009,
    pp. 2076–2082.
    [54] R. Snow, D. Jurafsky, and A. Y. Ng, “Learning syntactic patterns for automatic hypernym discovery,”
    in Advances in Neural Information Processing Systems 17, L. Saul, Y. Weiss, and L. Bottou,
    Eds., MIT Press, 2005, pp. 1297–1304.
    [55] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor, “Freebase: A collaboratively created
    graph database for structuring human knowledge,” in Proceedings of the 2008 ACM SIGMOD
    International Conference on Management of Data, ser. SIGMOD ’08, Vancouver, Canada: ACM,
    2008, pp. 1247–1250, 􀶂􀶌􀵻􀶇: 978-1-60558-102-6. 􀵽􀶈􀶂: 10.1145/1376616.1376746.
    [56] M. Mintz, S. Bills, R. Snow, and D. Jurafsky, “Distant supervision for relation extraction without
    labeled data,” in Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and
    the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume
    2 - Volume 2, ser. ACL ’09, Suntec, Singapore: Association for Computational Linguistics, 2009,
    pp. 1003–1011, 􀶂􀶌􀵻􀶇: 978-1-932432-46-6. [Online]. Available: http://dl.acm.org/citation.cfm?id=
    1690219.1690287.
    70
    [57] D. Zeng, K. Liu, Y. Chen, and J. Zhao, “Distant supervision for relation extraction via piecewise
    convolutional neural networks,” in Proceedings of the 2015 Conference on Empirical Methods
    in Natural Language Processing, Lisbon, Portugal: Association for Computational Linguistics,
    2015, pp. 1753–1762. 􀵽􀶈􀶂: 10.18653/v1/D15-1203. [Online]. Available: http://www.aclweb.org/
    anthology/D15-1203.
    [58] S. Riedel, L. Yao, and A. McCallum, “Modeling relations and their mentions without labeled
    text,” in Machine Learning and Knowledge Discovery in Databases, J. L. Balcázar, F. Bonchi, A.
    Gionis, and M. Sebag, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 148–163,
    􀶂􀶌􀵻􀶇: 978-3-642-15939-8.
    [59] B. Luo, Y. Feng, Z. Wang, Z. Zhu, S. Huang, R. Yan, and D. Zhao, “Learning with noise: Enhance
    distantly supervised relation extraction with dynamic transition matrix,” in Proceedings of the 55th
    Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver,
    Canada: Association for Computational Linguistics, 2017, pp. 430–439. 􀵽􀶈􀶂: 10.18653/
    v1/P17-1040. [Online]. Available: http://www.aclweb.org/anthology/P17-1040.
    [60] K. Nigam and R. Ghani, “Analyzing the effectiveness and applicability of co-training,” in Proceedings
    of the Ninth International Conference on Information and Knowledge Management,
    ser. CIKM ’00, McLean, Virginia, USA: ACM, 2000, pp. 86–93, 􀶂􀶌􀵻􀶇: 1-58113-320-0. 􀵽􀶈􀶂: 10.
    1145/354756.354805.
    [61] S. A. Goldman and Y. Zhou, “Enhancing supervised learning with unlabeled data,” in Proceedings
    of the Seventeenth International Conference on Machine Learning, ser. ICML ’00, San Francisco,
    CA, USA: Morgan Kaufmann Publishers Inc., 2000, pp. 327–334, 􀶂􀶌􀵻􀶇: 1-55860-707-2.
    [62] T. T. Nguyen, L. M. Nguyen, and A. Shimazu, “Using semi-supervised learning for question classification,”
    Information and Media Technologies, vol. 3, no. 1, pp. 112–130, 2008. 􀵽􀶈􀶂: 10.11185/
    imt.3.112.
    [63] G. S. Manku, A. Jain, and A. Das Sarma, “Detecting near-duplicates for web crawling,” in Proceedings
    of the 16th International Conference on World Wide Web, ser. WWW ’07, Banff, Alberta,
    Canada: ACM, 2007, pp. 141–150, 􀶂􀶌􀵻􀶇: 978-1-59593-654-7. 􀵽􀶈􀶂: 10.1145/1242572.1242592.
    [Online]. Available: http://doi.acm.org/10.1145/1242572.1242592.
    [64] A. S. Das, M. Datar, A. Garg, and S. Rajaram, “Google news personalization: Scalable online
    collaborative filtering,” in Proceedings of the 16th International Conference on World Wide Web,
    ser. WWW ’07, Banff, Alberta, Canada: ACM, 2007, pp. 271–280, 􀶂􀶌􀵻􀶇: 978-1-59593-654-7. 􀵽􀶈􀶂:
    10.1145/1242572.1242610. [Online]. Available: http://doi.acm.org/10.1145/1242572.1242610.
    [65] H. Koga, T. Ishibashi, and T. Watanabe, “Fast agglomerative hierarchical clustering algorithm
    using locality-sensitive hashing,” Knowl. Inf. Syst., vol. 12, no. 1, pp. 25–53, May 2007, 􀶂􀶌􀶌􀶇: 0219-
    1377. 􀵽􀶈􀶂: 10.1007/s10115-006-0027-5. [Online]. Available: http://dx.doi.org/10.1007/s10115-
    006-0027-5.
    [66] D. Brinza, M. Schultz, G. Tesler, and V. Bafna, “Rapid detection of gene-gene interactions in
    genome-wide association studies,” Bioinformatics, vol. 26 22, pp. 2856–62, 2010.
    [67] J. Wang, H. T. Shen, J. Song, and J. Ji, “Hashing for similarity search: A survey,” CoRR, vol. abs/
    1408.2927, 2014. arXiv: 1408.2927. [Online]. Available: http://arxiv.org/abs/1408.2927.
    71
    [68] C. Li, A. Sun, J. Weng, and Q. He, “Tweet segmentation and its application to named entity recognition,”
    Knowledge and Data Engineering, IEEE Transactions on, vol. 27, no. 2, pp. 558–570,
    Feb. 2015, 􀶂􀶌􀶌􀶇: 1041-4347. 􀵽􀶈􀶂: 10.1109/TKDE.2014.2327042.
    [69] Y. Jia, L. Bai, P. Wang, J. Guo, Y. Xie, and T. Yu, “Irrelevance reduction with locality-sensitive
    hash learning for efficient cross-media retrieval,” Multimedia Tools and Applications, Feb. 2018.
    􀵽􀶈􀶂: 10.1007/s11042-018-5692-3. [Online]. Available: https://doi.org/10.1007%2Fs11042-018-
    5692-3.
    [70] N. M. Toan and I. Yasushi, “Audio fingerprint hierarchy searching on massively parallel with
    multi-gpgpus using k-modes and lsh,” in 2016 Eighth International Conference on Knowledge
    and Systems Engineering (KSE), Oct. 2016, pp. 49–54. 􀵽􀶈􀶂: 10.1109/KSE.2016.7758028.
    [71] T. N. Mau and Y. Inoguchi, “Audio fingerprint hierarchy searching strategies on gpgpu massively
    parallel computer,” Journal of Information and Telecommunication, vol. 2, no. 3, pp. 265–290,
    2018. 􀵽􀶈􀶂: 10.1080/24751839.2018.1423790. eprint: https://doi.org/10.1080/24751839.2018.
    1423790. [Online]. Available: https://doi.org/10.1080/24751839.2018.1423790.
    [72] A. Rajaraman and J. D. Ullman, Mining of Massive Datasets. New York, NY, USA: Cambridge
    University Press, 2011, 􀶂􀶌􀵻􀶇: 1107015359, 9781107015357.
    [73] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, “Natural language
    processing (almost) from scratch,” J. Mach. Learn. Res., vol. 12, pp. 2493–2537, Nov. 2011, 􀶂􀶌􀶌􀶇:
    1532-4435. [Online]. Available: http://dl.acm.org/citation.cfm?id=1953048.2078186.
    [74] A. Lucene. (1999). Apache lucene text analyzer, [Online]. Available: https://lucene.apache.org.
    [75] L. Bergroth, H. Hakonen, and T. Raita, “A survey of longest common subsequence algorithms,” in
    Proceedings of the Seventh International Symposium on String Processing Information Retrieval
    (SPIRE’00), ser. SPIRE ’00, Washington, DC, USA: IEEE Computer Society, 2000, pp. 39–, 􀶂􀶌􀵻􀶇:
    0-7695-0746-8. [Online]. Available: http://dl.acm.org/citation.cfm?id=829519.830817.
    [76] R. T.-H. Tsai, S.-H. Wu, W.-C. Chou, Y.-C. Lin, D. He, J. Hsiang, T.-Y. Sung, and W.-L. Hsu,
    “Various criteria in the evaluation of biomedical named entity recognition,” BMC Bioinformatics,
    vol. 7, no. 1, p. 92, Feb. 2006, 􀶂􀶌􀶌􀶇: 1471-2105. 􀵽􀶈􀶂: 10.1186/1471-2105-7-92. [Online]. Available:
    https://doi.org/10.1186/1471-2105-7-92.
    [77] W. Chen, Y. Zhang, and H. Isahara, “Chinese chunking with tri-training learning,” in Proceedings
    of the 21st International Conference on Computer Processing of Oriental Languages: Beyond
    the Orient: The Research Challenges Ahead, ser. ICCPOL’06, Singapore: Springer-Verlag, 2006,
    pp. 466–473, 􀶂􀶌􀵻􀶇: 3-540-49667-X, 978-3-540-49667-0. 􀵽􀶈􀶂: 10.1007/11940098_49.
    [78] L. G. Valiant, “A theory of the learnable,” Commun. ACM, vol. 27, no. 11, pp. 1134–1142, Nov.
    1984, 􀶂􀶌􀶌􀶇: 0001-0782. 􀵽􀶈􀶂: 10.1145/1968.1972.
    [79] Y. Lin and C. Chang, “Facebook activity event extraction system,” in Proceedings of the 28th Conference
    on Computational Linguistics and Speech Processing, ROCLING 2016, National Cheng
    Kung University, Tainan, Taiwan, October 6-7, 2015, 2016. [Online]. Available: http://aclweb.
    org/anthology/O/O16/O16-1022.pdf.
    [80] G. Inc. (2013). Google asia mobile and user behavior survey, [Online]. Available: http://tappier.
    com/google-asia-mobile-and-user-behavior-survey.
    72
    [81] C.-Y. Chung, C.-L. Chou, and C.-H. Chang, “A study of restaurant information and food type
    extraction from ptt,” in Proceedings of the 29th Conference on Computational Linguistics and
    Speech Processing (ROCLING 2017), Taipei, Taiwan: The Association for Computational Linguistics
    and Chinese Language Processing (ACLCLP), 2017, pp. 183–196. [Online]. Available:
    http://aclweb.org/anthology/O17-1019.
    [82] K.-H. Hsu, H.-M. Chuang, C.-L. Chou, and C.-H. Chang, “Mining pois from web via poi recognition
    and relation verification,” in ROCLING, L.-W. Ku and Y. Tsao, Eds., The Association for
    Computational Linguistics and Chinese Language Processing (ACLCLP), 2017, pp. 53–67, 􀶂􀶌􀵻􀶇:
    978-986-95769-0-1.
    [83] C.-F. Chiang, C.-H. Chang, and C.-H. Liu, “Ptt disaster events extraction system,” in Proceedings
    of the Technologies and Applications of Artificial Intelligence (TAAI 2017), 2017.
    [84] K.-C. Chien and C.-H. Chang, “Leveraging memory enhanced condition random fields with convolutional
    and automatic lexical feature for chinese named entity recognition,” in Proceedings of
    the Technologies and Applications of Artificial Intelligence (TAAI 2018), 2018.
    [85] C.-H. Chang, “Multi-stack convolution with gating mechanism for chinese named entity recognition,”
    Master’s thesis, National Central University, 2018.
    [86] G. Levow, “The third international chinese language processing bakeoff: Word segmentation and
    named entity recognition,” in Proceedings of the Fifth Workshop on Chinese Language Processing,
    SIGHAN@COLING/ACL 2006, Sydney, Australia, July 22-23, 2006, 2006, pp. 108–117. [Online].
    Available: https://aclanthology.info/papers/W06-0115/w06-0115.
    [87] S. Junyi. (2013). Jieba, [Online]. Available: https://github.com/fxsjy/jieba.
    [88] K. Toutanova, D. Klein, C. D. Manning, and Y. Singer, “Feature-rich part-of-speech tagging with a
    cyclic dependency network,” in Proceedings of the 2003 Conference of the North American Chapter
    of the Association for Computational Linguistics on Human Language Technology - Volume 1,
    ser. NAACL ’03, Edmonton, Canada: Association for Computational Linguistics, 2003, pp. 173–
    180. 􀵽􀶈􀶂: 10.3115/1073445.1073478. [Online]. Available: https://doi.org/10.3115/1073445.
    1073478.

    QR CODE
    :::