基於半監督式學習的網路命名實體辨識模型訓練框架

簡易檢索 / 詳目顯示

回結果列表

研究生：	周建龍 Chien-Lung Chou
論文名稱：	基於半監督式學習的網路命名實體辨識模型訓練框架 A Framework for Web NER Model Trainingbased on Semi-supervised Learning
指導教授：	張嘉惠 Chia-Hui Chang
口試委員:
學位類別：	博士 Doctor
系所名稱：	資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering
論文出版年：	2019
畢業學年度：	107
語文別：	英文
論文頁數：	86
中文關鍵詞：	命名實體辨識、半監督式學習、Distant supervision 、局部敏感哈希、Tri-training
外文關鍵詞：	Named entity recognition (NER), Semi-supervised Learning, Distant supervision, Locality-Sensitive Hashing (LSH), Tri-training
相關次數：	點閱：15 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

命名實體辨識（NER）是自然語言理解中的一項重要任務，因為它可用來擷取文章中的關鍵實體（人名、地點名稱、組織名稱、日期、數字等）和對象（產品、歌曲、電影、活動名稱等）。
這些實體對於眾多相關應用至關重要，例如用於分析社交網絡上的公眾意見的意見分析以及用於進行交互式對話和提供智能客戶服務的智慧型交談系統。

然而，現有的自然語言處理（NLP）工具（例如Stanford named entity recognizer）僅可識別一般的命名實體(人名、地點名稱、組織名稱)，或者需準備特定格式的註釋訓練資料與特徵工程方可透過監督式學習訓練一客制的NER模型。
由於並非所有語言或命名實體都具有公開的NER工具可使用，因此構建NER模型培訓框架對於低資源的語言或罕見的命名實體提取至關重要。

要構建客制的NER模型，通常需要大量時間來準備，註釋和評估訓練/測試資料以及進行語言相關的特徵工程。
現有的研究依賴於帶註釋的訓練數據，這對於大量數據集的準備而言，時間與人工成本是非常昂貴的。
同時，這亦限制了命名實體辨識的有效性。
在本論文中，我們研究並開發一個基於半監督學習的網路命名實體辨識模型訓練框架，可利用Web上所收集的大量資料以及已知命名實體解決為自定義NER模型準備訓練語料庫的問題。

我們考量自動標記的有效性和效率問題以及與語言無關的特徵探勘來準備和註釋訓練數據。
自動標記的主要挑戰在於標記策略的選擇以避免由於短種子和長種子而導致的假陽性（false positive）和假陰性（false negative）訓練資料，以及巨量的語料庫和已知命名實體而導致的標記時間過長。

Distant supervision利用已知命名實體為關鍵字並收集搜索片段並當作訓練資料並不是一個新的概念;然而，當大量的已知命名實體（例如550k）和收集到的句子（例如2M）時，自動標記訓練資料的效率變得至關重要。
另外一個問題是用於監督學習的語言相關特徵挖掘。
此外，我們亦修改了序列標記的tri-training，並為大型數據集推導出適當的初始化公式，並提升tri-training於較大的資料集上的效能。

最後，我們對五種類型的實體識別任務進行了實驗，包括中式人名，食物名稱，地點名稱，興趣點（POI）和活動名稱的辨識，以證明所提出的Web NER模型構建框架是有效的。

Named entity recognition (NER) is an important task in natural language understanding because it extracts the key entities (e.g., person, organization, location, date, and number) and objects (e.g., product, song, movie, and activity name) mentioned in texts.
These entities are essential to numerous text applications, such as those used for analyzing public opinion on social networks, and to the interfaces used to conduct interactive conversations and provide intelligent customer services.

However, existing natural language processing (NLP) tools (such as Stanford named entity recognizer) recognize only general named entities or require annotated training examples and feature engineering for supervised model construction.
Since not all languages or entities have public NER support, constructing a framework for NER model training is essential for low-resource language or entity information extraction (IE).

Building a customized NER model often requires a significant amount of time to prepare, annotate, and evaluate the training/testing and language-dependent feature engineering.
Existing studies rely on annotated training data; however, it is quite expensive to obtain large datasets, thus limiting the effectiveness of recognition.
In this thesis, we examine the problem of developing a framework to prepare a training corpus from the web with known entities for custom NER model training via semi-supervised learning.

We consider the effectiveness and efficiency problems of automatic labeling and language-independent feature mining to prepare and annotate the training data.
The major challenge of automatic labeling lies in the choice of labeling strategies to avoid false positive and false negative examples, due to short and long seeds, and a long labeling time, due to large corpus and seed entities.

Distant supervision, which collects training sentences from search snippets with known entities is not new; however, the efficiency of automatic labeling becomes critical when dealing with a large number of known entities (e.g., 550k) and sentences (e.g., 2M).
Additionally, to address the language-dependent feature mining for supervised learning, we modify tri-training for sequence labeling and derive a proper initialization for large dataset training to improve the entity recognition performance for a large corpus.

We conduct experiments on five types of entity recognition tasks including Chinese person names, food names, locations, points of interest (POIs), and activity names to demonstrate the improvements with the proposed web NER model construction framework.

摘要 iv
Abstract vi
Introduction 1
1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Overview of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Related Work 5
1 Supervised Sequence Labeling Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1 Named Entity Recognition (NER) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 DNN-Based Sequence Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Word Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Semi-supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Distant Supervision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Co-training Based Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Locality-Sensitive Hashing (LSH) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
System Architecture 18
1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.1 Preprocessing Pseudo Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2 Automatic Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1 Automatic Labeling Pseudo Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 Feature Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4 Self-testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5 Evaluation and Category-Based Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Tri-training 36
1 Modification for the initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2 Co-labeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3 Tri-training Pseudo Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Experiments 42
1 Corpus Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2 Labeling Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3 NER Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4 Error Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5 Self-testing Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6 Tri-training Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7 Tri-training Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Applications 60
Conclusions 64
Bibliography 66
                                

[1] K. N. Vavliakis, A. L. Symeonidis, and P. A. Mitkas, “Event identification in web social media
through named entity recognition and topic modeling,” Data Knowl. Eng., vol. 88, pp. 1–24, Nov.
2013, 􀶂􀶌􀶌􀶇: 0169-023X. 􀵽􀶈􀶂: 10.1016/j.datak.2013.08.006.
[2] H.-M. Chuang, C.-H. Chang, and T.-Y. Kao, “Effective web crawling for chinese addresses and
associated information,” in E-Commerce and Web Technologies, Springer, 2014, pp. 13–25.
[3] A. Gattani, D. S. Lamba, N. Garera, M. Tiwari, X. Chai, S. Das, S. Subramaniam, A. Rajaraman,
V. Harinarayan, and A. Doan, “Entity extraction, linking, classification, and tagging for social
media: A wikipedia-based approach,” Proc. VLDB Endow., vol. 6, no. 11, pp. 1126–1137, Aug.
2013, 􀶂􀶌􀶌􀶇: 2150-8097. 􀵽􀶈􀶂: 10.14778/2536222.2536237.
[4] M. Hu and B. Liu, “Mining and summarizing customer reviews,” in Proceedings of the Tenth
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD
’04, Seattle, WA, USA: ACM, 2004, pp. 168–177, 􀶂􀶌􀵻􀶇: 1-58113-888-1. 􀵽􀶈􀶂: 10.1145/1014052.
1014073.
[5] E. Cambria, B. Schuller, Y. Xia, and C. Havasi, “New avenues in opinion mining and sentiment
analysis,” IEEE Intelligent Systems, vol. 28, no. 2, pp. 15–21, Mar. 2013, 􀶂􀶌􀶌􀶇: 1541-1672. 􀵽􀶈􀶂:
10.1109/MIS.2013.30.
[6] Q. Zhang, Y. Gong, J. Wu, H. Huang, and X. Huang, “Retweet prediction with attention-based
deep neural network,” in Proceedings of the 25th ACM International on Conference on Information
and Knowledge Management, ser. CIKM ’16, Indianapolis, Indiana, USA: ACM, 2016, pp. 75–
84, 􀶂􀶌􀵻􀶇: 978-1-4503-4073-1. 􀵽􀶈􀶂: 10.1145/2983323.2983809. [Online]. Available: http://doi.
acm.org/10.1145/2983323.2983809.
[7] J. Burger, C. Cardie, V. Chaudhri, R. Gaizauskas, S. Harabagiu, D. Israel, C. Jacquemin, C.-Y. Lin,
S. Maiorano, G. Miller, D. Moldovan, B. Ogden, J. Prager, E. Riloff, A. Singhal, R. Shrihari, T.
Strzalkowski, E. Voorhees, and R. Weischedel, “Issues, tasks and program structures to roadmap
research in question & answering (Q&A),” NIST, Tech. Rep., 2001.
[8] J. Lin, “The web as a resource for question answering: Perspectives and challenges,” in IN PROCEEDINGS
OF THE THIRD INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES
AND EVALUATION (LREC-2002, 2002.
[9] J. R. Finkel, T. Grenager, and C. Manning, “Incorporating non-local information into information
extraction systems by gibbs sampling,” in Proceedings of the 43rd Annual Meeting on Association
for Computational Linguistics, ser. ACL ’05, Ann Arbor, Michigan: Association for Computational
Linguistics, 2005, pp. 363–370. 􀵽􀶈􀶂: 10.3115/1219840.1219885.
66
[10] X. Qiu, Q. Zhang, and X. Huang, “Fudannlp: A toolkit for chinese natural language processing,”
in Proceedings of Annual Meeting of the Association for Computational Linguistics, 2013.
[11] J. An, S. Lee, and G. G. Lee, “Automatic acquisition of named entity tagged corpus from world
wide web,” in Proceedings of the 41st Annual Meeting on Association for Computational Linguistics
- Volume 2, ser. ACL ’03, Sapporo, Japan: Association for Computational Linguistics, 2003,
pp. 165–168, 􀶂􀶌􀵻􀶇: 0-111-456789. 􀵽􀶈􀶂: 10.3115/1075178.1075207.
[12] A. Rae, V. Murdock, A. Popescu, and H. Bouchard, “Mining the web for points of interest,” in
Proceedings of the 35th International ACM SIGIR Conference on Research and Development in
Information Retrieval, ser. SIGIR ’12, Portland, Oregon, USA: ACM, 2012, pp. 711–720, 􀶂􀶌􀵻􀶇:
978-1-4503-1472-5. 􀵽􀶈􀶂: 10.1145/2348283.2348379.
[13] C.-L. Chou, C.-H. Chang, and Y.-Y. Huang, “Boosted web named entity recognition via tri-training,”
ACM Trans. Asian Low-Resour. Lang. Inf. Process., vol. 16, no. 2, 10:1–10:23, Oct. 2016, 􀶂􀶌􀶌􀶇:
2375-4699. 􀵽􀶈􀶂: 10.1145/2963100. [Online]. Available: http://doi.acm.org/10.1145/2963100.
[14] Z.-H. Zhou and M. Li, “Tri-training: Exploiting unlabeled data using three classifiers,” IEEE
Trans. on Knowl. and Data Eng., vol. 17, no. 11, pp. 1529–1541, Nov. 2005, 􀶂􀶌􀶌􀶇: 1041-4347.
􀵽􀶈􀶂: 10.1109/TKDE.2005.186.
[15] P. Indyk and R. Motwani, “Approximate nearest neighbors: Towards removing the curse of dimensionality,”
in Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing,
ser. STOC ’98, Dallas, Texas, USA: ACM, 1998, pp. 604–613, 􀶂􀶌􀵻􀶇: 0-89791-962-9. 􀵽􀶈􀶂:
10.1145/276698.276876. [Online]. Available: http://doi.acm.org/10.1145/276698.276876.
[16] E. Kushilevitz, R. Ostrovsky, and Y. Rabani, “Efficient search for approximate nearest neighbor
in high dimensional spaces,” SIAM J. Comput., vol. 30, no. 2, pp. 457–474, Apr. 2000, 􀶂􀶌􀶌􀶇: 0097-
5397. 􀵽􀶈􀶂: 10 . 1137 / S0097539798347177. [Online]. Available: http : / / dx . doi . org / 10 . 1137 /
S0097539798347177.
[17] A. McCallum and W. Li, “Early results for named entity recognition with conditional random
fields, feature induction and web-enhanced lexicons,” in Proceedings of the Seventh Conference
on Natural Language Learning at HLT-NAACL 2003 - Volume 4, ser. CONLL ’03, Edmonton,
Canada: Association for Computational Linguistics, 2003, pp. 188–191. 􀵽􀶈􀶂: 10.3115/1119176.
1119206.
[18] L. Satish and B. Gururaj, “Use of hidden markov models for partial discharge pattern classification,”
Electrical Insulation, IEEE Transactions on, vol. 28, no. 2, pp. 172–182, Apr. 1993, 􀶂􀶌􀶌􀶇:
0018-9367. 􀵽􀶈􀶂: 10.1109/14.212242.
[19] A. McCallum, D. Freitag, and F. C. N. Pereira, “Maximum entropy markov models for information
extraction and segmentation,” in Proceedings of the Seventeenth International Conference on
Machine Learning, ser. ICML ’00, San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,
2000, pp. 591–598, 􀶂􀶌􀵻􀶇: 1-55860-707-2.
[20] A. E. Borthwick, “A maximum entropy approach to named entity recognition,” AAI9945252, PhD
thesis, New York, NY, USA, 1999, 􀶂􀶌􀵻􀶇: 0-599-47232-4.
[21] S. Sarawagi, “Information extraction,” Found. Trends databases, vol. 1, no. 3, pp. 261–377, Mar.
2008, 􀶂􀶌􀶌􀶇: 1931-7883. 􀵽􀶈􀶂: 10.1561/1900000003.
67
[22] Y. Bengio, “Learning deep architectures for ai,” Found. Trends Mach. Learn., vol. 2, no. 1, pp. 1–
127, Jan. 2009, 􀶂􀶌􀶌􀶇: 1935-8237. 􀵽􀶈􀶂: 10.1561/2200000006. [Online]. Available: http://dx.doi.
org/10.1561/2200000006.
[23] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation.,”
in EMNLP, vol. 14, 2014, pp. 1532–1543.
[24] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align
and translate,” CoRR, vol. abs/1409.0473, 2014. arXiv: 1409 . 0473. [Online]. Available: http :
//arxiv.org/abs/1409.0473.
[25] K. Cho, B. van Merrienboer, D. Bahdanau, and Y. Bengio, “On the properties of neural machine
translation: Encoder-decoder approaches,” CoRR, vol. abs/1409.1259, 2014. arXiv: 1409.1259.
[Online]. Available: http://arxiv.org/abs/1409.1259.
[26] J. Lafferty, “Conditional random fields: Probabilistic models for segmenting and labeling sequence
data,” Morgan Kaufmann, 2001, pp. 282–289.
[27] Z. Huang, W. Xu, and K. Yu, “Bidirectional lstm-crf models for sequence tagging.,” CoRR, vol. abs/
1508.01991, 2015.
[28] S. N. Kim, L. Wang, and T. Baldwin, “Tagging and linking web forum posts,” in Proceedings
of the Fourteenth Conference on Computational Natural Language Learning, ser. CoNLL ’10,
Uppsala, Sweden: Association for Computational Linguistics, 2010, pp. 192–202, 􀶂􀶌􀵻􀶇: 978-1-
932432-83-1. [Online]. Available: http://dl.acm.org/citation.cfm?id=1870568.1870591.
[29] L. Wang, “Knowledge discovery and extraction of domain-specific web data,” The University of
Melbourne, 2014.
[30] C. Sutton and A. McCallum, “Collective segmentation and labeling of distant entities in information
extraction,” in ICML Workshop on Statistical Relational Learning and Its Connections to
Other Fields. 2004.
[31] A. Graves and J. Schmidhuber, “Offline handwriting recognition with multidimensional recurrent
neural networks,” in Advances in Neural Information Processing Systems 21, D. Koller, D.
Schuurmans, Y. Bengio, and L. Bottou, Eds., Curran Associates, Inc., 2009, pp. 545–552. [Online].
Available: http : / / papers . nips . cc / paper / 3449 - offline - handwriting - recognition - with -
multidimensional-recurrent-neural-networks.pdf.
[32] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8,
pp. 1735–1780, Nov. 1997, 􀶂􀶌􀶌􀶇: 0899-7667. 􀵽􀶈􀶂: 10.1162/neco.1997.9.8.1735. [Online]. Available:
http://dx.doi.org/10.1162/neco.1997.9.8.1735.
[33] K. Cho, B. van Merriënboer, Ç. Gülçehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio,
“Learning phrase representations using rnn encoder–decoder for statistical machine translation,”
in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing
(EMNLP), Doha, Qatar: Association for Computational Linguistics, Oct. 2014, pp. 1724–1734.
[Online]. Available: http://www.aclweb.org/anthology/D14-1179.
68
[34] S. Lai, L. Xu, K. Liu, and J. Zhao, “Recurrent convolutional neural networks for text classification,”
in Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, ser. AAAI’15,
Austin, Texas: AAAI Press, 2015, pp. 2267–2273, 􀶂􀶌􀵻􀶇: 0-262-51129-0. [Online]. Available: http:
//dl.acm.org/citation.cfm?id=2886521.2886636.
[35] T. Linzen, E. Dupoux, and Y. Goldberg, “Assessing the ability of lstms to learn syntax-sensitive
dependencies,” TACL, vol. 4, pp. 521–535, 2016. [Online]. Available: https://transacl.org/ojs/
index.php/tacl/article/view/972.
[36] J. Weston, S. Chopra, and A. Bordes, “Memory networks,” CoRR, vol. abs/1410.3916, 2014.
arXiv: 1410.3916. [Online]. Available: http://arxiv.org/abs/1410.3916.
[37] Y. Bengio, Y. LeCun, and D. Henderson, “Globally trained handwritten word recognizer using spatial
representation, convolutional neural networks, and hidden markov models,” in NIPS, Morgan
Kaufmann, 1993, pp. 937–944.
[38] X. Ma and E. H. Hovy, “End-to-end sequence labeling via bi-directional lstm-cnns-crf,” CoRR,
vol. abs/1603.01354, 2016. arXiv: 1603.01354. [Online]. Available: http://arxiv.org/abs/1603.
01354.
[39] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, “Neural architectures
for named entity recognition,” CoRR, vol. abs/1603.01360, 2016. arXiv: 1603.01360. [Online].
Available: http://arxiv.org/abs/1603.01360.
[40] F. Liu, T. Baldwin, and T. Cohn, “Capturing long-range contextual dependencies with memoryenhanced
conditional random fields,” in Proceedings of the Eighth International Joint Conference
on Natural Language Processing (IJCNLP 2017), Taipei, Taiwan, 2017, pp. 555–565.
[41] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in
vector space,” CoRR, vol. abs/1301.3781, 2013. arXiv: 1301 . 3781. [Online]. Available: http :
//arxiv.org/abs/1301.3781.
[42] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words
and phrases and their compositionality,” in Proceedings of the 26th International Conference on
Neural Information Processing Systems - Volume 2, ser. NIPS’13, Lake Tahoe, Nevada: Curran
Associates Inc., 2013, pp. 3111–3119. [Online]. Available: http://dl.acm.org/citation.cfm?id=
2999792.2999959.
[43] M. Rei, G. K. O. Crichton, and S. Pyysalo, “Attending to characters in neural sequence labeling
models,” in COLING 2016, 26th International Conference on Computational Linguistics,
Proceedings of the Conference: Technical Papers, December 11-16, 2016, Osaka, Japan, 2016,
pp. 309–318. [Online]. Available: http://aclweb.org/anthology/C/C16/C16-1030.pdf.
[44] E. Riloff, J. Wiebe, and T. Wilson, “Learning subjective nouns using extraction pattern bootstrapping,”
in Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL
2003 - Volume 4, ser. CONLL ’03, Edmonton, Canada: Association for Computational Linguistics,
2003, pp. 25–32. 􀵽􀶈􀶂: 10.3115/1119176.1119180.
[45] K. P. Bennett and A. Demiriz, “Semi-supervised support vector machines,” in Proceedings of the
1998 Conference on Advances in Neural Information Processing Systems II, Cambridge, MA,
USA: MIT Press, 1999, pp. 368–374, 􀶂􀶌􀵻􀶇: 0-262-11245-0.
69
[46] A. Blum and T. Mitchell, “Combining labeled and unlabeled data with co-training,” in Proceedings
of the Eleventh Annual Conference on Computational Learning Theory, ser. COLT’ 98, Madison,
Wisconsin, USA: ACM, 1998, pp. 92–100, 􀶂􀶌􀵻􀶇: 1-58113-057-0. 􀵽􀶈􀶂: 10.1145/279943.279962.
[47] O. Chapelle, B. Schölkopf, and A. Zien, Semi-Supervised Learning. 2006.
[48] N. Yu and S. Kubler, “Semi-supervised learning for opinion detection,” 2014 IEEE/WIC/ACM International
Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT),
vol. 3, pp. 249–252, 2010. 􀵽􀶈􀶂: http://doi.ieeecomputersociety.org/10.1109/WI-IAT.2010.263.
[49] R. K. Ando and T. Zhang, “A high-performance semi-supervised learning method for text chunking,”
in Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics,
ser. ACL ’05, Ann Arbor, Michigan: Association for Computational Linguistics, 2005, pp. 1–9.
􀵽􀶈􀶂: 10.3115/1219840.1219841.
[50] G. S. Mann and A. McCallum, “Generalized expectation criteria for semi-supervised learning with
weakly labeled data,” J. Mach. Learn. Res., vol. 11, pp. 955–984, Mar. 2010, 􀶂􀶌􀶌􀶇: 1532-4435.
[51] S. Pawar, G. K. Palshikar, and P. Bhattacharyya, “Relation extraction : A survey,” CoRR, vol. abs/
1712.05191, 2017. eprint: 1712.05191. [Online]. Available: http://arxiv.org/abs/1712.05191.
[52] H. Chuang, C. Chang, T. Kao, C. Cheng, Y. Huang, and K. Cheong, “Enabling maps/location
searches on mobile devices: Constructing a POI database via focused crawling and information
extraction,” International Journal of Geographical Information Science, vol. 30, no. 7, pp. 1405–
1425, 2016. 􀵽􀶈􀶂: 10.1080/13658816.2015.1133820. [Online]. Available: https://doi.org/10.1080/
13658816.2015.1133820.
[53] M. Michelson and C. A. Knoblock, “Exploiting background knowledge to build reference sets
for information extraction,” in Proceedings of the 21st International Jont Conference on Artifical
Intelligence, ser. IJCAI’09, Pasadena, California, USA: Morgan Kaufmann Publishers Inc., 2009,
pp. 2076–2082.
[54] R. Snow, D. Jurafsky, and A. Y. Ng, “Learning syntactic patterns for automatic hypernym discovery,”
in Advances in Neural Information Processing Systems 17, L. Saul, Y. Weiss, and L. Bottou,
Eds., MIT Press, 2005, pp. 1297–1304.
[55] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor, “Freebase: A collaboratively created
graph database for structuring human knowledge,” in Proceedings of the 2008 ACM SIGMOD
International Conference on Management of Data, ser. SIGMOD ’08, Vancouver, Canada: ACM,
2008, pp. 1247–1250, 􀶂􀶌􀵻􀶇: 978-1-60558-102-6. 􀵽􀶈􀶂: 10.1145/1376616.1376746.
[56] M. Mintz, S. Bills, R. Snow, and D. Jurafsky, “Distant supervision for relation extraction without
labeled data,” in Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and
the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume
2 - Volume 2, ser. ACL ’09, Suntec, Singapore: Association for Computational Linguistics, 2009,
pp. 1003–1011, 􀶂􀶌􀵻􀶇: 978-1-932432-46-6. [Online]. Available: http://dl.acm.org/citation.cfm?id=
1690219.1690287.
70
[57] D. Zeng, K. Liu, Y. Chen, and J. Zhao, “Distant supervision for relation extraction via piecewise
convolutional neural networks,” in Proceedings of the 2015 Conference on Empirical Methods
in Natural Language Processing, Lisbon, Portugal: Association for Computational Linguistics,
2015, pp. 1753–1762. 􀵽􀶈􀶂: 10.18653/v1/D15-1203. [Online]. Available: http://www.aclweb.org/
anthology/D15-1203.
[58] S. Riedel, L. Yao, and A. McCallum, “Modeling relations and their mentions without labeled
text,” in Machine Learning and Knowledge Discovery in Databases, J. L. Balcázar, F. Bonchi, A.
Gionis, and M. Sebag, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 148–163,
􀶂􀶌􀵻􀶇: 978-3-642-15939-8.
[59] B. Luo, Y. Feng, Z. Wang, Z. Zhu, S. Huang, R. Yan, and D. Zhao, “Learning with noise: Enhance
distantly supervised relation extraction with dynamic transition matrix,” in Proceedings of the 55th
Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver,
Canada: Association for Computational Linguistics, 2017, pp. 430–439. 􀵽􀶈􀶂: 10.18653/
v1/P17-1040. [Online]. Available: http://www.aclweb.org/anthology/P17-1040.
[60] K. Nigam and R. Ghani, “Analyzing the effectiveness and applicability of co-training,” in Proceedings
of the Ninth International Conference on Information and Knowledge Management,
ser. CIKM ’00, McLean, Virginia, USA: ACM, 2000, pp. 86–93, 􀶂􀶌􀵻􀶇: 1-58113-320-0. 􀵽􀶈􀶂: 10.
1145/354756.354805.
[61] S. A. Goldman and Y. Zhou, “Enhancing supervised learning with unlabeled data,” in Proceedings
of the Seventeenth International Conference on Machine Learning, ser. ICML ’00, San Francisco,
CA, USA: Morgan Kaufmann Publishers Inc., 2000, pp. 327–334, 􀶂􀶌􀵻􀶇: 1-55860-707-2.
[62] T. T. Nguyen, L. M. Nguyen, and A. Shimazu, “Using semi-supervised learning for question classification,”
Information and Media Technologies, vol. 3, no. 1, pp. 112–130, 2008. 􀵽􀶈􀶂: 10.11185/
imt.3.112.
[63] G. S. Manku, A. Jain, and A. Das Sarma, “Detecting near-duplicates for web crawling,” in Proceedings
of the 16th International Conference on World Wide Web, ser. WWW ’07, Banff, Alberta,
Canada: ACM, 2007, pp. 141–150, 􀶂􀶌􀵻􀶇: 978-1-59593-654-7. 􀵽􀶈􀶂: 10.1145/1242572.1242592.
[Online]. Available: http://doi.acm.org/10.1145/1242572.1242592.
[64] A. S. Das, M. Datar, A. Garg, and S. Rajaram, “Google news personalization: Scalable online
collaborative filtering,” in Proceedings of the 16th International Conference on World Wide Web,
ser. WWW ’07, Banff, Alberta, Canada: ACM, 2007, pp. 271–280, 􀶂􀶌􀵻􀶇: 978-1-59593-654-7. 􀵽􀶈􀶂:
10.1145/1242572.1242610. [Online]. Available: http://doi.acm.org/10.1145/1242572.1242610.
[65] H. Koga, T. Ishibashi, and T. Watanabe, “Fast agglomerative hierarchical clustering algorithm
using locality-sensitive hashing,” Knowl. Inf. Syst., vol. 12, no. 1, pp. 25–53, May 2007, 􀶂􀶌􀶌􀶇: 0219-
1377. 􀵽􀶈􀶂: 10.1007/s10115-006-0027-5. [Online]. Available: http://dx.doi.org/10.1007/s10115-
006-0027-5.
[66] D. Brinza, M. Schultz, G. Tesler, and V. Bafna, “Rapid detection of gene-gene interactions in
genome-wide association studies,” Bioinformatics, vol. 26 22, pp. 2856–62, 2010.
[67] J. Wang, H. T. Shen, J. Song, and J. Ji, “Hashing for similarity search: A survey,” CoRR, vol. abs/
1408.2927, 2014. arXiv: 1408.2927. [Online]. Available: http://arxiv.org/abs/1408.2927.
71
[68] C. Li, A. Sun, J. Weng, and Q. He, “Tweet segmentation and its application to named entity recognition,”
Knowledge and Data Engineering, IEEE Transactions on, vol. 27, no. 2, pp. 558–570,
Feb. 2015, 􀶂􀶌􀶌􀶇: 1041-4347. 􀵽􀶈􀶂: 10.1109/TKDE.2014.2327042.
[69] Y. Jia, L. Bai, P. Wang, J. Guo, Y. Xie, and T. Yu, “Irrelevance reduction with locality-sensitive
hash learning for efficient cross-media retrieval,” Multimedia Tools and Applications, Feb. 2018.
􀵽􀶈􀶂: 10.1007/s11042-018-5692-3. [Online]. Available: https://doi.org/10.1007%2Fs11042-018-
5692-3.
[70] N. M. Toan and I. Yasushi, “Audio fingerprint hierarchy searching on massively parallel with
multi-gpgpus using k-modes and lsh,” in 2016 Eighth International Conference on Knowledge
and Systems Engineering (KSE), Oct. 2016, pp. 49–54. 􀵽􀶈􀶂: 10.1109/KSE.2016.7758028.
[71] T. N. Mau and Y. Inoguchi, “Audio fingerprint hierarchy searching strategies on gpgpu massively
parallel computer,” Journal of Information and Telecommunication, vol. 2, no. 3, pp. 265–290,
2018. 􀵽􀶈􀶂: 10.1080/24751839.2018.1423790. eprint: https://doi.org/10.1080/24751839.2018.
1423790. [Online]. Available: https://doi.org/10.1080/24751839.2018.1423790.
[72] A. Rajaraman and J. D. Ullman, Mining of Massive Datasets. New York, NY, USA: Cambridge
University Press, 2011, 􀶂􀶌􀵻􀶇: 1107015359, 9781107015357.
[73] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, “Natural language
processing (almost) from scratch,” J. Mach. Learn. Res., vol. 12, pp. 2493–2537, Nov. 2011, 􀶂􀶌􀶌􀶇:
1532-4435. [Online]. Available: http://dl.acm.org/citation.cfm?id=1953048.2078186.
[74] A. Lucene. (1999). Apache lucene text analyzer, [Online]. Available: https://lucene.apache.org.
[75] L. Bergroth, H. Hakonen, and T. Raita, “A survey of longest common subsequence algorithms,” in
Proceedings of the Seventh International Symposium on String Processing Information Retrieval
(SPIRE’00), ser. SPIRE ’00, Washington, DC, USA: IEEE Computer Society, 2000, pp. 39–, 􀶂􀶌􀵻􀶇:
0-7695-0746-8. [Online]. Available: http://dl.acm.org/citation.cfm?id=829519.830817.
[76] R. T.-H. Tsai, S.-H. Wu, W.-C. Chou, Y.-C. Lin, D. He, J. Hsiang, T.-Y. Sung, and W.-L. Hsu,
“Various criteria in the evaluation of biomedical named entity recognition,” BMC Bioinformatics,
vol. 7, no. 1, p. 92, Feb. 2006, 􀶂􀶌􀶌􀶇: 1471-2105. 􀵽􀶈􀶂: 10.1186/1471-2105-7-92. [Online]. Available:
https://doi.org/10.1186/1471-2105-7-92.
[77] W. Chen, Y. Zhang, and H. Isahara, “Chinese chunking with tri-training learning,” in Proceedings
of the 21st International Conference on Computer Processing of Oriental Languages: Beyond
the Orient: The Research Challenges Ahead, ser. ICCPOL’06, Singapore: Springer-Verlag, 2006,
pp. 466–473, 􀶂􀶌􀵻􀶇: 3-540-49667-X, 978-3-540-49667-0. 􀵽􀶈􀶂: 10.1007/11940098_49.
[78] L. G. Valiant, “A theory of the learnable,” Commun. ACM, vol. 27, no. 11, pp. 1134–1142, Nov.
1984, 􀶂􀶌􀶌􀶇: 0001-0782. 􀵽􀶈􀶂: 10.1145/1968.1972.
[79] Y. Lin and C. Chang, “Facebook activity event extraction system,” in Proceedings of the 28th Conference
on Computational Linguistics and Speech Processing, ROCLING 2016, National Cheng
Kung University, Tainan, Taiwan, October 6-7, 2015, 2016. [Online]. Available: http://aclweb.
org/anthology/O/O16/O16-1022.pdf.
[80] G. Inc. (2013). Google asia mobile and user behavior survey, [Online]. Available: http://tappier.
com/google-asia-mobile-and-user-behavior-survey.
72
[81] C.-Y. Chung, C.-L. Chou, and C.-H. Chang, “A study of restaurant information and food type
extraction from ptt,” in Proceedings of the 29th Conference on Computational Linguistics and
Speech Processing (ROCLING 2017), Taipei, Taiwan: The Association for Computational Linguistics
and Chinese Language Processing (ACLCLP), 2017, pp. 183–196. [Online]. Available:
http://aclweb.org/anthology/O17-1019.
[82] K.-H. Hsu, H.-M. Chuang, C.-L. Chou, and C.-H. Chang, “Mining pois from web via poi recognition
and relation verification,” in ROCLING, L.-W. Ku and Y. Tsao, Eds., The Association for
Computational Linguistics and Chinese Language Processing (ACLCLP), 2017, pp. 53–67, 􀶂􀶌􀵻􀶇:
978-986-95769-0-1.
[83] C.-F. Chiang, C.-H. Chang, and C.-H. Liu, “Ptt disaster events extraction system,” in Proceedings
of the Technologies and Applications of Artificial Intelligence (TAAI 2017), 2017.
[84] K.-C. Chien and C.-H. Chang, “Leveraging memory enhanced condition random fields with convolutional
and automatic lexical feature for chinese named entity recognition,” in Proceedings of
the Technologies and Applications of Artificial Intelligence (TAAI 2018), 2018.
[85] C.-H. Chang, “Multi-stack convolution with gating mechanism for chinese named entity recognition,”
Master’s thesis, National Central University, 2018.
[86] G. Levow, “The third international chinese language processing bakeoff: Word segmentation and
named entity recognition,” in Proceedings of the Fifth Workshop on Chinese Language Processing,
SIGHAN@COLING/ACL 2006, Sydney, Australia, July 22-23, 2006, 2006, pp. 108–117. [Online].
Available: https://aclanthology.info/papers/W06-0115/w06-0115.
[87] S. Junyi. (2013). Jieba, [Online]. Available: https://github.com/fxsjy/jieba.
[88] K. Toutanova, D. Klein, C. D. Manning, and Y. Singer, “Feature-rich part-of-speech tagging with a
cyclic dependency network,” in Proceedings of the 2003 Conference of the North American Chapter
of the Association for Computational Linguistics on Human Language Technology - Volume 1,
ser. NAACL ’03, Edmonton, Canada: Association for Computational Linguistics, 2003, pp. 173–
180. 􀵽􀶈􀶂: 10.3115/1073445.1073478. [Online]. Available: https://doi.org/10.3115/1073445.
1073478.

簡易檢索 / 詳目顯示

相關論文