利用輔助語句與BERT模型偵測詞彙的上下位關係｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	曾莊 Chuang Tseng
論文名稱：	利用輔助語句與BERT模型偵測詞彙的上下位關係 Hypernym and Hyponym Detection Based on Auxiliary Sentences and the BERT Model
指導教授：	陳弘軒
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 軟體工程研究所 Graduate Institute of Software Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	中文
論文頁數：	56
中文關鍵詞：	詞向量、BERT 語言模型、微調、上下關係
相關次數：	點閱：17 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

詞向量模型是一種利用文本的上下文關係產生詞彙相應之向量的技術。通常，我們可利用詞向量間的餘弦相似度來計算兩個詞彙間的相關程度。然而，我們卻難以利用詞向量來偵測兩個詞彙是否具備上位詞-下位詞的關係。另外，由於上下關係是一種不對稱的語義關係，即使給定一對具備上位詞-下位詞關係詞彙，我們也難以採用一般對稱的距離量度來決定何者為上位詞、何者為下位詞。
本論文提出一個基於 BERT 預訓練語言模型搭配額外建構的輔助語句來判斷一對詞彙的上下關係，任務共分兩階段。階段一：判斷詞對是否具有上下關係。若階段一的結果為真，則進入階段二：判斷何者為上位詞，何者為下位詞。經過實驗，我們發現兩種建構輔助語句的方式：BERT+Q 和 BERT+Q+PosNeg 能有效地利用詞向量判斷階段一及階段二的任務。

The word embedding model is a technique that utilizes contextual words to generate a vector for each word, which is called word embedding. Usually, we can use the cosine similarity between a pair of word embeddings
to calculate the relevance score between the two words. However, it is difficult to use word embeddings to detect the hypernym-hyponym relationship between two words. In addition, being an asymmetric semantic relationship, even when given a pair of vocabularies with a hypernym-hyponym relationship, it is challenging to apply general distance measures, which
are often symmetric, to determine which is the hypernym and which is the
hyponym.
This thesis proposes a model based on a BERT pre-trained model with auxiliary sentences to determine the hypernym-hyponym relationship of a pair of words. The entire process is consisted of two tasks. First, when given a pair of words, the model determines whether the word pair has a hypernym-hyponym relationship. Then, if the result is true, the model proceeds to the second task: distinguishing the hypernym and the hyponym. Experimental results show that two approaches to construct auxiliary sentences, BERT+Q and BERT+Q+PosNeg, can effectively accomplish both tasks.

摘要 iv
Abstract v
目錄 vii
圖目錄 x
表目錄 xi
一、 緒論 1
1 研究動機 1
2 研究目標 2
3 研究貢獻 2
4 論文架構 2
二、 相關研究 3
1 加強詞向量的同反義字詞辨別能力 3
1.1 Retrofitting 3
1.2 JointReps 3
2 加強詞向量的上下位字詞辨別能力 4
2.1 HyperVec 4
2.2 Poincaré 6
2.3 LEAR 6
2.4 HWE 8
2.5 Roller and Erk 9
2.6 Shwartz 10
2.7 BiRRE 10
3 Language Model(e.g., BERT) 加上輔助句子的研究 11
3.1 使用輔助句子幫助「面相情感分析」任務 11
3.2 使用輔助句子幫助「文字分類」任務 12
三、 模型及方法 15
1 模型架構 15
2 Task1 模型 16
3 Task2 模型 18
4 損失函數 19
四、 實驗結果 20
1 實驗參數細節 20
2 訓練用資料集 20
2.1 訓練用資料集 Shartz 介紹 20
2.2 自 WordNet 蒐集的上下關係資料集 21
2.3 用於 SVM 訓練資料集介紹 21
3 實驗一：task1 模型評量 22
3.1 實驗一評估用資料集 Shwartz、Kotlerman、BLESS、Baroni、Levy 介紹 22
3.2 Kotlerman、BLESS、Baroni、Levy、Shwartz 資料集實驗結果 23
3.3 Shwartz 資料集實驗結果  26
4 實驗二：task2 評量結果 30
4.1 評估用資料集 BLESShyper 介紹 30
4.2 評估用資料集 BIBLESS 介紹 30
4.3 BLESShyper 實驗結果 30
4.4 BIBLESS 實驗結果 31
5 實驗三：task1 + task2 評量結果 31
5.1 評估用資料集 BIBLESS 介紹 32
5.2 評估用資料集 Hyperlex 介紹 32
5.3 Bibless 資料集實驗結果 33
5.4 HyperLex 資料集實驗結果 34
6 實驗四：task1 Pos-neg 接 task2(Q, Pos-neg, AB) 36
6.1 BIBLESS 資料集實驗結果 36
6.2 HyperLex 實驗結果 37
7 實驗五：task1+task2 用於樹狀結構預測 37
五、 總結 38
1 結論 38
2 未來展望 39
參考文獻 40
                                

[1] Z. S. Harris, “Distributional structure,” Word, vol. 10, no. 2-3, pp. 146–162, 1954.
[2] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” arXiv preprint arXiv:1310.4546, 2013.
[3] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543.
[4] M. Faruqui, J. Dodge, S. K. Jauhar, C. Dyer, E. Hovy, and N. A. Smith, “Retrofitting word vectors to semantic lexicons,” in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2015, pp. 1606–1615.
[5] I. Vulić and N. Mrkšić, “Specialising word vectors for lexical entailment,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2018, pp. 1134–1145.
[6] M. Alsuhaibani, D. Bollegala, T. Maehara, and K.-i. Kawarabayashi, “Jointly learning word embeddings using a corpus and a knowledge base,” PloS one, vol. 13, no. 3, e0193094, 2018.
[7] K. A. Nguyen, M. Köper, S. S. i. Walde, and N. T. Vu, “Hierarchical embeddings for hypernymy detection and directionality,” arXiv preprint arXiv:1707.07273, 2017.
[8] M. Nickel and D. Kiela, “Poincaré embeddings for learning hierarchical representations,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, pp. 6341–6350.
[9] M. Alsuhaibani, T. Maehara, and D. Bollegala, “Joint learning of hierarchical word embeddings from a corpus and a taxonomy,” in Automated Knowledge Base Construction (AKBC), 2018.
[10] V. Shwartz, Y. Goldberg, and I. Dagan, “Improving hypernymy detection with an integrated path-based and distributional method,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2016, pp. 2389–2398.
[11] C. Wang and X. He, “BiRRE: Learning bidirectional residual relation embeddings for supervised hypernymy detection,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online: Association for Computational Linguistics, Jul. 2020, pp. 3630–3640. doi: 10.18653/v1/2020.aclmain.334. [Online]. Available: https://www.aclweb.org/anthology/2020.aclmain.334.
[12] C. Sun, L. Huang, and X. Qiu, “Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 380–385. doi: 10.18653/v1/N19 1035. [Online]. Available: https://www.aclweb.org/anthology/N19-1035.
[13] S. Yu, J. Su, and D. Luo, “Improving bert-based text classification with auxiliary sentence and domain knowledge,” IEEE Access, vol. 7, pp. 176 600–176 612, 2019. doi: 10.1109/ACCESS.2019.2953990.
[14] K. A. Nguyen, S. Schulte im Walde, and N. T. Vu, “Integrating distributional lexical contrast into word embeddings for antonym-synonym distinction,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume2: Short Papers), Berlin, Germany: Association for Computational Linguistics, Aug. 2016, pp. 454–459. doi: 10 . 18653 / v1 / P16 - 2074. [Online]. Available: https ://aclanthology.org/P16-2074.
[15] S. Roller and K. Erk, “Relations such as hypernymy: Identifying and exploiting hearst patterns in distributional vectors for lexical entailment,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas: Association for Computational Linguistics, Nov. 2016, pp. 2163–
2172. doi: 10.18653/v1/D16-1234. [Online]. Available: https://www.aclweb.org/anthology/D16-1234.
[16] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[17] D. Bollegala, M. Alsuhaibani, T. Maehara, and K.-i. Kawarabayashi, “Joint word representation learning using a corpus and a semantic lexicon,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, 2016.
[18] G. Glavaš and S. P. Ponzetto, “Dual tensor model for detecting asymmetric lexicosemantic relations,” in Proceedings of the 2017 Conference on Empirical Methods
in Natural Language Processing, 2017, pp. 1757–1767.
[19] C. Fellbaum, “Wordnet,” in Theory and applications of ontology: computer applications, Springer, 2010, pp. 231–243.
[20] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
[21] I. Vulić, D. Gerz, D. Kiela, F. Hill, and A. Korhonen, “Hyperlex: A large-scale evaluation of graded lexical entailment,” Computational Linguistics, vol. 43, no. 4, pp. 781–835, 2017.
[22] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M.Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
[23] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives, “Dbpedia: A nucleus for a web of open data,” in The Semantic Web, K. Aberer, K.-S. Choi,
N. Noy, D. Allemang, K.-I. Lee, L. Nixon, J. Golbeck, P. Mika, D. Maynard, R.Mizoguchi, G. Schreiber, and P. Cudré-Mauroux, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2007, pp. 722–735, isbn: 978-3-540-76298-0.
[24] D. Vrandečić, “Wikidata: A new platform for collaborative data collection,” in Proceedings of the 21st International Conference on World Wide Web, ser. WWW ’12 Companion, Lyon, France: Association for Computing Machinery, 2012, pp. 1063–1064, isbn: 9781450312301. doi: 10.1145/2187980.2188242. [Online]. Available: https://doi.org/10.1145/2187980.2188242.
[25] F. M. Suchanek, G. Kasneci, and G. Weikum, “Yago: A core of semantic knowledge,” in Proceedings of the 16th international conference on World Wide Web, 2007, pp. 697–706.
[26] L. KOTLERMAN, I. DAGAN, I. SZPEKTOR, and M. ZHITOMIRSKY-GEFFET, “Directional distributional similarity for lexical inference,” Natural Language Engineering, vol. 16, no. 4, pp. 359–389, 2010.
[27] M. Baroni and A. Lenci, “How we blessed distributional semantic evaluation,” in Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics, 2011, pp. 1–10.
[28] M. Baroni, R. Bernardi, N.-Q. Do, and C.-c. Shan, “Entailment above the word level in distributional semantics,” in Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, 2012, pp. 23–32.
[29] O. Levy, I. Dagan, and J. Goldberger, “Focused entailment graphs for open ie propositions,” in Proceedings of the Eighteenth Conference on Computational Natural Language Learning, 2014, pp. 87–97.
[30] M. Zhitomirsky-Geffet and I. Dagan, “Bootstrapping distributional feature vector quality,” Computational linguistics, vol. 35, no. 3, pp. 435–461, 2009.
[31] A. Lenci and G. Benotto, “Identifying hypernyms in distributional semantic spaces,” in * SEM 2012: The First Joint Conference on Lexical and Computational Semantics–Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), 2012, pp. 75–79.
[32] D. Kiela, L. Rimell, I. Vulic, and S. Clark, “Exploiting image generality for lexical entailment detection,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL 2015), ACL; East Stroudsburg, PA, 2015, pp. 119–124.
[33] S. Roller, D. Kiela, and M. Nickel, “Hearst patterns revisited: Automatic hypernym detection from large text corpora,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Melbourne, Australia: Association for Computational Linguistics, Jul. 2018, pp. 358– 363. doi: 10.18653/v1/P18-2057. [Online]. Available: https://www.aclweb.org/anthology/P182057.
[34] M. Le, S. Roller, L. Papaxanthos, D. Kiela, and M. Nickel, “Inferring concept hierarchies from text corpora via hyperbolic embeddings,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy: Association for Computational Linguistics, Jul. 2019, pp. 3231–3241. doi: 10.18653/v1/P19-1313. [Online]. Available: https://www.aclweb.org/anthology/P19-1313.
[35] I. Vulić, D. Gerz, D. Kiela, F. Hill, and A. Korhonen, “HyperLex: A large-scale evaluation of graded lexical entailment,” Computational Linguistics, vol. 43, no. 4,pp. 781–835, Dec. 2017. doi: 10.1162/COLI_a_00301. [Online]. Available: https://www.aclweb.org/anthology/J17-4004.

簡易檢索 / 詳目顯示

相關論文