跳到主要內容

簡易檢索 / 詳目顯示

研究生: 楊景都
Ching-Tu Yang
論文名稱: An automatic approach for finding keywords to classify opposite concepts
指導教授: 陳彥良
Yen-Liang Chen
口試委員:
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理學系
Department of Information Management
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 64
中文關鍵詞: 資訊檢索意見探勘文本情感分析種子集
外文關鍵詞: Information Retrieval, Opinion Mining, Sentiment Analysis, Seed Set
相關次數: 點閱:15下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著科技的發展以及資料量的成長,意見探勘(Opinion Mining)已是近年來資訊檢索領域中最熱門的任務之一。就意見探勘來說,若我們想了解一個文本的語意傾向,首先就得去分析其中每個詞彙的語意傾向。根據過往研究,詞彙層級的意見探勘有許多方法,例如:語料庫法、字典法以及種子集法。本研究主要將聚焦探討種子集法,種子集是一組標有正向及負向標籤的詞彙列表,透過種子集,學者才得以去開展更多關於意見探勘的研究和應用。關於傳統種子集法的缺陷,由於過往研究多是透過人工挑選,或是引用前人資料的方式來獲取種子集,然而,倘若今日使用者想探討其他對立的概念,就人工挑選來說,如此又得重新耗費時間、人力及成本來建置種子集;而就引用過往的研究來說,使用者也很難在短時間內去找到和概念對應的種子集。為了解決過往研究所面臨的問題,本研究會使用詞彙資料庫WordNet去萃取和概念語意相關的詞彙,並導入詞向量的概念word2vec以篩出和概念語意相似度較高的詞彙,最後再利用Google Search去保留使用度較高的詞彙以作為種子集。本研究的貢獻在於提出一套自動化的方法,以為任一語意對立的概念去選取種子集,此方法除了能增進種子集選取的效率,且也不會受限於傳統的正負向概念,而是能依照使用者任意指定的對立概念去建立種子集。除了能拿所挑選的種子集去去判斷詞彙的語意傾向,使用者也能使用種子集去進行更多意見探勘的應用。


    With the development of technology and the growth of data volume, Opinion Mining has become one of the most popular tasks in the field of Information Retrieval in recent years. Regarding the details of Opinion Mining, if we want to understand the semantic tendency of a text, then we must first analyze the semantic intention of each words in the text. According to past studies, there are many methods for Word-level opinion exploration, such as Corpus method, Dictionary method and Seed Set method. This study will mainly focus on Seed Set method. A seed set is a list of words with positive and negative labels. Through the seed set, scholars can carry out more research and application about Opinion Mining. Regarding the shortcomings of traditional ways to construct seed sets, most studies have obtained seed sets by manual selection or citing past research data. However, if today users want to explore other opposing concepts, then it’s hard for them to re-establish a seed set or find the right resources in a short time. In order to solve the problems of the past research, this study will first use the lexical database WordNet to extract words related to the opposing concepts. After extracting, we will introduce the vector tool word2vec to screen out the words with higher similarity to concepts, finally we will use Google Search to retain words with more popularity as a seed set of the opposing concepts. The contribution of this research is to propose an automated approach to select a seed set for any semantically opposite concept, our approach can not only boost the efficiency of seed set selection, but also is not limited to the traditional opposite concepts (positive and negative).

    摘要 i ABSTRACT ii CONTENTS iii LIST OF FIGURES vi LIST OF TABLES vii Chapter 1 緒論 1 1.1 研究背景 1 1.2 研究動機 2 1.3 研究目的 3 Chapter 2 文獻探討 5 2.1 詞彙級別的意見探勘方法 5 2.1.1 基於語料庫的方法 5 2.1.2 基於字典的方法 6 2.1.3 種子集法 7 2.2 種子集的建立方式 8 2.3 基礎理論 8 2.3.1 WordNet 9 2.3.2 word2vec 10 2.3.3 PMI (pointwise mutual information) 11 Chapter 3 研究方法 13 3.1 研究架構 13 3.2 WordNet 14 3.2.1 方法介紹 14 3.2.2 實例 17 3.3 word2vec 19 3.3.1 方法介紹 19 3.3.2 實例 21 3.4 Google Search 23 3.4.1 方法介紹 23 3.4.2 實例 24 3.5 詞彙分類 25 3.5.1 方法介紹 25 3.5.2 實例 26 Chapter 4 實驗 28 4.1 實驗設計 28 4.2 實驗細節 29 4.2.1 過往研究已討過的對立概念 29 4.2.2 過往研究未曾探討的對立概念 31 4.3 實驗結果 34 Chapter 5 結論 35 5.1 研究發現 35 5.2 研究限制與未來發展 36 Reference 37 附錄一:正向詞彙清單 39 附錄二:負向詞彙清單 44 附錄三:「狗」詞彙清單 49 附錄四:「貓」詞彙清單 53

    [1] Baroni, M., & Vegnaduzzo, S. (2004). Identifying subjective adjectives through web-based mutual information. Paper presented at the Proceedings of KONVENS.
    [2] Bjørkelund, E., Burnett, T. H., & Nørvåg, K. (2012). A study of opinion mining and visualization of hotel reviews. Paper presented at the Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services.
    [3] Church, K. W., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational linguistics, 16(1), 22-29.
    [4] Esuli, A., & Sebastiani, F. (2006). Determining term subjectivity and term orientation for opinion mining. Paper presented at the 11th Conference of the European Chapter of the Association for Computational Linguistics.
    [5] Esuli, A., & Sebastiani, F. (2006). Sentiwordnet: A publicly available lexical resource for opinion mining. Paper presented at the LREC.
    [6] Handler, A. (2014). An empirical study of semantic similarity in WordNet and Word2Vec.
    [7] Hatzivassiloglou, V., & McKeown, K. R. (1997). Predicting the semantic orientation of adjectives. Paper presented at the Proceedings of the 35th annual meeting of the association for computational linguistics and eighth conference of the european chapter of the association for computational linguistics.
    [8] Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. Paper presented at the Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining.
    [9] Kamps, J., Marx, M., Mokken, R. J., & De Rijke, M. (2004). Using WordNet to measure semantic orientations of adjectives. Paper presented at the LREC.
    [10] Kiritchenko, S., Zhu, X., & Mohammad, S. M. (2014). Sentiment analysis of short informal texts. Journal of Artificial Intelligence Research, 50, 723-762.
    [11] Liu, B., & Zhang, L. (2012). A survey of opinion mining and sentiment analysis. In Mining text data (pp. 415-463): Springer.
    [12] Mao, H., Gao, P., Wang, Y., & Bollen, J. (2014). Automatic construction of financial semantic orientation lexicon from large-scale Chinese news corpus. Institut Louis Bachelier, 20 (2), 1-18.
    [13] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
    [14] Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Paper presented at the Advances in neural information processing systems.
    [15] Mikolov, T., Yih, W.-t., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. Paper presented at the Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
    [16] Miller, G. A. (1995). WordNet: a lexical database for English. Communications of the ACM, 38(11), 39-41.
    [17] Mohammad, S., Dunne, C., & Dorr, B. (2009). Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus. Paper presented at the Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2-Volume 2.
    [18] Oliveira, N., Cortez, P., & Areal, N. (2016). Stock market sentiment lexicon acquisition using microblogging data and statistical measures. Decision Support Systems, 85, 62-73.
    [19] Pedersen, T., Patwardhan, S., & Michelizzi, J. (2004). WordNet:: Similarity: measuring the relatedness of concepts. Paper presented at the Demonstration papers at HLT-NAACL 2004.
    [20] Qiu, G., Liu, B., Bu, J., & Chen, C. (2011). Opinion word expansion and target extraction through double propagation. Computational linguistics, 37(1), 9-27.
    [21] Rao, D., & Ravichandran, D. (2009). Semi-supervised polarity lexicon induction. Paper presented at the Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics.
    [22] Stone, P. J., Dunphy, D. C., & Smith, M. S. (1966). The general inquirer: A computer approach to content analysis.
    [23] Tsytsarau, M., & Palpanas, T. (2012). Survey on mining subjective data on the web. Data Mining and Knowledge Discovery, 24(3), 478-514.
    [24] Turney, P. D. (2001). Mining the web for synonyms: PMI-IR versus LSA on TOEFL. Paper presented at the European conference on machine learning.
    [25] Turney, P. D., & Littman, M. L. (2003). Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information Systems (TOIS), 21(4), 315-346.
    [26] Williams, G. K., & Anand, S. S. (2009). Predicting the polarity strength of adjectives using wordnet. Paper presented at the Third International AAAI Conference on Weblogs and Social Media.
    [27] Wilson, T., Wiebe, J., & Hoffmann, P. (2005). Recognizing contextual polarity in phrase-level sentiment analysis. Paper presented at the Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing.

    QR CODE
    :::