旅館評論自動分析與歸納系統｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	賴威廷 Wei-Ting Lai
論文名稱：	旅館評論自動分析與歸納系統 Automatic Analysis and Summarization System for Hotel Reviews
指導教授：	鄭旭詠 Hsu-Yung Cheng
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering
論文出版年：	2020
畢業學年度：	108
語文別：	中文
論文頁數：	55
中文關鍵詞：	深度學習、句子邊界檢測、文字分析
外文關鍵詞：	Deep learning, Sentence Boundary Detection, Text Analysis
相關次數：	點閱：7 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著現在網路的快速發展，任何人都可以輕而易舉的在網路上留下評論，表達自己的觀點或感受。這些評論都是很重要的數據，但如果單純利用人工進行查詢、統計這些龐大的數據，顯然是困難且缺乏效率的。而透過自然語言處理分析可以快速的知道人們對產品、服務等具體意見。因此本篇論文實作一個旅館評論自動分析與歸納系統，能夠有效的取代以往人工查看評論尋找資訊的過程，以節省時間。
在旅館評論自動分析與歸納系統中，我們訓練了句子邊界檢測模型，改善在面對網路評論中常見的文法或標點符號錯誤時，基於統計模型或文法規則的方式難以處理的問題，提升了句子分割的效果。並透過訓練情緒分析、關鍵字提取模型，找出旅館中最多人提到的優、缺點，將使用者評論按照關鍵字分類，接著透過分群演算法統整相關的關鍵字。最後將成果以網頁呈現，讓使用者能便利的進行查詢。

With the rapid development of the Internet nowadays, it's easy for anyone to leave a comment on the web, expressing their own opinions or feelings. These comments are important data, but it would be difficult and inefficient to conduct manual inquiries and statistics on these large data set. Using Natural Language Processing technology, it is possible to quickly know people's specific opinions on products, services, etc. Therefore, this paper implements an automatic analysis and summarization system for hotel reviews. It can effectively save the time of manually checking comments and looking for information.
In the automatic analysis and summarization system, we have trained a sentence boundary detection model. Improve the problem of grammatical or punctuation errors commonly found in online comments, which difficult to handle by statistical models or grammatical rules. Through training on sentiment analysis and keyword extraction models, we identify the most frequently mentioned strengths and weaknesses of hotels. The comments are categorized by keywords, and the clustering algorithm is used to organize the relevant keywords. Finally, the results are presented on a web page for users to conveniently make inquiries.

摘要 I
Abstract II
目錄 III
圖目錄 V
表目錄 VI
第一章 緒論 1
1 研究動機 1
2 研究背景 1
3 研究目的 2
4 論文架構 3
第二章 相關背景與研究 4
1 詞嵌入(Word Embedding) 4
2 降維(Dimensionality Reduction) 5
2.1 主成分分析(Principal Component Analysis) 5
2.2 後處理演算法(Post-Processing Algorithm) 6
2.3 詞嵌入降維 7
3 句子邊界檢測(Sentence Boundary Detection)相關研究 8
4 情緒分析(Sentiment Analysis)相關研究 9
5 關鍵字提取(Keyword Extraction)相關研究 10
6 語言模型(Language Model)相關研究 12
6.1 ELMo 13
6.2 BERT 13
第三章 系統說明與分析方法 17
1 系統架構介紹 17
2 句子邊界檢測    18
3 情緒分析 21
4 關鍵字提取 23
5 關鍵字合併 25
6 關鍵字分群 27
7 使用者介面 29
第四章 實驗結果與分析 32
1 測試資料 32
2 評估指標 32
3 實驗結果與分析 34
3.1 實驗一: 不同模型對情緒分析之結果 34
3.2 實驗二: 不同模型對句子邊界檢測之結果 35
3.3 實驗三: 不同模型對關鍵字提取之結果 36
3.4 實驗四: 不同維度對關鍵字分群之影響 36
3.5 實驗五: 時間評估 39
第五章 結論與未來研究方向 40
參考文獻 41
                                

[1] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, “Natural Language Processing (Almost) from Scratch,” J. Mach. Learn. Res., vol. 12, pp. 2493--2537, 2011.
[2] E. F. T. K. Sang and F. D. Meulder, “Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition,” in Proceedings of CoNLL-2003 and the 7th Conference on Natural Language Learning, pp. 142--147, 2003.
[3] P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, “Squad: 100,000+ questions for machine comprehension of text,” arXiv preprint arXiv:1606.05250, 2016.
[4] Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin, “A Neural Probabilistic Language Model,” J. Mach. Learn. Res., vol. 3, pp. 1137--1155, 2003.
[5] R. Collobert and J. Weston, “A unified architecture for natural language processing: deep neural networks with multitask learning,” in ICML, 2008.
[6] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” ICLR Workshop, 2013.
[7] V. Raunak, “Effective Dimensionality Reduction for Word Embeddings.,” CoRR, vol. abs/1708.03629, 2019.
[8] J. Mu and P. Viswanath, “All-but-the-Top: Simple and Effective Postprocessing for Word Representations.,” in ICLR (Poster), 2018.
[9] K. Pearson, “On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, vol. 2, no. 6, pp. 559–572, 1901.
[10] Gregory Grefenstette and Pasi Tapanainen , “What is a word, what is a sentence?: problems of Tokenisation.,” In Proceedings of the 3rd International Conference on Computational Lexicography (COMPLEX '94), pp. 79-87, 1994.
[11] J. C. Reynar and A. Ratnaparkhi, “A Maximum Entropy Approach to Identifying Sentence Boundaries.,” in ANLP, pp. 16–19, 1997.
[12] “DeepSegment.” [Online]. Available:https://github.com/notAI-tech/deepsegment.
[13] T. Kiss and J. Strunk, “Unsupervised Multilingual Sentence Boundary Detection.,” Computational Linguistics, vol. 32, no. 4, pp. 485–525, 2006.
[14] C. J. Hutto and E. Gilbert, “VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text.,” in ICWSM, 2014.
[15] R. Socher et al., “Recursive deep models for semantic compositionality over a sentiment treebank,” in Proceedings of the conference on empirical methods in natural language processing (EMNLP), vol. 1631, p. 1642, 2013.
[16] K. S. Tai, R. Socher, and C. D. Manning, “Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks.,” in ACL (1), pp. 1556–1566, 2015.
[17] S. D. Gollapalli and X.-L. Li, “Keyphrase Extraction using Sequential Labeling.,” CoRR, vol. abs/1608.00329, 2016.
[18] R. Alzaidy, C. Caragea, and C. L. Giles, “Bi-LSTM-CRF Sequence Labeling for Keyphrase Extraction from Scholarly Documents.,” in WWW, pp. 2551–2557, 2019.
[19] D. Sahrawat et al., “Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings.,” in ECIR (2), vol. 12036, pp. 328–335, 2020.
[20] R. Mihalcea and P. Tarau, “TextRank: Bringing Order into Texts,” in Proceedings of EMNLP-04and the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelonaand Spain, 2004.
[21] S. Rose, D. Engel, N. Cramer, and W. Cowley, “Automatic Keyword Extraction from Individual Documents,” in Text Mining. Applications and Theory, M. W. Berry and J. Kogan, Eds. John Wiley and Sons, Ltd, pp. 1--20, 2010.
[22] R. Campos, V. Mangaravite, A. Pasquali, A. M. Jorge, C. Nunes, and A. Jatowt, “A Text Feature Based Automatic Keyword Extraction Method for Single Documents.,” in ECIR, vol. 10772, pp. 684–691, 2018.
[23] J. Howard and S. Ruder, “Universal Language Model Fine-tuning for Text Classification,” in ACL, 2018.
[24] M. E. Peters et al., “Deep Contextualized Word Representations.,” in NAACL-HLT, pp. 2227–2237, 2018.
[25] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training.,” 2018.
[26] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv preprint arXiv:1810.04805, 2018.
[27] A. Vaswani et al., “Attention is All you Need,” in Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. Curran Associates, pp. 5998–6008, Inc., 2017.
[28] P. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” J. Comput. Appl. Math., vol. 20, no. 1, pp. 53--65, 1987.
[29] H. Wachsmuth, M. Trenkmann, B. Stein, G. Engels, and T. Palakarska, “A Review Corpus for Argumentation Analysis.,” in CICLing (2), vol. 8404, pp. 115–127, 2014.
[30] Z. Huang, W. Xu, and K. Yu, “Bidirectional LSTM-CRF Models for Sequence Tagging.,” CoRR, vol. abs/1508.01991, 2015.

簡易檢索 / 詳目顯示

相關論文