| 研究生: |
賴威廷 Wei-Ting Lai |
|---|---|
| 論文名稱: |
旅館評論自動分析與歸納系統 Automatic Analysis and Summarization System for Hotel Reviews |
| 指導教授: |
鄭旭詠
Hsu-Yung Cheng |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 中文 |
| 論文頁數: | 55 |
| 中文關鍵詞: | 深度學習 、句子邊界檢測 、文字分析 |
| 外文關鍵詞: | Deep learning, Sentence Boundary Detection, Text Analysis |
| 相關次數: | 點閱:7 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著現在網路的快速發展,任何人都可以輕而易舉的在網路上留下評論,表達自己的觀點或感受。這些評論都是很重要的數據,但如果單純利用人工進行查詢、統計這些龐大的數據,顯然是困難且缺乏效率的。而透過自然語言處理分析可以快速的知道人們對產品、服務等具體意見。因此本篇論文實作一個旅館評論自動分析與歸納系統,能夠有效的取代以往人工查看評論尋找資訊的過程,以節省時間。
在旅館評論自動分析與歸納系統中,我們訓練了句子邊界檢測模型,改善在面對網路評論中常見的文法或標點符號錯誤時,基於統計模型或文法規則的方式難以處理的問題,提升了句子分割的效果。並透過訓練情緒分析、關鍵字提取模型,找出旅館中最多人提到的優、缺點,將使用者評論按照關鍵字分類,接著透過分群演算法統整相關的關鍵字。最後將成果以網頁呈現,讓使用者能便利的進行查詢。
With the rapid development of the Internet nowadays, it's easy for anyone to leave a comment on the web, expressing their own opinions or feelings. These comments are important data, but it would be difficult and inefficient to conduct manual inquiries and statistics on these large data set. Using Natural Language Processing technology, it is possible to quickly know people's specific opinions on products, services, etc. Therefore, this paper implements an automatic analysis and summarization system for hotel reviews. It can effectively save the time of manually checking comments and looking for information.
In the automatic analysis and summarization system, we have trained a sentence boundary detection model. Improve the problem of grammatical or punctuation errors commonly found in online comments, which difficult to handle by statistical models or grammatical rules. Through training on sentiment analysis and keyword extraction models, we identify the most frequently mentioned strengths and weaknesses of hotels. The comments are categorized by keywords, and the clustering algorithm is used to organize the relevant keywords. Finally, the results are presented on a web page for users to conveniently make inquiries.
[1] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, “Natural Language Processing (Almost) from Scratch,” J. Mach. Learn. Res., vol. 12, pp. 2493--2537, 2011.
[2] E. F. T. K. Sang and F. D. Meulder, “Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition,” in Proceedings of CoNLL-2003 and the 7th Conference on Natural Language Learning, pp. 142--147, 2003.
[3] P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, “Squad: 100,000+ questions for machine comprehension of text,” arXiv preprint arXiv:1606.05250, 2016.
[4] Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin, “A Neural Probabilistic Language Model,” J. Mach. Learn. Res., vol. 3, pp. 1137--1155, 2003.
[5] R. Collobert and J. Weston, “A unified architecture for natural language processing: deep neural networks with multitask learning,” in ICML, 2008.
[6] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” ICLR Workshop, 2013.
[7] V. Raunak, “Effective Dimensionality Reduction for Word Embeddings.,” CoRR, vol. abs/1708.03629, 2019.
[8] J. Mu and P. Viswanath, “All-but-the-Top: Simple and Effective Postprocessing for Word Representations.,” in ICLR (Poster), 2018.
[9] K. Pearson, “On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, vol. 2, no. 6, pp. 559–572, 1901.
[10] Gregory Grefenstette and Pasi Tapanainen , “What is a word, what is a sentence?: problems of Tokenisation.,” In Proceedings of the 3rd International Conference on Computational Lexicography (COMPLEX '94), pp. 79-87, 1994.
[11] J. C. Reynar and A. Ratnaparkhi, “A Maximum Entropy Approach to Identifying Sentence Boundaries.,” in ANLP, pp. 16–19, 1997.
[12] “DeepSegment.” [Online]. Available:https://github.com/notAI-tech/deepsegment.
[13] T. Kiss and J. Strunk, “Unsupervised Multilingual Sentence Boundary Detection.,” Computational Linguistics, vol. 32, no. 4, pp. 485–525, 2006.
[14] C. J. Hutto and E. Gilbert, “VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text.,” in ICWSM, 2014.
[15] R. Socher et al., “Recursive deep models for semantic compositionality over a sentiment treebank,” in Proceedings of the conference on empirical methods in natural language processing (EMNLP), vol. 1631, p. 1642, 2013.
[16] K. S. Tai, R. Socher, and C. D. Manning, “Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks.,” in ACL (1), pp. 1556–1566, 2015.
[17] S. D. Gollapalli and X.-L. Li, “Keyphrase Extraction using Sequential Labeling.,” CoRR, vol. abs/1608.00329, 2016.
[18] R. Alzaidy, C. Caragea, and C. L. Giles, “Bi-LSTM-CRF Sequence Labeling for Keyphrase Extraction from Scholarly Documents.,” in WWW, pp. 2551–2557, 2019.
[19] D. Sahrawat et al., “Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings.,” in ECIR (2), vol. 12036, pp. 328–335, 2020.
[20] R. Mihalcea and P. Tarau, “TextRank: Bringing Order into Texts,” in Proceedings of EMNLP-04and the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelonaand Spain, 2004.
[21] S. Rose, D. Engel, N. Cramer, and W. Cowley, “Automatic Keyword Extraction from Individual Documents,” in Text Mining. Applications and Theory, M. W. Berry and J. Kogan, Eds. John Wiley and Sons, Ltd, pp. 1--20, 2010.
[22] R. Campos, V. Mangaravite, A. Pasquali, A. M. Jorge, C. Nunes, and A. Jatowt, “A Text Feature Based Automatic Keyword Extraction Method for Single Documents.,” in ECIR, vol. 10772, pp. 684–691, 2018.
[23] J. Howard and S. Ruder, “Universal Language Model Fine-tuning for Text Classification,” in ACL, 2018.
[24] M. E. Peters et al., “Deep Contextualized Word Representations.,” in NAACL-HLT, pp. 2227–2237, 2018.
[25] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training.,” 2018.
[26] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv preprint arXiv:1810.04805, 2018.
[27] A. Vaswani et al., “Attention is All you Need,” in Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. Curran Associates, pp. 5998–6008, Inc., 2017.
[28] P. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” J. Comput. Appl. Math., vol. 20, no. 1, pp. 53--65, 1987.
[29] H. Wachsmuth, M. Trenkmann, B. Stein, G. Engels, and T. Palakarska, “A Review Corpus for Argumentation Analysis.,” in CICLing (2), vol. 8404, pp. 115–127, 2014.
[30] Z. Huang, W. Xu, and K. Yu, “Bidirectional LSTM-CRF Models for Sequence Tagging.,” CoRR, vol. abs/1508.01991, 2015.