| 研究生: |
翁梓勝 Tzu-Sheng Weng |
|---|---|
| 論文名稱: |
社群論壇之問題檢索 Question Retrieval of Community Forum |
| 指導教授: |
蔡宗翰
Tzong-Han Tsai |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系在職專班 Executive Master of Computer Science & Information Engineering |
| 論文出版年: | 2016 |
| 畢業學年度: | 104 |
| 語文別: | 英文 |
| 論文頁數: | 54 |
| 中文關鍵詞: | 社群 、論壇 、問題 、檢索 |
| 外文關鍵詞: | Question, Retrieval, Community, Forum |
| 相關次數: | 點閱:22 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
最近這幾年來,隨著網際網路 (World Wide Web) 的發展,社群問答的網站在最近這段時間也成長的非常多,大量的問答網站擁有非常多的資訊形成網路線上一個很有價值的知識寶庫,然而有一個現象,這些網站都會遇到的就是會有重複的問題,因此問題檢索的主要任務就是用來協助從存檔裡面找出之前已經被回答過的相關問題,然而詞語上同義詞性質的多樣性是問題檢索的一個極大挑戰,有些研究方法利用計算新的問題以及存檔問題之間相互關係的機率來處理這樣的狀況,另外也有許多研究是著重在字串之間的相似度。
在這篇論文裡,我們提出了一個方法首先利用 CBoW 的模型使用華碩 ROG 論壇的資料庫來做訓練資料,然後利用訓練出來的資料計算輸入的新問題以及存檔的問題之間的相似程度,與其他研究不同的地方在於我們將問題的標題以及問題的完整描述分開來看,將他們當作是兩個不同的特徵來做計算,另外我們也將使用者的榮譽點數拿來當做我們評估的一個要素, 我們的實驗顯示,對 ROG 論壇的資料庫做出來的結果優於其他的方法。
In recent years, there has been much development of community based question and answer (cQA) site. The number of large-scale Q&A sites has significantly increased over time, and the information on these sites represents a valuable online knowledge pool. However, one issue with such sites is the problem of duplicate questions. The task of question retrieval aims to find previously answered semantically similar questions in cQA archives. Nevertheless, synonymous lexical variations pose a big challenge for question retrieval. Some approaches address this issue by calculating the probability of correlation between new questions and archived questions. Much recent research has also focused on surface string similarity among questions.
In this paper, we propose a method that first builds a continuous bag-of-word (CBoW) model with data from Asus’s Republic of Gamers (ROG) forum and then determines the similarity between a given new question and the Q&As in our database. Unlike most other studies, we calculate the similarity between the given question and the archived questions and descriptions separately with two different features. In addition, we factor user reputation into our ranking model. Our experimental results on ROG forum dataset show that our CBoW model with reputation features outperforms other top methods.
References
[1] Adam Berger, Rich Caruana, David Cohn, Dayne Freitag, and Vibhu Mittal. Bridging the lexical chasm: statistical approaches to answer-finding. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 192–199. ACM, 2000.
[2] Li Cai, Guangyou Zhou, Kang Liu, and Jun Zhao. Learning the latent topics for question retrieval in community qa. In IJCNLP, volume 11, pages 273–281, 2011.
[3] Xin Cao, Gao Cong, Bin Cui, Christian Søndergaard Jensen, and Ce Zhang. The use of categorization information in language models for question retrieval. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 265–274. ACM, 2009.
[4] Long Chen, Dell Zhang, and Mark Levene. Question retrieval with user intent. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pages 973–976. ACM, 2013.
[5] Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan, and Tat-Seng Chua. Question answering passage retrieval using dependency relations. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 400–407. ACM, 2005.
[6] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
[7] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.
[8] Branko Milosavljevic, Danijela Boberic, and Duˇsan Surla. Retrieval of bibliographic records using apache lucene. The Electronic Library, 28(4):525–539, 2010.
[9] Joaqu´ın P´erez-Iglesias, Jos´e R P´erez-Ag¨uera, V´ıctor Fresno, and Yuval Z Feinstein. Integrating the probabilistic models bm25/bm25f into lucene. arXiv preprint arXiv:0911.5046, 2009.
[10] Jay M Ponte and W Bruce Croft. A language modeling approach to information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 275–281. ACM, 1998.
[11] Chirag Shah and Jefferey Pomerantz. Evaluating and predicting answer quality in community qa. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 411–418. ACM, 2010.
[12] Fei Song and W Bruce Croft. A general language model for information retrieval. In Proceedings of the eighth international conference on Information and knowledge management, pages 316–321. ACM, 1999.
[13] Kai Wang and Tat-Seng Chua. Exploiting salient patterns for question detection and question retrieval in community-based question answering. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 1155–1163. Association for Computational Linguistics, 2010.
[14] Xiaobing Xue, Jiwoon Jeon, andWBruce Croft. Retrieval models for question and answer archives. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 475–482. ACM, 2008.
[15] Chengxiang Zhai and John Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems (TOIS), 22(2):179–214, 2004.
[16] Dell Zhang and Wee Sun Lee. Question classification using support vector machines. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 26–32. ACM, 2003.
[17] Kai Zhang,WeiWu, FangWang, Ming Zhou, and Zhoujun Li. Learning distributed representations of data in community question answering for question retrieval. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, pages 533–542. ACM, 2016.
[18] Weinan Zhang, Zhaoyan Ming, Yu Zhang, Liqiang Nie, Ting Liu, and Tat-Seng Chua. The use of dependency relation graph to enhance the term weighting in question retrieval. In COLING, pages 3105–3120, 2012.
[19] Guangyou Zhou, Fang Liu, Yang Liu, Shizhu He, Jun Zhao, et al. Statistical machine translation improves question retrieval in community question answering via matrix factorization. In ACL (1), pages 852–861, 2013.
[20] Guangyou Zhou, Yang Liu, Fang Liu, Daojian Zeng, and Jun Zhao. Improving question retrieval in community question answering using world knowledge. In IJCAI, volume 13, pages 2239–2245, 2013.
[21] Guangyou Zhou, Tingting He, Jun Zhao, and Po Hu. Learning continuous word embedding with metadata for question retrieval in community question answering. In Proceedings of ACL, pages 250–259, 2015.
[22] Xiaoqiang Zhou, Baotian Hu, Qingcai Chen, Buzhou Tang, and Xiaolong Wang. Answer sequence learning with neural networks for answer selection in community question answering. arXiv preprint arXiv:1506.06490, 2015.