| 研究生: |
陳沛伃 Pei-Yu Chen |
|---|---|
| 論文名稱: |
基於歷史資訊向量與主題專精程度向量應用於尋找社群問答網站中專家 Finding experts in Community Question Answering websites using History Post Embedding and Topic Expertise Model features |
| 指導教授: |
蔡宗翰
Tzong-Han Tsai |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2018 |
| 畢業學年度: | 106 |
| 語文別: | 英文 |
| 論文頁數: | 54 |
| 中文關鍵詞: | 詞嵌入 、社群問答網站 、TEM 、佩奇排名 、主題模型 、專家 |
| 外文關鍵詞: | Word2Vec, CQA, TEM, PageRank, Topic Model, Experts |
| 相關次數: | 點閱:17 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著科技的日新月異,我們隨時都要精進自己以獲取新知,避免被世界淘汰,於是帶動諸如Stack Overflow, Yahoo Answers, Quora, Zhihu (知乎)等社群問答網站(Community Question Answering,CQA)的興起。使用者可以在上面提問、回答問題,作為彼此交流與學習的平台。
雖然社群問答網站的興起帶給使用者很大的便利,但是由於問題數量眾多,多數問題通常杳無音訊,想要及時得到問題正確的回覆,不可否認需要運氣與時間的等待。我們認為,若可於CQA 網站中正確地找出專家,則可藉由把對的問題推薦給有能力回答的專家,便可提升使用者互動,解決問題之效率。
本研究首先透過非監督方法 -- Yang, Liu, et al. (2013)所建的TEM (Topic Expertise Model) 模型,擷取使用者對每個主題下專精程度的特徵向量,並利用History post embedding,以詞嵌入(Word Embedding)的特性,擷取語意程度的特徵向量,再利用問題與回答者之相似度作為推薦專家之依據。我們鎖定Stack Overflow (世界前幾大的程式設計領域的問答網站)作為研究目標,並獲得良好之準確率,並期望研究成果可於其他CQA 網站使用。
本篇論文的貢獻是將TEM模型與詞嵌入的歷史資訊做結合,當在社群網路結構並非那麼完整時有效的把對的問題配對給對這個問題有能力回答的專家以提升社群網路參予度低的問題。
With the ever-changing technology, we humans have to be willing to keep on learning in order to avoid being demoted by the world. Therefore, the reasons above led to the rise of the community question answering websites, such as Stack Overflow, Yahoo Answers, Quora, Zhihu (知乎), and so on and so forth. Users can ask questions, answer questions, exchange and discuss ideas with each other in the above platform.
Although the rise of community question answering websites can surely bring great convenience to users, there is still room for improvement. Due to the large numbers of questions, most questions usually receive no response or get inappropriate answers. It is without doubt to rely on luck and time to get correct answers in time. Therefore, we believe that if we can find experts precisely in CQA websites, we can improve the efficiency of the participation rate by routing right questions to experts.
In this study, we firstly utilize TEM (Topic Expertise Model), which is an unsupervised model published by Yang, Liu, et al. (2013), for capturing the degree of expertise of question and answerer under different topic. Furthermore, we utilize History Post Embedding, which is published in this thesis by using word embedding techniques, to extract semantic meanings to enhance the understanding of question sets. Finally, we combine the vector of topical expertise with History Post Embedding and perform a recommendation formula to rank experts. We target Stack Overflow, which is one of the biggest computer programming field CQA websites in the world, as our research goal and obtain good result. Moreover, we expect the research result to be available on other CQA websites.
The main contribution of this thesis is combining TEM model with distributed representation of user historical information which can solve the problem of low participation rate in CQA websites when social network structure is not so complete.
1. Riahi, F., et al. Finding expert users in community question answering. in Proceedings of the 21st International Conference on World Wide Web. 2012. ACM.
2. Guo, J., et al. Tapping on the potential of q&a community by recommending answer providers. in Proceedings of the 17th ACM conference on Information and knowledge management. 2008. ACM.
3. "Stackoverflow.com Site Info". Alexa Internet.: p. Retrieved 2017-08-14.
4. Spolsky, J., "Stack Overflow Launches". Joel on Software. (2008-09-15).
5. Duan, J., J. Zeng, and B. Luo. Identification of opinion leaders based on user clustering and sentiment analysis. in Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT)-Volume 01. 2014. IEEE Computer Society.
6. Weng, J., et al. Twitterrank: finding topic-sensitive influential twitterers. in Proceedings of the third ACM international conference on Web search and data mining. 2010. ACM.
7. Agarwal, N., et al. Identifying the influential bloggers in a community. in Proceedings of the 2008 international conference on web search and data mining. 2008. ACM.
8. Yu, X., X. Wei, and X. Lin, Algorithms of BBS Opinion Leader Mining Based on Sentiment Analysis. WISM, 2010. 10: p. 360-369.
9. Katz, E. and P.F. Lazarsfeld, Personal Influence, The part played by people in the flow of mass communications. 1966: Transaction Publishers.
10. Wang, W. and W.N. Street, Modeling influence diffusion to uncover influence centrality and community structure in social networks. Social Network Analysis and Mining, 2015. 5(1): p. 15.
11. Bonacich, P., Factoring and weighting approaches to status scores and clique identification. Journal of Mathematical Sociology, 1972. 2(1): p. 113-120.
12. Katz, L., A new status index derived from sociometric analysis. Psychometrika, 1953. 18(1): p. 39-43.
13. Page, L., et al., The PageRank citation ranking: Bringing order to the web. 1999, Stanford InfoLab.
14. Zhu, H., et al., Ranking user authority with relevant knowledge categories for expert finding. World Wide Web, 2014. 17(5): p. 1081-1107.
15. Zhou, G., et al. Topic-sensitive probabilistic model for expert finding in question answer communities. in Proceedings of the 21st ACM international conference on Information and knowledge management. 2012. ACM.
16. Liu, X., W.B. Croft, and M. Koll. Finding experts in community-based question-answering services. in Proceedings of the 14th ACM international conference on Information and knowledge management. 2005. ACM.
17. Miller, D.R., T. Leek, and R.M. Schwartz. A hidden Markov model information retrieval system. in Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. 1999. ACM.
18. Lavrenko, V. and W.B. Croft. Relevance based language models. in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. 2001. ACM.
19. Xu, J. and W.B. Croft. Cluster-based language models for distributed retrieval. in Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. 1999. ACM.
20. Ponte, J.M. and W.B. Croft. A language modeling approach to information retrieval. in Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. 1998. ACM.
21. Qu, M., et al. Probabilistic question recommendation for question answering communities. in Proceedings of the 18th international conference on World wide web. 2009. ACM.
22. Yang, L., et al. Cqarank: jointly model topics and expertise in community question answering. in Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 2013. ACM.
23. Blei, D.M., A.Y. Ng, and M.I. Jordan, Latent dirichlet allocation. Journal of machine Learning research, 2003. 3(Jan): p. 993-1022.
24. Mikolov, T., et al., Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
25. Rong, X., word2vec parameter learning explained. arXiv preprint arXiv:1411.2738, 2014.
26. Adamic, L.A., et al. Knowledge sharing and yahoo answers: everyone knows something. in Proceedings of the 17th international conference on World Wide Web. 2008. ACM.