在Hadoop 環境下以自建本體進行使用者興趣偵測與文件推薦

簡易檢索 / 詳目顯示

回結果列表

研究生：	趙濬 Chun Chao
論文名稱：	在Hadoop 環境下以自建本體進行使用者興趣偵測與文件推薦 Automatically Constructing Ontology for Detecting User’s Interests and Document Recommendation Based on Hadoop Environment
指導教授：	林熙禎 Shi-Jen Lin
口試委員:
學位類別：	碩士 Master
系所名稱：	管理學院 - 資訊管理學系 Department of Information Management
論文出版年：	2016
畢業學年度：	104
語文別：	中文
論文頁數：	57
中文關鍵詞：	推薦系統、中文推薦系統、本體、分散式系統、Hadoop 、使用者輪廓
外文關鍵詞：	Recommendation System, Chinese Recommendation System, Ontology, Distributed System, Hadoop, User Profile
相關次數：	點閱：19 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

推薦系統是一種常見的資訊過濾系統，不論對於商業或是個人而言都是一項非常
重要的技術。為了針對使用者做出客製化的推薦，時常會藉由使用者輪廓(User profile)
來記錄使用者過往的行為，而透過本體(Ontology)來建立使用者輪廓的推薦系統可以做
出更準確且多元的推薦。
本研究主要可分為兩部份：自動建立本體與建立使用者輪廓和推薦，首先分別將
英文文件與中文文件藉由系統自動建立出本體，再將使用者行為對應到本體上建立出
使用者輪廓並進一步進行管理和推薦。另外，本研究加入聚合式階層分群來改善過去
研究建立本體時過度分群的現象。並且為了應對未來資料量的成長，本研究透過
Hadoop 分散式環境來提升系統效率與未來的可擴充性。
在實驗的部分，本研究採用Amazon 網路書店的英文書籍簡介和博客來網路書店
的中文書籍簡介作為資料集，模擬在不同語言與不同狀況下使用者的興趣變化來測試
本系統的推薦品質。而實驗結果顯示出本研究成功改善推薦的品質，並且在未來有能
力處理更大量的文本資料。

Recommendation system is a common information filtering system. It’s (an
important/a significant) technology for businesses and individuals. In order to make a
customized recommendation for users, they usually record users’ past behaviors by using
user profile. Throughout ontology to construct user profile recommended system can reach
the recommendations of higher accuracy and diversity.
This study mainly consists of two parts: Automatically constructs ontology and
constructs user profile & recommendations. First of all, we input the English and Chinese
documents separately into the system to construct ontology automatically. Next, the user
behavior corresponds with ontology to construct user profile to go a step further of
managements and recommendations. Moreover, this study has added agglomerative
hierarchical clustering into the system to resolve the phenomenon of excessive clustering
when constructed ontology in the past. To deal with the growth of information in the future,
this study improves system efficiency and future scalability by using Hadoop distributed
environment.
In the experiment part, this study adopts Amazon online shopping websites and Books
online shopping websites as data collection, and simulates user’s interest variation under
different conditions and languages to test our system recommendations’ quality. The result
shows that we improve system recommendations’ quality successfully, and we are capable
of handling massive texture data in the future.

摘要 ............................................................................................................................................. i
Abstract ....................................................................................................................................... ii
謝誌 ........................................................................................................................................... iii
目錄 ........................................................................................................................................... iv
圖目錄 ....................................................................................................................................... vi
表目錄 ...................................................................................................................................... vii
第一章 緒論 .............................................................................................................................. 1
1.1 研究背景 .............................................................................................................................................. 1
1.2 研究動機 .............................................................................................................................................. 1
1.3 研究目的 .............................................................................................................................................. 3
1.4 研究架構 .............................................................................................................................................. 3
第二章 文獻探討 ...................................................................................................................... 5
2.1 相似度計算 .......................................................................................................................................... 5
2.2 興趣偵測 .............................................................................................................................................. 7
2.3 文件概念分群 ...................................................................................................................................... 9
2.4 文字推薦系統 .................................................................................................................................... 11
2.5 分散式系統 ........................................................................................................................................ 12
2.5 中文斷詞系統 .................................................................................................................................... 14
第三章 系統架構 .................................................................................................................... 15
3.1 系統架構 ............................................................................................................................................ 15
3.2 本體建立 ............................................................................................................................................ 15
3.3 使用者輪廓建立和推薦 .................................................................................................................... 21
第四章 系統實作與展示 ........................................................................................................ 24
4.1 實驗環境 ............................................................................................................................................ 24
4.2 實驗1：HADOOP 系統效能比較 ...................................................................................................... 25
4.3 實驗2：聚合式階層分群比較 ......................................................................................................... 26
4.4 實驗3：英文本體品質實驗 ............................................................................................................. 28
4.5 實驗4：中文本體品質實驗 ............................................................................................................. 35
第五章 結論與未來研究方向 ................................................................................................ 41
5.1 研究貢獻 ............................................................................................................................................ 41
5.2 未來研究方向 .................................................................................................................................... 42
參考文獻 .................................................................................................................................. 43
                                

[1] 江欣鴻（2015），以自建本體進行使用者興趣偵測與文件推薦，國立中央大學
資訊管理學系碩士論文。
[2] 李佩儒（2014），利用自建Ontological User Profile 應用於文字文件推薦，國
立中央大學資訊管理學系碩士論文。
[3] 李浩平（2011），運用NGD 建立適用於使用者回饋資訊不足之文件過濾系統，
國立中央大學資訊管理學系碩士論文。
[4] 陳信夫（2011），基於字詞關係動態建立階層分群，國立中央大學資訊管理學
系碩士論文。
[5] 詹欣逸（2012），利用WordNet 判斷字詞包含關係─應用於動態階層文件分
群，國立中央大學資訊管理學系碩士論文。
[6] 鄭奕駿（2012），離線搜尋 Wikipedia 以縮減 NGD 運算時間之研究，中央
大學資訊管理學系碩士論文。
[7] 賴靜怡（2013），自動建立ontology 應用於user profile 建立，國立中央大學
資訊管理學系碩士論文。
[8] Adomavicius, G., & Tuzhilin, A. (2005), “Toward the next generation of
recommender systems: A survey of the state-of-the-art and possible extensions,”
IEEE Transactions on Knowledge and Data Engineering, 17(6), 734-749.
[9] Baeza-Yates, R., & Ribeiro-Neto, B. (1999), Modern information retrieval (Vol.
463): ACM press New York.
[10] Chen, P.-I., & Lin, S.-J. (2010), “Automatic keyword prediction using Google
similarity distance,” Expert Systems with Applications, 37(3), pp.1928-1938.
[11] Cilibrasi, R. L., & Vitanyi, P. (2007), “The google similarity distance,” IEEE Transactions on Knowledge and Data Engineering, 19(3), pp.370-383.
[12] Giaretta, P., & Guarino, N. (1995), “Ontologies and knowledge bases towards a
terminological clarification,” Towards very large knowledge bases: knowledge
building & knowledge sharing, pp.25-32.
[13] Gil-García, R., & Pons-Porrata, A. (2010), “Dynamic hierarchical algorithms for
document clustering,” Pattern Recognition Letters, 31(6), 469-477.
[14] Goldberg, D., Nichols, D., Oki, B. M., & Terry, D. (1992), “Using collaborative
filtering to weave an information tapestry,” Communications of the ACM, 35(12),
p. 61-70.
[15] Gu, L. and Li, H. (2013), “Memory or time: Performance evaluation for iterative
operation on hadoop and spark,” High Performance Computing and
Communications & 2013 IEEE International Conference on Embedded and
Ubiquitous Computing (HPCC_EUC), 2013 IEEE 10th International Conference
on. IEEE, 2013. p. 721-727.
[16] Han, L., Chen, G., & Li, M. (2013), “A method for the acquisition of
ontology-based user profiles,” Advances in Engineering Software, 65, pp.132-137.
[17] Hawalah, A., & Fasli, M. (2014), “Utilizing contextual ontological user profiles
for personalized recommendations,” Expert Systems with Applications, 41(10), pp.
4777-4797.
[18] Hawalah, A., & Fasli, M. (2015), “Dynamic user profiles for web personalization,
“ Expert Systems with Applications, 42(5), pp. 2547-2569.
[19] Li, Q., Wang, J., Chen, Y. P., & Lin, Z. (2010), “User comments for news
recommendation in forum-based social media,” Information Sciences, 180(24),
pp.4929-4939.
[20] McAuley, J., & Leskovec, J. (2013), “Hidden factors and hidden topics:
understanding rating dimensions with review text,” Proceedings of the 7th ACM conference on Recommender systems. pp.165-172
[21] Middleton, S. E., Shadbolt, N. R., & De Roure, D. C. (2004), “Ontological user
profiling in recommender systems,” ACM Transactions on Information Systems
(TOIS), 22(1), pp.54-88.
[22] Murtagh, F. and Contreras, P. (2012), “Algorithms for hierarchical clustering: an
overview,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge
Discovery, 2012, 2.1: 86-97.
[23] Pérez-Suárez, A., Martínez-Trinidad, J. F., Carrasco-Ochoa, J. A., &
Medina-Pagola, J. E. (2013), “An algorithm based on density and compactness for
dynamic overlapping clustering,” Pattern Recognition, 46(11), 3040-3055.
[24] Rada, R., Mili, H., Bicknell, E., & Blettner, M. (1989), “Development and
application of a metric on semantic nets,” IEEE Transactions on Systems, Man
and Cybernetics, 19(1), pp.17-30.
[25] Razmerita, L., & Lytras, M. D. (2008), “Ontology-based user modelling
personalization: Analyzing the requirements of a semantic learning portal
Emerging Technologies and Information Systems for the Knowledge Society,
“Springer Berlin Heidelberg, pp.354-363.
[26] Shiller, R. J. (1979), “The volatility of long-term interest rates and expectations
models of the term structure,” The Journal of Political Economy, pp.1190-1219.
[27] Sieg, A., Mobasher, B., & Burke, R. D. (2007), “Learning Ontology-Based User
Profiles: A Semantic Approach to Personalized Web Search,” IEEE Intelligent
Informatics Bulletin, 8(1), pp.7-18.
[28] Sussna, M. (1993), “Word sense disambiguation for free-text indexing using a
massive semantic network,” Proceedings of the second international conference
on Information and knowledge management, pp.67-74.
[29] Tang, X., & Zeng, Q. (2012), “Keyword clustering for user interest profiling refinement within paper recommender systems,” Journal of Systems and Software,
85(1), pp.87-101.
[30] Weng, S.-S., Lin, B., & Chen, W.-T. (2009), “Using contextual information and
multidimensional approach for recommendation,” Expert Systems with
Applications, 36(2), pp.1268-1279.
[31] Wu, Z., & Palmer, M. (1994), “Verbs semantics and lexical selection,”
Proceedings of the 32nd annual meeting on Association for Computational
Linguistics, pp.133-138.
[32] Zewen, C.; Yao, Z. (2012), “Parallel text clustering based on mapreduce,” 2012
Second International Conference on. IEEE, 2012. pp.226-229.
[33] 中研院，上網日期：2016 年，取自：http://ckipsvr.iis.sinica.edu.tw/
[34] Hadoop，上網日期：2016 年，取自http://hadoop.apache.org/
[35] Jieba 中文斷詞系統，上網日期:2016 年3 月，取自https://github.com/fxsjy/jieba

簡易檢索 / 詳目顯示

相關論文