| 研究生: |
吳登翔 WU, DENG-SIANG |
|---|---|
| 論文名稱: |
使用者模型為基礎的概念飄移預測 |
| 指導教授: | 林熙禎 |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理學系 Department of Information Management |
| 論文出版年: | 2014 |
| 畢業學年度: | 102 |
| 語文別: | 中文 |
| 論文頁數: | 68 |
| 中文關鍵詞: | 概念飄移 、遺忘因子 、參與中間度分群 、主題關係 |
| 外文關鍵詞: | Concept Drift, Forgetting factor, Betweenness centrality, Topic relationship |
| 相關次數: | 點閱:5 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
摘要
現在使用者面對的資訊環境,其資訊量與資訊的增加速度都遠遠大於過去,因此考量到資訊不斷增加的資料流研究成為資訊檢索領域努力的課題,概念飄移是資料流中的資料類別隨時間改變或是使用者閱讀興趣改變造成資料篩選上的錯誤,本研究考量了系統使用者較為細膩的興趣感受,在文件相關判定階段,利用使用者看過的文件內的主題間的共現關係做為判定相關的依據,在面對資料流檢索系統的計算速度需求方面,提出了基於NGD的相似度容差方法,讓帶有相似資訊的字詞互相取來,以減少字詞數量達到降低系統執行時間的目的,而本研究將使用者的興趣分成四類,並針對四類興趣對於資料的保存與去除需求設計動態的遺忘因子,對於概念飄移發生後,系統偵測概念飄移前的時間區段造成的效能下降問題,本研究透過使用者觀看文件的行為對可能發生的概念飄移進行預測,以降低概念飄移發生時對於系統效能的影響。
Abstract
With the amount of data and the speed of data increasing are more quickly than past time for a user nowadays. Therefore considering data stream study becomes a trend of information retrieval. The concept drift means the data categories can change by time or the data filtering mistake when user's interests changed causing. This study considers users' exquisite feelings, using the documents users have read belongs to which topics and judge the relevance based on the co-occurrence between two topics. The demand of the system calculating speed we propose NGD similarity tolerance method to decrease the amount of terms to reach the goal of decreasing system executing time. And our study divide users' interests into four categories and then aim to those categories designing the forgetting factor to keep and filter the data improving the effectiveness decreasing of concept drift. This study predicts the concept drift through the users' reading behavior to decrease the effect to the system when concept drift happened.
參考文獻
中文部分
[1]. 林文羽、林熙禎,(2013),「關鍵字為基礎的多主題概念飄移學習」,TANET2013臺灣網際網路研討會-論文集
[2]. 李浩平、林熙禎,(2011),「運用NGD建立適用於使用者回饋資訊不足之文件過濾系統」,國立中央大學,碩士論文
[3]. 鄭奕駿、林熙禎,(2012),「離線搜尋Wikipedia以縮減NGD運算時間之研究」,國立中央大學,碩士論文
[4]. 鄭運剛、馬建國,(2008),“A Model of User s Interests Drift Based on Classification Model,” Journal of Information, no. 1
[5]. 蘇怡仁、溫建成、許維麟、陳岳群,(2012),「以重疊社群分析引文網路支援論文自動分類之探討」,The 8th International Conference on Knowledge Community
英文部分
[6]. Aggarwal, Charu C. and Yu, Philip S., (2006), “A Framework for Clustering Massive Text and Categorical Data Streams,” Proceedings of the SIAM Conference on Data Mining (SDM)
[7]. Brandes, Ulrik, (2001), “A faster algorithm for betweenness centrality,” Journal of Mathematical Sociology, vol. 25, pp. 163-177
[8]. Bifet, Albert, Holmes, Geoff, Pfahringer, Bernhard and Gavaldà, Ricard, (2011), “Mining Frequent Closed Graphs on Evolving Data Streams,” 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp, 591-599
[9]. Chang, H.-C. and Chiun-Chieh, H., (2005), “Using topic keyword clusters for automatic document clustering,” IEICE TRANSACTIONS on Information and Systems, vol. 88, pp. 1852-1860
[10]. Chen, P.-I. and Lin, S.-J., (2010), “Automatic keyword prediction using Google similarity distance,” Expert Systems with Applications, vol. 37, pp. 1928-1938
[11]. Chen, P.-I. and Lin, S.-J., (2011), “Word AdHoc network: using Google core distance to extract the most relevant information,” Knowledge-Based Systems, vol. 24, pp. 393-405
[12]. Cilibrasi, Rudi L. and Paul MB Vitanyi, (2007), “The google similarity distance,” IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 3, pp. 370-383.
[13]. Dietz, Laura and Dalton and Jeffrey, (2012), “Acrossdocument neighborhood expansion: UMass at TAC KBP 2012 entity linking,” Text Analysis Conference (TAC)
[14]. Dijkstra, E. W., (1959), “A note on two problems in connexion with graphs,” Numerische mathematik, vol. 1, pp. 269-271.
[15]. Farid, Dewan Md., Zhang, Li, Hossain, Alamgir, Rahman, Chowdhury Mofizur, Strachan, Rebecca, Sexton, Graham and Dahal, Keshav, (2013), “An adaptive ensemble classifier for mining concept drifting data streams,” Expert Systems with Applications, vol. 40, pp. 5895-5906
[16]. Girvan, M. and Newman, M. E., (2002), “Community structure in social and biological networks,” Proceedings of the National Academy of Sciences, vol. 99, pp. 7821-7826
[17]. Gu, Suicheng, Tan, Ying and He, Xingui, (2013), “Recentness biased learning for time series forecasting,” Information Sciences, vol. 237, pp. 29-38
[18]. Koehn, Philipp, Och, Franz Josef and Marcu, Daniel, (2003), “Statistical phrase-based translation,” Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1. Association for Computational Linguistics, pp. 48-54
[19]. Li, Lei, Zheng, Li, Yang, Fan and Li, Tao, (2014), “Modeling and broadening temporal user interest in personalized news recommendation,” Expert Systems with Applications, vol. 41, pp. 3168-3177
[20]. Nanas, Nikolaos, Uren, Victoria, Roeck, Anne de and Domingue, John, (2004), “Multi-topic Information Filtering with a Single User Profile,” Methods and Applications of Artificial Intelligence, vol. 3025, pp. 400-409
[21]. Tufis, D. and Mason, O., (1998), “Tagging romanian texts: a case study for qtag, a language independent probabilistic tagger,” Proceedings of the First International Conference on Language Resources and Evaluation (LREC), pp. 589-596
[22]. Wang, Hongwei and Zou, Li, (2013), “Modeling User Preference Based on Long-term and Short-term Interest,” Journal of Tongji University(Natural Science), vol. 06
[23]. Yang, Jiping, Wang, Yue and Gao, Xuesong, (2011), “User interest modeling for personalized streaming media services based on behavior analysis,” Computer Applications and Software, vol. 28, no. 8