使用者模型為基礎的概念飄移預測｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	吳登翔 WU, DENG-SIANG
論文名稱：	使用者模型為基礎的概念飄移預測
指導教授：	林熙禎
口試委員:
學位類別：	碩士 Master
系所名稱：	管理學院 - 資訊管理學系 Department of Information Management
論文出版年：	2014
畢業學年度：	102
語文別：	中文
論文頁數：	68
中文關鍵詞：	概念飄移、遺忘因子、參與中間度分群、主題關係
外文關鍵詞：	Concept Drift, Forgetting factor, Betweenness centrality, Topic relationship
相關次數：	點閱：5 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

摘要
現在使用者面對的資訊環境，其資訊量與資訊的增加速度都遠遠大於過去，因此考量到資訊不斷增加的資料流研究成為資訊檢索領域努力的課題，概念飄移是資料流中的資料類別隨時間改變或是使用者閱讀興趣改變造成資料篩選上的錯誤，本研究考量了系統使用者較為細膩的興趣感受，在文件相關判定階段，利用使用者看過的文件內的主題間的共現關係做為判定相關的依據，在面對資料流檢索系統的計算速度需求方面，提出了基於NGD的相似度容差方法，讓帶有相似資訊的字詞互相取來，以減少字詞數量達到降低系統執行時間的目的，而本研究將使用者的興趣分成四類，並針對四類興趣對於資料的保存與去除需求設計動態的遺忘因子，對於概念飄移發生後，系統偵測概念飄移前的時間區段造成的效能下降問題，本研究透過使用者觀看文件的行為對可能發生的概念飄移進行預測，以降低概念飄移發生時對於系統效能的影響。

Abstract
With the amount of data and the speed of data increasing are more quickly than past time for a user nowadays. Therefore considering data stream study becomes a trend of information retrieval. The concept drift means the data categories can change by time or the data filtering mistake when user's interests changed causing. This study considers users' exquisite feelings, using the documents users have read belongs to which topics and judge the relevance based on the co-occurrence between two topics. The demand of the system calculating speed we propose NGD similarity tolerance method to decrease the amount of terms to reach the goal of decreasing system executing time. And our study divide users' interests into four categories and then aim to those categories designing the forgetting factor to keep and filter the data improving the effectiveness decreasing of concept drift. This study predicts the concept drift through the users' reading behavior to decrease the effect to the system when concept drift happened.

目錄

摘要    5
Abstract    6
目錄    8
圖目錄    10
表目錄    11
一、緒論    12
1-1 研究背景    12
1-2 研究動機    13
1-3 研究目的    15
二、文獻探討    17
2-1 文件前處理    17
2-1-1 詞性過濾與基於詞性組合的關鍵字合併    17
2-1-2字詞長度過濾    18
2-1-3 字根還原    18
2-1-4 Wikipedia搜尋結果數過濾    18
2-2 文件特徵    19
2-2-1 文字頻率 (TF)    19
2-2-2 字詞網路    19
2-2-3 參與中間度分群    20
2-3 使用者興趣    21
2-4 概念飄移    21
2-5 正規化的Google距離 (Normalized Google Distance, NGD)    26
三、系統架構    28
3-1 研究假設    28
3-2 系統架構    28
3-3 文件預處理    29
3-3-1 文件前處理    29
3-3-2 文件特徵    29
3-4 相似度容差    30
3-5 使用者模型    32
3-5-1 字詞活躍分佈矩陣    32
3-5-2 主題共現關係矩陣    33
3-6 主題映射    34
3-7 動態遺忘因子    35
3-8興趣去除    41
3-9 文件過濾    42
3-10 概念飄移預測    43
四、實驗    45
本章將描述實驗的環境、所使用到的評估準則、資料集等敘述    45
4-1 實驗環境    45
4-2 資料集與評估準則    45
4-3 實驗設計    47
4-3-1 門檻值實驗:    47
4-3-2 相似度容差減少時間成效實驗:    54
4-3-3 使用者模型學習能力實驗:    56
4-3-4 動態遺忘因子實驗:    57
4-3-5 概念飄移預測成效實驗:    59
五、結論與未來研究方向    61
5-1 結論    61
5-2 未來研究方向    63
參考文獻    66
中文部分    66
英文部分    66

                                

參考文獻
中文部分
[1]. 林文羽、林熙禎，(2013)，「關鍵字為基礎的多主題概念飄移學習」，TANET2013臺灣網際網路研討會-論文集
[2]. 李浩平、林熙禎，(2011)，「運用NGD建立適用於使用者回饋資訊不足之文件過濾系統」，國立中央大學，碩士論文
[3]. 鄭奕駿、林熙禎，(2012)，「離線搜尋Wikipedia以縮減NGD運算時間之研究」，國立中央大學，碩士論文
[4]. 鄭運剛、馬建國，(2008)，“A Model of User s Interests Drift Based on Classification Model,” Journal of Information, no. 1
[5]. 蘇怡仁、溫建成、許維麟、陳岳群，(2012)，「以重疊社群分析引文網路支援論文自動分類之探討」，The 8th International Conference on Knowledge Community

英文部分
[6]. Aggarwal, Charu C. and Yu, Philip S., (2006), “A Framework for Clustering Massive Text and Categorical Data Streams,” Proceedings of the SIAM Conference on Data Mining (SDM)
[7]. Brandes, Ulrik, (2001), “A faster algorithm for betweenness centrality,” Journal of Mathematical Sociology, vol. 25, pp. 163-177
[8]. Bifet, Albert, Holmes, Geoff, Pfahringer, Bernhard and Gavaldà, Ricard, (2011), “Mining Frequent Closed Graphs on Evolving Data Streams,” 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp, 591-599
[9]. Chang, H.-C. and Chiun-Chieh, H., (2005), “Using topic keyword clusters for automatic document clustering,” IEICE TRANSACTIONS on Information and Systems, vol. 88, pp. 1852-1860
[10]. Chen, P.-I. and Lin, S.-J., (2010), “Automatic keyword prediction using Google similarity distance,” Expert Systems with Applications, vol. 37, pp. 1928-1938
[11]. Chen, P.-I. and Lin, S.-J., (2011), “Word AdHoc network: using Google core distance to extract the most relevant information,” Knowledge-Based Systems, vol. 24, pp. 393-405
[12]. Cilibrasi, Rudi L. and Paul MB Vitanyi, (2007), “The google similarity distance,” IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 3, pp. 370-383.
[13]. Dietz, Laura and Dalton and Jeffrey, (2012), “Acrossdocument neighborhood expansion: UMass at TAC KBP 2012 entity linking,” Text Analysis Conference (TAC)
[14]. Dijkstra, E. W., (1959), “A note on two problems in connexion with graphs,” Numerische mathematik, vol. 1, pp. 269-271.
[15]. Farid, Dewan Md., Zhang, Li, Hossain, Alamgir, Rahman, Chowdhury Mofizur, Strachan, Rebecca, Sexton, Graham and Dahal, Keshav, (2013), “An adaptive ensemble classifier for mining concept drifting data streams,” Expert Systems with Applications, vol. 40, pp. 5895-5906
[16]. Girvan, M. and Newman, M. E., (2002), “Community structure in social and biological networks,” Proceedings of the National Academy of Sciences, vol. 99, pp. 7821-7826
[17]. Gu, Suicheng, Tan, Ying and He, Xingui, (2013), “Recentness biased learning for time series forecasting,” Information Sciences, vol. 237, pp. 29-38
[18]. Koehn, Philipp, Och, Franz Josef and Marcu, Daniel, (2003), “Statistical phrase-based translation,” Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1. Association for Computational Linguistics, pp. 48-54
[19]. Li, Lei, Zheng, Li, Yang, Fan and Li, Tao, (2014), “Modeling and broadening temporal user interest in personalized news recommendation,” Expert Systems with Applications, vol. 41, pp. 3168-3177
[20]. Nanas, Nikolaos, Uren, Victoria, Roeck, Anne de and Domingue, John, (2004), “Multi-topic Information Filtering with a Single User Profile,” Methods and Applications of Artificial Intelligence, vol. 3025, pp. 400-409
[21]. Tufis, D. and Mason, O., (1998), “Tagging romanian texts: a case study for qtag, a language independent probabilistic tagger,” Proceedings of the First International Conference on Language Resources and Evaluation (LREC), pp. 589-596
[22]. Wang, Hongwei and Zou, Li, (2013), “Modeling User Preference Based on Long-term and Short-term Interest,” Journal of Tongji University(Natural Science), vol. 06
[23]. Yang, Jiping, Wang, Yue and Gao, Xuesong, (2011), “User interest modeling for personalized streaming media services based on behavior analysis,” Computer Applications and Software, vol. 28, no. 8

簡易檢索 / 詳目顯示

相關論文