跳到主要內容

簡易檢索 / 詳目顯示

研究生: 林文羽
Wun-Yu Lin
論文名稱: 關鍵字為基礎的多主題概念飄移學習
指導教授: 林熙禎
Shi-Jen Lin
口試委員:
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理學系
Department of Information Management
論文出版年: 2013
畢業學年度: 101
語文別: 中文
論文頁數: 95
中文關鍵詞: 概念飄移資訊過濾使用者模型
外文關鍵詞: Concept Drift, Information Filtering, User Modeling
相關次數: 點閱:15下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著網際網路(Internet)的資訊蓬勃發展,使用者可以輕易的從各個搜尋引擎與入口網站取得大量的資訊。然而,在此同時,使用者也得面對資訊過載(Information Overload)的問題,資訊過濾(Information filtering)也就應運而生。然而,使用者的興趣並非一成不變,它會隨著時空的變化產生改變。這種目標概念隨著時空改變而轉變的現象稱之為概念飄移(Concept drift)。以往的研究多關注在單標籤分類(Single label classification)所發生的概念飄移,然而現實生活上使用者對於資訊的需求是多元、多主題的,並且每個主題在時空的影響下擁有各自的喜好變化;同時文件也常屬於多個類別,若僅依照文件的主要概念,將之分類,則可能讓使用者錯過潛在感興趣的相關文件。因此本研究提出一個以字詞網路為基礎的使用者模型,透過它可以依照使用者對於多個主題的喜好對文件進行過濾,而在喜好發生變化時,也能夠適當的偵測並更新模型。


    With the rapidly growing of internet, users can easily access mass information from a variety of search engines and portals. However, users also have to face the problem of “Information Overload” in the meantime. Therefore, the research of information filtering has been caused. Nevertheless, the users' interest are not static, they will change with time and space. The phenomenon that the distribution of data changes over time is called “Concept drift”. Previous researches about concept drift usually focus on the situation of single label classification. But in fact, the demand for information is diverse and user may be interested in multiple target concepts. And each concept has its own drift pattern. Furthermore, documents often belong to more than one class. People will miss potentially relevant documents if only considering the main concept in classification. Therefore, this paper proposes a keyword-network based user model, through which people can filter incoming documents according to their preference. When one of target concept has drift, the user model also has the ability to adapt this change.

    摘要 iii Abstract iv 目錄 v 圖目錄 viii 表目錄 x 一、緒論 1 1-1 研究背景 1 1-2 研究動機 1 1-3 情境說明 3 1-4 問題定義 5 1-5 研究目的 6 1-6 論文架構 7 二、文獻探討 8 2-1 使用者模型 8 2-1-1 向量 8 2-1-2 詞彙袋 8 2-1-3 網路基礎使用者模型 8 2-1-4本體基礎使用者模型 9 2-2 文件前處理與特徵選取 9 2-2-1 前處理 9 2-2-1-1 詞性與關鍵字合併 10 2-2-1-2 字詞長度 10 2-2-1-3 Wikipedia搜尋結果數 11 2-2-2 特徵選取 11 2-2-3 Google相似度距離 11 2-3 概念飄移 13 2-3-1 概念飄移的定義與問題 13 2-3-2 概念飄移學習方法 14 2-3-2-1 持續學習器 14 2-3-2-2 以偵測為基礎的學習器 15 2-4 多標籤文件分類 16 2-4-1 隨機挑選與去除多標籤資料 17 2-4-2 標籤冪集 18 2-4-3 二元關聯 18 2-4-4 樣本分解 19 2-4-5 小結 20 2-5 複雜網路分析 20 2-5-1 Degree 20 2-5-2 K核心 21 2-5-3 參與中間度分群 21 2-5-4 社群結構 24 三、系統架構與設計 25 3-1 研究限制 25 3-2 系統架構 25 3-3 文件前處理 27 3-4 特徵選取 27 3-5 參與中間度分群 27 3-6 文件過濾 30 3-7 概念飄移偵測與處理 32 四、實驗結果與討論 34 4-1 實驗環境 34 4-2 實驗資料集 34 4-3 評估準則 36 4-4 實驗設計 37 4-4-1 實驗一:特徵選取的差異 37 4-4-2 實驗二:本研究方法的門檻值實驗 38 4-4-2-1 參與中間度分群門檻βsingle、βmulti 38 4-4-2-2 四種相關性方法比較與γ、相關性門檻值α的訂定 42 4-4-3 實驗三:找出潛在相關文件的能力評估 48 4-4-4 實驗四:使用者模型學習能力評估 51 4-4-5 實驗五:多主題概念飄移情境模擬實驗 54 4-5 系統執行效能分析 58 4-5-1 時間複雜度 58 4-5-2 實際執行時間 59 五、結論與未來研究方向 63 5-1 結論 63 5-2 未來研究方向 64 5-3 管理意涵 65 參考文獻 66 中文部分 66 英文部分 66 附錄一 70 附錄二 71

    中文部分
    〔1〕 李浩平,「運用NGD建立適用於使用者回饋資訊不足之文件過濾系統」,國立中央大學,碩士論文, 民國100年。
    〔2〕 鄭奕駿,「離線搜尋Wikipedia以縮減NGD運算時間之研究」,國立中央大學,碩士論文, 民國101年。
    英文部分
    〔3〕 Boutell, M. R., Luo, J., Shen, X., and Brown, C. M., "Learning multi-label scene classification", Pattern recognition, vol. 37, pp. 1757-1771, 2004.
    〔4〕 Brandes, U., "A faster algorithm for betweenness centrality", Journal of Mathematical Sociology, vol. 25, pp. 163-177, 2001.
    〔5〕 Chang, H.-C. and Chiun-Chieh, H., "Using topic keyword clusters for automatic document clustering", IEICE TRANSACTIONS on Information and Systems, vol. 88, pp. 1852-1860, 2005.
    〔6〕 Chen, P.-I. and Lin, S.-J., "Automatic keyword prediction using Google similarity distance", Expert Systems with Applications, vol. 37, pp. 1928-1938, 2010.
    〔7〕 Chen, P.-I. and Lin, S.-J., "Word AdHoc network: using Google core distance to extract the most relevant information", Knowledge-Based Systems, vol. 24, pp. 393-405, 2011.
    〔8〕 Cilibrasi, R. L. and Vitanyi, P. M., "The google similarity distance", Knowledge and Data Engineering, IEEE Transactions, vol. 19, pp. 370-383, 2007.
    〔9〕 De Bra, P. and Calvi, L., "AHA: a generic adaptive hypermedia system," in Proceedings of the 2nd Workshop on Adaptive Hypertext and Hypermedia, 1998, pp. 5-12.
    〔10〕 Diestel, R., "Graph theory. 2005," ed: Springer-Verlag, 2005.
    〔11〕 Dijkstra, E. W., "A note on two problems in connexion with graphs", Numerische mathematik, vol. 1, pp. 269-271, 1959.
    〔12〕 Diplaris, S., Tsoumakas, G., Mitkas, P. A., and Vlahavas, I., "Protein classification with multiple algorithms," in Advances in Informatics, ed: Springer, 2005, pp. 448-456.
    〔13〕 Girvan, M. and Newman, M. E., "Community structure in social and biological networks", Proceedings of the National Academy of Sciences, vol. 99, pp. 7821-7826, 2002.
    〔14〕 Hanani, U., Shapira, B., and Shoval, P., "Information filtering: Overview of issues, research and systems", User Modeling and User-Adapted Interaction, vol. 11, pp. 203-259, 2001.
    〔15〕 Joachims, T., Text categorization with support vector machines: Learning with many relevant features: Springer, 1998.
    〔16〕 Klinkenberg, R. and Joachims, T., "Detecting concept drift with support vector machines," in Proceedings of the Seventeenth International Conference on Machine Learning (ICML), 2000.
    〔17〕 Liu, Y.-C., Wang, X.-L., and Liu, B.-Q., "A feature selection algorithm for document clustering based on word co-occurrence frequency," in Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference 2004, pp. 2963-2968.
    〔18〕 Magnini, B. and Strapparava, C., "User modelling for news web sites with word sense based techniques", User Modeling and User-Adapted Interaction, vol. 14, pp. 239-257, 2004.
    〔19〕 Newman, M. E. and Girvan, M., "Finding and evaluating community structure in networks", Physical review E, vol. 69, p. 026113, 2004.
    〔20〕 Page, E., "Continuous inspection schemes", Biometrika, vol. 41, pp. 100-115, 1954.
    〔21〕 Quinlan, J. R., "Induction of decision trees", Machine learning, vol. 1, pp. 81-106, 1986.
    〔22〕 Razmerita, L., Angehrn, A., and Maedche, A., "Ontology-based user modeling for knowledge management systems," in User Modeling 2003, ed: Springer, 2003, pp. 213-217.
    〔23〕 Salton, G. and Buckley, C., "Term-weighting approaches in automatic text retrieval", Information processing & management, vol. 24, pp. 513-523, 1988.
    〔24〕 Schwarzkopf, E., Heckmann, D., Dengler, D., and Kröner, A., "Mining the structure of tag spaces for user modeling," in Complete On-Line Proceedings of the Workshop on Data Mining for User Modeling at the 11th International Conference on User Modeling. Corfu, Griechenland, 2007, pp. 63-75.
    〔25〕 Seidman, S. B., "Network structure and minimum degree", Social networks, vol. 5, pp. 269-287, 1983.
    〔26〕 Tsoumakas, G. and Katakis, I., "Multi-label classification: An overview", International Journal of Data Warehousing and Mining (IJDWM), vol. 3, pp. 1-13, 2007.
    〔27〕 Tsymbal, A., "The problem of concept drift: definitions and related work", Computer Science Department, Trinity College Dublin, 2004.
    〔28〕 Tsymbal, A., Pechenizkiy, M., Cunningham, P., and Puuronen, S., "Dynamic integration of classifiers for handling concept drift", Information Fusion, vol. 9, pp. 56-68, 2008.
    〔29〕 Tufis, D. and Mason, O., "Tagging romanian texts: a case study for qtag, a language independent probabilistic tagger," in Proceedings of the First International Conference on Language Resources and Evaluation (LREC), 1998, pp. 589-596.
    〔30〕 Vitányi, P. M., Balbach, F. J., Cilibrasi, R. L., and Li, M., "Normalized information distance," in Information theory and statistical learning, ed: Springer, 2009, pp. 45-82.
    〔31〕 White, S., O’Madadhain, J., Fisher, D., and Boey, Y.-B., "JUNG: Java Universal Network/Graph Framework", available now at: http://jung.sourceforge.net/index.html, 2004.
    〔32〕 Xioufis, E. S., Spiliopoulou, M., Tsoumakas, G., and Vlahavas, I., "Dealing with concept drift and class imbalance in multi-label stream classification," in Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Two, 2011, pp. 1583-1588.
    〔33〕 Zhang, P., Zhu, X., and Shi, Y., "Categorizing and mining concept drifting data streams," in Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 2008, pp. 812-820.
    〔34〕 Žliobaitė, I., "Learning under concept drift: an overview", arXiv preprint arXiv:1010.4784, 2010.

    QR CODE
    :::