| 研究生: |
林文羽 Wun-Yu Lin |
|---|---|
| 論文名稱: |
關鍵字為基礎的多主題概念飄移學習 |
| 指導教授: |
林熙禎
Shi-Jen Lin |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理學系 Department of Information Management |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 中文 |
| 論文頁數: | 95 |
| 中文關鍵詞: | 概念飄移 、資訊過濾 、使用者模型 |
| 外文關鍵詞: | Concept Drift, Information Filtering, User Modeling |
| 相關次數: | 點閱:16 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著網際網路(Internet)的資訊蓬勃發展,使用者可以輕易的從各個搜尋引擎與入口網站取得大量的資訊。然而,在此同時,使用者也得面對資訊過載(Information Overload)的問題,資訊過濾(Information filtering)也就應運而生。然而,使用者的興趣並非一成不變,它會隨著時空的變化產生改變。這種目標概念隨著時空改變而轉變的現象稱之為概念飄移(Concept drift)。以往的研究多關注在單標籤分類(Single label classification)所發生的概念飄移,然而現實生活上使用者對於資訊的需求是多元、多主題的,並且每個主題在時空的影響下擁有各自的喜好變化;同時文件也常屬於多個類別,若僅依照文件的主要概念,將之分類,則可能讓使用者錯過潛在感興趣的相關文件。因此本研究提出一個以字詞網路為基礎的使用者模型,透過它可以依照使用者對於多個主題的喜好對文件進行過濾,而在喜好發生變化時,也能夠適當的偵測並更新模型。
With the rapidly growing of internet, users can easily access mass information from a variety of search engines and portals. However, users also have to face the problem of “Information Overload” in the meantime. Therefore, the research of information filtering has been caused. Nevertheless, the users' interest are not static, they will change with time and space. The phenomenon that the distribution of data changes over time is called “Concept drift”. Previous researches about concept drift usually focus on the situation of single label classification. But in fact, the demand for information is diverse and user may be interested in multiple target concepts. And each concept has its own drift pattern. Furthermore, documents often belong to more than one class. People will miss potentially relevant documents if only considering the main concept in classification. Therefore, this paper proposes a keyword-network based user model, through which people can filter incoming documents according to their preference. When one of target concept has drift, the user model also has the ability to adapt this change.
中文部分
〔1〕 李浩平,「運用NGD建立適用於使用者回饋資訊不足之文件過濾系統」,國立中央大學,碩士論文, 民國100年。
〔2〕 鄭奕駿,「離線搜尋Wikipedia以縮減NGD運算時間之研究」,國立中央大學,碩士論文, 民國101年。
英文部分
〔3〕 Boutell, M. R., Luo, J., Shen, X., and Brown, C. M., "Learning multi-label scene classification", Pattern recognition, vol. 37, pp. 1757-1771, 2004.
〔4〕 Brandes, U., "A faster algorithm for betweenness centrality", Journal of Mathematical Sociology, vol. 25, pp. 163-177, 2001.
〔5〕 Chang, H.-C. and Chiun-Chieh, H., "Using topic keyword clusters for automatic document clustering", IEICE TRANSACTIONS on Information and Systems, vol. 88, pp. 1852-1860, 2005.
〔6〕 Chen, P.-I. and Lin, S.-J., "Automatic keyword prediction using Google similarity distance", Expert Systems with Applications, vol. 37, pp. 1928-1938, 2010.
〔7〕 Chen, P.-I. and Lin, S.-J., "Word AdHoc network: using Google core distance to extract the most relevant information", Knowledge-Based Systems, vol. 24, pp. 393-405, 2011.
〔8〕 Cilibrasi, R. L. and Vitanyi, P. M., "The google similarity distance", Knowledge and Data Engineering, IEEE Transactions, vol. 19, pp. 370-383, 2007.
〔9〕 De Bra, P. and Calvi, L., "AHA: a generic adaptive hypermedia system," in Proceedings of the 2nd Workshop on Adaptive Hypertext and Hypermedia, 1998, pp. 5-12.
〔10〕 Diestel, R., "Graph theory. 2005," ed: Springer-Verlag, 2005.
〔11〕 Dijkstra, E. W., "A note on two problems in connexion with graphs", Numerische mathematik, vol. 1, pp. 269-271, 1959.
〔12〕 Diplaris, S., Tsoumakas, G., Mitkas, P. A., and Vlahavas, I., "Protein classification with multiple algorithms," in Advances in Informatics, ed: Springer, 2005, pp. 448-456.
〔13〕 Girvan, M. and Newman, M. E., "Community structure in social and biological networks", Proceedings of the National Academy of Sciences, vol. 99, pp. 7821-7826, 2002.
〔14〕 Hanani, U., Shapira, B., and Shoval, P., "Information filtering: Overview of issues, research and systems", User Modeling and User-Adapted Interaction, vol. 11, pp. 203-259, 2001.
〔15〕 Joachims, T., Text categorization with support vector machines: Learning with many relevant features: Springer, 1998.
〔16〕 Klinkenberg, R. and Joachims, T., "Detecting concept drift with support vector machines," in Proceedings of the Seventeenth International Conference on Machine Learning (ICML), 2000.
〔17〕 Liu, Y.-C., Wang, X.-L., and Liu, B.-Q., "A feature selection algorithm for document clustering based on word co-occurrence frequency," in Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference 2004, pp. 2963-2968.
〔18〕 Magnini, B. and Strapparava, C., "User modelling for news web sites with word sense based techniques", User Modeling and User-Adapted Interaction, vol. 14, pp. 239-257, 2004.
〔19〕 Newman, M. E. and Girvan, M., "Finding and evaluating community structure in networks", Physical review E, vol. 69, p. 026113, 2004.
〔20〕 Page, E., "Continuous inspection schemes", Biometrika, vol. 41, pp. 100-115, 1954.
〔21〕 Quinlan, J. R., "Induction of decision trees", Machine learning, vol. 1, pp. 81-106, 1986.
〔22〕 Razmerita, L., Angehrn, A., and Maedche, A., "Ontology-based user modeling for knowledge management systems," in User Modeling 2003, ed: Springer, 2003, pp. 213-217.
〔23〕 Salton, G. and Buckley, C., "Term-weighting approaches in automatic text retrieval", Information processing & management, vol. 24, pp. 513-523, 1988.
〔24〕 Schwarzkopf, E., Heckmann, D., Dengler, D., and Kröner, A., "Mining the structure of tag spaces for user modeling," in Complete On-Line Proceedings of the Workshop on Data Mining for User Modeling at the 11th International Conference on User Modeling. Corfu, Griechenland, 2007, pp. 63-75.
〔25〕 Seidman, S. B., "Network structure and minimum degree", Social networks, vol. 5, pp. 269-287, 1983.
〔26〕 Tsoumakas, G. and Katakis, I., "Multi-label classification: An overview", International Journal of Data Warehousing and Mining (IJDWM), vol. 3, pp. 1-13, 2007.
〔27〕 Tsymbal, A., "The problem of concept drift: definitions and related work", Computer Science Department, Trinity College Dublin, 2004.
〔28〕 Tsymbal, A., Pechenizkiy, M., Cunningham, P., and Puuronen, S., "Dynamic integration of classifiers for handling concept drift", Information Fusion, vol. 9, pp. 56-68, 2008.
〔29〕 Tufis, D. and Mason, O., "Tagging romanian texts: a case study for qtag, a language independent probabilistic tagger," in Proceedings of the First International Conference on Language Resources and Evaluation (LREC), 1998, pp. 589-596.
〔30〕 Vitányi, P. M., Balbach, F. J., Cilibrasi, R. L., and Li, M., "Normalized information distance," in Information theory and statistical learning, ed: Springer, 2009, pp. 45-82.
〔31〕 White, S., O’Madadhain, J., Fisher, D., and Boey, Y.-B., "JUNG: Java Universal Network/Graph Framework", available now at: http://jung.sourceforge.net/index.html, 2004.
〔32〕 Xioufis, E. S., Spiliopoulou, M., Tsoumakas, G., and Vlahavas, I., "Dealing with concept drift and class imbalance in multi-label stream classification," in Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Two, 2011, pp. 1583-1588.
〔33〕 Zhang, P., Zhu, X., and Shi, Y., "Categorizing and mining concept drifting data streams," in Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 2008, pp. 812-820.
〔34〕 Žliobaitė, I., "Learning under concept drift: an overview", arXiv preprint arXiv:1010.4784, 2010.