探討多重記憶系統應用於遺忘因子的使用者興趣模型

簡易檢索 / 詳目顯示

回結果列表

研究生：	蘇鼎文 Ting-Wen Su
論文名稱：	探討多重記憶系統應用於遺忘因子的使用者興趣模型
指導教授：	林熙禎 Shi-Jen Lin
口試委員:
學位類別：	碩士 Master
系所名稱：	管理學院 - 資訊管理學系 Department of Information Management
論文出版年：	2015
畢業學年度：	103
語文別：	中文
論文頁數：	68
中文關鍵詞：	多重記憶系統模型、使用者模型、遺忘因子、文件過濾、圖形居中
外文關鍵詞：	Atkinson–Shiffrin memory model, User Profile, Forgetting Factor, Document Filter, Graph Centrality
相關次數：	點閱：9 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著使用者的閱讀習慣從紙本轉成數位、電腦轉成手機平板，使得使用者能夠隨時隨地的閱讀，不僅也增加了平均閱讀量也造成了更容易分散注意力的環境，面對這些新的挑戰，系統除了需要解決使用者興趣的概念飄移的問題以外還需要解決因網路資料規模呈指數成長而所造成系統處理即時性的問題。
而為了解決這些問題，本研究提出了使用不同居中度演算法來建立使用者模型中主題字詞圖形的核心字詞，藉由使用這些較具代表性的核心字詞使用在系統流程中能夠達到改善建立使用者模型的時間並且甚至改進了使用者模型判斷文件的效能。而在概念偏移的問題上，本研究透過多重記憶系統模型的架構於使用者模型的興趣分類上，將使用者興趣主題區分成長期與短期興趣。最後實驗證明短期興趣的動態遺忘因子能夠較快地適應興趣，而靜態的長期興趣遺忘因子能保留較多資訊。而在模擬網路串流的情況下系統的F-measure效能較以往研究高了並且提高了系統速度。

While user’s channels of reading is changing from physical to digital, desktop computer to mobile device. It becomes easier for user to read at anywhere, anytime. It have not only increasing the amount of average reading but also causing the user interest drift more often. To solve these problems, information filter system have to adapt the concept drift of user interests and trains fast enough to deal with the explosion of documents streaming.
The research try to use different centrality algorithm to find the core set of keywords in user profile's graph. Using the strong keywords instead of all of the keywords in the graph, system improves the speed of building user profile and even the performance of the system. In addition, the research design the user profile's interest base on the Atkinson-Shiffrin's multi-store model, the framework divided user interests into long-term interest and short-term interest. The short-term interest use the dynamic forgetting factor to adapt the concept drift occurred in user profile. In contrast, the long-term interest using the static forgetting factor to store information for the system to use. the experiments proved short term forgetting factor can adapt the concept drift quicker, and the long term forgetting factor can save more information in the interest. In the end, research’s system shows better F-measure performance and more efficient than the other research.

摘要    i
Abstract    ii
目錄    iv
圖目錄    vi
表目錄    viii
一、緒論    1
1-1 研究背景    1
1-2 研究動機    2
1-3 研究目的    3
二、相關研究    4
2-1 前處理框架    4
2-2 文件的字詞圖形表達方法    7
2-2-1 字詞頻率    8
2-2-2 NGD距離    8
2-2-3 字詞主題    9
2-3 居中度演算法    11
2-4 概念飄移    12
2-4-1 滑動視窗    14
2-4-2 遺忘因子    15
2-4-3 結合使用者興趣的遺忘因子    15
三、系統架構    18
3-1 系統流程    18
3-2 研究前處理流程    20
3-3 使用者模型    20
3-3-1 主題字詞圖形    20
3-3-2 字詞圖形核心辨識    22
3-4 主題映射    24
3-5 過濾文件    26
3-6 長、短期遺忘因子    27
3-7 主題興趣的生命週期    30
3-8短期興趣移除    32
四、實驗    34
4-1 研究設定    34
4-2 資料集    34
4-3 門檻值實驗：    36
4-3-1 主題門檻比例    36
4-3-2 興趣移除比例實驗    38
4-3-3 核心數量與演算法實驗    40
4-4 概念飄移實驗:    43
4-5使用者模型學習效能實驗    45
五、結論與未來研究方向    49
5-1 結論    49
5-2 未來研究    50
參考文獻    52
                                

[1] 吳登翔與林熙禎，「使用者模型為基礎的概念飄移預測」國立中央大學，碩士論文，2014年。
[2] 林文羽與林熙禎，「關鍵字為基礎的多主題概念飄移學習」，TANET2013臺灣網際網路研討會-論文集，2013年。
[3] 鄭奕駿與林熙禎，「離線搜尋Wikipedia以縮減NGD運算時間之研究」，國立中央大學，碩士論文,，2012年。
[4] 李浩平與林熙禎，“A NGD Based Document Filtering System for Limited User Feedback,” National Central University, 2011.
[5] Pew Research Center, “State of the News Media 2014,” 2014. [Online]. Available: http://www.journalism.org/packages/state-of-the-news-media-2014/.
[6] G. Widmer and M. Kubat, “Learning in the presence of concept drift and hidden contexts,” Mach. Learn., vol. 23, pp. 69–101, 1996.
[7] A. Tsymbal, “The problem of concept drift: definitions and related work,” Comput. Sci. Dep. Trinity Coll. Dublin, vol. 4, no. C, pp. 2004–15, 2004.
[8] H. Weinreich, H. Obendorf, E. Herder, and M. Mayer, “Not Quite the Average: An Empirical Study of Web Use,” ACM Trans. Web, vol. 2, no. 1, pp. 5:1–5:31, 2008.
[9] Nielsen, “Smartphones: So Many Apps, So Much Time,” 2014. [Online]. Available: http://www.nielsen.com/us/en/insights/news/2014/smartphones-so-many-apps--so-much-time.html.
[10] J. C. Schlimmer and R. H. Granger, “Incremental learning from noisy data,” Mach. Learn., vol. 1, no. 3, pp. 317–354, 1986.
[11] A. Bifet, R. Gavalda, and R. Gavaldà, “Learning from Time-Changing Data with Adaptive Windowing.,” SDM, vol. 7, p. 2007, 2007.
[12] Y. Jiao, “Maintaining stream statistics over multiscale sliding windows,” ACM Transactions on Database Systems, vol. 31, no. 6. pp. 1305–1334, 2006.
[13] I. Koychev, “Gradual forgetting for adaptation to concept drift,” in Proceedings of ECAI 2000 Workshop Current Issues in Spatio-Temporal Reasoning, 2000, pp. 101–106.
[14] M. Daoud, L. Tamine, and M. Boughanem, “A personalized search using a semantic distance measure in a graph-based ranking model,” J. Inf. Sci., vol. 37, pp. 614–636, 2011.
[15] X. Xu, Y. Shen, and G. Li, “News Video Semantic Topic Mining Based on Multi-wing Harmoniums Model,” MMEDIA 2013, Fifth Int. …, no. c, pp. 74–81, 2013.
[16] A. Hawalah and M. Fasli, “Dynamic user profiles for web personalisation,” Expert Syst. Appl., vol. 42, no. 5, pp. 2547–2569, 2015.
[17] A. Jain, “For Impatient Web Users, an Eye Blink Is Just Too Long to Wait,” 2012. [Online]. Available: http://www.nytimes.com/2012/03/01/technology/impatient-web-users-flee-slow-loading-sites.html.
[18] R. C. Atkinson and R. M. Shiffrin, “Human memory:A proposed system and its control processes,” Psychol. Learn. Motiv., vol. 2, pp. 89–195, 1968.
[19] H. Ebbinghaus, “Memory: A contribution to experimental psychology,” Retent. obliviscence as a Funct. time., pp. 62–80, 1913.
[20] D. Tufis and O. Mason, “Tagging romanian texts: a case study for qtag, a language independent probabilistic tagger,” in Proceedings of the First International Conference on Language Resources and Evaluation (LREC), 1998, pp. 589–596.
[21] L. Shen, L. Champollion, and A. K. Joshi, “LTAG-spinal and the Treebank,” Lang. Resour. Eval., vol. 42, no. 1, pp. 1–19, 2008.
[22] R. Krovetz, “Viewing morphology as an inference process,” Artif. Intell., vol. 118, no. 1–2, pp. 277–294, 2000.
[23] M. F. Porter, “An algorithm for suffix stripping,” Program: electronic library and information systems, vol. 14, no. 3. pp. 130–137, 1980.
[24] P.-I. Chen and S.-J. Lin, “Automatic keyword prediction using Google similarity distance,” Expert Syst. Appl., vol. 37, no. 3, pp. 1928–1938, Mar. 2010.
[25] K. Toutanova and C. D. Manning, “Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger,” Proc. Jt. SIGDAT Conf. Empir. Methods Nat. Lang. Process. Very Large Corpora, pp. 63–70, 2000.
[26] K. Toutanova, D. Klein, and C. D. Manning, “Feature-rich part-of-speech tagging with a cyclic dependency network,” Proc. 2003 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Vol. 1 (NAACL ’03), pp. 252–259, 2003.
[27] P.-I. Chen and S.-J. Lin, “Word AdHoc Network: Using Google Core Distance to extract the most relevant information,” Knowledge-Based Syst., vol. 24, no. 3, pp. 393–405, Apr. 2011.
[28] P. Koehn, F. J. Och, and D. Marcu, “Statistical phrase-based translation,” in Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology NAACL 03, 2003, vol. 1, no. June, pp. 48–54.
[29] H. P. Luhn, “A Statistical Approach to Mechanized Encoding and Searching of Literary Information,” IBM J. Res. Dev., vol. 1, no. 4, pp. 309–317, Oct. 1957.
[30] M. Girvan and M. E. J. Newman, “Community structure in social and biological networks,” PNAS, vol. 99, no. 12, pp. 7821–7826, 2002.
[31] M. Newman, Networks: An Introduction. Oxford University Press, 2010.
[32] L. C. Freeman, “A Set of Measures of Centrality Based on Betweenness,” Sociometry, vol. 40, no. 1, p. 35, 1977.
[33] U. Brandes, “A faster algorithm for betweenness centrality,” The Journal of Mathematical Sociology, vol. 25, no. 2. pp. 163–177, 2001.
[34] E. W. Dijkstra, “A note on two problems in connexion with graphs,” Numer. Math., vol. 1, no. 1, pp. 269–271, 1959.
[35] Y. WANG and G. CHEN, “A Centrality Measure Based on Two Layer Neighbors for Complex Networks,” J. Comput. Inf. Syst., vol. 1, pp. 25–32, 2013.
[36] L. Lü, C.-H. Jin, and T. Zhou, “Similarity index based on local paths for link prediction of complex networks,” Phys. Rev. E, vol. 80, no. 4, p. 046122, Oct. 2009.
[37] L. Lü and T. Zhou, “Link prediction in complex networks: A survey,” Phys. A Stat. Mech. its Appl., vol. 390, no. 6, pp. 1150–1170, Mar. 2011.
[38] J. O. Ao, A. Bifet, M. Pechenizkiy, and A. Bouchachia, “A survey on concept drift adaptation,” … Comput. Surv. ( …, vol. 46, no. 4, 2014.
[39] G. Widmer and M. Kubat, “Effective learning in dynamic environments by explicit context tracking,” in Machine Learning, ECML93, 1993, pp. 227–243.
[40] P. Kosina, J. Gama, and R. Sebastio, “Drift severity metric,” in Frontiers in Artificial Intelligence and Applications, 2010, vol. 215, pp. 1119–1120.
[41] L. L. Minku, A. P. White, and X. Yao, “The impact of diversity on online ensemble learning in the presence of concept drift,” IEEE Trans. Knowl. Data Eng., vol. 22, no. 5, pp. 730–742, 2010.
[42] I. Žliobaitė, “Learning under Concept Drift: an Overview,” Oct. 2010.
[43] C. Aggarwal and S. Philip, “A Framework for Clustering Massive Text and Categorical Data Streams.,” in SDM, 2006, pp. 479–483.
[44] D. Billsus and M. J. Pazzani, “User modeling for adaptative news access.,” User Model. User-adapt. Interact., vol. 10, pp. 147–180, 2002.
[45] D. Wu, D. Zhao, and X. Zhang, “An Adaptive User Profile Based on Memory Model,” Web-Age Information Management, 2008. WAIM ’08. The Ninth International Conference on. pp. 461–468, 2008.
[46] W. Wang, D. Zhao, H. Luo, and X. Wang, “Mining User Interests in Web Logs of an Online News Service Based on Memory Model,” 2013 IEEE Eighth Int. Conf. Networking, Archit. Storage, pp. 151–155, Jul. 2013.
[47] H.-C. Chang and C.-C. Hsu, “Using Topic Keyword Clusters for Automatic Document Clustering,” in Third International Conference on Information Technology and Applications (ICITA’05), vol. 1, pp. 419–424.
[48] L. Li, L. Zheng, F. Yang, and T. Li, “Modeling and broadening temporal user interest in personalized news recommendation,” Expert Syst. Appl., vol. 41, no. 7, pp. 3168–3177, Jun. 2014.
[49] T. Joachims, Text Categorization with Suport Vector Machines: Learning with Many Relevant Features. Springer, 1998.
[50] D. Greene and P. Cunningham, “Practical solutions to the problem of diagonal dominance in kernel document clustering,” in Proceedings of the 23rd international conference on Machine learning - ICML ’06, 2006, pp. 377–384.
[51] F. Abel, Q. Gao, G. Houben, and K. Tao, “Analyzing User Modeling on Twitter for Personalized News Recommendations,” Proc. 19th Int. Conf. User Model. Adapt. Pers., 2011.

簡易檢索 / 詳目顯示

相關論文