| 研究生: |
陳信夫 Hsin-fu Chen |
|---|---|
| 論文名稱: |
基於字詞關係動態建立階層分群 Dynamic Hierarchical Clustering Based on Taxonomy |
| 指導教授: |
林熙禎
Shi-jen Lin |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理學系 Department of Information Management |
| 畢業學年度: | 99 |
| 語文別: | 中文 |
| 論文頁數: | 58 |
| 中文關鍵詞: | 階層分群演算法、動態分群演算法、分類學 、文件分群 |
| 外文關鍵詞: | Dynamic clustering algorithm, Hierarchical clustering, Taxonomy |
| 相關次數: | 點閱:6 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
資訊爆炸時代的來臨,越來越多使用者在網路上搜尋相關資料進行閱讀。本研究目標是將大量文件資料進行階層分群(Hierarchical Clustering),並以字詞關係建置具有上下包含關係的分類學(Taxonomy),以用來成為階層群集的標籤。運用上,能方便使用者快速瞭解文件集有哪些主題,迅速選擇所需主題的文件進行閱讀。本研究提出的系統架構有效地改善了階層群集研究上的五個議題:高維度的向量、動態的特徵選取與文件分群、文件處理順序、文件跨領域分群與群集標籤之間的關係。
With the popularity of Internet, the World Wide Web contains a giant amount of information. To search relevant information from large number of texts becomes a challenge to the users. Hierarchical clustering is one of the methods to conquer this problem. Because its features let users can browse the topic gradually and find out the most relevant documents they have interesting. But there are still have some challenge in hierarchical clustering must be addressed, like high dimensionality of the data, dynamic data sets, the sensitivity of input order, documents has several concept, and the relationship of clusters and tags.
In this paper, we propose an approach of dynamic hierarchical clustering based on taxonomy to conquer those challenges. The experimental result shows that our method not only suitable for constructing hierarchical clustering in dynamic data sets, but also offer a easier structure to browse than traditional algorithms, BKM and UPGMA. In addition, the clusters are labeled meaningful tags with the relationship of containment can let users understand the whole concept of clusters rapidly.
1. 王千豪(民96),基於近似詞彙樣式匹配與共現關聯度之文件分群,未出版碩士論文,私立大同大學資訊經營學系(所)。
2. 張家寧(民98),以概念萃取為基礎之文件分群與視覺化,未出版碩士論文,國立交通大學資訊科學與工程研究所。
3. 楊雅婷、阮明淑(民95), 「分類相關概念之術語學研究」, 國家圖書館館刊, No. 2, 25-50。
4. 陳志豐(民97),基於高頻項目集結合近似樣式匹配之文件分群,未出版碩士論文,私立大同大學資訊經營學系(所)。
5. 潘麒全(民92),可修正的二分群集法,未出版碩士論文,私立中原大學資訊管理研究所。
6. Amigo, E., Gonzalo, J., Artiles, J., & Verdejo, F. (2009). A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retr., 12(4), 461-486.
7. Beil, F., Ester, M., & Xu, X. (2002). Frequent term-based text clustering. Paper presented at the Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, Edmonton, Alberta, Canada.
8. Berland, M., & Charniak, E. (1999). Finding parts in very large corpora. Paper presented at the Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, College Park, Maryland.
9. Caramia, M., Felici, G., & Pezzoli, A. (2004). Improving search results with data mining in a thematic search engine. Comput. Oper. Res., 31(14), 2387-2404.
10. Chen, P.-I., & Lin, S.-J. (2010). Automatic keyword prediction using Google similarity distance. Expert Systems with Applications, 37(3), 1928-1938.
11. Chung, S., & McLeod, D. (2005). Dynamic Pattern Mining: An Incremental Data Clustering Approach (pp. 85-112).
12. Cilibrasi, R. L., & Vitanyi, P. M. B. (2007). The Google Similarity Distance. IEEE Trans. on Knowl. and Data Eng., 19(3), 370-383.
13. Hearst, M. A. (1992). Automatic acquisition of hyponyms from large text corpora. Paper presented at the Proceedings of the 14th conference on Computational linguistics - Volume 2, Nantes, France.
14. Henschel, A., Woon, W. L., Wachter, T., & Madnick, S. (2009). Comparison of generality based algorithm variants for automatic taxonomy generation. Paper presented at the Proceedings of the 6th international conference on Innovations in information technology, AI-Ain, United Arab Emirates.
15. Heymann, P., & Garcia-Molina, H. (2006). Collaborative Creation of Communal Hierarchical Taxonomies in Social Tagging Systems.
16. Larsen, B., & Aone, C. (1999). Fast and effective text mining using linear-time document clustering. Paper presented at the Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, San Diego, California, United States.
17. Lin, F.-r., & Hsueh, C.-m. (2003, 6-9 Jan. 2003). Knowledge map creation and maintenance for virtual communities of practice. Paper presented at the System Sciences, 2003. Proceedings of the 36th Annual Hawaii International Conference on.
18. Lin, F.-r., & Yu, J.-H. (2009). Visualized cognitive knowledge map integration for P2P networks. Decis. Support Syst., 46(4), 774-785.
19. Makrehchi, M., & Kamel, M. S. (2007). Automatic Taxonomy Extraction Using Google and Term Dependency. Paper presented at the Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence.
20. Oliveira, A., Pereira, F., & Cardoso, A. (2002). Automatic Reading and Learning from Text. Paper presented at the Symposium on Artificial Intelligence.
21. Ong, T.-H., Chen, H., Sung, W.-k., & Zhu, B. (2005). Newsmap: a knowledge map for online news. Decision Support Systems, 39(4), 583-597.
22. Rajaraman, K., & Tan, A.-H. (2002). Knowledge discovery from texts: a concept frame graph approach. Paper presented at the Proceedings of the eleventh international conference on Information and knowledge management, McLean, Virginia, USA.
23. Reynaldo, G.-G., & Aurora, P.-P. (2010). Dynamic hierarchical algorithms for document clustering. Pattern Recognition Letters, 31(6), 469-477.
24. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Inf. Process. Manage., 24(5), 513-523.
25. Sanderson, M., & Croft, B. (1999). Deriving concept hierarchies from text. Paper presented at the SIGIR ''99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval.
26. Shih, J.-Y., Chang, Y.-J., & Chen, W.-H. (2008). Using GHSOM to construct legal maps for Taiwan''s securities and futures markets. Expert Syst. Appl., 34(2), 850-858.
27. Steinbach, M., Karypis, G., & Kumar, V. (2000). A comparison of document clustering techniques.
28. Tsui, E., Wang, W. M., Cheung, C. F., & Lau, A. S. M. (2010). A concept-relationship acquisition and inference approach for hierarchical taxonomy construction from tags. Inf. Process. Manage., 46(1), 44-57.
29. Widyantoro, D. H., Ioerger, T. R., & Yen, J. (2002). An Incremental Approach to Building a Cluster Hierarchy. Paper presented at the Proceedings of the 2002 IEEE International Conference on Data Mining.
30. Wong, W., & Fu, A. (2000). Incremental Document Clustering for Web Page Classification.
31. Woon, W. L., & Madnick, S. (2009). Asymmetric information distances for automated taxonomy construction. Knowl. Inf. Syst., 21(1), 91-111.
32. Yang, Y., Carbonell, J. G., Brown, R. D., Pierce, T., Archibald, B. T., & Liu, X. (1999). Learning Approaches for Detecting and Tracking News Events. IEEE Intelligent Systems, 14(4), 32-43.
33. Zhang, W., Yoshida, T., Tang, X., & Wang, Q. (2010). Text clustering using frequent itemsets. Knowledge-Based Systems, 23(5), 379-388.
34. 視覺素養學習網(無日期),2011年5月21日取自http://vr.theatre.ntu.edu.tw/fineart/index.html。
35. 國際數據資訊公司(2010),2011年5月21日取自http://www.idc.com/。
36. Medical Subject Headings(2011),2011年5月21日取自http://www.nlm.nih.gov/mesh/。
37. Wikipedia(2001),2011年5月21日取自http://www.wikipedia.org/。