| 研究生: |
陳慶治 Qingzhi Chen |
|---|---|
| 論文名稱: | Improvement of Kernel Dependency Estimation and Case Study on Skewed Data |
| 指導教授: |
張嘉惠
Chia-Hui Chang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 英文 |
| 論文頁數: | 32 |
| 中文關鍵詞: | 分類 、核依賴估計 |
| 外文關鍵詞: | classification, Kernel dependency estimation |
| 相關次數: | 點閱:12 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
核依賴估計是一個計算兩個抽象物件之間的依賴的學習架構。雖然已經有很多方面的應用,但是它的一些特性還沒有被徹底的研究。本文討論了核依賴估計的兩個實際操作中常見的問題。第一個問題是它對每個標籤的實數值輸出與最終目標所希望求得的二進位值還是有所區別的。通常解決這個問題的做法是使用特定的臨界值策略。本文提出一個替代方法,通過特殊的堆疊歸納法,加入了一個第二層的分類器。第二個問題是關於核依賴估計應用於不平衡數據集時性能的衰減現象。我們的實驗結果顯示核依賴估計並不直接適用於不平衡數據集,對此我們提出了補救措施來處理不平衡數據集。
Kernel dependency estimation is a learning framework of finding the dependencies between two general classes of objects. Although already succeeded in many kinds of applications, its properties are not fully studied. In this paper we will discuss two practical issues in it. The first one is about its real-value output for each label which is different from the ultimate target-binary value for one of k coding scheme. Thus there usually exists a gap between predicted real-value from KDE and the ground true binary value. One common practice to reduce the gap is using threshold strategies. In this paper we provide an alternative approach to combine a second level classifier by a special degenerated form of stacked generalization. The second issue is about how the performance decreases when KDE is applied to classification with skewed data, our experiments show KDE is not an appropriate approach for skewed data, and then we provide a remedy to handle the skewed data.
[1] W. Bi and J.T. Kwok. Multi-label classification on tree- and DAG-structured hierarchies. In Proceedings of the 28th International Conference on Machine Learning, pages 17–24, 2011.
[2]K. Dembczynski, W. Waegeman, W. Cheng, and E. Hüllermeier. On Label Dependence in Multi-Label Classification. In Proceedings of the 2nd International Workshop on Learning from Multi-Label Data, pages. 5-12, 2010.
[3] V. Ganganwar. An overview of classification algorithms for imbalanced datasets. International Journal of Emerging Technology and Advanced Engineering, Volume 2, Issue 4, April 2012.
[4] J. V. Hulse, M. Khoshgoftaar , and A. Napolitano. Experimental perspectives on learning from imbalanced data. In Proceeding ICML '07 Proceedings of the 24th international conference on Machine learning, pages. 935-942, 2007.
[5] M. Ioannou, G. Sakkas, G. Tsoumakas, and I. P. Vlahavas. Obtaining Bipartitions from Score Vectors for Multi-Label Classification, International Conference on Tools with Artificial Intelligence - ICTAI , vol. 1, pp. 409-416, 2010.
[6] J. R. Quevedo, O. Luaces, and A. Bahamonde. Multilabel classifiers with a probabilistic thresholding strategy, Pattern Recognition, vol. 45, no. 2, pp. 876–883, 2012.
[7] L Rokach, Ensemble-based classifiers, Artificial Intelligence Review, Vol. 33, No. 1-2, pp. 1-39, 2009.
[8] S. Russell and P. Norvig. Artificial Intelligence : A Modern Approach Third Edition 2010, Prentice Hall.
[9] M. Sewell, Ensemble Learning (2008) edited by University College London.
[10] F. Tai, and H.T Lin. Multi-label classification with principle label space transformation. In Proceedings of the 2nd International Workshop on Learning from Multi-Label Data, Haifa, Israel, 2010.
[11]G. Tsoumakas, I. Katakis, and I. Vlahavas. Mining Multi-label Data. Data Mining and Knowledge Discovery Handbook, O. Maimon, L. Rokach (Ed.). Springer, 2nd edition, 2010a.
[12] D. H. Wolpert. Stacked Generalization, Neural Networks, Vol. 5, pages 241—259, 1992.
[13] J. Weston, O. Chapelle, A. Elisseeff, B. Sch¨olkopf, and V. Vapnik. Kernel dependency estimation. In Advances in Neural Information Processing Systems 15, 2003.
[14] Y.M Yang. A study of thresholding strategies for text categorization. Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. Pages 137-145. 2001
[15] M. L. Zhang and Z. H. Zhou. A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, in press.