跳到主要內容

簡易檢索 / 詳目顯示

研究生: 范文翔
Wen-Hsiang Fan
論文名稱: 一個估計資料群數的新方法
A new method for estimating the number of clusters
指導教授: 銀慶剛
Ching-Kang Ing
口試委員:
學位類別: 碩士
Master
系所名稱: 理學院 - 統計研究所
Graduate Institute of Statistics
畢業學年度: 96
語文別: 中文
論文頁數: 32
中文關鍵詞: K平均值分群演算法訊息準則
外文關鍵詞: Information criterion, K-means clustering algorithm
相關次數: 點閱:9下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 估計資料群數是群集分析(cluster analysis)中一個重要的問題。在本篇論文中,我們嘗試模型選取中最被普遍使用的貝氏訊息準則(Bayesian information criterion)做為群集問題中選取群數的標準。然而,在資料變數為一維的情況下,我們發現使用BIC會高估資料的真實群數;即使嘗試各種不同的懲罰項,並沒有找到一個有效的一致性訊息準則(consistent information criterion)。因此,本篇論文提出了一個群數估計的新方法,並經由程式模擬說明其估計資料群數的準確性。


    A major problem in cluster analysis is to find the number of clusters. In this paper, we try to use Bayesian information criterion(BIC), a wide-used criterion in model selection problem, as a criterion to estimate the number of clusters. However, we found that the ture number of clusters would be overestimated when using BIC as a criterion in one dimension case. We can not find a consistent information criterion in the problem of number estimation. We propose a new method for estimating the number of clusters and show the currency of the method via simulation study.

    一、緒論..................................1 1.1 研究背景..............................1 1.2 研究動機..............................2 二、文獻回顧..............................3 2.1 Gap統計量.............................3 2.2 Calinski-Harabasz index...............4 2.3 Krzanowski-Lai index..................5 2.4 Hartigan統計量........................5 三、一致性訊息準則在集群分析上的探討......7 3.1 高估分群群數現象的發生................9 3.2 低估分群群數現象的發生................11 3.3 使用一致性訊息準則估計群數的模擬結果..12 四、估計群數的新方法......................16 4.1 最小變量法............................18 4.2 模擬研究..............................19 五、結論與未來方向........................22 參考文獻..................................23

    [1] Calinski, R. B. and Harabasz, J. A.(1974). A denrite method for cluster analysis. Communications in Statistics 3, 1-27.
    [2] Hartigan, J. A.(1975). Clustering Algorithms. Wiley.
    [3] Kaufman, L. and Rousseeuw, P.(1990). Finding Groups in Data: An Introduction to Cluster Analysis. New York: Wiley.
    [4] Krzanowski, W. J. and Lai, Y. T.(1985). A criterion for determining the number of clusters in a data set. Biometrics 44, 23-34.
    [5] Milligan, G. W. and Cooper, M. C.(1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159-179
    [6] Sugar, Catherine A. and James, Gareth M.(2003). Finding the number of clusters in a data set: An information theoretic approach. Journal of the American Statistical Association 98, 750-763.
    [7] Tibshirani, R., Walther, G., and Hastie, T.(2001). Estimating the number of clusters in a data set via the gap statistics. Journal of the Royal Statistical Society, Series B 63, 411-423.

    QR CODE
    :::