跳到主要內容

簡易檢索 / 詳目顯示

研究生: 徐永棚
Yung-Peng Hsu
論文名稱: 基於雙變量及多變量貝他分布的兩個新型機率分群模型
Two probabilistic clustering models based on bivariate and multivariate beta distributions
指導教授: 陳弘軒
Hung-Hsuan Chen
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 52
中文關鍵詞: 混和模型雙變量和多變量貝他分布分群期望最大化演算法
外文關鍵詞: Mixture Models, Bivariate and Multivariate Beta Distribution, Clustering, Expectation Maximization Algorithm
相關次數: 點閱:9下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本篇論文提出了兩種基於機率模型的分群方法: Multivariate Beta Mixture Model (MBMM)和Flexible Bivariate Beta Mixture Model (FBBMM)。兩個模型的差異包含輸入的變量數(多變量和雙變量)和貝他分布的定義。我們基於期望最大化(Expectation-Maximization, EM)演算法、最大似然估計(maximum likelihood estimation, MLE)和最佳化方法sequential least squares programming optimizer (SLSQP)來估計模型參數。我們對人工合成和真實世界的資料集進行實驗,來確認MBMM和FBBMM的有效性。


    This thesis presents two probability model-based clustering methods: the Multivariate Beta Mixture Model (MBMM) and the Flexible Bivariate Beta Mixture Model (FBBMM). Differences between the two models include the number of input variates (multivariate or bivariate) and the definition of the beta distributions. We estimate model parameters based on the Expectation-Maximization (EM) algorithm, the maximum likelihood estimation (MLE), and the sequential least squares programming optimizer (SLSQP). We conduct experiments on the synthetic and the real datasets to confirm the effectiveness of the MBMM and FBBMM.

    目錄 頁次 摘要 iv Abstract v 致謝 vi 目錄 vii 圖目錄 ix 表目錄 xi 一、緒論1 1.1 研究動機.................................................................. 1 1.2 研究目標.................................................................. 2 1.3 研究貢獻.................................................................. 2 1.4 論文架構.................................................................. 2 二、相關研究4 三、模型及方法7 3.1 Multivaraite Beta Mixture Model ................................... 7 3.1.1 Multivaraite beta distribution............................... 7 3.1.2 MBMM 機率密度函數........................................ 10 3.1.3 MBMM 參數學習.............................................. 10 3.2 Flexible Bivariate Beta Mixture Model ............................ 13 3.2.1 Flexible bivariate beta distribution ........................ 14 3.2.2 Flexible multivariate beta distribution 問題............. 17 3.2.3 FBBMM 機率密度函數....................................... 19 3.2.4 FBBMM 參數學習............................................. 19 四、實驗結果22 4.1 實驗設定.................................................................. 22 4.2 評估指標.................................................................. 22 4.3 訓練資料集............................................................... 24 4.3.1 人工生成資料集................................................ 24 4.3.2 真實世界資料集................................................ 24 4.4 實驗結果.................................................................. 25 4.4.1 人工生成資料集................................................ 25 4.4.2 wine 資料集..................................................... 30 4.4.3 breast cancer 資料集.......................................... 32 4.4.4 MNIST 資料集.................................................. 34 五、總結36 5.1 結論........................................................................ 36 5.2 未來展望.................................................................. 36 參考文獻 38 附錄A 實驗程式碼 40

    [1] M. Jones, “Multivariate t and beta distributions associated with the multivariate f
    distribution,” Metrika, vol. 54, no. 3, pp. 215–231, 2002.
    [2] I. Olkin and R. Liu, “A bivariate beta distribution,” Statistics & Probability Letters,
    vol. 62, no. 4, pp. 407–412, 2003.
    [3] I. Olkin and T. A. Trikalinos, “Constructions for a bivariate beta distribution,”
    Statistics & Probability Letters, vol. 96, pp. 54–60, 2015.
    [4] S. Lloyd, “Least squares quantization in pcm,” IEEE transactions on information
    theory, vol. 28, no. 2, pp. 129–137, 1982.
    [5] D. Arthur and S. Vassilvitskii, “K-means++: The advantages of careful seeding,”
    Stanford, Tech. Rep., 2006.
    [6] H.-S. Park and C.-H. Jun, “A simple and fast algorithm for k-medoids clustering,”
    Expert systems with applications, vol. 36, no. 2, pp. 3336–3341, 2009.
    [7] D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space
    analysis,” IEEE Transactions on pattern analysis and machine intelligence, vol. 24,
    no. 5, pp. 603–619, 2002.
    [8] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al., “A density-based algorithm for
    discovering clusters in large spatial databases with noise.,” in kdd, vol. 96, 1996,
    pp. 226–231.
    [9] M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander, “Optics: Ordering points
    to identify the clustering structure,” ACM Sigmod record, vol. 28, no. 2, pp. 49–60,
    1999.
    [10] A. Hinneburg, D. A. Keim, et al., An efficient approach to clustering in large
    multimedia databases with noise. Bibliothek der Universität Konstanz, 1998, vol. 98.
    [11] T. Zhang, R. Ramakrishnan, and M. Livny, “Birch: An efficient data clustering
    method for very large databases,” ACM sigmod record, vol. 25, no. 2, pp. 103–114,
    1996.
    [12] S. Guha, R. Rastogi, and K. Shim, “Cure: An efficient clustering algorithm for large
    databases,” ACM Sigmod record, vol. 27, no. 2, pp. 73–84, 1998.
    [13] ——, “Rock: A robust clustering algorithm for categorical attributes,” Information
    systems, vol. 25, no. 5, pp. 345–366, 2000.
    [14] N. Manouchehri and N. Bouguila, “A probabilistic approach based on a finite mixture
    model of multivariate beta distributions.,” in ICEIS (1), 2019, pp. 373–380.
    [15] D. Kraft, “A software package for sequential quadratic programming,” Forschungsbericht-
    Deutsche Forschungs- und Versuchsanstalt fur Luft- und Raumfahrt, 1988.
    [16] D. L. Libby and M. R. Novick, “Multivariate generalized beta distributions with
    applications to utility assessment,” Journal of Educational Statistics, vol. 7, no. 4,
    pp. 271–294, 1982.
    [17] S. Nadarajah and S. Kotz, “Some bivariate beta distributions,” Statistics, vol. 39,
    no. 5, pp. 457–466, 2005.
    [18] S. Nadarajah et al., “A new bivariate beta distribution with application to drought
    data,” Metron, vol. 65, no. 2, pp. 153–74, 2007.
    [19] B. C. Arnold and H. K. T. Ng, “Flexible bivariate beta distributions,” Journal of
    Multivariate Analysis, vol. 102, no. 8, pp. 1194–1202, 2011.
    [20] L. Hubert and P. Arabie, “Comparing partitions,” Journal of classification, vol. 2,
    no. 1, pp. 193–218, 1985.
    [21] D. Steinley, “Properties of the hubert-arable adjusted rand index.,” Psychological
    methods, vol. 9, no. 3, p. 386, 2004.
    [22] N. X. Vinh, J. Epps, and J. Bailey, “Information theoretic measures for clusterings
    comparison: Variants, properties, normalization and correction for chance,” The
    Journal of Machine Learning Research, vol. 11, pp. 2837–2854, 2010.
    [23] L. Buitinck, G. Louppe, M. Blondel, et al., “API design for machine learning software:
    Experiences from the scikit-learn project,” in ECML PKDD Workshop: Languages
    for Data Mining and Machine Learning, 2013, pp. 108–122.
    [24] D. Dua and C. Graff, UCI machine learning repository, 2017. [Online]. Available:
    http://archive.ics.uci.edu/ml.
    [25] Y. LeCun, C. Cortes, and C. Burges, “Mnist handwritten digit database,” ATT
    Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, vol. 2, 2010.
    [26] H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: A novel image dataset for
    benchmarking machine learning algorithms,” arXiv preprint arXiv:1708.07747, 2017.
    39

    QR CODE
    :::