| 研究生: |
徐永棚 Yung-Peng Hsu |
|---|---|
| 論文名稱: |
基於雙變量及多變量貝他分布的兩個新型機率分群模型 Two probabilistic clustering models based on bivariate and multivariate beta distributions |
| 指導教授: |
陳弘軒
Hung-Hsuan Chen |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 中文 |
| 論文頁數: | 52 |
| 中文關鍵詞: | 混和模型 、雙變量和多變量貝他分布 、分群 、期望最大化演算法 |
| 外文關鍵詞: | Mixture Models, Bivariate and Multivariate Beta Distribution, Clustering, Expectation Maximization Algorithm |
| 相關次數: | 點閱:9 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本篇論文提出了兩種基於機率模型的分群方法: Multivariate Beta Mixture Model (MBMM)和Flexible Bivariate Beta Mixture Model (FBBMM)。兩個模型的差異包含輸入的變量數(多變量和雙變量)和貝他分布的定義。我們基於期望最大化(Expectation-Maximization, EM)演算法、最大似然估計(maximum likelihood estimation, MLE)和最佳化方法sequential least squares programming optimizer (SLSQP)來估計模型參數。我們對人工合成和真實世界的資料集進行實驗,來確認MBMM和FBBMM的有效性。
This thesis presents two probability model-based clustering methods: the Multivariate Beta Mixture Model (MBMM) and the Flexible Bivariate Beta Mixture Model (FBBMM). Differences between the two models include the number of input variates (multivariate or bivariate) and the definition of the beta distributions. We estimate model parameters based on the Expectation-Maximization (EM) algorithm, the maximum likelihood estimation (MLE), and the sequential least squares programming optimizer (SLSQP). We conduct experiments on the synthetic and the real datasets to confirm the effectiveness of the MBMM and FBBMM.
[1] M. Jones, “Multivariate t and beta distributions associated with the multivariate f
distribution,” Metrika, vol. 54, no. 3, pp. 215–231, 2002.
[2] I. Olkin and R. Liu, “A bivariate beta distribution,” Statistics & Probability Letters,
vol. 62, no. 4, pp. 407–412, 2003.
[3] I. Olkin and T. A. Trikalinos, “Constructions for a bivariate beta distribution,”
Statistics & Probability Letters, vol. 96, pp. 54–60, 2015.
[4] S. Lloyd, “Least squares quantization in pcm,” IEEE transactions on information
theory, vol. 28, no. 2, pp. 129–137, 1982.
[5] D. Arthur and S. Vassilvitskii, “K-means++: The advantages of careful seeding,”
Stanford, Tech. Rep., 2006.
[6] H.-S. Park and C.-H. Jun, “A simple and fast algorithm for k-medoids clustering,”
Expert systems with applications, vol. 36, no. 2, pp. 3336–3341, 2009.
[7] D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space
analysis,” IEEE Transactions on pattern analysis and machine intelligence, vol. 24,
no. 5, pp. 603–619, 2002.
[8] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al., “A density-based algorithm for
discovering clusters in large spatial databases with noise.,” in kdd, vol. 96, 1996,
pp. 226–231.
[9] M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander, “Optics: Ordering points
to identify the clustering structure,” ACM Sigmod record, vol. 28, no. 2, pp. 49–60,
1999.
[10] A. Hinneburg, D. A. Keim, et al., An efficient approach to clustering in large
multimedia databases with noise. Bibliothek der Universität Konstanz, 1998, vol. 98.
[11] T. Zhang, R. Ramakrishnan, and M. Livny, “Birch: An efficient data clustering
method for very large databases,” ACM sigmod record, vol. 25, no. 2, pp. 103–114,
1996.
[12] S. Guha, R. Rastogi, and K. Shim, “Cure: An efficient clustering algorithm for large
databases,” ACM Sigmod record, vol. 27, no. 2, pp. 73–84, 1998.
[13] ——, “Rock: A robust clustering algorithm for categorical attributes,” Information
systems, vol. 25, no. 5, pp. 345–366, 2000.
[14] N. Manouchehri and N. Bouguila, “A probabilistic approach based on a finite mixture
model of multivariate beta distributions.,” in ICEIS (1), 2019, pp. 373–380.
[15] D. Kraft, “A software package for sequential quadratic programming,” Forschungsbericht-
Deutsche Forschungs- und Versuchsanstalt fur Luft- und Raumfahrt, 1988.
[16] D. L. Libby and M. R. Novick, “Multivariate generalized beta distributions with
applications to utility assessment,” Journal of Educational Statistics, vol. 7, no. 4,
pp. 271–294, 1982.
[17] S. Nadarajah and S. Kotz, “Some bivariate beta distributions,” Statistics, vol. 39,
no. 5, pp. 457–466, 2005.
[18] S. Nadarajah et al., “A new bivariate beta distribution with application to drought
data,” Metron, vol. 65, no. 2, pp. 153–74, 2007.
[19] B. C. Arnold and H. K. T. Ng, “Flexible bivariate beta distributions,” Journal of
Multivariate Analysis, vol. 102, no. 8, pp. 1194–1202, 2011.
[20] L. Hubert and P. Arabie, “Comparing partitions,” Journal of classification, vol. 2,
no. 1, pp. 193–218, 1985.
[21] D. Steinley, “Properties of the hubert-arable adjusted rand index.,” Psychological
methods, vol. 9, no. 3, p. 386, 2004.
[22] N. X. Vinh, J. Epps, and J. Bailey, “Information theoretic measures for clusterings
comparison: Variants, properties, normalization and correction for chance,” The
Journal of Machine Learning Research, vol. 11, pp. 2837–2854, 2010.
[23] L. Buitinck, G. Louppe, M. Blondel, et al., “API design for machine learning software:
Experiences from the scikit-learn project,” in ECML PKDD Workshop: Languages
for Data Mining and Machine Learning, 2013, pp. 108–122.
[24] D. Dua and C. Graff, UCI machine learning repository, 2017. [Online]. Available:
http://archive.ics.uci.edu/ml.
[25] Y. LeCun, C. Cortes, and C. Burges, “Mnist handwritten digit database,” ATT
Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, vol. 2, 2010.
[26] H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: A novel image dataset for
benchmarking machine learning algorithms,” arXiv preprint arXiv:1708.07747, 2017.
39