長期追蹤資料上的 Gamma-EM 分群｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	李宥序 Yu-Hsu Li
論文名稱：	長期追蹤資料上的 Gamma-EM 分群 Gamma-EM clustering on longitudinal data
指導教授：	王紹宣 Shao-Hsuan Wang
口試委員:
學位類別：	碩士 Master
系所名稱：	理學院 - 統計研究所 Graduate Institute of Statistics
論文出版年：	2022
畢業學年度：	110
語文別：	英文
論文頁數：	55
中文關鍵詞：	EM 演算法、散度、長期追蹤資料、線性混合模型、分群
外文關鍵詞：	EM algorithm, divergence, longitudinal data, linear mixed effect model, clustering
相關次數：	點閱：24 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著現代科技和醫學的進步，已經有很多精密儀器可以準確檢測各種生物指標。在實踐中，研究一種藥物是否具有顯著效果是藥物研發中的一個重要問題。傳統上，我們會驗證實驗組和對照組是否有顯著差異，然後解釋藥物療效是否有效；然而，在一些臨床數據上，我們不知道數據背後的分群，進而判斷藥物有效性，本文將採用 PBC 資料作為例子。這裡我們使用Lin 和Wang（2021）提出的γ-EM 算法對未知群體的種群進行聚類分析。γ-EM 是通過γ-divergence 改進的EM 算法，可用於實現分類的魯棒性。在這種情況下，我們可以使用γ-EM 來初步了解種群是否具有不同群體的表現。

With the advancement of modern technology and medicine, there are already many sophisticated instruments that can accurately detect various biological indicators. In practice, it is an important issue in drug research and development to study whether a drug has a significant effect. Traditionally, we will verify whether there is a significant difference between the experimental group and the control group, and then explain whether the drug efficacy is effective; from another perspective, here we use the γ-EM algorithm proposed by Lin and Wang(2021) to perform cluster analysis on the population of unknown groups.
γ-EM is an improved EM algorithm through γ-divergence, which can be used to achieve robustness in classification. In this case, we can use γ-EM to initially understand whether the population has the performance of different groups.

Contents
page
摘要iii
Abstract v
Acknowledgement vii
Contents ix
List of Figures xi
List of Tables xiii
Introduction 1
1 Longitudinal data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Classification and Clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 Clustering of longitudinal data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Background 5
1 Linear mixed effect model (LMM). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Maximum Likelihood Estimation (MLE). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Expectation-maximization (EM) algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4 Divergence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5 q-Gaussian Distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
ix
CONTENTS
6 Density-based spatial clustering of applications with noise (DBSCAN) . . . . . 12
Methodology 15
1 LMM in longitudinal clustering data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 Gamma-EM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Simulation 21
1 Two groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.1 Model setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3 Comparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2 Three groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.1 Model setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Comparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Application 29
1 Primary Biliary Cirrhosis (PBC) Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Conclusion 35
Bibliography 37
                                

Bibliography [1] Folly Adjogou, Alejandro Murua, and Wolfgang Raffelsberger. Bayesian lasso functional clustering for time-course and longitudinal data.
[2] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. JOURNAL OF THE ROYAL STATISTICAL SOCIETY, SERIES B, 39(1):1–38, 1977.
[3] Shinto Eguchi, Osamu Komori, and Shogo Kato. Projective power entropy and maximum tsallis entropy distributions. Entropy, 13(10):1746–1764, 2011.
[4] Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd, volume 96, pages 226–231, 1996.
[5] Hironori Fujisawa and Shinto Eguchi. Robust parameter estimation with a small bias against heavy contamination. Journal of Multivariate Analysis, 99(9):2053– 2081, 2008.
[6] MC Jones, Nils Lid Hjort, Ian R Harris, and Ayanendranath Basu. A comparison of related density-based minimum divergence estimators. Biometrika, 88(3):865–873, 2001.
37 BIBLIOGRAPHY [7] Solomon Kullback and Richard A Leibler. On information and sufficiency. The annals of mathematical statistics, 22(1):79–86, 1951.
[8] James MacQueen et al. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1, pages 281–297. Oakland, CA, USA, 1967.
[9] Wim Meeus, Jurjen Iedema, Marianne Helsen, and Wilma Vollebergh. Patterns of adolescent identity development: Review of literature and longitudinal analysis.
Developmental review, 19(4):419–461, 1999.
[10] Giovanni Saraceno, Abhik Ghosh, Ayanendranath Basu, and Claudio Agostinelli. Robust estimation under linear mixed models: The minimum density power divergence approach. arXiv preprint arXiv:2010.05593, 2020.
[11] Jean D Skinner, Betty Ruth Carruth, Wendy Bounds, and Paula J Ziegler. Children’s food preferences: a longitudinal analysis. Journal of the American Dietetic Association, 102(11):1638–1647, 2002.
[12] Xi Zhang. Longitudinal Data Clustering Via Kernel Mixture Models. PhD thesis, 2021.

簡易檢索 / 詳目顯示

相關論文