跳到主要內容

簡易檢索 / 詳目顯示

研究生: 胡雅萱
Ya-Hsuan Hu
論文名稱: Maximum likelihood estimation for double-truncation data under a special exponential family
指導教授: 江村剛志
Takeshi Emura
口試委員:
學位類別: 碩士
Master
系所名稱: 理學院 - 統計研究所
Graduate Institute of Statistics
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 80
中文關鍵詞: 林德伯格-費勒中央極限定理牛頓-拉弗森演算法固定點迭代法
外文關鍵詞: Lindeberg-Feller central limit theorem, Newton-Raphson algorithm, Fixed point iteration
相關次數: 點閱:18下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 人類壽命資料分析經常發生截斷(truncation)的情況,因為觀測時間經常僅限於某個特定的區間內。這篇論文中主要針對有母數模型進行推論,在觀察樣本受限於雙尾截斷的情況下(也就是左尾截斷以及右尾截斷)進行分析。Efron and Petrosian (1999) 提出特殊指數族(special exponential family, SEF)為模型,特殊指數族的分布可定義如下: 其中 以及 。
    其可配適雙尾截斷資料,但在Efron and Petrosian論文中,並無做進一步的理論及模擬探討,所以此篇論文中我們將雙尾截斷資料利用特殊指數族的理論及模擬完成。我們使用此模型為基礎,進而利用牛頓-拉弗森演算法(Newton-Raphson algorithm)以及固定點迭代法(Fixed-point iteration)得到參數最大概似估計量(Maximum likelihood estimator, MLE),並且比較此兩種方法在參數估計上的優劣;為了確保三個參數的特殊指數族收斂性質,我們提出了隨機牛頓-拉弗森演算法(Randomized Newton-Raphson algorithm)。而在理論部分,若變數受限於雙尾截斷而導致隨機變數互相獨立但來自不同分配時,我們依然可以得到最大概似估計量的大樣本性質,諸如一致性(consistency)、有效性(efficiency)、漸近常態性(Normality)。最後我們利用人口壽命資料作為例證。


    Truncation often occurs in lifetime data analysis, where samples are collected under certain time constraints. This thesis considers parametric inference when random samples are subject to double-truncation, i.e., both left- and right-truncations. Efron and Petrosian (1999) proposed to fit the special exponential family (SEF)

    where and ,
    for doubly-truncated data, but did not study it’s computational and theoretical properties. This thesis fills this gap.
    We develop computational algorithms for Newton-Raphson and fixed point iteration techniques to obtain maximum likelihood estimator (MLE) of the parameters, and then compare the performance of these two methods by simulations. To stabilize the convergence under the three-parameter SEF, we propose a randomized Newton-Raphson method. Also, we study the asymptotic properties of the MLE based on the theory of independent but not identically distributed (i.n.i.d) random variables that accommodate the heterogeneity of truncation intervals. Lifetime data from the Channing House study are used for illustration.

    摘要 I Abstract II 致謝辭 III List of Tables VI List of Figures VII 1 Introduction 1 2 Special exponential family (SEF) 4 Example 1: One-parameter SEF 4 Example 2: Two-parameter SEF 5 Example 3: Cubic SEF 6 3 Method of estimation 9 3.1 Likelihood functions 9 Example 1: One-parameter SEF 9 Example 2: Two-parameter SEF 10 Example 3: Cubic SEF 11 3.2 Newton-Raphson method 12 Example 1: One-parameter SEF 12 Example 2: Two-parameter SEF 14 Example 3: Cubic SEF 17 3.3 Fixed-point iteration method 19 Example 1: One-parameter SEF 20 Example 2: Two-parameter SEF 21 4 Asymptotic Theory 23 5 Simulation 30 5.1 Data generation 30 5.1.1 One-parameter SEF 30 5.1.2 Two-parameter SEF 31 5.1.3 Cubic SEF 32 5.2 Simulation results for the one-parameter SEF 35 5.3 Simulation results for the two-parameter SEF 38 5.4 Simulation results for the cubic SEF 40 5.5 Simulation results for confidence interval 43 6 Data analysis 46 6.1 Data Background 46 6.2 Numerical result 47 Appendix A: Proof of the asymptotic properties of the MLE 55 Appendix B Data 67 References 70

    Akaike H (1973) Information theory and an extension of the maximum likelihood principle, Petrov BN and Csaki F, Proc. 2nd International Symposium on Information Theory, Akademiai Kiado, Budapest, pp.267-281.
    Bakoyannis G, Touloumi G. (2012) Practical methods for competing risks data: a review. Statistical Method in Medical Research : 21: 257-272.
    Balakrishnan N, Asit Basu P (1996) The Exponential Distribution: Theory, Methods and Applications. Taylor & Francis Ltd, United States.
    Bradley RA, Gart JJ (1962) The asymptotic properties of ML estimators when sampling from associated population. Biometrika 49: 205-214.
    Burden RL, Faires JD (2011) Numerical Analysis. Cengage Learning, Boston.
    Chen YH (2009) Weighted Breslow-type and maximum likelihood estimation in semiparametric transformation models. Biometrika 96: 235-251.
    Chen YH (2012) Maximum likelihood analysis of semi-competing risks data with semiparametric regression models. Lifetime Data Analysis 18: 36-57.
    Chang SM, Genton MG (2007) Extreme value distributions for the skew-symmetric family of distributions. Communications in statistics-Theory and methods 36:1705-1717.
    Cohen AC (1991) Truncated and Censored Samples. Marcel Dekker, New York.
    Casella G, Berger RL (2002) Statistical Inference. Duxbury Thomson Learning, Australia.
    Castillo JD (1994) The singly truncated normal distribution: A non-steep exponential family. Annals of the Institute of Statistical Mathematics, 46: 57-66.
    Cheng YJ (2014) Personal communication. Date: 2014/06/23. Place: National Central University.
    Commenges D (2002) Inference for multi-state models from interval-censored data. Statistical Methods in Medical Research 11: 167-182.
    Efron B, Tibshirani R (1996) Using specially designed exponential families for density estimation. The Annals of Statistics 24: 2431-2461.
    Efron B, Petrosian R (1999) Nonparametric methods for doubly truncated data. Journal of the American Statistical Association 94: 824-834.
    Emura T, Konno Y (2012) Multivariate normal distribution approaches for dependently truncated data. Statistical Papers 53:133-149.
    Emura T, Wang W (2012) Nonparametric maximum likelihood estimation for dependent truncation data based on copulas. Journal of Multivariate Analysis 110: 171-188.
    Emura T, Konno Y, Michimae H (2014) Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation. Lifetime Data Analysis. DOI: 10.1007/s10985-014-9297-5.

    Emura T, Chen YH (2014) Gene selection for survival data under dependent censoring: A copula-based approach. Statistical Methods in Medical Research. DOI: 10.1177/0962280214533378.
    Klein JP, Moeschberger ML (2003) Survival Analysis Techniques for Censored and Truncated Data. Springer, New York.
    Knight K (2000) Mathematical Statistics. Chapman and Hall, Boca Raton.
    Lehmann EL, Casella G (1998) Theory of Point Estimation. Springer, New York.
    Lehmann EL, Romano JP (2005) Testing Statistical Hypotheses. Springer, New York.
    Miller RG, Efron B, Brown BW, Moses LE (1980) Survival analysis with incomplete observations, Hyde J, Biostatistics Casebook, Wiley, New York, pp. 31-46.
    Moreira C, de Uña-Álvarez J (2010) Bootstrapping the NPMLE for doubly truncated data. Journal of Nonparametric Statistics 22: 567-583.
    Moreira C, de Uña-Álvarez J, Keilegom IV (2014) Goodness-of-fit tests for a semiparametric model under random double truncation. Computational Statistics. DOI: 10.1007/s00180-014-0496-z.
    Moreira C, de Uña-Álvarez J (2012) Kernel density estimation with doubly-truncated data. Electronic Journal of Statistics 6: 501-521.
    Moreira C, Keilegom IV (2013) Bandwidth selection for kernel density estimation with doubly truncated data. Computational Statistics & Data analysis 61: 107-123.
    Nelsen RB (2006) An Introduction To Copulas. Springer, New York.
    Robertson HT, Allison DB (2012) A novel generalized normal distribution for human longevity and other negatively skewed data. PLoS ONE 7: e37025.
    R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, R version 3.0.2.
    Shao J (2003) Mathematical Statistics. Springer, New York.
    Sankaran PG, Sunoj SM (2004) Identification of models using failure rate and mean residual life of doubly truncated random variables. Statistical Papers 45: 97-109.
    Shen PS (2010) Nonparametric analysis of doubly truncated data. Annals of the Institute of Statistical Mathematics 62: 835-853.
    Stovring H, Wang MC (2007) A new approach of nonparametric estimation of incidence and lifetime risk based on birth rates and incidence events. BMC Medical Research Methodology 7: 53.
    van der Vaart AW (1998) Asymptotic Statistics. Cambridge University Press, Cambridge.
    Zhu H, Wang MC (2012) Analysing bivariate survival data with interval sampling and application to cancer epidemiology. Biometrika 99: 345-361

    QR CODE
    :::