跳到主要內容

簡易檢索 / 詳目顯示

研究生: 薛承恩
Cheng-En Hsueh
論文名稱: Two-stage model selection under a misspecified spatial covariance function
指導教授: 陳春樹
Chun-Shu Chen
口試委員:
學位類別: 碩士
Master
系所名稱: 理學院 - 統計研究所
Graduate Institute of Statistics
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 45
中文關鍵詞: 共變異數矩陣廣義自由度均方預測誤差模型選擇變數選取
外文關鍵詞: Covariance matrix, generalized degrees of freedom, mean squared prediction error, model selection, variable selection
相關次數: 點閱:19下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 空間迴歸模型廣泛地應用於地質、大氣、水文、生態等相關領域的資料分析,其中如何配適合理的空間迴歸模型及該如何從候選模型中選出較適合的解釋變數組合都是重要的議題。空間統計中常用的選模準則有AICc,BIC與RIC。過往的研究顯示AICc較傾向選到解釋變數較多的模型,而BIC和RIC則是比較傾向選到解釋變數較少的模型。然而在實際的資料分析中,資料背後真正的解釋變數個數是未知的,同時資料間的相關結構也是未知的。因此若這些選模準則選出不同的模型,我們將無從判斷哪個模型才是最適合實際資料的描述。本研究提出一個均方預測誤差的準則去公平地比較各式準則所選出的模型並決定一個較適合實際資料的最終模型。本研究透過模擬實驗驗證所提方法的有效性。最後,我們使用法國默茲河附近的一筆重金屬汙染的空間資料去說明所提方法的實用性。


    Spatial regression models are widely used in geology, atmosphere, hydrology, ecology and other related fields for data analysis, where how to select a suitable subset of covariates among candidate models is an important issue. Commonly used selection criteria such as AICc, BIC, and RIC can be applied for model selection. Past researches had shown that AICc tends to select a model with more covariates while BIC and RIC tend to select a model that has less covariates. Moreover, the covariance structure of the observed data set is generally unknown in practice. Therefore, how to determine an appropriate model for the observed data set is a difficult issue, especially when the covariance structure is misspecified or these criteria select different models. In this thesis, a mean squared prediction error criterion is proposed to fairly compare the selected models and then a final model can be determined. Simulation studies show that our proposed method has an adaptive feature which can’t be achieved by AICc, BIC, and RIC. Finally, a real data example regarding a heavy metal pollution near the Meuse river in France is analyzed for illustration.

    目錄 摘要 i Abstract ii 致謝辭 iii 目錄 iv 第一章 緒論 1 第二章 空間模型簡介 3 2.1 空間線性模型 3 2.2 相關函數 3 第三章 模型選取準則 7 3.1 準則介紹 7 3.2 訊息準則 8 第四章 二階段模型選取準則 11 4.1 廣義自由度 11 4.2 自適應模型選擇準則 13 4.3 選模流程 15 第五章 模擬實驗 17 5.1 模擬設定 17 5.2 模擬結果 18 第六章 資料分析 28 第七章 結論 35 參考文獻 36

    參考文獻

    [1] Akaike, H. (1973), Information theory and extension of the maximum likelihood principle, Proceeding of the Second International Symposium on Information Theory, Budapest : Akadémiai Kiadó ,267-281.
    [2] Barnett, AG., Kpoer, N., Dobson, AJ., Schmiegelow, F., Manseau, M. (2010), Using information criteria to select the correct variance-covariance structure for longitudinal data in ecology, Methods Ecol Evol, 1:15-24.
    [3] Burnham, K.P., Anderson, D.R. (2002), Model selection and inference: a pracical information theoretic approach, (2nd ed.), New York: Springer.
    [4] Cressie, N.A.C. (1993), Statistics for Spatial Data, New York: Wiley.
    [5] Hin, L.Y., Wang, Y.G. (2009), Working correlation structure identification in generalized estimating equations, Stat Med, 28:642-658.
    [6] Hoeting, J.A., Davis, R.A., Merton, A.A., Thomspon S.E. (2006), Model selection for geostatistical models, Ecol Appl, 16:87-98.
    [7] Huang, H.C., Chen, C.S. (2007), Optimal Geostatistical Model Selection, Journal of the American Statistical Association, 102:1009-1024.
    [8] Hurvich, C.M., Tsai C.L. (1989), Regression and time series model selection in small samples, Biometrika, 76:297–307.
    [9] Lee, H., Ghosh, S.K. (2009), Performance of information criteria for spatial models, J Stat Comput Sinul, 79:93-106.
    [10] Minasny, B., McBratney, A. B. (2005), The Matérn function as a general model for soil variograms, Geoderma, 128 (3–4): 192–207.
    [11] Pan, W. (2001), Akaike’s information criterion in generalized estimating equations, Biometrics, 57:120–125.
    [12] Rissanen, J. (1978), Modeling by the shortest data description, Automatica, 14:465-471.
    [13] Schwarz, G. (1978), Estimation the dimension of a model, Ann Statist, 6:461-464.
    [14] Shen, X., Huang, H.-C. (2006), Optimal Model Assessment, Selection and Combination, Journal of the American Statistical Association, 101:554-568.
    [15] Shen, X., Huang, H.-C, and Ye, J. (2004), Adaptive Model Selection and Assessment for Exponential Family Models, Technometrics, 46:306-317.
    [16] Shen, X., and Ye, J. (2002), Adaptive Model Selection, Journal of the American Statistical Association, 97:210-221.
    [17] Shi, P., Tsai, C.L. (2002), Regression model selection-a residual likelihood approach, J Roy Statist Soc B, 64:237-252.
    [18] Spiegelhalter, D.J., Best, N.G., Carlin, B.P., Linde, A. (2002), Bayesian measures of model complexity and fit (with discussion), J Roy Statist Soc Ser B, 64:583-640.
    [19] Wang, Y.G., Carey, V. (2003), Working correlation structure misspecification, estimation and covariate design: implications for generalized estimating equations performance, Biometrika, 90:29-41.
    [20] Wang, Y.G., Lin, X. (2005), Effects of variance-function misspecification in analysis of longitudinal data, Biometrics, 61:413-421.
    [21] Xu, L., Wang, Y.G., Zheng, S., Shi, N.Z. (2014), Model selection with misspecified spatial covariance structure, Journal of Statistical Computation and Simulation, 85:2276-2294.
    [22] Ye, J. (1998), On Measuring and Correcting the effects of Data Mining and Model Selection, Journal of the American Statistical Association, 93:120-131.

    QR CODE
    :::