| 研究生: |
薛承恩 Cheng-En Hsueh |
|---|---|
| 論文名稱: | Two-stage model selection under a misspecified spatial covariance function |
| 指導教授: |
陳春樹
Chun-Shu Chen |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
理學院 - 統計研究所 Graduate Institute of Statistics |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 中文 |
| 論文頁數: | 45 |
| 中文關鍵詞: | 共變異數矩陣 、廣義自由度 、均方預測誤差 、模型選擇 、變數選取 |
| 外文關鍵詞: | Covariance matrix, generalized degrees of freedom, mean squared prediction error, model selection, variable selection |
| 相關次數: | 點閱:19 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
空間迴歸模型廣泛地應用於地質、大氣、水文、生態等相關領域的資料分析,其中如何配適合理的空間迴歸模型及該如何從候選模型中選出較適合的解釋變數組合都是重要的議題。空間統計中常用的選模準則有AICc,BIC與RIC。過往的研究顯示AICc較傾向選到解釋變數較多的模型,而BIC和RIC則是比較傾向選到解釋變數較少的模型。然而在實際的資料分析中,資料背後真正的解釋變數個數是未知的,同時資料間的相關結構也是未知的。因此若這些選模準則選出不同的模型,我們將無從判斷哪個模型才是最適合實際資料的描述。本研究提出一個均方預測誤差的準則去公平地比較各式準則所選出的模型並決定一個較適合實際資料的最終模型。本研究透過模擬實驗驗證所提方法的有效性。最後,我們使用法國默茲河附近的一筆重金屬汙染的空間資料去說明所提方法的實用性。
Spatial regression models are widely used in geology, atmosphere, hydrology, ecology and other related fields for data analysis, where how to select a suitable subset of covariates among candidate models is an important issue. Commonly used selection criteria such as AICc, BIC, and RIC can be applied for model selection. Past researches had shown that AICc tends to select a model with more covariates while BIC and RIC tend to select a model that has less covariates. Moreover, the covariance structure of the observed data set is generally unknown in practice. Therefore, how to determine an appropriate model for the observed data set is a difficult issue, especially when the covariance structure is misspecified or these criteria select different models. In this thesis, a mean squared prediction error criterion is proposed to fairly compare the selected models and then a final model can be determined. Simulation studies show that our proposed method has an adaptive feature which can’t be achieved by AICc, BIC, and RIC. Finally, a real data example regarding a heavy metal pollution near the Meuse river in France is analyzed for illustration.
參考文獻
[1] Akaike, H. (1973), Information theory and extension of the maximum likelihood principle, Proceeding of the Second International Symposium on Information Theory, Budapest : Akadémiai Kiadó ,267-281.
[2] Barnett, AG., Kpoer, N., Dobson, AJ., Schmiegelow, F., Manseau, M. (2010), Using information criteria to select the correct variance-covariance structure for longitudinal data in ecology, Methods Ecol Evol, 1:15-24.
[3] Burnham, K.P., Anderson, D.R. (2002), Model selection and inference: a pracical information theoretic approach, (2nd ed.), New York: Springer.
[4] Cressie, N.A.C. (1993), Statistics for Spatial Data, New York: Wiley.
[5] Hin, L.Y., Wang, Y.G. (2009), Working correlation structure identification in generalized estimating equations, Stat Med, 28:642-658.
[6] Hoeting, J.A., Davis, R.A., Merton, A.A., Thomspon S.E. (2006), Model selection for geostatistical models, Ecol Appl, 16:87-98.
[7] Huang, H.C., Chen, C.S. (2007), Optimal Geostatistical Model Selection, Journal of the American Statistical Association, 102:1009-1024.
[8] Hurvich, C.M., Tsai C.L. (1989), Regression and time series model selection in small samples, Biometrika, 76:297–307.
[9] Lee, H., Ghosh, S.K. (2009), Performance of information criteria for spatial models, J Stat Comput Sinul, 79:93-106.
[10] Minasny, B., McBratney, A. B. (2005), The Matérn function as a general model for soil variograms, Geoderma, 128 (3–4): 192–207.
[11] Pan, W. (2001), Akaike’s information criterion in generalized estimating equations, Biometrics, 57:120–125.
[12] Rissanen, J. (1978), Modeling by the shortest data description, Automatica, 14:465-471.
[13] Schwarz, G. (1978), Estimation the dimension of a model, Ann Statist, 6:461-464.
[14] Shen, X., Huang, H.-C. (2006), Optimal Model Assessment, Selection and Combination, Journal of the American Statistical Association, 101:554-568.
[15] Shen, X., Huang, H.-C, and Ye, J. (2004), Adaptive Model Selection and Assessment for Exponential Family Models, Technometrics, 46:306-317.
[16] Shen, X., and Ye, J. (2002), Adaptive Model Selection, Journal of the American Statistical Association, 97:210-221.
[17] Shi, P., Tsai, C.L. (2002), Regression model selection-a residual likelihood approach, J Roy Statist Soc B, 64:237-252.
[18] Spiegelhalter, D.J., Best, N.G., Carlin, B.P., Linde, A. (2002), Bayesian measures of model complexity and fit (with discussion), J Roy Statist Soc Ser B, 64:583-640.
[19] Wang, Y.G., Carey, V. (2003), Working correlation structure misspecification, estimation and covariate design: implications for generalized estimating equations performance, Biometrika, 90:29-41.
[20] Wang, Y.G., Lin, X. (2005), Effects of variance-function misspecification in analysis of longitudinal data, Biometrics, 61:413-421.
[21] Xu, L., Wang, Y.G., Zheng, S., Shi, N.Z. (2014), Model selection with misspecified spatial covariance structure, Journal of Statistical Computation and Simulation, 85:2276-2294.
[22] Ye, J. (1998), On Measuring and Correcting the effects of Data Mining and Model Selection, Journal of the American Statistical Association, 93:120-131.