| 研究生: |
楊思芃 Szu-peng Yang |
|---|---|
| 論文名稱: | A class of generalized ridge estimator for high-dimensional linear regression |
| 指導教授: |
江村剛志
Takeshi Emura |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
理學院 - 統計研究所 Graduate Institute of Statistics |
| 論文出版年: | 2014 |
| 畢業學年度: | 102 |
| 語文別: | 英文 |
| 論文頁數: | 49 |
| 中文關鍵詞: | 脊迴歸 、高維度資料 |
| 外文關鍵詞: | High dimensional, Generalized ridge regression |
| 相關次數: | 點閱:13 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
此篇論文建立在多元線性迴歸(Multiple linear regression)模型之上。在這個模型之下,一般常用的最小平方估計量(Least square estimator)並不適合用在變數個數大的情況,會產生共線性(Collinearity)的問題,特別是在變數個數大於樣本數的時候。Hoerl和Kennard在1970年提出了Generalized ridge迴歸方法。在理論上,Generalized ridge估計量可以解決最小平方估計量的共線性問題。其後,也有許多人討論過特殊型式的Generalized ridge估計量。但是,當變數個數增大的時候,需要估計的參數也隨之增加,導致其實行上的困難,因此大多只考慮樣本數大於變數個數的情形。我們在此篇論文提出了一個在高變數個數之下也能運作的Generalized ridge估計量的特殊型。除此之外,此估計量在貝氏理論中也具有適當的解釋,更可以與先驗資訊做連結,藉此取得較佳的估計。在此篇論文中,我們做了顯著性檢定、模擬資料以及實際資料分析。資料分析中,一般的ridge估計量被拿來與我們提出的估計量做比較,而我們提出的估計量以均方差(Mean square error)來說表現得比ridge估計量來得好。
In multiple linear regression, the least square estimator is inappropriate for high-dimensional regressors, especially for p≥n. Consider the linear regression model. The generalized ridge estimator has been considered by many authors under the usual p<n setting. In this paper, we propose a class of generalized ridge estimator that appropriately chooses W under high-dimensionality. The proposed method has a natural Bayesian interpretation under the prior knowledge of sparsity. We also consider significance testing based on the proposed estimator. Simulations show that the proposed estimator performs better than the usual ridge estimator in terms of mean square error criterion, under both p<n and p≥n. We demonstrate the method using the non-small cell lung cancer data with microarrays.
Allen, D. M. (1974). The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16, 125-127.
Binder, H., Allignol, A., Schumacher, M., and Beyersmann, J. (2009). Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics 25, 890-896.
Cule, E., Vineis, P. and De Iorio, M. (2011). Significance testing in ridge regression for genetic data. BMC Bioinformatics 12, 372.
Dicker, A. P., and Rodeck, U. (2005). Predicting the future from trials of the past: epidermal growth factor receptor expression and outcome of fractionated radiation therapy trials. Journal of Clinical Oncology 23, 5437-5439.
Emura, T., Chen, Y. H., and Chen, H. Y. (2012). Survival prediction based on compound covariate under cox proportional hazard models. PLoS ONE 7, e47627.
Emura, T., and Chen, Y. H. (2014). Gene selection for survival data under dependent censoring: a copula-based approach. Statistical Methods in Medical Research, DOI: 10.1177/0962280214533378.
Golub, G. H., Heath, M. and Wahba, G. (1979). Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21, 215-223.
Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning. Springer-Verlag, New York.
Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55-67.
Ing, C.-K. and Lai, T.-L. (2011). A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Statistica Sinica 21, 1473-1513.
Jenssen, T. K., Kuo, W. P., Stokke, T., et al. (2002). Association between gene expressions in breast cancer and patient survival. Hum Genet 111, 411-420.
Kim, S.-Y. and Lee, J.-W. (2007). Ensemble clustering method based on the resampling similarity measure for gene expression data. Statistical Methods in Medical Research 16, 539-564.
Loesgen, K.-H. (1990). A generalization and Bayesian interpretation of ridge-type estimators with good prior means. Statistical Papers 31, 147-154.
Matsui, S. (2006). Predicting survival outcomes using subsets of significant genes in prognostic marker studies with microarrays. BMC Bioinformatics 7, 156.
Mallows, C. L. (1973). Some comments on Cp. Technometrics 15, 661-675.
McLachlan, G. J. (1980). On the mean square error associated with adaptive generalized ridge regression. Biometrical Journal 22, 125-129.
Trenkler, G. (1985). Mean square error matrix comparisons of estimators in linear regression. Communications in Statistics A14, 2495-2509.
Trenkler, G., and Tourenburg, H. (1990). Mean squared error matrix comparisons between biased estimators – an overview of recent results. Statistical Papers 31, 165-179.
Wain, J. M., Bruford, E. A., and Lovering, R. C., et al. (2002). Guidelines for human gene nomenclature. Genomics 79, 464-470.
Whittaker, J. C., Thompson, R., and Denham, M. C., (2000). Marker-assisted selection using ridge regression. Genetical Research 75, 249-252.
Yanagihara, H., and Satoh, K. (2010). An unbiased Cp criterion for multivariate ridge regression. Journal of Multivariate Analysis 101, 1226-1238.
Zhao, X., Rødland, E. A., and Sørlie, T., et al. (2011). Combining gene signatures improves prediction of breast cancer survival. PLoS ONE 6, e17845.