跳到主要內容

簡易檢索 / 詳目顯示

研究生: 陳俊廷
Chen, Chun-Ting
論文名稱: 邏輯斯迴歸的子取樣方法之比較
A Comparison among Subsampling Methods for Logistic Regression
指導教授: 黃世豪
Huang, Shih-Hao
口試委員:
學位類別: 碩士
Master
系所名稱: 理學院 - 數學系
Department of Mathematics
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 46
中文關鍵詞: A-最佳性D-最佳性邏輯斯迴歸子取樣
外文關鍵詞: A-optimality, D-optimality, Logistic regression, Subsampling
相關次數: 點閱:19下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 為了對資料中感興趣的二元分類變數做出推論或預測,以該變數做為反應變數建立邏輯斯迴歸是個很常見的方法。但是當我們感興趣的反應變數需要付出額外的成本才能取得標記,且在資源有限下只能標記一小部分的樣本時,如何從大樣本中選取對建立邏輯斯迴歸有較佳效率的子樣本進行標記就會是個重要的問題。本文的主要目標是在給定已知解釋變數但未知反應變數的資料中,處理子取樣問題以有效地估計參數。首先我們會介紹Wang et al. (2018)與Hsu et al. (2019)提出的子取樣方法。接下來我們會根據本研究的設定情境及最適設計理論來提出他們子取樣方法的變化型,並預期其有更好的表現。我們將會比較各方法在模擬資料與實際資料分析的效果。


    To make inference for or to predict the binary variable of interest, we usually use logistic regression where the variable is treated as the response. When extra cost is needed to label the variable of interest under a limited budget, we can only label a small part of samples. How to select subsamples to be labelled to efficiently build a logistic regression model would be an important issue.The main purpose of this article is such subsampling problem for efficiently estimating parameters under known explanatory variables and unknown responses. First we introduce the subsampling methods introduced in Wang et al. (2018) and Hsu et al. (2019). Then, we propose modified methods which are more efficient in our framework.We will compare the performance of these methods by simulation studies and a real-word application.

    摘要......................................iv Abstract..................................v 致謝......................................vi 目錄.....................................vii 圖目錄....................................ix 表目錄.....................................x 一、緒論....................................1 二、方法介紹................................4 2.1 貪婪主動式學習演算法GATE.................7 2.2 D最是設計下的貪婪主動式學習演算法GATED...10 2.3 最小均方誤差子取樣mMSE..................12 2.4 最小期望均方誤差子取樣mEMSE.............16 2.5 方法比較...............................19 三、模擬資料分析............................21 3.1 模擬實驗設定............................21 3.2 實驗結果...............................22 四、實際資料分析............................27 五、結論...................................32 參考文獻...................................34

    [1] Deng, X., Joseph, V. R., Sudjianto, A., and Wu, C.F.J. (2009). Active learning through sequential design, with applications to detection of money laundering. Journal of the American Statistical Association,104(487), 969-981.
    [2] Dua, D. and Graff, C. (2019). UCI Machine Learning Repository. Retrieved 2021/07/15, from http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Scienc
    [3] Ford, I., Torsney, B., and Wu, C.F.J. (1992). The use of a canonical form in the construction of locally optimal designs for non‐linear problems. Journal of the Royal Statistical Society: Series B (Methodological), 54(2), 569-583.
    [4] Hsu, H. L., Chang, Y. C. I., and Chen, R. B. (2019). Greedy active learning algorithm for logistic regression models. Computational Statistics & Data Analysis, 129, 119-134.
    [5] Huang, S. H., Huang, M. N. L., and Lin, C. W. (2020). Optimal designs for binary response models with multiple nonnegative variables. Journal of Statistical Planning and Inference, 206, 75-83.
    [6] Kabera, G. M., Haines, L. M., and Ndlovu, P. (2015). The analytic construction of D-optimal designs for the two-variable binary logistic regression model without interaction. Statistics, 49(5), 1169-1186.
    [7] Kohavi, R. (1996). Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 96, 202-207.
    [8] Wang, H., Zhu, R., and Ma, P. (2018). Optimal subsampling for large sample logistic regression. Journal of the American Statistical Association, 113(522), 829-844.

    QR CODE
    :::