跳到主要內容

簡易檢索 / 詳目顯示

研究生: 陳奕儒
Yi-Ju Chen
論文名稱: Contrastive Principal Component Analysis for High Dimension, Low Sample Size Data
指導教授: 王紹宣
Shao-Hsuan Wang
口試委員:
學位類別: 碩士
Master
系所名稱: 理學院 - 統計研究所
Graduate Institute of Statistics
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 52
中文關鍵詞: 子組發現可視化特徵選取去噪
外文關鍵詞: subgroup discovery, visualizing, feature selection, denoising
相關次數: 點閱:20下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 主成分分析(PCA)是一種常用的線性降維方法,在降維過程中保留了數據之間變數的變異性。PCA 通常用於可視化單個數據集;對比成分分析 (CPCA) 是傳統 PCA的推廣。CPCA 可用於存在多個數據集(如實驗組和對照組)的情況,CPCA 可以在參考其他數據集的前提下探索特定數據集獨特的低維結構。然而,雖然 CPCA 已在許多領域被證明可以找到 PCA 忽略的重要數據模式(Abubakar Abid,2017),但CPCA 缺乏一個統計模型來告訴我們為什麼 CPCA 可以識別我們感興趣的那些變化。在本文中,我們提出 CPCA 的模型假設。我們將目標數據劃分為我們感興趣的信號
    矩陣和我們不感興趣的滋擾矩陣,並試圖說明我們不感興趣的滋擾矩陣對目標數據的影響可以通過 CPCA 移除。另一方面,我們通過模擬分析說明 CPCA 還原信號矩陣的優勢。除此之外,我們根據我們對 CPCA 的模型假設提出了一種新方法,用以幫助我們選取對執行 CPCA 很重要的對比參數。最後,我們通過調整對比參數在合成圖像示例中找到了感興趣的數據模式,並驗證了我們選擇對比參數的新方法可以達到相同的效果。


    Principal Component Analysis (PCA) is a commonly used linear dimensionality reduction method and is often used to visualize a single dataset; Contrastive Component Analysis
    (CPCA) can be used in situations where there are multiple datasets, and CPCA can explore the unique low-dimensional structure of a specific dataset on the premise of referring to other datasets. However, while CPCA has been shown in many fields to find important data patterns that PCA ignores (Abubakar Abid, 2017), CPCA lacks a statistical model to tell us why CPCA can identify those changes that we are interested in. In this paper, we propose a statistical model for CPCA. We divide the target data into the signal matrix that we are interested in and the nuisance matrix that we are not interested in, and try to explain that the influence
    of the nuisance matrix on the target data can be removed by CPCA. On the other hand, we illustrate the advantages of CPCA in restoring the signal matrix using simulation analysis. Furthermore, we propose a new method based on our model to help us decide on the contrast
    parameter that is important to perform CPCA. Finally, we found data patterns of interest in the synthetic image example by adjusting the contrast parameter, and verified that our new method of choosing the contrast parameter can achieve the same effect.

    摘要 i Abstract ii 誌謝 iii 1 Introduction 1 1.1 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Method 5 2.1 CPCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 The CPCA Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5 Contrast parameter α . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3 Theory 17 4 Numerical Study 23 4.1 Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.1.1 Target dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.1.2 Background dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5 Application: synthetic images 29 5.1 Handwritten digits on grassy backgrounds . . . . . . . . . . . . . . . . . . 30 5.2 Merchandise on grassy backgrounds . . . . . . . . . . . . . . . . . . . . . 32 6 Conclusion 38 6.1 Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Bibliography 41

    Abid, A., Zhang, M. J., Bagaria, V. K., & Zou, J. (2017). Contrastive principal component
    analysis.
    Aoshima, M., Shen, D., Shen, H., Yata, K., Zhou, Y.-H., & Marron, J. S. (2018). A survey
    of high dimension low sample size asymptotics. Australian & New Zealand Journal of
    Statistics, 60(1), 4-19.
    Cox, M. A. A., & Cox, T. F. (2008). Multidimensional scaling. In Handbook of data
    visualization (pp. 315–347). Berlin, Heidelberg: Springer Berlin Heidelberg.
    du Prel, J.-B., Röhrig, B., Hommel, G., & Blettner, M. (2010). Choosing statistical tests:
    part 12 of a series on evaluation of scientific publications. Deutsches Arzteblatt
    international, 107 19, 343-8.
    Fujiwara, T., Kwon, O.-H., & Ma, K.-L. (2020). Supporting analysis of dimensionality
    reduction results with contrastive learning. IEEE Transactions on Visualization and
    Computer Graphics, 26, 45-55.
    Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components.
    Journal of Educational Psychology, 24, 498-520.
    HUNG, H., WU, P., TU, I., & HUANG, S. (2012). On multilinear principal component
    analysis of order-two tensors. Biometrika, 99(3), 569–583.
    Kazuyoshi, Y., & Makoto, A. (2013, 11). Pca consistency for the power spiked model
    in high-dimensional settings. Journal of Multivariate Analysis, 122, 334-354. doi:
    10.1016/j.jmva.2013.08.003
    LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to
    document recognition. Proc. IEEE, 86, 2278-2324.
    Lu, H., Plataniotis, K. N., & Venetsanopoulos, A. N. (2008). Mpca: Multilinear principal
    component analysis of tensor objects. IEEE Transactions on Neural Networks, 19(1),
    18-39. doi: 10.1109/TNN.2007.901277
    Obeya, P. O., & Akinlabi, G. O. (2021, jan). Application of the regular perturbation method
    for the solution of first-order initial value problems. Journal of Physics: Conference
    Series, 1734(1), 012021. Retrieved from https://doi.org/10.1088/1742-6596/
    1734/1/012021 doi: 10.1088/1742-6596/1734/1/012021
    van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-sne. Journal of Machine
    Learning Research, 9(86), 2579–2605.
    Yata. (2009). Pca consistency for non-gaussian data in high dimension, low sample size
    context (Vol. 38). doi: 10.1080/03610910902936083
    Yata, K., & Aoshima, M. (2012). Effective pca for high-dimension, low-sample-size data
    with noise reduction via geometric representations. Journal of Multivariate Analysis,
    105(1), 193-215.
    Yata, K., & Aoshima, M. (2016). Reconstruction of a high-dimensional low-rank matrix.
    Electronic Journal of Statistics, 10(1), 895-917. doi: 10.1214/16-ejs1128
    Ye, J. (2004). Generalized low rank approximations of matrices. In Proceedings of the
    twenty-first international conference on machine learning (p. 112). Association for
    Computing Machinery.
    Zhu, P., & Knyazev, A. (2013). Angles between subspaces and their tangents. Journal of
    Numerical Mathematics, 21(4). doi: 10.1515/jnum-2013-0013

    QR CODE
    :::