| 研究生: |
賴彥儒 Yen-Ru Lai |
|---|---|
| 論文名稱: | Contrastive Principal Component Analysis for High-Dimension, Low-Sample-Size Data with Noise-Reduction |
| 指導教授: |
王紹宣
Shao-Hsuan Wang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
理學院 - 統計研究所 Graduate Institute of Statistics |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 英文 |
| 論文頁數: | 43 |
| 中文關鍵詞: | 子組發現 、視覺化 、特徵選取 、去噪 |
| 外文關鍵詞: | subgroup discovery, visualizing, feature selection, denoising |
| 相關次數: | 點閱:10 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
對比主成分分析(cPCA)是在某些特定情境下有用的降維技術,該情境下資料集在不同條件下收集,例如治療與對照實驗,特別用於視覺化和探索僅屬於一個資料集的模式。在本研究中,我們提出了一種新的方法來處理高維度、低樣本數(HDLSS)資料情境下的cPCA。這種方法稱為cPCA-NR,它借鑑了Yata和Aoshima(2012)提出的降噪(NR)方法,以減輕噪音資料點的不良影響,提高降維過程的穩健性和可靠性。在模擬研究中,我們證明了cPCA-NR在分類準確度和聚類性能方面優於傳統PCA。此外,該方法對噪音資料表現出強大的韌性,在高噪音水準的情境下達到了顯著的改進。這些結果突顯了cPCA-NR的優越性能,確定其作為各種應用的寶貴工具,例如圖像識別、異常檢測和資料視覺化。
Contrastive Principal Component Analysis (cPCA) is a useful dimensionality reduction technique under some specific scenarios in which datasets are collected under different conditions, e.g., a treatment and a control experiment, especially in visualizing and exploring patterns that are specific to one dataset. In this study, we propose a new methodology to deal with cPCA in high-dimension, low-sample-size (HDLSS) data situations. The proposed method, called cPCA-NR, gives an idea of applying the noise-reduction (NR) method proposed by Yata and Aoshima (2012) to mitigates the adverse effects of noisy data points, improving the robustness and reliability of the dimensionality reduction process. In simulation study, we demonstrate that the cPCA-NR outperforms traditional PCA in terms of classification accuracy and clustering performance. Moreover, the proposed method exhibits strong resilience to noisy data, achieving notable improvements in scenarios with high levels of noise. The results highlight the superior performance of cPCA-NR, establishing its potential as a valuable tool for various applications, such as image recognition, anomaly detection, and data visualization.
[1] Abid, A., Zhang, M. J., Bagaria, V. K., and Zou, J. “Contrastive principal component analysis.” arXiv preprint arXiv:1709.06716, 2017.
[2] Ahn, J. “High dimension, low sample size data analysis”. The University of North Carolina at Chapel Hill, 2006.
[3] Kazuyoshi Yata, Makoto Aoshima, “Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations.” Journal of Multivariate Analysis, Volume 105, Issue 1, 2012, pp. 193-215.
[4] Kazuyoshi Yata & Makoto Aoshima, “PCA Consistency for Non-Gaussian Data in High Dimension, Low Sample Size Context.” Communications in Statistics - Theory and Methods, 38:16-17, 2009, pp. 2634-2652.
[5] Yi-Ju Chen & Shao-Hsuan Wang. “Contrastive Principal Component Analysis for High Dimension, Low Sample Size Data.” Master’s Thesis, National Central University, 2022.
[6] Hotelling, H. “Analysis of a complex of statistical variables into principal components.” Journal of Educational Psychology, 24, 498-520. 1933.
[7] Takanori Fujiwara, Oh-Hyun Kwon, and Kwan-Liu Ma. “Supporting analysis of dimensionality reduction results with contrastive learning.” IEEE Transactions on Visualization and Computer Graphics, vol. 26, no. 1, pp. 45–55, 2020.
[8] Imrul Kaish, Jakir Hossain, Evangelos Papalexakis, and Jia Chen, “COVID-19 or flu? Discriminative knowledge discovery of COVID-19 symptoms from Google Trends data,” 4th International Workshop on Epidemiology meets Data Mining and Knowledge discovery, 2021.
[9] Micol Marchetti-Bowick, “Structured Sparse Regression Methods for Learning from High-Dimensional Genomic Data.” Ph.D. thesis, Carnegie Mellon University, 2020.