| 研究生: |
張家瑜 Chia-Yu Zhang |
|---|---|
| 論文名稱: |
典型相關分析中維度檢定方法之比較 Comparison of Dimensionality Testing Methods in Canonical Correlation Analysis |
| 指導教授: | 黃世豪 |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
理學院 - 統計研究所 Graduate Institute of Statistics |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 中文 |
| 論文頁數: | 54 |
| 中文關鍵詞: | 典型相關分析 、卡方檢定 、變數擴充下的維度推論方法 、Tracy-Widom 檢定 、維度檢定 、逐步檢定 |
| 外文關鍵詞: | Canonical correlation analysis, Chi-square test, dimension inference using variable augmentation (DIVA), Tracy-Widom test, Dimensionality test, Sequential testing |
| 相關次數: | 點閱:12 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
典型相關分析(CCA)是一種用於衡量兩組變數之間線性關係的統計方法。其典型相關係數是透過變數集合的共變異數矩陣與交叉共變異數矩陣所進行的廣義特徵值分解而來,典型變數對則對基於對應的特徵向量建立。為了檢定典型相關是否顯著,目前最常使用的兩種檢定方法皆基於特徵值,分別為傳統的卡方檢定(假設維度固定且樣本數趨近無窮)與適用於高維條件下的 Tracy-Widom 檢定(假設維度與樣本數同時趨近無窮)。近年來,一種基於特徵向量的替代方法──變數填充下的維度 推論方法(DIVA),被發展出來,原先用於充分維度縮減框架下的維度 檢定。文中我們說明該方法也可應用於檢定 CCA 中的典型相關顯著性。為了評估這些方法在有限樣本下的表現,我們在不同的維度設定、樣本大小及相關強度條件下進行了綜合模擬研究。我們發現,當檢定兩組變數是否相關時,若相關性較低,卡方檢定表現較佳;若相關性較高,Tracy-Widom 檢定更為合適。而在估計顯著的典型變數對的個數時,當相關性較低時,卡方檢定效果較好,當相關性較高時,則以 s-DIVA 方法表現較佳。
Canonical Correlation Analysis (CCA) assesses linear relationships between two sets of variables. Canonical correlations are obtained via generalized eigen-decomposition of covariance and cross-covariance matrices, and canonical pairs are based on the corresponding eigenvectors. Two common eigenvalue-based significance tests are the traditional chi-square test (assuming fixed p and n → ∞) and the Tracy-Widom test (for high-dimensional settings where both p and n → ∞). In this work, the eigenvector-based “dimension inference using variable augmentation” (DIVA), originally developed for dimension testing in sufficient dimension reduction framework, is applied to CCA. We evaluates these methods via simulation studies with varying dimensions, sample sizes, and correlation strengths. Our numerical results show that the chi-square test performs better under weak correlations, while Tracy-Widom excels with strong correlations. For selecting number of significant canonical pairs, chi-square test is recommended for weak correlations, whereas DIVA is preferable for strong correlations.
Andrew, G.,Arora,R.,Bilmes,J.,&Livescu,K.(2013).Deep canonical correlation analysis. International Conference on Machine Learning, 1247–1255.
Ayo,F.E.,Ogundele,L.A.,Olakunle,S.,Awotunde,J.B.,&Kasali,F.A.(2024).A hybrid correlation-based deep learning model for email spam classification using fuzzy inference system. Decision Analytics Journal, 10, 100390.
Bao, Z.,Hu,J.,Pan,G.,&Zhou,W.(2019).Canonical correlation coefficients of high-dimensional gaussian vectors:Finite rank case. The Annals of Statistics, 47, 612–640.
Bartlett, M.S.(1941).The statistical significance of canonical correlations. Biometrika,32, 29–37.
Benjamini, Y.,&Yekutieli,D.(2001).The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics, 29, 1165–1188.
Cai, Y.,Xu,Q.,Yang,J.,Tan,J.,&Xue,J.(2024).Canonical correlation-based relationships between social support and sleep quality in a hospital psychiatric outpatient
population with examining the mediating roles of anxiety and depressive symptoms. Scientific Reports, 14, 27139.
Chaghooshi,A.J.,Soltani-Neshan,M.,&Moradi-Moghadam,M.(2015).Canonical correlation analysis between supply chain quality management and competitive advantages. Foundations of Management, 7, 83–92.
Dos Santos,S.F.,&Brandi,H.S.(2014).A canonical correlation analysis of the relationship between sustainability and competitiveness. Clean Technologies and Environmental Policy, 16, 1735–1746.
Fukumizu,K.,Bach,F.R.,&Gretton,A.(2007).Statistical consistency of kernel canonical correlation analysis. Journal of Machine Learning Research, 8, 361–383.
Gao, J.,Zheng,C.,&Wang,P.(2010).Online removal of muscle artifact from electroencephalogram signals based on canonical correlation analysis. Clinical EEG and
Neuroscience, 41, 53–59.
Górecki,T.,&Smaga,Ł.(2017).Testing conditional independence via estimated residuals
with applications to gene expression data. Journal of Multivariate Analysis, 159,82–100.
Hotelling, H.(1936).Relations between two sets of variables. Biometrika, 28, 321–377.
Huang, S.-H.,Shedden,K.,&Chang,H.-w.(2023).Inference for the dimension of a regression relationship using pseudo-covariates. Biometrics, 79, 2394–2403.
Johnstone, I.M.,Ma,Z.,Perry,P.O.,& Shahram,M.(2022). Rmtstat:Distributions,statistics and tests derived from random matrix theory [R package version 0.3.1].
Kim, S.M.,&Baek,J.-G.(2015).Correlation analysis on semiconductor process variables using CCA(Canonical Correlation Analysis):Focusing on the relationship between
the voltage variables and fail bit counts through the wafer process. Journal of Korean Institute of Industrial Engineers, 41, 579–587.
McNemar, Q.(1947).Note on the sampling error of the difference between correlated
proportions or percentages. Psychometrika, 12, 153–157.
Menotti, A.,&Puddu,P.E.(2024).Canonical correlation for the analysis of lifestyle
behaviors versus cardiovascular risk factors and the prediction of cardiovascular mortality:A population study. Hearts, 5, 29–44.
Talaei Pashiri,R.,Rostami,Y.,& Mahrami,M.(2020).Spam detection through feature selection using artificial neural network and sine–cosine algorithm. Mathematical Sciences, 14, 193–199.
Tracy,C.A.,&Widom,H.(2002).Distribution functions for largest eigenvalues and their applications. Proceedings of the International Congress of Mathematicians (Beijing, 2002), 1, 587–596.