跳到主要內容

簡易檢索 / 詳目顯示

研究生: 張家瑜
Chia-Yu Zhang
論文名稱: 典型相關分析中維度檢定方法之比較
Comparison of Dimensionality Testing Methods in Canonical Correlation Analysis
指導教授: 黃世豪
口試委員:
學位類別: 碩士
Master
系所名稱: 理學院 - 統計研究所
Graduate Institute of Statistics
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 54
中文關鍵詞: 典型相關分析卡方檢定變數擴充下的維度推論方法Tracy-Widom 檢定維度檢定逐步檢定
外文關鍵詞: Canonical correlation analysis, Chi-square test, dimension inference using variable augmentation (DIVA), Tracy-Widom test, Dimensionality test, Sequential testing
相關次數: 點閱:12下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 典型相關分析(CCA)是一種用於衡量兩組變數之間線性關係的統計方法。其典型相關係數是透過變數集合的共變異數矩陣與交叉共變異數矩陣所進行的廣義特徵值分解而來,典型變數對則對基於對應的特徵向量建立。為了檢定典型相關是否顯著,目前最常使用的兩種檢定方法皆基於特徵值,分別為傳統的卡方檢定(假設維度固定且樣本數趨近無窮)與適用於高維條件下的 Tracy-Widom 檢定(假設維度與樣本數同時趨近無窮)。近年來,一種基於特徵向量的替代方法──變數填充下的維度 推論方法(DIVA),被發展出來,原先用於充分維度縮減框架下的維度 檢定。文中我們說明該方法也可應用於檢定 CCA 中的典型相關顯著性。為了評估這些方法在有限樣本下的表現,我們在不同的維度設定、樣本大小及相關強度條件下進行了綜合模擬研究。我們發現,當檢定兩組變數是否相關時,若相關性較低,卡方檢定表現較佳;若相關性較高,Tracy-Widom 檢定更為合適。而在估計顯著的典型變數對的個數時,當相關性較低時,卡方檢定效果較好,當相關性較高時,則以 s-DIVA 方法表現較佳。


    Canonical Correlation Analysis (CCA) assesses linear relationships between two sets of variables. Canonical correlations are obtained via generalized eigen-decomposition of covariance and cross-covariance matrices, and canonical pairs are based on the corresponding eigenvectors. Two common eigenvalue-based significance tests are the traditional chi-square test (assuming fixed p and n → ∞) and the Tracy-Widom test (for high-dimensional settings where both p and n → ∞). In this work, the eigenvector-based “dimension inference using variable augmentation” (DIVA), originally developed for dimension testing in sufficient dimension reduction framework, is applied to CCA. We evaluates these methods via simulation studies with varying dimensions, sample sizes, and correlation strengths. Our numerical results show that the chi-square test performs better under weak correlations, while Tracy-Widom excels with strong correlations. For selecting number of significant canonical pairs, chi-square test is recommended for weak correlations, whereas DIVA is preferable for strong correlations.

    摘要 i Abstract ii 目錄 iv 一、 緒論 1 二、 方法介紹 4 2.1 典型相關分析 4 2.2 卡方檢定 6 2.3 Tracy-Widom 檢定 7 2.4 DIVA 方法 9 2.5 方法比較 11 三、 模擬資料分析 13 3.1 獨立性檢定之比較 13  3.1.1 情境 1:無相關情境 (H_0 為真) 13  3.1.2 情境 2:有相關情境 (H_1 為真) 18 3.2 逐步維度檢定之比較 21  3.2.1 情境 1:典型相關維度為一 22  3.2.2 情境 2:典型相關維度為二 23  3.2.3 情境 3:典型相關維度為三 25  3.2.4 總結 27 四、 實際資料分析 28 五、 結論 32 附錄 A Tracy-Widom 分布簡介 34 附錄 B 定理 3 證明 36 附錄 C 逐步維度檢定之比較 n=1000 38  C.1 情境 1:典型相關維度為一 38  C.2 情境 2:典型相關維度為二 39  C.3 情境 3:典型相關維度為三 41 參考文獻 43

    Andrew, G.,Arora,R.,Bilmes,J.,&Livescu,K.(2013).Deep canonical correlation analysis. International Conference on Machine Learning, 1247–1255.

    Ayo,F.E.,Ogundele,L.A.,Olakunle,S.,Awotunde,J.B.,&Kasali,F.A.(2024).A hybrid correlation-based deep learning model for email spam classification using fuzzy inference system. Decision Analytics Journal, 10, 100390.

    Bao, Z.,Hu,J.,Pan,G.,&Zhou,W.(2019).Canonical correlation coefficients of high-dimensional gaussian vectors:Finite rank case. The Annals of Statistics, 47, 612–640.

    Bartlett, M.S.(1941).The statistical significance of canonical correlations. Biometrika,32, 29–37.

    Benjamini, Y.,&Yekutieli,D.(2001).The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics, 29, 1165–1188.

    Cai, Y.,Xu,Q.,Yang,J.,Tan,J.,&Xue,J.(2024).Canonical correlation-based relationships between social support and sleep quality in a hospital psychiatric outpatient
    population with examining the mediating roles of anxiety and depressive symptoms. Scientific Reports, 14, 27139.

    Chaghooshi,A.J.,Soltani-Neshan,M.,&Moradi-Moghadam,M.(2015).Canonical correlation analysis between supply chain quality management and competitive advantages. Foundations of Management, 7, 83–92.

    Dos Santos,S.F.,&Brandi,H.S.(2014).A canonical correlation analysis of the relationship between sustainability and competitiveness. Clean Technologies and Environmental Policy, 16, 1735–1746.

    Fukumizu,K.,Bach,F.R.,&Gretton,A.(2007).Statistical consistency of kernel canonical correlation analysis. Journal of Machine Learning Research, 8, 361–383.

    Gao, J.,Zheng,C.,&Wang,P.(2010).Online removal of muscle artifact from electroencephalogram signals based on canonical correlation analysis. Clinical EEG and
    Neuroscience, 41, 53–59.

    Górecki,T.,&Smaga,Ł.(2017).Testing conditional independence via estimated residuals
    with applications to gene expression data. Journal of Multivariate Analysis, 159,82–100.

    Hotelling, H.(1936).Relations between two sets of variables. Biometrika, 28, 321–377.

    Huang, S.-H.,Shedden,K.,&Chang,H.-w.(2023).Inference for the dimension of a regression relationship using pseudo-covariates. Biometrics, 79, 2394–2403.

    Johnstone, I.M.,Ma,Z.,Perry,P.O.,& Shahram,M.(2022). Rmtstat:Distributions,statistics and tests derived from random matrix theory [R package version 0.3.1].

    Kim, S.M.,&Baek,J.-G.(2015).Correlation analysis on semiconductor process variables using CCA(Canonical Correlation Analysis):Focusing on the relationship between
    the voltage variables and fail bit counts through the wafer process. Journal of Korean Institute of Industrial Engineers, 41, 579–587.

    McNemar, Q.(1947).Note on the sampling error of the difference between correlated
    proportions or percentages. Psychometrika, 12, 153–157.

    Menotti, A.,&Puddu,P.E.(2024).Canonical correlation for the analysis of lifestyle
    behaviors versus cardiovascular risk factors and the prediction of cardiovascular mortality:A population study. Hearts, 5, 29–44.

    Talaei Pashiri,R.,Rostami,Y.,& Mahrami,M.(2020).Spam detection through feature selection using artificial neural network and sine–cosine algorithm. Mathematical Sciences, 14, 193–199.

    Tracy,C.A.,&Widom,H.(2002).Distribution functions for largest eigenvalues and their applications. Proceedings of the International Congress of Mathematicians (Beijing, 2002), 1, 587–596.

    QR CODE
    :::