| 研究生: |
田銀錦 Yin-Jing Tien |
|---|---|
| 論文名稱: |
多變量變異數分析建模之矩陣視覺化 Matrix Visualization for MANOVA Modeling |
| 指導教授: |
鄒宗山
Tsung-Shan Tsou 陳君厚 Chun-houh Chen |
| 口試委員: | |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
理學院 - 統計研究所 Graduate Institute of Statistics |
| 畢業學年度: | 98 |
| 語文別: | 英文 |
| 論文頁數: | 72 |
| 中文關鍵詞: | 矩陣視覺化 、多變量變異數分析 |
| 外文關鍵詞: | Matrix Visualization, MANOVA |
| 相關次數: | 點閱:14 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
相較於箱型圖、散佈圖陣列、與平行座標圖等傳統方法,矩陣視覺化相關方法在高維度資料的視覺化與分群上,為更具效率且功能較強大之探索式資料分析工具。Chen (2002)的廣義相關圖是一個全方位的矩陣視覺化環境;許多的廣義相關圖模組與擴充功能已被開發以觀察更多樣的科學資料與更複雜的統計模型。本研究將針對多變量變異數分析之建模過程提出一套完整的矩陣視覺化程序,作為廣義相關圖環境的新成員。
現有的矩陣視覺化方法並不適用於多變量變異數分析模型中資料與資訊之群集與觀察,因為這些方法視個別樣本為分析基本單位,並未將統計模型效應納入分析與視覺化過程。為了將多變量變異數分析模型中相關資料與訊息結構做全面的視覺化呈現,必須同時探索模型與殘差兩個層次的資訊。在我們提出的方法中,不僅呈現共變異矩陣分解成模型及殘差,也將資料矩陣相對應的分解一併呈現。我們進一步將各類統計檢定的結果(多變量變異數分析與個別變異數分析之p-值)以矩陣視覺化方式呈現,預期對於多變量變異數分析建模過程,在資料描述上或是統計推論上,都能有更強大且完整的呈現與了解。藉由變數校正的方法,此一矩陣視覺化過程,亦得以延伸應用於多變數共變異數分析模型之矩陣視覺化呈現。
Matrix visualization (MV) related graphical methods are more efficient and powerful exploratory data analysis (EDA) tools for visualizing and clustering high-dimensional data than conventional methods such as box-plots, scatterplot-matrix, or parallel coordinate plots. Generalized association plots (GAP), introduced by Chen (2002), serve as an environment for general-purposes matrix visualization. Many modules and extensions have been developed for GAP for visualizing scientific datasets with more versatile formats and studying statistical models of more complex nature. This study proposes a new member of the GAP family: a comprehensive matrix visualization procedure for analyzing multivariate analysis of variance (MANOVA) models.
Existing matrix visualization methods are not suitable for clustering and visualizing data and information structures with a MANOVA setting, because they regard individual samples as the base analysis unit without taking into consideration the model’s effects. In order to comprehensively visualize data and information structures for MANOVA modeling, it is necessary to simultaneously explore related information structures at both the model and the residual levels. In our proposed method, we visualize not only the decomposition of a covariance matrix into model and residual components, but also the decomposition of the data matrix. We further convert statistical testing results (p-values from MANOVA and ANOVA for individual variables) into MV format, in order to obtain a more powerful and complete visualization for understanding MANOVA modeling at both the descriptive and inference levels. With a covariate adjusted MV, adopted before the MANOVA MV procedure, our proposed method can be extended to visualizations of MANCOVA modeling.
Bagga S., Bracht J., Hunter S., Massirer K., Holtz J., Eachus R., Pasquinelli A. E. (2005), “Regulation by let-7 and lin-4 miRNAs results in target mRNA degradation,” Cell, 122(4), 553-563.
Barbarotto E., Schmittgen T. D., Calin G. A. (2008), “MicroRNAs and cancer: profile, profile, profile,” Int J Cancer, 122(5), 969-977.
Bar-Joseph Z., Demaine E. D., Gifford D. K., Srebro N., Hamel A. M., Jaakkola T. S. (2003), “K-ary clustering with optimal leaf ordering for gene expression data,” Bioinformatics, Special section on Microarray Analysis, 19, 1070-1078.
Bar-Joseph Z., Gifford D. K., Jaakkola T.S. (2001), “Fast optimal leaf ordering for hierarchical clustering,” Bioinformatics, 17 Suppl 1, S22-S29.
Bauer R. (1999), “Chemistry, analysis and immunological investigations of Echinacea phytopharmaceuticals,” In: Wagner H, editor. Immunomodulatory agents from plants. Boston (Mass): Birkhäuser Inc., 41–88.
Bertin, J. (1967), “Semiologie Graphique, Paris: Editions gauthier-Villars. English translation by William J. Berg. as Semiology of Graphics: : Diagrams, Networks, Maps, ” The University of Wisconsin Press, Madison, WI, 1983.
Campbell, N. A. and Mahon, R. J. (1974), “A multivariate study of variation in two species of rock crab of the genus Leptograpsus,” Australian Journal of Zoology, 22, 417–425.
Carmichael, J. W. and Sneath, P. H. A. (1969), “Taxometric maps,” Systematic Zoology, 18, 402 – 415.
Chen, C. H. (1996), “The properties and applications of the convergence of correlation matrices,” Proceedings of the Section on Statistical Graphics of the American Statistical Association, 49 – 54.
Chen, C. H. (1999), “Extensions of generalized association plots,” Proceedings of the Section on Statistical Graphics of the American Statistical Association, 111 – 116.
Chen C. H. (2002), “Generalized association plots: information visualization via iteratively generated correlation matrices,” Statistica Sinica, 12, 7-29.
Chen C. H., Hwu H. G., Jang W. J., Kao C. H., Tien Y. J., Tzeng S., and Wu H. M.
(2004), “Matrix visualization and information mining,” Proceedings in Computational Statistics, 85-100.
Chen, H. Y., Yu, S. L., Chen, C. H., Chang, G. C., Chen, C. Y., Yuan, A., Cheng, C. L., Wang, C. H., Terng, H. J., Kao, S. F., Chen, W. J., Chen, J. J. W., Yang, P. C. (2007), “A Five-Gene Signature and Clinical Outcome in Non–Small-Cell Lung Cancer,” The New England Journal of Medicine, 356, 11-20.
Chen, J. J. W., Lin, Y. C., Yao, P. L., Yuan, A., Chen, H. Y., Shun, C. T., Tsai, M. F., Chen, C. H., and Yang, P. C. (2005), “Tumor-Associated Macrophages: The Double Edged Sword in Cancer Progression,” Journal of Clinical Oncology 23, 1-12.
Chaw S. M., Chen, C. H., Chen Shih-Huei (1998), “Flora of Taiwan”, In: Doufford, D.E., Hsieh, C.F., Huang, T.C., Lowry, P.P., II, Ohashi, H., Peng, C.-I. (Eds.), vol. 4, National Taiwan University.
Chien, S. C., Young P. H., Hsu Y. J., Chen C. H., Tien Y. J., Shiu S. Y., Li T. H., Yang C. W., Marimuthu P., Tsai L. F. L., Yang W. C., (2009), “Anti-diabetic properties of three common Bidens pilosa variants in Taiwan.” Phytochemistry, 70(10), 1246-1254.
Cho W.C. (2007), “ OncomiRs: the discovery and progress of microRNAs in cancers,” Mol Cancer, 6:60.
Croce C.M. (2008), “ Oncogenes and cancer,” N Engl J Med, 358(5), 502-511.
De Leeuw J. and Van Rijckevorsel J. (1980), “Homals and Princals. Some generalizations of principal components analysis,” Data Analysis and Informatics II, Diday et al. (eds.), 231-242, Amsterdam: North Holland.
Degerman R. (1982), “Ordered binary trees constructed through an application of kendall''s tau,” Psychometrica, 47, 523-527.
Eisn M. B., Spellman P. T., Brown P. O. and Botstein D. (1998), “Cluster analysis and display of genome-wide expression patterns,” Proc. Nat’l. Acad. Sci. U. S. A. 95, 14863-14868.
Enright A. J., John B., Gaul U., Tuschl T., Sander C., Marks D. S. (2003), “Micro-RNA targets in Drosophila,” Genome Biol, 5(1), R1.
Eulalio A., Huntzinger E., Izaurralde E. (2008), “Getting to the Root of miRNA-Mediated Gene Silencing,” Cell, 132(1), 9-14.
Filipowicz W., Bhattacharyya S. N., Sonenberg N. (2008), “Mechanisms of post-transcriptional regulation by microRNAs: are the answers in sight?” Nat Rev Genet, 9(2), 102-114.
Friendly M. (2002), “Corrgrams: exploratory displays for correlation matrices,” The American Statistician, 56(4), 316-324.
Friendly M. (2006), “HE plots for multivariate general linear models,” J Comput Graph Stat, 16, 421–444.
Friendly, M. and Kwan, E. (2003), “Effect ordering for data displays,” Computational Statistics and Data Analysis, 43(4), 509 – 539.
Gale N., Halperin C. W., and Costanzo C. M. (1984), “Unclassed matrix shading and optimal ordering in hierarchical cluster analysis,” J Classification, 1, 75-92.
Ghoniem M., Fekete J., and Castagliola P. (2005), “On the readability of graphs using node-link and matrix-based representations: a controlled experiment and statistical analysis,” Information Visualization 4(2), 114–135.
Gilroy C. M., Steiner J. F., Byers T., Shapiro H., and Georgian W. (2003), “Echinacea and truth in labeling,” Arch Intern Med, 163, 699–704.
Gower, J., and Digby, P. (1981), “ Expressing complex relationships in two dimensions,” in Interpreting Multivariate Data, ed. V. Barnett, Chichester, U.K.: Wiley, 83–118.
Grimson A, Farh K. K., Johnston W. K., Garrett-Engele P, Lim L. P., and Bartel D. P. (2007) “MicroRNA targeting specificity in mammals: determinants beyond seed pairing,” Mol Cell, 27(1), 91-105.
Gruvaeus G., and Wainer H. (1972), “Two additions to hierarchical cluster Analysis,” British journal of Mathematical and Statistical Psychology, 25, 200-206.
Hartigan, J. A. (1972), “Direct clustering of a data matrix,” Journal of the American Statistical Association, 78, 123 -129.
Hermeking H. (2007), “ p53 enters the microRNA world,” Cancer Cell, 12(5), 414-418.
Henry N., Fekete J., and McGuffin M. J. (2007), “NodeTrix: a hybrid visualization of social networks,” IEEE Trans Vis Comput Graph, 13(6), 1302–1309.
Holm, S. (1979), “A simple sequentially rejective multiple test procedure,” Scand. J. Statist, 6, 65-70.
Hochberg, Y. (1988), “A sharper Bonferroni procedure for multiple tests of significance,” Biometrika, 75, 800-803.
Hommel, G. (1988), “A stagewise rejective multiple test procedure based on a modified Bonferroni test,” Biometrika, 75, 383-386.
Hoti F., Tuulio-Henriksson, A., Haukka J., Partonen T., Holmström L., Lönnqvist J. (2004), “Family-based clusters of cognitive test performance in familial schizophrenia,” BMC Psychiatry, 4:20.
Hurley, C. B. (2004), “Clustering visualization of multidimensional data,” Journal of Computational and Graphics Statistics, 13, 788 – 806.
Hou, C. C., Chen, C. H., Yang, N. S., Chen, Y. P., Lo, C. P., Wang, S. Y., Tien, Y. J., Tsai, P. W., and Shyur, L. F., (2010), “Comparative metabolomics approach coupled with cell- and gene-based assays for species classification and anti-inflammatory bioactivity validation of Echinacea plants.” to appear in Journal of Nutritional Biochemistry.
Hwu, H. G., Chen, C. H., Hwang, T. J., Liu, C. M., Cheng, J. J., Lin, S. K., Liu, S. K., Chen, C. H., Chi, Y. Y., OuYoung, C. W., Lin, H. N., and Chen, W. J. (2002), “Symptom Patterns and Subgrouping of Schizophrenic Patients: Significance of Negative Symptoms Assessed on Admission,” Schizophrenia Research 56, 105-119.
Inselberg, A. (1985), “The Plane with Parallel Coordinates,” The Visual Computer, 1, 69-97.
Krek A., Grun D., Poy M.N., Wolf R., Rosenberg L., Epstein E. J., MacMenamin P., da Piedade I., Gunsalus K. C., Stoffel M., et al. (2005), “Combinatorial microRNA target predictions,” Nat Genet, 37(5), 495-500.
Lawler E. L., Lenstra J. K., Rinnooy KAHG, and Shmoys D. B. (1985), “The travelling salesman problem: A guided tour of combinatorial optimization,” Wiley, Chichester.
Lee, Y. S., Chen, C. H., Chao, A., Chen, E. S., Wei, M. L., Chen, L. K., Yang, K., Lin, M. C., Wang, Y. H., Liu, J. W., Eng, H. L., Chiang, P. C., Wu, T. S., Tsao, K. C., Huang, C. G., Tien, Y. J., Wang, T. H., Wang, H. S., and Lee, Y. S. (2005), “Molecular signature of clinical severity in recovering patients with severe acute respiratory syndrome coronavirus (SARS-CoV),” BMC Genomics, 6:132.
Lenstra, J. (1974), “Clustering a data array and the traveling salesman problem,” Operations Research, 22, 413 – 414.
Lewis B. P., Burge C. B., an d Bartel D. P. (2005), “Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets,” Cell, 120(1), 15-20.
Lewis B. P., Shih I. H., Jones-Rhoades M. W., Bartel D. P., and Burge C. B. (2003) “Prediction of mammalian microRNA targets,” Cell , 115(7), 787-798.
Liang Y., Ridzon D., Wong L., and Chen C. (2007), “Characterization of micro-RNA expression profiles in normal human tissues,” BMC Genomics, 8:166.
Liiv I. (2010), “Seriation and Matrix Reordering Methods: An Historical Overview,” Statistical analysis and data mining, DOI:10.1002/sam.10071.
Lim L. P., Lau N. C., Garrett-Engele P., Grimson A., Schelter J. M., Castle J., Bartel D. P., Linsley P. S., and Johnson J. M. (2005), “Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs,” Nature, 433(7027), 769-773.
Lin, S. H., Liu, C. M., Liu, Y. L., Fann, C. S. J., Hsiao, P. C., Wu, J. Y., Hung, S. I., Chen, C. H., Wu, H. M., Jou, Y. S., Liu, S. K., Hunag, T. J., Hsieh, M. H., Chang, C. C., Yang, W. C., Lin, J. J., Chou, F. H. C., Faraone, S. V., Tsuang, M. T., Hwu, H. G., and Chen, W. J., (2009), “Clustering by neurocognition for fine-mapping of the schizophrenia susceptibility loci on chromosome 6p,” Genes, Brain and Behavior, 8(8), 785-294.
Ling, R.L. (1973), “A computer generated aid for cluster analysis,” Communications of the ACM, 16(6), 355 – 361.
Marchette, D. J., and Solka, J. L. (2003), “Using data images for outlier detection,” Computational Statistics and Data Analysis, 43, 541 – 552.
Minnotte, M. and West, W. (1998), “The data image: a tool for exploring high dimensional data sets,” in Proceedings of the ASA Section on Statistical Graphics, Dallas, Texas, 25 – 33.
Murdoch, D. J. and Chow, E. D. (1996), “A graphical display of large correlation matrices,” The American Statistician, 50, 178 – 180.
O’Connor T., Dunn, J., Jenkins J., Pickering K., and Rasbash J. (2001), “ Family settings and children''s adjustment: differential adjustment within and across families,” The British Journal of Psychiatry, 179, 110:115.
Raloff J. (2003), “Herbal lottery: what''s on a dietary supplement''s label may not be what''s in the bottle,” Sci News, 163.
Rom, D. M. (1990), “A sequentially rejective procedure based on a modified Bonferroni inequality,” Biometrika, 77, 663-665.
Seggerson K., Tang L., Moss E. G. (2002), “Two genetic circuits repress the Caenorhabditis elegans heterochronic gene lin-28 after translation initiation,” Dev Biol, 243(2), 215-225.
Shen Z. and Ma K. L. (2007), “Path visualization fro adjacency matrix,” Proceedings of Eurographics/IEEE VGTC Syposium on Visualizaion, 83-90.
Sher, Y. P., Chou, C. C., Chou , R. H., Wu, H. M., Chang, W. W-S, Chen, C. H., Wu, C. W., Yang, P. C., Yu, C. L., and Peck, K. (2006), “Human Kallikrein 8 Protease Confers a Favorable Clinical Outcome in Non–Small Cell Lung Cancer by Suppressing Tumor Cell Invasiveness,” Cancer Research, 66, 11763-11770.
Shyu A.B., Wilkinson M .F., and van Hoof A. (2008), “Messenger RNA regulation: to translate or to degrade,” EMBO J, 27(3), 471-481.
Simes, R. J. (1986), “An improved Bonferroni procedure for multiple tests of significance,” Biometrika, 73, 751-754.
Slagle, J. R., Chang, C. L. and Heller, S. R. (1975), “A clustering and data- reorganizing algorithm,” IEEE Trans. Syst. Man Cybern, 5:125-128.
Tibshirani, R., Hastie, T., Eisen, M., Ross, D., Botstein, D., Brown, P. (1999), “Clustering methods for the analysis of DNA microarray data” Technical Report, Stanford University, Oct.
Tien Y. J., Lee, Y. S., Wu, H. M., and Chen C. H. (2008), “Methods for simultaneously identifying coherent local clusters with smooth global patterns in gene expression profiles,” BMC Bioinformatics, 9, 155.
Tubb, A., Parker, A., and Nickless, G. (1980), “The analysis of Romano-British pottery by atomic absorption spectrophotometer,” Archaeometry, 22, 153–171.
Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.
Wang Y. P., and Li K. B. (2009), “Correlation of expression profiles between microRNAs and mRNA targets using NCI-60 data,” BMC Genomics, 10, 218.
Wegman, E. J. (1990), “Hyperdimensional data analysis using parallel coordinates,” Journal of the American Statistical Association. 85, 664 – 675.
Wilkinson L., and Friendly M. (2009), “The History of the Cluster Heat Map,” The American Statistician, 63(2), 179-184.
Wu H. M., Tien Y. J., and Chen C. H. (2010b), “GAP: a graphical environment for matrix visualization and cluster analysis,” Computational Statistics and Data Analysis. 54, 767-778.
Wu, H. M., Tien Y. J., Hwu, H. G, and Chen, C. H. (2010a), “Matrix visualization Covariate adjustment”, Manuscript.
Wu L., and Belasco J. G. (2005), “Micro-RNA regulation of the mammalian lin-28 gene during neuronal differentiation of embryonal carcinoma cells,” Mol Cell Biol, 25(21), 9198-9208.
Yeh, L. L., Hwu, H. G., Chen, C. H., Chen, C. H., and Wu, A. C. C., (2008), “Factors Related to Perceived Needs of Primary Caregivers of Patients with Schizophrenia,” Journal of the Formosan Medical Association. 107 (8), 644-652.
Yin J.Q., Zhao R.C., and Morris K.V. (2008), “Profiling microRNA expression with microarrays,” Trends Biotechnol, 26(2), 70-76.