跳到主要內容

簡易檢索 / 詳目顯示

研究生: 張美虹
Mei-hong Chang
論文名稱: 主成分分析與叢集分析於DNA微陣列數據前處理的應用與實作
Application and Implementation of PCA and Clustering in DNA Microarray data Preprocessing
指導教授: 王孫崇
Sun-chong Wang
口試委員:
學位類別: 碩士
Master
系所名稱: 生醫理工學院 - 系統生物與生物資訊研究所
Graduate Institute of Systems Biology and Bioinformatics
論文出版年: 2014
畢業學年度: 102
語文別: 中文
論文頁數: 70
中文關鍵詞: 微陣列晶片R語言主成分分析叢集分析
外文關鍵詞: Microarray, R language, Principal component analysis, Clustering
相關次數: 點閱:13下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,微陣列基因晶片的分析軟體越來越容易取得,卻也都不盡完善。商用或制式化的分析套件也限制了分析人員的分析效率、使用彈性。
    此篇論文,為了改善基因晶片數據分析的品質與效率,建構更符合產業自產生物晶片分析流程。透過R統計語言的撰寫進行整合由其他人開發的分析軟體,並以建構了一個進行主成分分析(Principal Component Analysis,簡稱PCA)與叢集分析(Clustering)流程的模組。藉此取代先前使用的多種分析軟體,以自動化於微陣列晶片分析品質評估一環的主成分分析與叢集分析流程為訴求進行實作。
    在完成分析模組後,自動化分析流程取代了原有耗時的分析過程。將單一個案在此分析步驟所需時間縮減百分之八十以上,操作的簡便也使分析人員容易作業。確實達到了提升分析工作的效率。而R語言撰寫上的彈性,便於在未來建構出符合產業自產生物晶片分析的作業流程。


    In order to improve the efficiency of quality assessment in microarray data preprocessing, an automated analysis pipeline for Clustering and PCA (principal component analysis) was developed using R language.
    We successfully replaced the previously use of third-party analysis software, using the automated analysis module and integrated the module into the routine pipeline for microarray analysis.
    The automated analysis pipeline for Principal Component Analysis and Clustering reduced processing time by almost 80% compared to previous approach, showing that the project goal was met. The R-based package is also flexible enough to be readily incorporated into other bioinformatics applications.

    中文摘要 i Abstract ii 謝誌 iii 目錄 iv 圖目錄 vi 表目錄 viii 一、緒論 1 1-1 DNA微陣列 1 1-2 系統生物學 2 1-3 微陣列數據分析 5 1-3-1 主成分分析 6 1-3-2 叢集分析 7 1-4 研究動機 7 二、研究材料與方法 8 2-1 微陣列晶片資料 8 2-1-1 華聯生技微陣列晶片介紹 8 2-1-2 分析晶片資料內容 10 2-2 開發平台 11 2-2-1 R語言 11 2-2-2 RStudio 13 2-3 分析流程架構 14 2-3-1 原先分析結果產生流程 14 2-3-2 自動化分析結果產生流程 15 2-4 資料前處理程序 16 2-5 主成分分析方法與程序 17 2-6 階層式叢集分群法與分析程序 19 三、結果 21 3-1 主成分分析結果 21 3-1-1 與Array Track主成分分析結果比較 24 3-1-2 不同基因數據的主成分分析結果 25 3-2 群集分析結果 26 3-3 分析模組設置 30 3-3-1 必備文件 30 3-3-2參數設定 31 3-3-3 產出檔案對照 32 四、案例探討 33 4-1 案例分析(一) 編號1****102804 33 4-2 案例分析(二) 編號1****022401 37 4-3 案例分析(三) 編號2****101802 42 五、討論 47 六、結論 48 七、參考文獻 49 附錄一 51 附錄二 61

    1. Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270: 467-470.
    2. Schena M, Shalon D, Heller R, Chai A, Brown PO, et al. (1996) Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. Proceedings of the National Academy of Sciences 93: 10614-10619.
    3. Maskos U, Southern EM (1992) Oligonucleotide hybridisations on glass supports: a novel linker for oligonucleotide synthesis and hybridisation properties of oligonucleotides synthesised in situ. Nucleic Acids Research 20: 1679-1684.
    4. Heller MJ (2002) DNA microarray technology: devices, systems, and applications. Annual review of biomedical engineering 4: 129-153.
    5. Lipshutz RJ, Fodor SP, Gingeras TR, Lockhart DJ (1999) High density synthetic oligonucleotide arrays. Nature genetics 21: 20-24.
    6. Brodie EL, DeSantis TZ, Joyner DC, Baek SM, Larsen JT, et al. (2006) Application of a high-density oligonucleotide microarray approach to study bacterial population dynamics during uranium reduction and reoxidation. Applied and Environmental Microbiology 72: 6288-6298.
    7. Ideker T, Galitski T, Hood L (2001) A new approach to decoding life: systems biology. Annual review of genomics and human genetics 2: 343-372.
    8. Kitano H (2002) Systems biology: a brief overview. Science 295: 1662-1664.
    9. Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, et al. (2006) The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. science 313: 1929-1935.
    10. Slonim DK, Yanai I (2009) Getting started in gene expression microarray analysis. PLoS computational biology 5: e1000543.
    11. Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine learning 52: 91-118.
    12. Dudoit S, Yang YH, Callow MJ, Speed TP (2002) Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica sinica 12: 111-140.
    13. Lattin JM, Carroll JD, Green PE (2003) Analyzing multivariate data: Thomson Brooks/Cole Pacific Grove, CA, USA.
    14. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, et al. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome biology 5: R80.
    15. Gentleman R (2008) R programming for bioinformatics: CRC Press.
    16. Gentleman R, Carey V, Huber W, Irizarry RA, Dudoit S (2005) Bioinformatics and computational biology solutions using R and Bioconductor: Springer.
    17. de Hoon MJ, Imoto S, Nolan J, Miyano S (2004) Open source clustering software. Bioinformatics 20: 1453-1454.
    18. Page RD (2001) TreeView. Glasgow University, Glasgow, UK.
    19. Tong W, Cao X, Harris S, Sun H, Fang H, et al. (2003) ArrayTrack--supporting toxicogenomic research at the US Food and Drug Administration National Center for Toxicological Research. Environmental health perspectives 111: 1819.
    20. Fang H, Harris SC, Su Z, Chen M, Qian F, et al. (2009) ArrayTrack: an FDA and public genomic tool. Protein Networks and Pathway Analysis: Springer. pp. 379-398.
    21. Wilkinson L, Friendly M (2009) The history of the cluster heat map. The American Statistician 63.
    22. Rajaram S, Oono Y (2010) NeatMap--non-clustering heat map alternatives in R. BMC Bioinformatics 11: 45.
    23. Cattell RB (1966) The scree test for the number of factors. Multivariate behavioral research 1: 245-276.

    QR CODE
    :::