跳到主要內容

簡易檢索 / 詳目顯示

研究生: 莊淳翔
Chun-Hsiang Chuang
論文名稱: 利用兩階段群集方法以微陣列辨識轉錄調控點
Two-stage clustering method for identifying transcriptional regulatory sites based on gene expression profiles
指導教授: 洪炯宗
Jorng-Tzong Horng
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
畢業學年度: 94
語文別: 英文
論文頁數: 43
中文關鍵詞: 微陣列轉錄調控
外文關鍵詞: transcrption, gene expression, microarray, regulatory
相關次數: 點閱:9下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 瞭解基因調控機制在研究分子生物學上是非常重要的議題,辨識轉錄調控點對於瞭解基因調控機制扮演了極為重要的角色。透過實驗的方式來尋找轉錄調控點是準確的,但是也相當費時的。微陣列晶片技術可以同時顯現數以千計的基因表現,透過分析基因表現的狀況可以找出有相似表現結果的群組。大範圍的基因表現研究與基因體定序計畫提供了大量的資訊可供辨識或預測轉錄調控點。在本研究中,我們提出一個兩階段的群集方法透過微陣列晶片資料與基因序列上游區特定片段出現次數的資訊以辨識轉錄調控點。利用已知公開的酵母菌細胞週期基因表現資料加以檢驗微陣列晶片資料與基因序列上游區特定片段出現次數的相互關係。最後實際分析結果呈現本方法的確能夠找出更緊密、被共同調控的基因群組,並有助於辨識轉錄調控點。


    The research of gene regulation mechanisms is an important issue of studying molecular biology. The identification of transcriptional regulatory sites plays an important role of understanding gene expression regulation mechanism. It is precise but time-consuming by using the experimental approach to discover the transcriptional regulatory sites. The microarray technology can reveal expression profiles of several thousands of genes in parallel. By analyzing gene expression profiles, gene groups with the similar expression pattern can be found. Large-scale gene expression studies and genomic sequencing projects are providing numerous amounts of information that can be used to identify or predict transcriptional regulatory sites. In this study, we describe a two-stage clustering method for identifying transcriptional regulatory sites from both gene expression and promoter sequence data. The correlation between time-series gene expression patterns and the occurrence frequency of several motifs in their upstream sequences is examined by using publicly available yeast cell-cycle data. The results show that the two-stage clustering method taken here conducts dense co-regulated gene group and identifies transcriptional regulatory sites usefully.

    Chapter 1 Introduction....................................................1 1.1 Background............................................................1 1.1.1 Gene Expression.....................................................1 1.1.2 Microarray..........................................................2 1.1.3 Regulation of gene expression.......................................3 1.1.4 Gene Ontology.......................................................3 1.2 Motivation............................................................5 1.3 Goal..................................................................6 Chapter 2 Related Works...................................................7 2.1 Clustering methods....................................................7 2.1.1 K-means clustering..................................................7 2.1.2 Hierarchical clustering.............................................8 2.1.3 Self-organizing map.................................................8 2.2 Motif discovery.......................................................9 2.2.1 MEME................................................................9 2.2.2 Gibbs sampler......................................................10 2.2.3 Consensus..........................................................10 2.3 Integrate system for gene expression analysis........................11 Chapter 3 Materials and Methods..........................................13 3.1 Materials............................................................13 3.2 Overview of our methodology..........................................13 3.3 Similarity measurement for gene expression data......................14 3.4 Clustering method....................................................15 3.5 Motif discovery......................................................16 3.6 Eliminate redundant motifs...........................................17 3.7 Extract motif occurrence frequency...................................18 Chapter 4 Results........................................................19 4.1 First stage clustering...............................................19 4.2 Second stage clustering..............................................21 4.3 Case study...........................................................25 Chapter 5 Discussion.....................................................30 Appendix.................................................................34

    Ashburner, M., C. A. Ball, et al. (2000). "Gene ontology: tool for the unification of biology. The Gene Ontology Consortium." Nat Genet 25(1): 25-9.
    Bailey, T. L. and C. Elkan (1994). "Fitting a mixture model by expectation maximization to discover motifs in biopolymers." Proc Int Conf Intell Syst Mol Biol 2: 28-36.
    Bailey, T. L. and C. Elkan (1995). "The value of prior knowledge in discovering motifs with MEME." Proc Int Conf Intell Syst Mol Biol 3: 21-9.
    Bussemaker, H. J., H. Li, et al. (2001). "Regulatory element detection using correlation with expression." Nat Genet 27(2): 167-71.
    Cheng, Y. and G. M. Church (2000). "Biclustering of expression data." Proc Int Conf Intell Syst Mol Biol 8: 93-103.
    Cho, R. J., M. J. Campbell, et al. (1998). "A genome-wide transcriptional analysis of the mitotic cell cycle." Mol Cell 2(1): 65-73.
    Crick, F. (1970). "Central dogma of molecular biology." Nature 227(5258): 561-3.
    Daxin, J., T. Chun, et al. (2004). "Cluster analysis for gene expression data: a survey." Knowledge and Data Engineering, IEEE Transactions on 16(11): 1370.
    Hertz, G. Z. and G. D. Stormo (1999). "Identifying DNA and protein patterns with statistically significant alignments of multiple sequences." Bioinformatics 15(7-8): 563-77.
    Holmes, I. and W. J. Bruno (2000). "Finding regulatory elements using joint likelihoods for sequence and expression profile data." Proc Int Conf Intell Syst Mol Biol 8: 202-10.
    Huang, H. D., J. T. Horng, et al. (2004). "Identifying transcriptional regulatory sites in the human genome using an integrated system." Nucleic Acids Res 32(6): 1948-56.
    Jakt, L. M., L. Cao, et al. (2001). "Assessing clusters and motifs from gene expression data." Genome Res 11(1): 112-23.
    Kasturi, J. and R. Acharya (2005). "Clustering of diverse genomic data using information fusion." Bioinformatics 21(4): 423-9.
    Lawrence, C. E., S. F. Altschul, et al. (1993). "Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment." Science 262(5131): 208-14.
    Little, R. J. A. and D. B. Rubin (1987). Statistical Analysis with Missing Data. New York, John Wiley & Sons.
    Liu, X., D. L. Brutlag, et al. (2001). "BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes." Pac Symp Biocomput: 127-38.
    Liu, X. S., D. L. Brutlag, et al. (2002). "An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments." Nat Biotechnol 20(8): 835-9.
    Pavlidis, P., J. Weston, et al. (2001). Gene Functional Classification from Heterogeneous Data. RECOMB Conference Proceedings.
    Phuong, T. M., D. Lee, et al. (2004). "Regression trees for regulatory element identification." Bioinformatics 20(5): 750-7.
    Spellman, P. T., G. Sherlock, et al. (1998). "Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization." Mol Biol Cell 9(12): 3273-97.
    T.Kohonen (1984). Self-Organization and Associate Memory. Berlin, Spring-Verlag.
    Tavazoie, S., J. D. Hughes, et al. (1999). "Systematic determination of genetic network architecture." Nat Genet 22(3): 281-5.
    Teixeira, M. C., P. Monteiro, et al. (2006). "The YEASTRACT database: a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae." Nucleic Acids Res 34(Database issue): D446-51.
    Thijs, G., K. Marchal, et al. (2002). "A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes." J Comput Biol 9(2): 447-64.
    Thijs, G., Y. Moreau, et al. (2002). "INCLUSive: integrated clustering, upstream sequence retrieval and motif sampling." Bioinformatics 18(2): 331-2.
    Wingender, E., P. Dietze, et al. (1996). "TRANSFAC: a database on transcription factors and their DNA binding sites." Nucleic Acids Res 24(1): 238-41.
    Wolfsberg, T. G., A. E. Gabrielian, et al. (1999). "Candidate regulatory sequence elements for cell cycle-dependent transcription in Saccharomyces cerevisiae." Genome Res 9(8): 775-92.

    QR CODE
    :::