跳到主要內容

簡易檢索 / 詳目顯示

研究生: 黃楹楹
Ying-Ying Huang
論文名稱: 鑑別導致不同功能性基因表現差異之調控因子組合
Genome-wide Co-occurrence Detection of PutativeRegulatory Sites Based on Co-regulated GeneClusters in Yeast Genomes
指導教授: 洪炯宗
Jorng-Tzong Horng
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
畢業學年度: 91
語文別: 英文
論文頁數: 41
中文關鍵詞: 基因表現調控因子
外文關鍵詞: regulatory sites, transcription factor binding sites, pattern discovery, mining
相關次數: 點閱:14下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本文標記轉錄因子, 重複序列和工具預測出的黏合序列定位於基因前的促進區域。應用資料探 (Data Ming) 技術於重複序列與轉錄因子的組合以及工具預測出的黏合序列與轉錄因子的組合。再從關聯規則中去除多餘的規則.利用統計方法找出較有意義的,在規則裡的重複序列和工具預測出的黏合序列中找尋可能的轉錄因子。由於不同的轉錄因子組合的黏合會造成基因的轉錄有所不同,因此我們找出不同功能之相關基因較具鑑別性的組合。我們進行的實驗主要是酵母菌及原蟲的基因組上。轉錄因子的研究上,我們得到相當有價值的資訊,並將結果公開在http://dbms68.csie.ncu.edu.tw/REDB/ 網站上。


    The data mining approach, mining association rules, is applied to mine the associations from the combinations of candidate regulatory sites and known regulatory sites. We apply a set of statistical algorithms to characterization of the site combinations in a co-regulated gene group and statistically analyzed it to other co-regulated gene groups to find the site combinations which prefer to occur in a specific gene groups with significant occurrences. The regulatory sites of the gene group-specific site combinations are putative transcription factor binding sites. The methodology introduced here facilitates to analyze combinatorial interactions of multiple transcription factors and is applied to two organisms, Saccharomyces cerevisize and Caenorhabditis elegans, and the promoter regions of ORFs of them. The results are now available at http://dbms68.csie.ncu.edu.tw/REDB/

    Contents Chapter 1 Introduction 1 1.1 Motivations 1 1.2 Goals 2 1.3 Background 3 Chapter 2 Related Works 5 2.1 Pattern discovery tools 5 2.2 TRANSFAC database 8 2.3 GenBank 10 2.4 RSDB [3] 11 2.5 Functional related gene groups 11 2.6 Co-regulated gene groups 12 Chapter 3 Materials and Methods 14 3.2 Preprocessing phase 15 3.3 Prediction phase 16 3.3.1 Over-represented repeats statistics analysis 17 3.3.2 Known site homologs and DNA binding motifs discovery 19 3.4 Annotation phase 23 3.4.1 Site co-occurrence Analysis 24 3.4.2 Significance filtering 25 3.4.3 Distance filtering 29 Chapter 5 Results 30 5.1 Positional biased of motif groups 30 5.2 Group specific site combinations 31 Chapter 6 Summery 34 References 36 Appendix 39 A. Database schema of web 39 B. Comparison with other approaches 40 C. Enrichment of gene expression clusters for ORFs within MIPS functional categories 41 List of Figures Figure 1. The transcriptional regulation of a gene. 4 Figure 2. Top periodic clusters, their motifs and overall distribution in all clusters. 13 Figure 3. System Flow 15 Figure 4. Map showing the locations of experimentally verified binding sites of Mat

    References
    1. Horng, J.T., et al., The repetitive sequence database and mining putative regulatory elements in gene promoter regions. J Comput Biol, 2002. 9(4): p. 621-40.
    2. Wingender, E., et al., The TRANSFAC system on gene expression regulation. Nucleic Acids Res, 2001. 29(1): p. 281-3.
    3. Horng, J.T., J.H. Lin, and C.Y. Kao. RSDB-A Database of Repetitive Elements in Complete Genomes. in Proceedings of the Atlantic Symposium on Computational Biology and Genome Information Systems & Technology. 2001. Durham, NC, USA.
    4. Roth, F.P., et al., Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat Biotechnol, 1998. 16(10): p. 939-45.
    5. Bailey, T.L. and C. Elkan, The value of prior knowledge in discovering motifs with MEME. Proc Int Conf Intell Syst Mol Biol, 1995. 3: p. 21-9.
    6. Lawrence, C.E., et al., Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 1993. 262(5131): p. 208-14.
    7. Sinha, S. and M. Tompa, A statistical method for finding transcription factor binding sites. Proc Int Conf Intell Syst Mol Biol, 2000. 8: p. 344-54.
    8. Hughes, J.D., et al., Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol, 2000. 296(5): p. 1205-14.
    9. Kel-Margoulis, O.V., et al., COMPEL: a database on composite regulatory elements providing combinatorial transcriptional regulation. Nucleic Acids Res, 2000. 28(1): p. 311-5.
    10. Bjorklund, S. and Y.J. Kim, Mediator of transcriptional regulation. Trends Biochem Sci, 1996. 21(9): p. 335-7.
    11. Neuwald, A.F., J.S. Liu, and C.E. Lawrence, Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci, 1995. 4(8): p. 1618-32.
    12. Bailey, T.L. and C. Elkan, Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol, 1994. 2: p. 28-36.
    13. Liu, X., D.L. Brutlag, and J.S. Liu, BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac Symp Biocomput, 2001: p. 127-38.
    14. Eskin, E. and P.A. Pevzner, Finding composite regulatory patterns in DNA sequences. Bioinformatics, 2002. 18 Suppl 1: p. S354-63.
    15. GuhaThakurta, D. and G.D. Stormo, Identifying target sites for cooperatively binding factors. Bioinformatics, 2001. 17(7): p. 608-21.
    16. Eskin, E., Sparse Sequence Modeling with Applications to Computational Biology and Intrusion Detection. 2002.
    17. Kielbasa, S.M., et al., Combining frequency and positional information to predict transcription factor binding sites. Bioinformatics, 2001. 17(11): p. 1019-26.
    18. van Helden, J., del Olmo, M. and Perez-Ortin, J.E., Statistical analysis of yeast genomic downstream sequences reveals putative polyadenylation signals. Nucleic Acids Res., 2000a. 28: p. 1000-1010.
    19. van Helden, J., A.F. Rios, and J. Collado-Vides, Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Res, 2000. 28(8): p. 1808-18.
    20. Mewes, H.W., et al., MIPS: a database for protein sequences, homology data and yeast genome information. Nucleic Acids Res, 1997. 25(1): p. 28-30.
    21. Costanzo, M.C., et al., The yeast proteome database (YPD) and Caenorhabditis elegans proteome database (WormPD): comprehensive resources for the organization and comparison of model organism protein information. Nucleic Acids Res, 2000. 28(1): p. 73-6.
    22. Tavazoie, S., et al., Systematic determination of genetic network architecture. Nat Genet, 1999. 22(3): p. 281-5.
    23. van Helden, J., B. Andre, and J. Collado-Vides, Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol, 1998. 281(5): p. 827-42.
    24. Agrawal, R., T. Imielinski, and A. Swami. Mining Associations between Sets of Items in Large Databases. in Proc. of the ACM SIGMOD Int''l Conference on Management of Data. 1993. Washington D.C.
    25. Agrawal, R. and R. Srikant, Fast Algorithms for Mining Association Rules. 1994, IBM Almaden Research Center. p. 1-32.
    26. Zhu, J. and M.Q. Zhang, SCPD: a promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics, 1999. 15(7-8): p. 607-11.
    27. Jensen, L.J. and S. Knudsen, Automatic discovery of regulatory patterns in promoter regions based on whole cell expression data and functional annotation. Bioinformatics, 2000. 16(4): p. 326-33.
    28. Sudarsanam, P., Y. Pilpel, and G.M. Church, Genome-wide co-occurrence of promoter elements reveals a cis-regulatory cassette of rRNA transcription motifs in Saccharomyces cerevisiae. Genome Res, 2002. 12(11): p. 1723-31.
    29. Matthews, B.W., Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta, 1975. 405(2): p. 442-51.
    30. Pilpel, Y., P. Sudarsanam, and G.M. Church, Identifying regulatory networks by combinatorial analysis of promoter elements. Nat Genet, 2001. 29(2): p. 153-9.

    QR CODE
    :::