| 研究生: |
黃憲達 Hsien-Da Huang |
|---|---|
| 論文名稱: |
真核生物體基因轉錄調控因子之預測系統 A Predictive System for Transcriptional Regulatory Sites in Eukaryotic Genomes |
| 指導教授: |
洪炯宗
Jorng-Tzong Horng |
| 口試委員: | |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 畢業學年度: | 91 |
| 語文別: | 英文 |
| 論文頁數: | 128 |
| 中文關鍵詞: | 基因表現 、生物資訊 、轉錄因子 |
| 外文關鍵詞: | gene expression, Transcription factor, data mining |
| 相關次數: | 點閱:15 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
研究基因的調控機制,是一項探討真核生物體基因體運作的重要工作。由於大量基因表現(Gene Expression)資料的取得,使得可以運用電腦計算的方式來探索基因(Gene)的調控機制。一般來說,在生物體內的有著相同表現現象的某一群基因,有很大的機會被同一組的轉錄因子(Transcription Factors)來調控(Regulation)。透過電腦計算的方式可以用來預測轉錄因子所黏結(Binding)的黏結子(Binding Sites)是可行的。而傳統的研究分析方法,是繁瑣的、不方便的和費時的。本研究的目的即是設計及實作一自動化之整合性的基因轉錄因子黏結子(Transcription Factor Binding Sites)預測系統,簡稱為RgS-Miner。RgS-Miner預測系統的功能為輸入一群基因,此預測系統便會針對這群基因的調控區(Upstream)進行分析,透過具有統計基礎的電腦運算方法,預測可能共同調控(Co-regulation)這群基因的基因轉錄因子黏結子。此系統更運用資料探勘(Data Mining)的方法,找尋基因轉錄因子黏結子之間出現的關連分析(Occurrence Association)。系統並提供網頁介面供使用者直接查詢、使用及分析,圖形化的使用介面提供使用者更容易了解預測結果。在與其它的系統比較之後,我們的系統確實提供生物學家更方便的工具,可以用來分析真核生物基因體的基因調控機制。
The availability of genome-wide gene expression data provides a unique set of genes from which can be to decipher the mechanisms underlying the common transcriptional response. The identification of transcription factor binding sites provides valuable information on gene expression and regulation. Recently, the biological information and analyzing methods are available for the analysis of gene expression and transcriptional regulatory sequences. However, users should make elaborate the complicated analysis processes to query the data from different databases, followed by analyzing the gene upstreams by different prediction tools, and finally convert among different data formats. Beyond methods for the prediction of transcriptional regulatory site, new automated and integrated methods for gene upstream sequence analysis at a higher level are needed. Since the identification of regulatory sites requires a large set of biological databases, methods for an efficient and integrated data management are also crucial. In this dissertation, we proposed a predictive system, designated RgS-Miner, which is capable of predicting transcriptional regulatory sites in eukaryotes and detecting co-occurrence of these regulatory sites by inputting a group of genes, i.e., a set of genes that are considered potentially with the common regulatory mechanisms. The system integrates several regulatory site detection methods, such as known site matching, over-presented oligonucleotide detection, and DNA motif discovery. Three case studies in yeast and human genomes are studies in the proposed system. Besides, the system successfully constructs a biological data warehouse to integrate a variety of heterogeneous biological databases. By comparison to other systems, our system is a useful tool in the analyses of transcriptional regulatory sites when users investigate on the regulation of gene expression.
Aaronson, J.S., Eckman, B., Blevins, R.A., Borkowski, J.A., Myerson, J., Imran, S., and Elliston, K.O. 1996. Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data. Genome Res 6: 829-845.
Alberts, B. 1997. Essential Cell Biology: An introducton to the Molecular Biology of the Cell. Garland Pub.
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J Mol Biol 215: 403-410.
Audic, S. and Claverie, J.M. 1997. The significance of digital gene expression profiles. Genome Res 7: 986-995.
Bailey, T.L. and Elkan, C. 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2: 28-36.
Bailey, T.L. and Elkan, C. 1995. The value of prior knowledge in discovering motifs with MEME. Proc Int Conf Intell Syst Mol Biol 3: 21-29.
Bairoch, A. and Apweiler, R. 2000. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res 28: 45-48.
Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., and Wheeler, D.L. 2003. GenBank. Nucleic Acids Res 31: 23-27.
Benson, G. 1999. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27: 573-580.
Blaiseau, P.L., Isnard, A.D., Surdin-Kerjan, Y., and Thomas, D. 1997. Met31p and Met32p, two related zinc finger proteins, are involved in transcriptional regulation of yeast sulfur amino acid metabolism. Mol Cell Biol 17: 3640-3648.
Boguski, M.S., Lowe, T.M., and Tolstoshev, C.M. 1993. dbEST--database for "expressed sequence tags". Nat Genet 4: 332-333.
Bortoluzzi, S., d''Alessi, F., Romualdi, C., and Danieli, G.A. 2001. Differential expression of genes coding for ribosomal proteins in different human tissues. Bioinformatics 17: 1152-1157.
Brazma, A., Jonassen, I., Vilo, J., and Ukkonen, E. 1998. Predicting gene regulatory elements in silico on a genomic scale. Genome Res 8: 1202-1215.
Brazma, A., Vilo, J., Ukkonen, E., and Valtonen, K. 1997. Data mining for regulatory elements in yeast genome. Proc Int Conf Intell Syst Mol Biol 5: 65-74.
Capy, P., Bazin, C., Hiquet, D., and Langin, T. 1998. Molecular biology intelligence unit: dynamics and evolution of transposable elements. Landes Bioscienece, Georgetown, Texas.
Cornell, M., Paton, N.W., Wu, S., Goble, C.A., Miller, C.J., Kirby, P., Eilbeck, K., Brass, A., Hayes, A., and Oliver, S.G. 2001. GIMS - A Data Warehouse for Storage and Analysis of Genome Sequence and Functional Data. In Proc. 2nd IEEE International Symposium on Bioinformatics and Bioengineering (BIBE), pp. 15-22. IEEE Press.
Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. 1998. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95: 14863-14868.
Gollub, J., Ball, C.A., Binkley, G., Demeter, J., Finkelstein, D.B., Hebert, J.M., Hernandez-Boussard, T., Jin, H., Kaloper, M., Matese, J.C., Schroeder, M., Brown, P.O., Botstein, D., and Sherlock, G. 2003. The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res 31: 94-96.
GuhaThakurta, D. and Stormo, G.D. 2001. Identifying target sites for cooperatively binding factors. Bioinformatics 17: 608-621.
Gusfield, D. 1997. Algorithm on Strings, Trees and Sequences. Cambridge University Press, NY.
Harrington, C.A., Rosenow, C., and Retief, J. 2000. Monitoring gene expression using DNA microarrays. Curr Opin Microbiol 3: 285-291.
Hertz, G.Z. and Stormo, G.D. 1999. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15: 563-577.
Hinnebusch, A.G. 1992. General and pathway-specific regulatory mechanisms controlling the synthesis of amino acid biosyntheic enzymes in Saccharomyces cerevisiae. In Regulation of nitrogen utilization. Molecular and Cellular Biology of Yeast Saccharomyces: Gene Express , (ed. J.R. Broach), pp. 319-414. Cold Spring Harbor Laboratory Press, Cold Spring, Harbor, NY.
Horng, J.T., Huang, H.D., Hang, C.C., and Kao, C.Y. 2001a. Mining Putative Regulatory Elements in Gene Promoter Regions. In Proceedings of the German Conference on Bioinformatics., pp. 90-95.
Horng, J.T., Huang, H.D., Huang, S.L., Yan, U.C., and Chang, Y.C. 2002a. Mining putative regulatory elements in promoter regions of Saccharomyces cerevisiae. In Silico Biol 2: 263-273.
Horng, J.T., Huang, H.D., Jin, M.H., Wu, L.C., and Huang, S.L. 2002b. The repetitive sequence database and mining putative regulatory elements in gene promoter regions. J Comput Biol 9: 621-640.
Horng, J.T., Lin, J.H., and Kao, C.Y. 2001b. RSDB-A Database of Repetitive Elements in Complete Genomes. In Proceedings of the Atlantic Symposium on Computational Biology and Genome Information Systems & Technology, pp. 220-223, Durham, NC, USA.
Huang, H.D., Chang, H.L., Tsou, T.S., Liu, B.J., Kao, C.Y., and Horng, J.T. 2003. A Data Mining Method to Predict Transcriptional Regulatory Sites Based on Differentially Expressed Genes in Human Genome. In Third IEEE Symposium on BioInformatics and BioEngineering. Computer Society, IEEE, Bethesda, Maryland.
Hubbard, T., Barker, D., Birney, E., Cameron, G., Chen, Y., Clark, L., Cox, T., Cuff, J., Curwen, V., Down, T., Durbin, R., Eyras, E., Gilbert, J., Hammond, M., Huminiecki, L., Kasprzyk, A., Lehvaslaiho, H., Lijnzaad, P., Melsopp, C., Mongin, E., Pettett, R., Pocock, M., Potter, S., Rust, A., Schmidt, E., Searle, S., Slater, G., Smith, J., Spooner, W., Stabenau, A., Stalker, J., Stupka, E., Ureta-Vidal, A., Vastrik, I., and Clamp, M. 2002. The Ensembl genome database project. Nucleic Acids Res 30: 38-41.
Hughes, J.D., Estep, P.W., Tavazoie, S., and Church, G.M. 2000. Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol 296: 1205-1214.
Jensen, L.J. and Knudsen, S. 2000. Automatic discovery of regulatory patterns in promoter regions based on whole cell expression data and functional annotation. Bioinformatics 16: 326-333.
Jurka, J. 1998. Repeats in genomic DNA: mining and meaning. Curr Opin Struct Biol 8: 333-337.
Jurka, J. 2000. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet 16: 418-420.
Kanehisa, M. 2002. Post-genome Informatics. Oxford express.
Karolchik, D., Baertsch, R., Diekhans, M., Furey, T.S., Hinrichs, A., Lu, Y.T., Roskin, K.M., Schwartz, M., Sugnet, C.W., Thomas, D.J., Weber, R.J., Haussler, D., and Kent, W.J. 2003. The UCSC Genome Browser Database. Nucleic Acids Res 31: 51-54.
Kel-Margoulis, O.V., Kel, A.E., Reuter, I., Deineko, I.V., and Wingender, E. 2002. TRANSCompel: a database on composite regulatory elements in eukaryotic genes. Nucleic Acids Res 30: 332-334.
Kel-Margoulis, O.V., Romashchenko, A.G., Kolchanov, N.A., Wingender, E., and Kel, A.E. 2000. COMPEL: a database on composite regulatory elements providing combinatorial transcriptional regulation. Nucleic Acids Res 28: 311-315.
Klingenhoff, A., Frech, K., Quandt, K., and Werner, T. 1999. Functional promoter modules can be detected by formal models independent of overall nucleotide sequence similarity. Bioinformatics 15: 180-186.
Kohonen, T. 1997. Self-Organizing Maps. Springer, Berlin.
Korner, K. and Muller, R. 2000. In vivo structure of the cell cycle-regulated human cdc25C promoter. J Biol Chem 275: 18676-18681.
Korner, K., Wolfraim, L.A., Lucibello, F.C., and Muller, R. 1997. Characterization of the TATA-less core promoter of the cell cycle-regulated cdc25C gene. Nucleic Acids Res 25: 4933-4939.
Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., and Wootton, J.C. 1993. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262: 208-214.
Levy, S., Hannenhalli, S., and Workman, C. 2001. Enrichment of regulatory signals in conserved non-coding genomic sequence. Bioinformatics 17: 871-877.
Li, W.H., Gu, Z., Wang, H., and Nekrutenko, A. 2001. Evolutionary analyses of the human genome. Nature 409: 847-849.
Liu, B., Hsu, W., and Ma, Y. 1999. Pruning and Summarizing the Discovered Associations. In Proc. of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,, pp. 125-134, San Diego, CA, USA.
Liu, X., Brutlag, D.L., and Liu, J.S. 2001. BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac Symp Biocomput: 127-138.
Magasanik, B. 1992. Regulation of nitrogen utilization. In Molecular and Cellular Biology of Yeast Saccharomyces: Gene Express (ed. J.R. Broach), pp. 283-318. Cold Spring Harbor Laboratory Press, Cold Spring, Harbor, NY.
Mewes, H.W., Frishman, D., Gruber, C., Geier, B., Haase, D., Kaps, A., Lemcke, K., Mannhaupt, G., Pfeiffer, F., Schuller, C., Stocker, S., and Weil, B. 2000. MIPS: a database for genomes and protein sequences. Nucleic Acids Res 28: 37-40.
Mewes, H.W., Frishman, D., Guldener, U., Mannhaupt, G., Mayer, K., Mokrejs, M., Morgenstern, B., Munsterkotter, M., Rudd, S., and Weil, B. 2002. MIPS: a database for genomes and protein sequences. Nucleic Acids Res 30: 31-34.
Mewes, H.W., Heumann, K., Kaps, A., Mayer, K., Pfeiffer, F., Stocker, S., and Frishman, D. 1999. MIPS: a database for genomes and protein sequences. Nucleic Acids Res 27: 44-48.
Ohler, U. and Niemann, H. 2001. Identification and analysis of eukaryotic promoters: recent computational approaches. Trends Genet 17: 56-60.
Okubo, K., Hori, N., Matoba, R., Niiyama, T., Fukushima, A., Kojima, Y., and Matsubara, K. 1992. Large scale cDNA sequencing for analysis of quantitative and qualitative aspects of gene expression. Nat Genet 2: 173-179.
Oshima, Y., Ogawa, N., and Harashima, S. 1996. Regulation of phosphatase synthesis in Saccharomyces cerevisiae--a review. Gene 179: 171-177.
Pruitt, K.D. and Maglott, D.R. 2001. RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res 29: 137-140.
Roth, F.P., Hughes, J.D., Estep, P.W., and Church, G.M. 1998. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat Biotechnol 16: 939-945.
Ruitberg, C.M., Reeder, D.J., and Butler, J.M. 2001. STRBase: a short tandem repeat DNA database for the human identity testing community. Nucleic Acids Res 29: 320-322.
Schena, M. 1996. Genome analysis with gene expression microarrays. Bioessays 18: 427-431.
Schuler, G.D., Boguski, M.S., Stewart, E.A., Stein, L.D., Gyapay, G., Rice, K., White, R.E., Rodriguez-Tome, P., Aggarwal, A., Bajorek, E., Bentolila, S., Birren, B.B., Butler, A., Castle, A.B., Chiannilkulchai, N., Chu, A., Clee, C., Cowles, S., Day, P.J., Dibling, T., Drouot, N., Dunham, I., Duprat, S., East, C., Hudson, T.J., and et al. 1996. A gene map of the human genome. Science 274: 540-546.
Srikant, R., Vu, Q., and Agrawal, R. 1995. Mining Generalized Association Rules. In Proc. of the 21st Int''l Conference on Very Large Databases, pp. 407-419, Zurich, Switzerland.
Stekel, D.J., Git, Y., and Falciani, F. 2000. The comparison of gene expression from multiple cDNA libraries. Genome Res 10: 2055-2061.
Strichman-Almashanu, L.Z., Lee, R.S., Onyango, P.O., Perlman, E., Flam, F., Frieman, M.B., and Feinberg, A.P. 2002. A genome-wide screen for normally methylated human CpG islands that can identify novel imprinted genes. Genome Res 12: 543-554.
Sudarsanam, P., Pilpel, Y., and Church, G.M. 2002. Genome-wide co-occurrence of promoter elements reveals a cis-regulatory cassette of rRNA transcription motifs in Saccharomyces cerevisiae. Genome Res 12: 1723-1731.
Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., and Golub, T.R. 1999. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci U S A 96: 2907-2912.
Toronen, P., Kolehmainen, M., Wong, G., and Castren, E. 1999. Analysis of gene expression data using self-organizing maps. FEBS Lett 451: 142-146.
van Helden, J., Andre, B., and Collado-Vides, J. 1998. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol 281: 827-842.
van Helden, J., Andre, B., and Collado-Vides, J. 2000a. A web site for the computational analysis of yeast regulatory sequences. Yeast 16: 177-187.
van Helden, J., Rios, A.F., and Collado-Vides, J. 2000b. Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Res 28: 1808-1818.
Wagner, A. 1999. Genes regulated cooperatively by one or more transcription factors and their identification in whole eukaryotic genomes. Bioinformatics 15: 776-784.
Wasserman, W.W. and Fickett, J.W. 1998. Identification of regulatory regions which confer muscle-specific gene expression. J Mol Biol 278: 167-181.
Werner, T. 1999. Models for prediction and recognition of eukaryotic promoters. Mamm Genome 10: 168-175.
Westbrook, J., Feng, Z., Chen, L., Yang, H., and Berman, H.M. 2003. The Protein Data Bank and structural genomics. Nucleic Acids Res 31: 489-491.
Wheeler, D.L., Church, D.M., Federhen, S., Lash, A.E., Madden, T.L., Pontius, J.U., Schuler, G.D., Schriml, L.M., Sequeira, E., Tatusova, T.A., and Wagner, L. 2003. Database resources of the National Center for Biotechnology. Nucleic Acids Res 31: 28-33.
Whitfield, M.L., Sherlock, G., Saldanha, A.J., Murray, J.I., Ball, C.A., Alexander, K.E., Matese, J.C., Perou, C.M., Hurt, M.M., Brown, P.O., and Botstein, D. 2002. Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol Biol Cell 13: 1977-2000.
Wingender, E., Chen, X., Fricke, E., Geffers, R., Hehl, R., Liebich, I., Krull, M., Matys, V., Michael, H., Ohnhauser, R., Pruss, M., Schacherer, F., Thiele, S., and Urbach, S. 2001. The TRANSFAC system on gene expression regulation. Nucleic Acids Res 29: 281-283.
Workman, C.T. and Stormo, G.D. 2000. ANN-Spec: a method for discovering transcription factor binding sites with improved specificity. Pac Symp Biocomput: 467-478.
Wu, C.H., Yeh, L.S., Huang, H., Arminski, L., Castro-Alvear, J., Chen, Y., Hu, Z., Kourtesis, P., Ledley, R.S., Suzek, B.E., Vinayaka, C.R., Zhang, J., and Barker, W.C. 2003. The Protein Information Resource. Nucleic Acids Res 31: 345-347.
Zwicker, J., Gross, C., Lucibello, F.C., Truss, M., Ehlert, F., Engeland, K., and Muller, R. 1995. Cell cycle regulation of cdc25C transcription is mediated by the periodic repression of the glutamine-rich activators NF-Y and Sp1. Nucleic Acids Res 23: 3822-3830.