| 研究生: |
謝立青 Li-Ching Hsieh |
|---|---|
| 論文名稱: |
細菌基因體的普適長度與基因體生長模型 Universal Lengths of Bacterial Genomes and Model for Genome Growth |
| 指導教授: |
李弘謙
Hoong-Chien Lee |
| 口試委員: | |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
理學院 - 物理學系 Department of Physics |
| 畢業學年度: | 91 |
| 語文別: | 中文 |
| 論文頁數: | 34 |
| 中文關鍵詞: | 核醣核酸世界 、準複製體 、Shannon資訊 、根序列長度 、相對頻譜寬度 、基因體 、生長模型 |
| 外文關鍵詞: | genome, growth model, Shannon information, root-sequence length, relative spectral length, qusairelpicas, RNA world |
| 相關次數: | 點閱:13 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
我們說明了頻率分佈的相對頻譜寬度(relative spectral width)和Shannon資訊(Shannon information)之間是有簡單關聯的. 從計算 108個細菌全基因體序列的2到10長的核甘酸 (k-字串, k從2到10) 頻率分佈的相對頻譜寬度, 揭露出了一組由全部細菌基因體所共有的"根序列長度(root-sequence length)"; 它與細菌基因體的長度及鹼基成分無關, 但和k成指數關係, 隨著k的增加而變大. 若給定一個k, 細菌基因體序列與長度恰為根序列長度的隨機序列擁有相同的相對頻譜寬度(relative spectral length). 由此概念我們由電腦模擬了一條原長大約為200鹼基(base)長的隨機序列, 經由高度隨機的短片段自我複製的過程後, 其長成的"準複製體(qusairelpicas)" 序列也擁有一些與細菌基因體序列相類似的特性. 準複製體序列是條自我組織的、複雜的且無週期的序列, 它是個儲存大量資訊的理想地方. 由小尺度觀之, 它是一條短隨機序列的高倍複製體, 由大尺度觀之, 它僅像是一條隨機序列. 從這些發現之中, 我們推斷出當遠祖基因體的長度大約為200鹼基長且已有初步的複製機制時, 基因體開始藉由複製而生長, 而那時候的遺傳世界是個只有去氧核醣核酸(DNA)與核醣核酸(RNA)而沒有蛋白質的世界.
Spectral width and Shannon information of a frequency distribution are shown to be simply related. Measurements of spectral widths of distributions of frequencies of words two to ten nucleotides long (k-mers, k=2 to 10) in 108 bacterial complete genomes reveal the existence of a set of universal "root-sequence lengths" shared by all bacterial genomes independent of sequence length and base composition but grow exponentially with k. For a given k the relative spectral widths of all bacterial genomes are the same as that of a random sequence whose length is the root-sequence length for k-mers. We use computer modelling to show that such properties of bacterial genomes are reproduced by "quasireplicas" -sequences "grown" by maximally stochastic short-segmental duplications from initial random root sequences about 200 bases long. Ideal of storing large amounts of information, quasireplicas are self-organized, complex, aperiodic sequences appearing in the short scale as high-multiple replicas of random sequences and in the large scale as random sequences. From our findings we infer that growth by duplication in a world with only DNA/RNA and devoid of proteins, when the ancestral genomes were about 200 bases long and had acquired a rudimentary duplication machinery.
[1] Avery OT, et al. (1944) Studies on the Chemical Nature of the Substance Inducing Transformation of Pneumococcal Types. J. Exp. Med. 79: 137.
[2] Watson JD, Crick FHC (1953) A Structure for Deoxyribose Nucleic Acid. Nature 171: 737-738.
[3] Watson JD, Crick FHC (1953) Genetical Implications of the structure of Deoxyribonucleic Acid. Nature 171: 964-967.
[4] Blattner FR, et al. (1997) The complete genome sequence of Escherichia coli K-12. Science 277: 1453-1474.
[5] Goffeau A, et al. (1996) Life with 6000 genes. Science 274: 546, 563-567.
[6] Ainscough R, et al. (1998) Genome sequence of the nematode C-elegans: A platform for investigating biology. Science 282: 2012-2018.
[7] Mark DA, et al. (1998) The Genome Sequence of Drosophila melanogaster. Science 287: 2185-2195.
[8] Watson JD (1990) The human genome project: past, present, and future. Science 248: 44-49.
[9] Collins FS, et al. (1993) A new five-year plan for the U.S. Human Genome Project. Science 262: 43-46.
[10] Collins FS, et al. (1998) New Goals for the U.S. Human Genome Project: 1998-2003. Science 282: 682-689.
[11] Fleischmann RD, et al. (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269: 496-512.
[12] Venter JC, et al. (2001) The Sequence of the Human Genome. Science 291: 1304-1351.
[13] NHGRI News, April 14, 2003. International Consortium Completes Human Genome Project. http://www.genome.gov/11006929.
[14] Marshall E (2001) Bermuda Rules: Community Spirit, With Teeth. Science 291: 1192.
[15] Benson DA (2003) GenBank. Nucl. Acids Res. 31: 23-27.
[16] The GenBank, http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html (Version February 07, 2003).
[17] Karp G, Cell and Molecular Biology, 3nd Edition, (Wiley, 2002).
[18] Alberts B, et al., Molecular Biology of the Cell, 4th Edition, (Garland Publishing, 2002).
[19] Stryer L, et al., Biochemistry, 5th Edition, (W.H. Freeman, 2002).
[20] Joyce GF (2002) The antiquity of RNA-based evolution. Nature 418: 214-221.
[21] Gilbert W (1986) The RNA world. Nature 319: 618.
[22] Cech TR, Brehm SL. (1981) Replication of the extrachromosomal ribosomal RNA genes of Tetrahymena thermophilia. Nucl. Acids Res. 24: 3531-3543.
[23] Guerrier-Takada C, et al. (1983) The RNA moiety of RNase P is the catalytic subunit of the enzyme. Cell 35: 849-857.
[24] The Database of Genome Sizes, http://www.cbs.dtu.dk/databases/DOGS/abbr_table.bysize.txt Version January 13, 2001).
[25] Li WH, Molecular Evolution, (Sinauer Associates, Inc.,1997).
[26] Waterman MS, Introduction to Computational Biology, p67, (Chapman & Hall, 1995).
[27] Durbin R, et al., Biological Sequence Analysis, p248, (Cambridge Univ. Press, 1998).
[28] Setubal J and Meidanis J, Introduction to Computational Molecular Biology, p40, (PWS Publishing, 1997).
[29] Dayhoff MO, et al. (1965) Atlas of Protein Sequence and Structure. National Biomedical Research Foundation, Silver Spring MD.
[30] Ellis RE, et al. (1986) The rDNA of C. elegans: sequence and structure. Nucl. Acids Res. 14: 2345-2364.
[31] Tautz D, et al. (1988) Complete sequences of the rRNA genes of Drosophila melanogaster. Mol. Biol. Evol. 5: 366-376.
[32] Schurr T (1984) Comparison of the nucleotide sequence of soybean 18S rRNA with the sequences of other small-subunit rRNAs. J. Mol. Evol. 21: 259-269.
[33] Torczynski RM, et al. (1985) Cloning and sequencing of a human 18S ribosomal RNA gene. DNA 4: 283-291.
[34] Raynal F, et al. (1984) Complete nucleotide sequence of mouse 18S rRNA gene: comparison with other available homologs. FEBS Lett. 167: 263-268.
[35] Mankin AS, et al. (1986) Identification of ten additional nucleotides in the primary structure of yeast 18S rRNA. Gene 44:143-145.
[36] Borisjuk NV (09-JUL-1992) Direct Submission. Submitted N.V. Borisjuk, University of Tuebingen, Dept of Genetics, Auf der Morgenstelle 28, 7400 Tuebingen, FRG.
[37] Karlin S, et al. (1995) Dinucleotide Relative Abundance Extremes: A Genomic Signature, Trends in Genetics 11: 283-290.
[38] Karlin S, et al. (1992) Statistical Analyses of Counts and Distributions of Restriction Sites in DNA Sequences, Nucl. Acids Res. 20: 1363-1370.
[39] Colbert T, et al. (1998) Genomics, Chi Sites and Codons: ‘Islands of Preferred DNA Pairing’ Are Oceans of ORFs. Trends in Genetics 14: 485-488.
[40] Smith HO, et al. (1995) Frequency and Distribution of DNA Uptake Signal Sequences in the Haemophilus in Fluenzae Rd genome. Science 269: 538-540.
[41] Karlin SJ, et al. (1996) Frequent Oligonucleotides and Peptides of the Haemophilus in Fluenzae Genome. Nucl. Acid Res. 24: 4263-4272.
[42] Smith HO, et al. (1999) DNA Uptake Signal Sequence in Naturally Transformable Bacteria, Res. Microbiol. 150: 603-616.
[43] Woese, CR (1987) Bacterial evolution. Microbiol. Rev. 51: 221-271.
[44] Woese, CR and Fox G.E (1977) Phylogenetic structure of the prokaryotic domain: The primary kingdoms. Proc. Natl. Acad. 74: 5088-5090.
[45] Woese, CR, et al., Evolution at Molecular Level , Chap.1, The use of ribosomal RNA in reconstructing evolutionary relationships among bacteria, (Sinauer Associates, 1991).
[46] Achenbach-Richter L (1987) Were the original eubacteria thermophiles? Syst. Appl. Microbial. 9: 34-39 .
[47] Margulis L, et al., Five Kingdoms, (W. H. Freeman, 1998).
[48] D. Higgins, et al. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl. Acids Res. 22: 4673-80. For the software package CLUSTAL W see the website ftp://ftp.ebi.ac.uk/pub/software.
[49] Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39: 783-791.
[50] Felsenstein J (1998) Phylogenies from molecular sequences: Inference and reliability. Annu. Rev. Genet. 22: 521-565. For the software package PHYLIP see the website http://evolution.genetics.washington.edu/phylip/software.pars.html#PHYLP.
[51] Deckert G., et al. (1998) The complete genome of the hyperthermophilic bacterium Aquifex aeolicus. Nature 392: 335-358 .
[52] Nelson KE, et al. (1999) Evidence for horizontal gene transfer between archaea and bacteria from genome sequence of T. maritima. Science 399: 323-329
[53] Lake JA, et al. (1998) Mix and match in the tree of life. Science 280: 2027-2028.
[54] Stover CK, et al. (2000) Complete genome sequence of Pseudomonas aeruginosa PA01, an opportunistic pathogen. Nature 406: 959-964.
[55] Shannon CE (1948) A mathematical theory of communication. Bell Sys. Techn. J. 27: 379-423; 623-656.
[56] The GenBank, http://www.ncbi.nlm.nih.gov/PMGifs/Genomes/micr.html (Version January 26, 2003).
[57] Ohno S, Evolution by Gene Duplication, (Springer Verlag, New York, 1970).
[58] Hughes AL, et al. Ancient genome duplications did not structure the Human Hox-bearing chromosomes. Genome Res. 11 (2001) 771-780.
[59] Hsieh LC, Lee HC (2002) Model for the growth of bacterial genomes. Mod. Phys. Lett. B 22:821-827.
[60] Hsieh LC, et al. (2003) Minimal model for genome evolution and growth. Phys. Rev. Lett. 90: 018101-018104.
[61] Hsieh LC, et al. ( preprint archive: http://arXiv.org/physics/0302031).
[62] Smith JM and Szarthmary E, The Major Transition in Evolution, (Oxford Univ. Press, London, 1997).
[63] Forster AC, Symons RH (1987) Self-cleavage of plus and minus RNAs of a virusoid and a structural model for the active sites. Cell 49: 211-220.
[64] Hayes JM (1996) The earliest memories of life on Earth. Nature 384: 21-22.
[65] Yanai I, et al. (2000) Predictions of Gene Family Distributions in Microbial Genomes: Evolution by Gene Duplication and Modification. Phys. Rev. Lett. 85: 2641-2644.
[66] Qian J, et al. (2001) Protein family and fold. J. Mol. Biol. 313:673-681.