跳到主要內容

簡易檢索 / 詳目顯示

研究生: 許博凱
Bo-Kai HSU
論文名稱: 蛋白質重複序列分析系統
ProDup: a software to compute structure internal duplications in protein sequences
指導教授: 洪炯宗
Jorng-Tzong Horng
黃憲達
Hsien-Da Huang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
畢業學年度: 92
語文別: 英文
論文頁數: 43
中文關鍵詞: 蛋白質重複序列分析系統
外文關鍵詞: secondary structure, internal duplication
相關次數: 點閱:14下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 了解蛋白質的重複序列對於分析蛋白質的空間,功能,結構,以及其彼此間的相互作用是有助益的。在文獻上指出,許多較長的蛋白質序列可能是由上一代含有許多重覆序列的蛋白質演化而來,同時也發現許多蛋白質中的重覆序列和其功能與結構有關。本研究試著開發一套蛋白質序列的分析工具稱作(ProDup),可以配合一些額外的資訊,像是二級結構來分析蛋白質序列中的重覆序列。根據這工具的分析我們發現許多有趣的結果。並且拿來和其他找尋蛋白質重覆序列的工具做比較。同時我們也利用這個工具來分析Pfam資料庫中的 repeats families。ProDup可以在蛋白質重覆序列的研究上提供一個不同的觀點。


    Protein internal duplications are regular arrays of spatial and functional groups. It is useful for structural packing or for protein-protein interactions. Many large proteins have evolved by internal duplications and many protein internal duplications correspond to functional and structural units. In this study, we develop a system that can find protein internal duplications with user constraints called ProDup. We obtain several interesting results from the finder. We also compare our results with those found by other tools. Furthermore, based on Pfam repeats families, we provide the statistics of the data found by ProDup. ProDup provides an alternative viewpoint of protein internal duplications.

    Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 2 1.3 Goal 3 Chapter 2 Related Works 4 2.1 PAM (Percent/Point Accepted Mutation) 4 2.2 CLUSTAL W 4 2.3 Databases 5 2.4 TRIPS 8 2.5 Radar 9 2.6 REP 10 2.7 PSIPRED 11 Chapter 3 Materials and Methods 12 3.1 Data Sets 12 3.2 Develop Environment 12 3.3 Main Ideals 12 3.4 Methods 13 Chapter 4 Results 31 4.1 Web Interface 31 4.2 Case Study 32 Chapter 5 Discussions and Conclusions 38 References 41 Appendix 43

    1. Heger, A. and L. Holm, Rapid automatic detection and alignment of repeats in protein sequences. Proteins, 2000. 41(2): p. 224-37.
    2. Kajava, A.V., Review: proteins with repeated sequence--structural prediction and modeling. J Struct Biol, 2001. 134(2-3): p. 132-44.
    3. Andrade, M.A., et al., Homology-based method for identification of protein repeats using statistical significance estimates. J Mol Biol, 2000. 298(3): p. 521-37.
    4. Andrade, M.A., C. Perez-Iratxeta, and C.P. Ponting, Protein repeats: structures, functions, and evolution. J Struct Biol, 2001. 134(2-3): p. 117-31.
    5. Kohl, A., et al., Designed to be stable: crystal structure of a consensus ankyrin repeat protein. Proc Natl Acad Sci U S A, 2003. 100(4): p. 1700-5.
    6. Walker, R.G., A.T. Willingham, and C.S. Zuker, A Drosophila mechanosensory transduction channel. Science, 2000. 287(5461): p. 2229-34.
    7. Sedgwick, S.G. and S.J. Smerdon, The ankyrin repeat: a diversity of interactions on a common structural framework. Trends Biochem Sci, 1999. 24(8): p. 311-6.
    8. Bateman, A., et al., The Pfam protein families database. Nucleic Acids Res, 2002. 30(1): p. 276-80.
    9. Kurtz, S., et al., Computation and visualization of degenerate repeats in complete genomes. Proc Int Conf Intell Syst Mol Biol, 2000. 8: p. 228-38.
    10. TIGER, Repeat-finder. [http://www.tigre.org/tdb/rice/repeatinfo-MUMmer.shtml], 1999.
    11. Delcher, A.L., et al., Alignment of whole genomes. Nucleic Acids Res, 1999. 27(11): p. 2369-76.
    12. Kurtz, S. and C. Schleiermacher, REPuter: fast computation of maximal repeats in complete genomes. Bioinformatics, 1999. 15(5): p. 426-7.
    13. Henikoff, S. and J.G. Henikoff, Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A, 1992. 89(22): p. 10915-9.
    14. Thompson, J.D., D.G. Higgins, and T.J. Gibson, CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res, 1994. 22(22): p. 4673-80.
    15. Tatusov, R.L., et al., The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res, 2000. 28(1): p. 33-6.
    16. Tatusov, R.L., et al., The COG database: an updated version includes eukaryotes. BMC Bioinformatics, 2003. 4(1): p. 41.
    17. Apweiler, R., et al., InterPro--an integrated documentation resource for protein families, domains and functional sites. Bioinformatics, 2000. 16(12): p. 1145-50.
    18. Bateman, A., et al., The Pfam protein families database. Nucleic Acids Res, 2004. 32 Database issue: p. D138-41.
    19. Mulder, N.J., et al., The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res, 2003. 31(1): p. 315-8.
    20. Attwood, T.K., The PRINTS database: a resource for identification of protein families. Brief Bioinform, 2002. 3(3): p. 252-63.
    21. Corpet, F., J. Gouzy, and D. Kahn, The ProDom database of protein domain families. Nucleic Acids Res, 1998. 26(1): p. 323-6.
    22. Katti, M.V., et al., Amino acid repeat patterns in protein sequences: their diversity and structural-functional implications. Protein Sci, 2000. 9(6): p. 1203-9.
    23. McGuffin, L.J., K. Bryson, and D.T. Jones, The PSIPRED protein structure prediction server. Bioinformatics, 2000. 16(4): p. 404-5.
    24. Adebiyi, E.F., T. Jiang, and M. Kaufmann, An efficient algorithm for finding short approximate non-tandem repeats. Bioinformatics, 2001. 17 Suppl 1: p. S5-S12.

    QR CODE
    :::