跳到主要內容

簡易檢索 / 詳目顯示

研究生: 楊兵河
Bing-He Yang
論文名稱: 應用遺傳演算法解決蛋白質多重序列排比問題
An Approach to Multiple Protein Sequence Alignment Using A Genetic Algorithm
指導教授: 洪炯宗
Jorng-Tzong Horng
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
畢業學年度: 89
語文別: 中文
論文頁數: 43
中文關鍵詞: 多重序列排比遺傳演算法
外文關鍵詞: Multiple Sequence Alignment, Genetic Algorithm
相關次數: 點閱:14下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 人類的基本組成物質為去氧核醣核酸(DNA),但最主要的作用由蛋白質完成。就單一人類而言,DNA的序列對(基因組大小)為30億組,從如此大量的資料中我們可以找出某些序列之間的關係。利用序列排比(Sequence Alignment)尤其是多重序列排比,可以幫助我們去預測新序列的二維或三維架構,且找出它們之間的關係。多重序列排比在生物資訊學中是一個很重要也具有挑戰性的問題。在這篇論文中,我們使用了遺傳演算法,再加上動態編程法(DP,Dynamic Programming)一起作用來找出最佳的多重排比。遺傳演傳法在複雜的問題領域上是一項很強的工具,我們主要利用它找出固定合理的配合區塊(Match Block),然後利用DP來處理中間非配合區域(Mismatch Area)。最後實驗數項資料組,我們發現這種方法是可行。


    Multiple sequence alignment (MSA) is an important and challenging problem in computational biology. Using sequence alignment skill, especially MSA (multiple sequence alignment) we may extract the function of genes, to help predict the secondary or tertiary structure of new sequences, and to find the relationship between sequences. In this thesis, we combine genetic algorithms (GA) and dynamic programming (DP) together to find protein alignment. Genetic algorithm is a strong stochastic approach for efficient and robust search in large space and time area. We use a genetic approach to find reasonable match blocks and isolate them. We then apply modified dynamic programming to do pairwise alignment in each mismatch blocks. We apply our approach to several data sets, and from the experimental results we find our approach is promising.

    Chapter 1 Introduction1 1.1 Background1 1.2 Motivation and Goal2 1.3 Problem Description3 1.4 Organization of The Thesis5 Chapter 2 Related Work7 Chapter 3 The Proposed Approach11 3.1 An Overview of Our Approach11 3.2 Using GA to Find Match Blocks13 3.3 Using DP to Do Pairwise Alignment21 Chapter 4 Experiments and Results24 4.1 Algorithmic Implementation24 4.2 Protein Dataset24 4.3 Interface25 4.4 Experimental Results26 Chapter 5 Discussion and Conclusion28 References29 Appendix32

    [1].T. Jiang and L. Wang. (1994) On the complexity of multiple sequence alignment. J. Comp. Biol., 1:337-348.
    [2].D. E. Goldberg. (1989) Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, New York.
    [3].C. Zhang and A. K. C. Wong. (1997) A genetic algorithm for multiple molecular sequence alignment. Comput. Applic. Biosci., 13(6):565-581.
    [4].D. Jeanteur, J. H. Lakey, and F. Pattus. (1991) The bacterial porin superfamily: Sequence alignment and structure prediction. Mol. Microbiology, 5:2135-2164.
    [5].R. Unger and J. Moult. (1993) Genetic Algorithm for Protein Folding Simulations. J. Mol. Biol., 231:75-81.
    [6].A. P. Gultyaev, F. H. D. V. Batenbury and C. W. A. Pleij. (1995) The Influence of A Metastable Structure in Plasmid Primer RNA on Antisence RNA Binding Kinetics. Nucleic Acids Res., 23(18):3718-3725.
    [7].J. D. Thompson, D. G. Higgins and T. J. Gibson. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res., 22(22):4673-4680.
    [8].C. Korostensky and G. H. Gonnet. (1999) Near Optimal Multiple Sequence Alignments Using a Traveling Salesman Problem Approach. SPIRE, 105-114.
    [9].A. Krogh, M. Brown, I. S. Mian, K. Sjolander and D. Haussler. (1994) Hidden markov models in computational biology: applications to protein modeling. J. Mol. Biol., 235:1501-1531.
    [10].Gotoh. (1982) An improved algorithm for matching biological sequences. J. Mol. Biol., 162:705-708.
    [11].M. S. Waterman. (1984) General methods of sequence comparison. Bull, Math. Biol., 46:473-500.
    [12].E. W. Myers and W. Miller. (1988) Optimal alignments in linear space. Comput. Applicat. Biosci., 4(1):11-17.
    [13].S. C. Chen, A. K. C. Wong and D. K. Y. Chiu. (1992) A survey of multiple sequence comparison methods. Bull. Math. Biol., 54:563-598.
    [14].M. McClure, T. Vasi and W. Fitch. (1994) Comparative analysis of multiple protein sequence alignment methods. J. Mol. Biol., 11(4):571-592.
    [15].D. F. Feng and R. F. Doolittle. (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. Journal of Molecular Evolution, 25:351-360.
    [16].H. Martinez. (1988) A flexible multiple sequence alignment program. Nucleic Acids Res., 16:1683-1691.
    [17].W. Taylor. (1987) Multiple sequence alignment by a pairwise algorithm. Comput. Appl. Biosci., 3:81-87.
    [18].C. Lawrence, S. Altschul, M. Boguski, J. Liu, A. Neuwald, and J. Wooton. (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 262:208-214.
    [19].M. Isokawa, M. Wayama, and T. Shimizu. (1996) Multiple sequence alignment using a genetic algorithm. Genome Informatics, 7:176-177.
    [20].C. Notredame and Desmond G. Higgins. (1996) SAGA: sequence alignment by genetic algorithm. Nucleic Acids Res., 24(8):1515-1524.
    [21].M. Wayama, K. Takahashi and T. Shimizu. (1995) An approach to amino acid sequence alignment using a genetic algorithm. Genome Informatics, 6:122-123.
    [22].J. T. Horng, G.M. Lin and B. J. Liu. (2000) Applying genetic Algorithms to Multiple Sequence Alignment. In Proc. of the Genetic and Evolutionary Computation Conference, Las Vegas, Nevada, 883-890.

    QR CODE
    :::