跳到主要內容

簡易檢索 / 詳目顯示

研究生: 曾世維
Shih-Wei Tseng
論文名稱: 以隱藏馬可夫模型預測蛋白質序列中之磷酸化位置
Prediction of the Phosphorylation Sites in Protein Sequences Using Profile Hidden Markov Model
指導教授: 洪炯宗
Jorng-Tzong Horng
黃憲達
Hsien-Da Huang
口試委員:
學位類別: 碩士
Master
系所名稱: 生醫理工學院 - 生命科學系
Department of Life Science
畢業學年度: 92
語文別: 英文
論文頁數: 58
中文關鍵詞: 磷酸化隱藏馬可夫模型
外文關鍵詞: Phosphorylation, Profile Hidden Markov Model
相關次數: 點閱:14下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 對於轉譯後修飾作用來說,磷酸化反應是一種非常重要的機制。磷酸化反應對於許多的細胞生理作用都有非常大的影響,包括代謝作用、訊息傳遞、細胞分化、以及細胞膜上的物質運輸等等。同時磷酸化反應也被證實與某些疾病有相當大的關係,包括癌症以及阿茲海默氏症等。磷酸化是由於激脢的催化所形成。在不同的激脢催化下,激脢所能辨識的受質結合區也有所不同。在本論文目的是建立電腦分析模型,預測出蛋白質序列中會被磷酸化的胺基酸。本研究運用『隱藏馬可夫模型 (Hidden Markov Model)』來建立模型,主要是依據不同屬性的激脢以及序列組成不同的受質所建立形成的,並將模型實作成預測系統,稱為KinasePhos。使用者在輸入未知磷酸化的蛋白質序列之後,系統會將預測出以何種激脢所催化以及所催化的磷酸化位置。經過與在此之前所提出的研究方法比較,我們有較佳的預測效能。


    The phosphorylation of proteins, which is an important mechanism in post-translational modification, affects essentially cellular process such as metabolism, cell signaling, differentiation and membrane transportation. Phosphorylation is performed by protein kinases. The aim here is to computationally predict phosphorylation sites within given protein sequences. The known phosphorylation sites are categorized by substrate sequences and their corresponding protein kinase classes. Profile Hidden Markov Model (HMM) is applied for learning to each group of sequences surrounding to the phosphorylation residues. A predictive tool of protein phosphorylation sites, namely KinasePhos, is implemented to allow users submit protein sequences for prediction of phosphorylation sites. By comparing to other approaches previously developed, our method has higher accuracy and provides not only the location of the phosphorylation sites, but also the corresponding catalytic protein kinases.

    Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 1 1.3 The Specific Aims 2 Chapter 2 Related Works 3 2.1 PhosphoBase 3 2.2 Swiss-Prot 3 2.3 NetPhos 4 2.4 Protein Kinase Specificity Determinants 4 Chapter 3 Methods 5 3.1 Materials 5 3.2 System Flow 8 3.3 Data Preprocessing 10 3.3.1 Data Set Construction 12 3.3.2 Sequence Logos of Sequences Surrounding the Phosphorylated Residues 13 3.3.3 The Statistics of Protein Kinase Types Involved in Phosphorylation 15 3.3.4 Sequence Logos of Sequences Surrounding Substrate Catalyzed by Kinase Types 17 3.4 Maximal Dependence Decomposition (MDD) 21 3.5 Profile Hidden Markov Model (HMM) 23 3.5.1 Building Predictive Models 24 3.5.2 Expectation Values and HMMER Bit Scores 25 3.6 Cross-validation 25 3.7 GOR IV 26 Chapter 4 Results 28 4.1 The Secondary Structure of Phosphorylated Protein 28 4.2 Predictive Models Built Using Profile HMM 29 4.2.1 Claim One: The accuracy of model with kinase information is better than it without kinase information 29 4.2.2 Claim Two: The model built from MDD applied dataset performs better than built from original dataset 31 4.2.3 Claim Three: The model built from combined dataset performs better than from individual dataset 32 4.2.4 Claim Four: The model built from MDD applied dataset performs better 34 4.2.5 The Selection of the Models 36 4.3 Different Search Method and Cross-validation 38 4.3.1 By E-value and HMMER Bit Scores 38 4.3.2 k-fold Cross-validation and Leave-one-out Cross-validation 44 4.4 The Performance of One Kinase Compared to Others 48 4.5 The Web Interface 50 Chapter 5 Discussion 53 5.1 Comparison 54 5.2 Future Works 55 Chapter 6 Conclusions 56 References 57

    Bairoch, A. and R. Apweiler. 1998. The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1998. Nucleic Acids Res 26: 38-42.
    Berry, E.A., A.R. Dalby, and Z.R. Yang. 2004. Reduced bio basis function neural network for identification of protein phosphorylation sites: comparison with pattern recognition algorithms. Comput Biol Chem 28: 75-85.
    Blom, N., S. Gammeltoft, and S. Brunak. 1999. Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J Mol Biol 294: 1351-1362.
    Blom, N., A. Kreegipuu, and S. Brunak. 1998. PhosphoBase: a database of phosphorylation sites. Nucleic Acids Res 26: 382-386.
    Boeckmann, B., A. Bairoch, R. Apweiler, M.C. Blatter, A. Estreicher, E. Gasteiger, M.J. Martin, K. Michoud, C. O''Donovan, I. Phan, S. Pilbout, and M. Schneider. 2003. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 31: 365-370.
    Burge, C. and S. Karlin. 1997. Prediction of complete gene structures in human genomic DNA. J Mol Biol 268: 78-94.
    Crooks, G.E., G. Hon, J.M. Chandonia, and S.E. Brenner. 2004. WebLogo: A Sequence Logo Generator. Genome Res 14: 1188-1190.
    Eddy, S.R. 1998. Profile hidden Markov models. Bioinformatics 14: 755-763.
    Garnier, J., J.F. Gibrat, and B. Robson. 1996. GOR method for predicting protein secondary structure from amino acid sequence. Methods Enzymol 266: 540-553.
    Gibrat, J.F., J. Garnier, and B. Robson. 1987. Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. J Mol Biol 198: 425-443.
    Iakoucheva, L.M., P. Radivojac, C.J. Brown, T.R. O''Connor, J.G. Sikes, Z. Obradovic, and A.K. Dunker. 2004. The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res 32: 1037-1049.
    Kreegipuu, A., N. Blom, S. Brunak, and J. Jarv. 1998. Statistical analysis of protein kinase specificity determinants. FEBS Lett 430: 45-50.
    Lindberg, R.A., A.M. Quinn, and T. Hunter. 1992. Dual-specificity protein kinases: will any hydroxyl do? Trends Biochem Sci 17: 114-119.
    Schneider, T.D. and R.M. Stephens. 1990. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res 18: 6097-6100.

    QR CODE
    :::