跳到主要內容

簡易檢索 / 詳目顯示

研究生: 黃振家
Chen-Chia Huang
論文名稱: 預測訊息核醣核酸剪接中保留序列子之整合性系統
An Integrated System to Identify Conserved Sequence Elements Associated with mRNA Splicing
指導教授: 黃憲達
Hsien-Da Huang
洪炯宗
Jorng-Tzong Horng
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
畢業學年度: 92
語文別: 英文
論文頁數: 67
中文關鍵詞: 資料庫替代性剪接剪接
外文關鍵詞: alternative splicing, database, splicing
相關次數: 點閱:9下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 替代性剪接 (Alternative splicing) 是真核生物的基因表現過程中一種重要的現象,使得一個基因能有多種表現的產物,即蛋白質 (Protein),而這個現象目前正引吸引許多生物學家積極地投入研究當中。蛋白質序列、訊息核醣核酸和表現序列標幟序列提供了關於基因的替代性剪接相關有用資訊。
    本論文主要的貢獻為設計一分析平台,透過電腦運算分析的方式,可以大量地分析所有人類基因,萃取出含有替代性剪接資訊的基因,並綜合基因表現資料、功能資訊、以及跨物種基因的比較,運用統計(Statistics)和資料探勘 (Data mining)的方式,整理並找尋mRNA上的具功能的區塊(Functional sites),供生物學家驗證。
    我們定義替代性剪接(Alternative splicing)的型態 (Types),並設計一套演算法 (Algorithm),萃取替代性剪接資訊,並將這些資訊,儲存成資料庫。此外,人類基因表現資訊及基因功能資訊也可以幫助替代性剪接(Alternative splicing)的分析。我們將基因,依據tissue-specificity和 Function做分類,綜合SpliceMotif 資料庫來進行Motif 的預測。我們應用DNA Motif預測的工具於已搜集好的SpliceMotif資料庫,並找出與替代性剪接相關連的 motif,稱SpliceMotif。透過實際案例探討,我們證明本論文所研究的工具,可以發現Exonic Splicing Enhancer (ESE)於所挑選的基因中。


    Generally speaking, in eukaryotes, alternative splicing (AS) mechanism for pre-mRNA plays an important role to generate multiple isoforms. In this thesis, we propose an integrated approach to automatically identify the conserved sequences in selected exon/intron regions of a gene group. Firstly, the alternative splicing database, namely ProSplicer, is constructed in our previously research and used in the system. Secondly, several alternative splicing types such as exon skipping, alternative 5’ splicing sites, alternative 3’ splicing sites and mutually exclusive exons are derived and extracted from evidence data. Finally, for each type of alternative splicing, the flanking intronic sequences are collected and then used for motif discovery tools. Alternative splicing related conserved motif, namely SpliceMotif, are computationally detected. The tissue-specific information and gene functionalities corresponding to the selected regions are also taken into account. The main contribution of this work is to establish an integrated platform for detecting conserved sequences within exon/intron regions of particular alternative splicing type, e.g., exon skipping. After several case studies for experimenting the proposed system, the system can detect the known functional sites, which are experimentally verified, related alternative splicing modes, e.g., exonic splicing enhancer (ESE).

    Chapter 1 Introduction 1 1.1 Background 1 1.1.1 The Central Dogma 1 1.1.2 Pre-mRNA Splicing 2 1.1.3 Alternative Splicing 3 1.1.4 Splice Site 5 1.1.5 Alternative Splicing Modes 6 1.1.6 SR Protein Involving in ESE-dependent Splicing 6 1.2 Motivation 8 1.3 Biological Significance 9 1.4 The Specific Aim 10 Chapter 2 Related Works 11 2.1 Alternative Splicing Databases 11 2.2 Splicing Site Databases 17 2.3 Conserved Sequence Analysis 17 Chapter 3 Materials and Method 19 3.1 Materials 19 3.2 Overview of SpliceMotif 29 3.3 System Flow 30 3.4 Preprocessing 31 3.4.1 Logical Operations 31 3.4.2 Definitions of Alternative Splicing Modes 32 3.4.3 Filtering Features 32 3.5 Motif Discovery 35 3.5.1 Selected Regions and Flanking Regions 35 3.5.2 Motif Discovery Tools 35 3.5.3 Sequence Logos 38 3.5.4 Converting Motifs Into Profile Hidden Markov Models (HMM) 38 3.5.5 RNA Secondary Structure Prediction 39 3.6 Motif Display 39 3.7 Scanning for Motif Instances 40 Chapter 4 Results 42 4.1 Database System 42 4.2 Web Interfaces 45 4.2.1 Overview of Web Interfaces 45 4.2.2 Motif Discovery 45 4.2.3 Motif Display 52 4.2.4 Motif Search 56 4.3 Case Studies 58 4.3.1 An Alternative Spliced Gene ( BRCA1 ) 58 4.3.2 A Variety of Genes Which Has ESEs 59 4.3.3 Conserved Exon Skipping between Human and Mouse 60 4.4 Summary of Result 62 Chapter 5 Discussions 63 5.1 Limitations of SpliceMotif 63 5.2 Comparison to Other Tools 63 5.3 Future Works 64 Chapter 6 Conclusions 65 References 66

    1. Cartegni, L., S.L. Chew, and A.R. Krainer, Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat Rev Genet, 2002. 3(4): p. 285-98.
    2. Murakami, T., et al., Identification and characterization of two splice variants of human diacylglycerol kinase eta. J Biol Chem, 2003. 278(36): p. 34364-72.
    3. Huang, H.D., et al., ProSplicer: a database of putative alternative splicing information derived from protein, mRNA and expressed sequence tag sequence data. Genome Biol, 2003. 4(4): p. R29.
    4. Altschul, S.F., et al., Basic local alignment search tool. J Mol Biol, 1990. 215(3): p. 403-10.
    5. Florea, L., et al., A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res, 1998. 8(9): p. 967-74.
    6. Thanaraj, T.A., et al., ASD: the Alternative Splicing Database. Nucleic Acids Res, 2004. 32(1): p. D64-9.
    7. Lee, C., et al., ASAP: the Alternative Splicing Annotation Project. Nucleic Acids Res, 2003. 31(1): p. 101-5.
    8. Modrek, B., et al., Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Res, 2001. 29(13): p. 2850-9.
    9. Dralyuk, I., et al., ASDB: database of alternatively spliced genes. Nucleic Acids Res, 2000. 28(1): p. 296-7.
    10. Gelfand, M.S., et al., ASDB: database of alternatively spliced genes. Nucleic Acids Res, 1999. 27(1): p. 301-2.
    11. Croft, L., et al., ISIS, the intron information system, reveals the high frequency of alternative splicing in the human genome. Nat Genet, 2000. 24(4): p. 340-1.
    12. Huang, Y.H., et al., PALS db: Putative Alternative Splicing database. Nucleic Acids Res, 2002. 30(1): p. 186-90.
    13. Krause, A., et al., SYSTERS, GeneNest, SpliceNest: exploring sequence space from genome to protein. Nucleic Acids Res, 2002. 30(1): p. 299-300.
    14. Burset, M., I.A. Seledtsov, and V.V. Solovyev, SpliceDB: database of canonical and non-canonical mammalian splice sites. Nucleic Acids Res, 2001. 29(1): p. 255-9.
    15. Miriami, E., H. Margalit, and R. Sperling, Conserved sequence elements associated with exon skipping. Nucleic Acids Res, 2003. 31(7): p. 1974-83.
    16. Cartegni, L., et al., ESEfinder: A web resource to identify exonic splicing enhancers. Nucleic Acids Res, 2003. 31(13): p. 3568-71.
    17. Birney, E., et al., Ensembl 2004. Nucleic Acids Res, 2004. 32 Database issue: p. D468-70.
    18. Wheeler, D.L., et al., Database resources of the National Center for Biotechnology Information: update. Nucleic Acids Res, 2004. 32 Database issue: p. D35-40.
    19. Apweiler, R., et al., UniProt: the Universal Protein knowledgebase. Nucleic Acids Res, 2004. 32 Database issue: p. D115-9.
    20. Mulder, N.J., et al., The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res, 2003. 31(1): p. 315-8.
    21. Harris, M.A., et al., The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res, 2004. 32 Database issue: p. D258-61.
    22. Boeckmann, B., et al., The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res, 2003. 31(1): p. 365-70.
    23. Lawrence, C.E., et al., Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 1993. 262(5131): p. 208-14.
    24. Bailey, T.L. and C. Elkan, Fitting a mixture model by expectation maximization to discover motifs in biopolymers, in Proc Int Conf Intell Syst Mol Biol. 1994. p. 28-36.
    25. Hughes, J.D., et al., Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae, in J Mol Biol. 2000. p. 1205-14.
    26. Eddy, S.R., Profile hidden Markov models. Bioinformatics, 1998. 14(9): p. 755-63.
    27. Mathews, D.H., et al., Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol, 1999. 288(5): p. 911-40.
    28. Zuker, M., Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res, 2003. 31(13): p. 3406-15.
    29. Zuker, M. and A.B. Jacobson, Using reliability information to annotate RNA secondary structures. Rna, 1998. 4(6): p. 669-79.
    30. Crooks, G.E., et al., WebLogo: A Sequence Logo Generator. Genome Res, 2004. 14(6): p. 1188-90.
    31. Schneider, T.D. and R.M. Stephens, Sequence logos: a new way to display consensus sequences. Nucleic Acids Res, 1990. 18(20): p. 6097-100.
    32. Liu, S. and R.B. Altman, Large scale study of protein domain distribution in the context of alternative splicing. Nucleic Acids Res, 2003. 31(16): p. 4828-35.
    33. Orban, T.I. and E. Olah, Emerging roles of BRCA1 alternative splicing. Mol Pathol, 2003. 56(4): p. 191-7.
    34. Sorek, R. and G. Ast, Intronic sequences flanking alternatively spliced exons are conserved between human and mouse. Genome Res, 2003. 13(7): p. 1631-7.

    QR CODE
    :::