跳到主要內容

簡易檢索 / 詳目顯示

研究生: 呂理維
Li-Wei Lu
論文名稱: 重複序列資料庫效能之改進與基因資訊之整合應用
Performance Improvements on RSDB and Integration of Repetitive Elements with Genes
指導教授: 洪炯宗
Jorng-Tzong Horng
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
畢業學年度: 89
語文別: 中文
論文頁數: 90
外文關鍵詞: RSDB, repetitive element, repeat
相關次數: 點閱:10下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 重複序列在基因體序列裡佔了相當大的比例,生物學家已從重複序列中找出大量的調控機制,藉由分析重複序列,可以進一步了解染色體結構的組成與基因和物種演化之間的關係。重複序列資料庫儲存了上億筆重複序列的資料,包含 direct, bi-directional, palindromic, interspersed 以及 tandem 重複序列。藉著以索引為組織的表格、鍵值壓縮、管線式的資料載入、資料倉儲、快取處理、以及 suffix arrays 等技術的使用,重複序列資料庫可以更有效率地存取如此大量的資料。此外,重複序列資料庫提供了對於所有重複序列的統計資料,並且整合基因的資料,以期能幫助生物學家發現更多更重要的訊息。


    ContentsI List of FiguresIII List of TablesVI Chapter 1 Introduction1 1.1 Background1 1.2 Motivation2 1.3 Goal and Purpose3 1.4 Related Work3 1.5 Organization of This Thesis5 Chapter 2 System Architecture7 Chapter 3 Improvements on RSDB10 3.1 Changes to Columns’ Data Type10 3.2 Usage of Index-Organized Table11 3.3 Key Compression12 3.4 Changes to the Order of Table Columns14 3.5 Data Load15 3.6 Partitioning Data19 3.7 The Pipelined Flow of Data Load23 3.8 Data Warehousing in RSDB25 3.9 Manual Cache Processing27 3.10 Intelligent Auxiliary Query Processor28 3.11 Queries on Suffix Arrays of Organisms29 3.12 Delta Daemon and Delta Hash Table32 3.13 Relationship between Repeats and Genes33 3.14 Integration with Other Biological Databases34 Chapter 4 Performance Evaluation36 4.1 Tuned and Non-Tuned Queries36 4.2 Query Performance on Ordinary Tables and IOTs40 4.3 Comparison among IOT, Ordinary Table, and Summary Table43 4.4 Comparison between CLOB and VARCHAR245 Chapter 5 Statistics about RSDB46 5.1 Cross Reference49 Chapter 6 Tools & Function on RSDB V252 6.1 Search By Feature52 6.2 Search By Range55 6.3 Search By Repeat Pattern57 6.4 Search By Non-Repeat Pattern58 6.5 Search By Accession Number or Sequence ID61 6.6 Search By Tandem Repeats64 6.7 Delta Daemon and Tool64 6.8 Statistics for RSDB65 6.9 Personal Query History65 Chapter 7 Discussion and Conclusion67 7.1 Discussion67 7.2 Conclusion67 7.3 Future Work68 References70 Appendix73

    [1].Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403-410.
    [2].Benson,D.A., Ilene Karsch-Mizrachi, Lipman,D.J., Ostell,J., Rapp,B.A. and Wheeler,D.L. (2000) GenBank. Nucleic Acids Research, 28, 15-18.
    [3].Biaudet,V., Samson,F., and Bessieres,P. (1997) Micado--a network-oriented database for microbial genomes. Comput. Applic. Biosci., 13, 431-438.
    [4].Cheang,I.K., Choi,Y.B. and Tang A. (1994) Overview of the Structures of Heterogeneous Genome Databases. Proceedings of the 27th Hawaii International Conference on System Sciences, Biotechnology Computing, 5, 15 —24.
    [5].Courteau,J. (1991) Genome Databases. Science, 254, 201-207.
    [6].Elmasri,R. and Navathe,S.B. (1994) Fundamentals of Database Systems Second Edition. Addison-Wesley Publishing Company, Menlo Park, CA.
    [7].Etzold,T., Ulyanov,A. and Argos,P. (1996) SRS: Information Retrieval System for Molecular Biology Data Banks. Methods Enzymol., 266, 114-128.
    [8].Gupta,H. (1997) Selection of Views to Materialized in a Data Warehouse. Proceedings of the 23rd VLDB Conference, Athens, Greece, 156-165.
    [9].Gusfield,D. (1997) Algorithms on Strings, Trees, and Sequences. Cambridge University Press.
    [10].Harger,C. et al. (1998) The Genome Sequence DataBase (GSDB): improving data quality and data access. Nucleic Acids Research, 26, 21-26.
    [11].Horng,J.T., Lin,J.H. and Kao,C.Y. (2001) RSDB — A Database of Repetitive Elements in Complete Genomes. Proceedings of the Atlantic Symposium on Computational Biology and Genome Information Systems & Technology, Burham, NC, USA, 220-223.
    [12].Letovskey,S.I., Cottingham,R.W., Porter,C.J. and Peter W.D. Li. (1998) GDB: the Human Genome Database. Nucleic Acids Research, 26, 94-99.
    [13].Li,W.H., Gu,Z., Wang,H. and Nekrutenko,A. (2001) Evolutionary analyses of the human genome. Nature, 409, 847-849.
    [14].Ruitberg,C.M., Reeder,D.J. and Butler,J.M. (2001) STRBase: a short tandem repeat DNA database for the human identity test community. Nucleic Acids Research, 29, 320-322.
    [15].Sargent,R., Fuhrman,D., Critchlow,T., Sera,T.D., Mecklenburg,R., Lindstrom,G., Schuler,G.D., Epstein,J.A., Ohkawa,H. and Kans,J.A. (1996) Entrez: molecular biology database and retrieval system. Methods Enzymol., 266, 141-162.
    [16].Stein,L.D. and Thierry-Mieg,J. (1999) AceDB: a genome database management system. Computing in Science & Engineering, 1-3, 44 —52.
    [17].Stoesser,G., Baker,W., Alexandra van den Broek, Camon,E., Maria Garcia-Pastor, Kanz,C., Kulikova,T., Lombard,V., Lopez,R., Parkinson,H., Redaschi,N., Sterk,P., Stoehr,P. and Mary Ann Tuli. (2001) The EMBL nucleotide sequence database. Nucleic Acids Research, 29, 17-21.
    [18].Tateno,Y., Miyazaki,S., Ota,M., Sugawara,H. and Gojobori,T. (2000) DNA Data Bank of Japan (DDBJ) in collaboration with mass sequencing teams. Nucleic Acids Research, 28, 24-26.
    [19].Wall,L., Christiansen,T. and Schwartz,R.L. (1996) Programming Perl, Second Edition. O’Reilly & Associates, Inc.
    [20].Widom,J. (1995) Research Problems in Data Warehousing. Proc. of 4th Int’l Conference on Information and Knowledge Management (CIKM), 25-30.

    QR CODE
    :::