| 研究生: |
林冠宏 Guan-Hong Lin |
|---|---|
| 論文名稱: |
經由潛在語義的線索從蛋白質交互作用網路進行蛋白質功能的預測 Protein Function Prediction from Protein Interaction Networks by Latent Semantic Indexing |
| 指導教授: |
何錦文
Chin-Wen Ho |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 畢業學年度: | 93 |
| 語文別: | 英文 |
| 論文頁數: | 44 |
| 中文關鍵詞: | 蛋白質功能預測 、蛋白質交互作用網路 、潛在語義的線索 |
| 外文關鍵詞: | protein interaction network, protein function prediction, latent semantic indexing |
| 相關次數: | 點閱:9 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
了解各種蛋白質在細胞中的作用一直是生物學中一項很重要的課題,近年來,由於新的實驗技術相繼問世,有些實驗技術可以在單一實驗中產生大量實驗結果,例如雙雜合系統可以在一次實驗中產生大量蛋白質交互作用的資料,這些資料通常都會隱含著某些具有生物意義的訊息。
在這篇論文中,我們提出了一個基於潛在語義的線索的方法,這個方法可以用來萃取隱藏在蛋白質交互作用網路中具有生物意義的訊息。在資訊擷取的領域中,一字多義與多字一義一直是導致擷取結果不正確的主因,而潛在語義的線索具有解決這些問題的能力。在蛋白質交互作用網路中,經常會存在一些錯誤或者是不明確的訊息,我們利用潛在語義的線索來過濾這一些訊息。我們的結果顯示出這個方法確實能幫我們過濾這些訊息並且擷取出具有高度功能相關的蛋白質。
Determining protein function is one of the most important tasks in the post-genomic era. Large-scale biological experiment results such as protein interaction networks can be obtained now, and these data often involve the information about protein functions.
In this thesis, we present an approach based on Latent Semantic Indexing (LSI) to extract this information from protein interaction networks. LSI is an information retrieval technique that can solve the synonymy and polysemy problems. Because biologists believe that there are a lot of false positives and false negatives in protein interaction networks, we use the properties of LSI to filter out the wrong and confused information retrieved from these networks. Our results show that our approach can find out the functional related proteins in cells.
[1] Hartwell, L. H., Hopfield, J. J., Leibler, S., and Murray, A. W. (1999) From molecular to molecular cell biology. Nature, 402, C47-52.
[2] Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N., and Barabási, A. –L. (2002) Hierarchical Organization of Modularity in Metabolic Networks. Science, 297, 1551-1555.
[3] Uetz, P., Giot, L., Cagney, G., Mansfield, T. A., Judson, R. S., Knight, J. R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., Qureshi-Emili, A., Li, Y., Godwin, B., Conover, D., Kalbfleisch, T., Vijayadamodar, G., Yang, M., Johnston, M., Fields, S., and Rothberg, J. M. (2000) A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature, 403, 623-627.
[4] Tong, A. H. Y., Evangelista, M., Parsons, A. B., Xu, H., Bader, G. D., Pagé, Robinson, M., Raghibizadeh, S., Hogue, C. W. V., Bussey, H., Andrews, B., Tyers, M., and Boone, C. (2001) Systematic Genetic Analysis with Ordered Arrays of Yeast Deletion Mutants. Science, 294, 2364-2368.
[5] Tong, A. H. Y., Lesage, G., Bader, G. D., Ding, H., Xu, H., Xin, X., Young, J., Berriz, G. F., Brost, R. L., Chang, M., Chen, Y. Q., Cheng, X., Chua, G., Friesen, H., Goldberg, D. S., Haynes, J., Humphries, C., He, G., Hussein, S., Ke, L., Krogan, N., Li, Z., Levinson, J. N., Lu, H., Mébard, P., Munyana C., Parsons, A. B., Ryan, O., Tonikian, R., Roberts, T., Sdicu, A. M., Shapiro, J., Sheikh, B., Suter, B., Wong, S. L., Zhang L. V., Zhu, H., Burd, C. G., Munro, S., Sander, C., Rine, J., Greenblatt, J., Peter, M., Bretscher, A., Bell, G., Roth, F. P., Brown G. W., Andrews, B., Bussey, H., and Boone, C. (2004) Global Mapping of the Yeast Genetic Interaction Network. Science, 303, 808-813.
[6] Ge, H., Liu, Z., Church, G. M., and Vidal, M. (2001) Correlation between transcriptome and interactome mapping data from Saccharomyces cerivisia. Nature Genetics, 29, 482-486.
[7] Deerwester, S., Dumais, S. T., and Harshamn, R. (1990) Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 41, 391-407.
[8] Enright, A. J., Dongen, S. V., and Ouzounis, C. A. (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Research, 30, 7, 1575-1584.
[9] Dongen, S. V. (2000) Graph clustering by flow simulation. PhD Thesis, University of Utrecht, The Netherlands.
[10] Altschul., S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 25, 17, 3389-3402.
[11] Apweiler, R., Attwood, T. K., Bairoch, A., Bateman, A., Birney, E., Biswas, M., Bucher, P., Cerutti, L., Corpet, F., Croning, M. D. R., Durbin, R., Falquet, L., Fleischmann, W., Gouzy, J., Hermjakob, H., Hulo, N., Jonassen, I., Kahn, D., Kanapin, A., Karavidopoulou, Y., Lopez, R., Marx, B., Mulder, N. J., Oinn, T. M., Pagni, M., Servant, F., Sigrist, C. J. A., and Zdobnov, E. M. (2001) The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Research, 29, 1, 37-40.
[12] Conte, L. L., Ailey, B., Hubbard, T. J. P., Brenner, S. E., Murzin, A. G., and Chothia, C. (2000) SCOP: a Structural Classification of Proteins database. Nucleic Acids Research, 28, 1, 257-259.
[13] Tatusov, R. L., Koonin, E. V., and Lipman, D. J. (1997) A Genomic Perspective on Protein Families. Science, 278, 631-637.
[14] Tatusov, R. L., Galperin, M. Y., Natale, D. A., and Koonin, E. V. (2000) The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Research, 28, 1, 33-36.
[15] Tatusov, R. L., Natale, D. A., Garkavtsev, L. V., Tatusova, T. A., Shankavaram, U. T., Rao, B. S., Kiryutin, B., Galperin, M. Y., Fedorova, N. D., and Koonin, E. V. (2001) The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Research, 29, 1, 22-28.
[16] Heger, A. and Holm, L. (2000) Towards a covering set of protein family profiles. Progress in Biophysics & Molecular Biology, 73, 321-337.
[17] Schwikowski, B., Uetz, P., and Fileds S. (2000) A network of protein-protein interactions in yeast. Nature Biotechnology, 18, 1257-1261.
[18] Vazquez, A., Flammini, A., Maritan, A., and Vespignani, A. (2003) Global protein function prediction from protein-protein interaction networks. Nature Biotechnology, 21, 6, 697-700.
[19] Kirkpatrck, S., Gelatt, C. D., and Vecchi, M. P. (1983) Optimization by simulated annealing. Science, 220, 671-680.
[20] Mewes, H. W., Frishman, D., Güldener, U., Mannhaupt, G., Mayer, K., Mokrejs, M., Morgenstren, B., Münsterkötter, M., Rudd, S., and Weil, B. (2002) MIPS: a database for genomes and protein sequences. Nucleic Acids Research, 30, 1, 31-34.
[21] Samanta, M. P. and Liang, S. (2003) Predicting protein functions from redundancies in large-scale protein interaction networks. PNAS, 100, 22, 12579-12583.
[22] von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S. G., Fields, S., and Bork, P. (2002) Comparative assessment of large-scale data sets of protein-protein interactions. Nature, 417, 399-403.
[23] Pereira-Leal, J. B., Enright, A. J., and Ouzounis, C. A. (2004) Detection of Functional Modules From Protein Interaction Networks. PROTEINS: Structure, Function, and Bioinformatics, 54, 49-57.
[24] Kanehisa, M., Goto, S., Kawashima, S., and Nakaya, A. (2002) The KEGG databases at GenomeNet. Nucleic Acids Research, 30, 1, 42-46.
[25] Andrade, M. A., Brown, N. P., Leroy, C., Hoersch, S., Daruvar, A. D., Reich, C., Franchini, A., Tamanes, J., Valencia, A., Ouzounis, C., and Sander, C. (1999) Automated genome sequence analysis and annotation. Bioinformatics, 15, 5, 391-412.
[26] Kelley, B. P., Sharan, R., Karp, R. M., Sittler, T., Root, D. E., Stockwell, B. R., and Ideker, T. (2003) Conserved pathways within bacteria and yeast as revealed by global protein network alignment. PNAS, 100, 20, 11394-11399.
[27] Sharan, R., Ideker, T., Kelley, B. P., Shamir, R., and Karp, R. M. (2004) Identification of Protein Complexes by Comparative Analysis of Yeast and Bacterial Protein Interaction Data. Proceedings of the Eighth Annual International Conference on Computational Molecular Biology, 282-289.
[28] The TIGR database. http://www.tigr.org.
[29] Segal, E., Wang, H., and Koller, D. (2003) Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics, 19, 1, i264-i272.
[30] Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1-39.
[31] Berry, M., Do, T., O’Brien, G., Krishna, V., and Varadhan S. (1993) SVDPACKC (Version 1.0) User’s Guide.
[32] Dowling, J. (2002) Information Retrieval using Latent Semantic Indexing and a Semi-Discrete Matrix Decomposition. Thesis.
[33] Kolda, T.G. and O’Leary, D. P. (1998) A Semidiscrete Matrix Decomposition for Latent Semantic Indexing in Information Retrieval. ACM Transactions on Information Systems, 16, 4, 322-346.
[34] Papadimitriou, C. H., Raghavan, P., and Tamaki, H. (1998) Latent Semantic Indexing: A Probabilistic Analysis. Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, 159-168.
[35] Rosario B. (2000) Latent Semantic Indexing: An overview. INFOSYS 240.
[36] Salwinski, L., Miller, C. S., Smith, A. J., Pettit, F. K., Bowie, J. U., and Eisenberg, D. (2004) The Database of Interacting Proteins: 2004 update. Nucleic Acids Research, 32, D449-D451.
[37] Xenarios, I., Rice, D. W., Salwinski, L., Baron, M. K., Marcotte, E. M., and Eisenberg, D. (2000) DIP: the Database of Interacting Proteins. Nucleic Acids Research, 28, 1, 289-291.
[38] Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Traver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., and Sherlock, G. (2000) Gene Ontology: tool for the unification of biology. Nature Genetics, 25, 25-29.
[39] Cherry, J. M., Adler, C., Ball, C., Chervitz, S. A., Dwight, S. S., Hester, E. T., Jia, Y., Juvik, G., Roe, T. Y., Schroeder, M., Weng, S., and Botstein, D. (1998) SGD: Saccharomyces Genome Database. Nucleic Acids Research, 26, 1, 73-79.
[40] Gavin, A. C., Bösche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., Schultz, J., Rick, J. M., Michon, A. M., Cruciat, C. M., Remor, M. Höfert, C., Schelder, M., Brajenovic, M., Ruffner, H., Merino, A., Klein, K., Hudak, M., Dickson, D., Rudi, T., Gnau, V., Bauch,A., Bastuck, S., Huhse, B., Leutwein, C., Heurtier, M. A., Copley, R. R., Edelmann, A., Querfurth, E., Rybin, V., Drewes, G., Raida, M., Bouwmeester, T., Bork, P., Seraphin, B., Kuster, B., Neubauer, G., and Superti-Furga, G. (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature, 415, 141-147.
[41] Ho, Y., Gruhler, A., Heilbut, A., Bader, G. D., Moore, L., Adams, S. L., Millar, A., Taylor, P., Bennett, K., Boutilier, K., Yand, L., Wolting, C., Donaldson, I., Schandorff, S., Shewnarane, J., Vo, M., Taggart, J., Goudreault, M., Muskat, B., Alfarano, C., Dewar, D., Lin, Z., Michalickova, K., Willems, A. R., Sassi, H., Nielsen, P. A., Rasmussen, K. J., Andersen, J. R., Johansen, L. E., Hansen, L. H., Jespersen, H., Podtelejnikov, A., Nielsen, E., Crawford, J., Poulsen, V., Sǿrensen, B. D., Matthiesen, J., Hendrickson, R. C., Gleeson, F., Pawson, T., Moran, M. F., Durocher, D., Mann, M., Hogue, C. W. V., Daniel, F., and Tyers, M. (2002) Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature, 415, 180-183.
[42] Krogan, N. J., Peng, W. T., Cagney, G., Robinson, M. D., Haw, R., Zhong, G., Gau, X., Zhang, X., Canadien, V., Richards, D. P., Beattie, B. K., Lalev, A., Zhang, W., Davierwala, A. P., Mnaimneh, S., Starostine, A., Tikuisis, A. P., Grigull, J., Datta, N., Bray, J. E., Hughes, T. R., Emili, A., and Greenblatt, J. F. (2004) High-Definition Macromolecular Composition of Yeast RNA-Processing Complexes. Molecular Cell, 13, 225-239.