經由潛在語義的線索從蛋白質交互作用網路進行蛋白質功能的預測

簡易檢索 / 詳目顯示

回結果列表

研究生：	林冠宏 Guan-Hong Lin
論文名稱：	經由潛在語義的線索從蛋白質交互作用網路進行蛋白質功能的預測 Protein Function Prediction from Protein Interaction Networks by Latent Semantic Indexing
指導教授：	何錦文 Chin-Wen Ho
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering
畢業學年度：	93
語文別：	英文
論文頁數：	44
中文關鍵詞：	蛋白質功能預測、蛋白質交互作用網路、潛在語義的線索
外文關鍵詞：	protein interaction network, protein function prediction, latent semantic indexing
相關次數：	點閱：9 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

了解各種蛋白質在細胞中的作用一直是生物學中一項很重要的課題，近年來，由於新的實驗技術相繼問世，有些實驗技術可以在單一實驗中產生大量實驗結果，例如雙雜合系統可以在一次實驗中產生大量蛋白質交互作用的資料，這些資料通常都會隱含著某些具有生物意義的訊息。
在這篇論文中，我們提出了一個基於潛在語義的線索的方法，這個方法可以用來萃取隱藏在蛋白質交互作用網路中具有生物意義的訊息。在資訊擷取的領域中，一字多義與多字一義一直是導致擷取結果不正確的主因，而潛在語義的線索具有解決這些問題的能力。在蛋白質交互作用網路中，經常會存在一些錯誤或者是不明確的訊息，我們利用潛在語義的線索來過濾這一些訊息。我們的結果顯示出這個方法確實能幫我們過濾這些訊息並且擷取出具有高度功能相關的蛋白質。

Determining protein function is one of the most important tasks in the post-genomic era. Large-scale biological experiment results such as protein interaction networks can be obtained now, and these data often involve the information about protein functions.
In this thesis, we present an approach based on Latent Semantic Indexing (LSI) to extract this information from protein interaction networks. LSI is an information retrieval technique that can solve the synonymy and polysemy problems. Because biologists believe that there are a lot of false positives and false negatives in protein interaction networks, we use the properties of LSI to filter out the wrong and confused information retrieved from these networks. Our results show that our approach can find out the functional related proteins in cells.

TABLE OF CONTENTS	I
LIST OF FIGURES	II
LIST OF TABLES	III
INTRODUCTION	1
RELATED WORK	5
1	Methods based on Sequence Similarity	5
2	Methods based on Biological Experiment Data	7
3	Comparison between These Directions	10
MATERIALS AND METHODS	12
1	Introduction of Latent Semantic Indexing	13
1.1	Term-Document Matrix	13
1.2	Truncated Singular Value Decomposition	14
1.3	Similarity Definition	15
1.4	Basic Properties of LSI	16
2	Latent Semantic Indexing of Protein Interaction Network	17
2.1	Modeling	17
2.2	Similarity and Clustering	18
EXPERIMENTS, RESULTS, AND DISCUSSIONS	20
1	Data Handling	20
2	Validations of Our Method	21
2.1	Experiment processes	21
2.2	Similarity Test Results	22
2.3	Clustering Test Results	28
3	Fault Tolerance Experiments	33
3.1	Experiment Process	33
3.2	Fault Tolerance Results	33
4	Comparison with Other Methods	36
CONCLUSION AND FUTURE WORK	39
REFERENCE	40

                                

[1] Hartwell, L. H., Hopfield, J. J., Leibler, S., and Murray, A. W. (1999) From molecular to molecular cell biology. Nature, 402, C47-52.
[2] Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N., and Barabási, A. –L. (2002) Hierarchical Organization of Modularity in Metabolic Networks. Science, 297, 1551-1555.
[3] Uetz, P., Giot, L., Cagney, G., Mansfield, T. A., Judson, R. S., Knight, J. R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., Qureshi-Emili, A., Li, Y., Godwin, B., Conover, D., Kalbfleisch, T., Vijayadamodar, G., Yang, M., Johnston, M., Fields, S., and Rothberg, J. M. (2000) A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature, 403, 623-627.
[4] Tong, A. H. Y., Evangelista, M., Parsons, A. B., Xu, H., Bader, G. D., Pagé, Robinson, M., Raghibizadeh, S., Hogue, C. W. V., Bussey, H., Andrews, B., Tyers, M., and Boone, C. (2001) Systematic Genetic Analysis with Ordered Arrays of Yeast Deletion Mutants. Science, 294, 2364-2368.
[5] Tong, A. H. Y., Lesage, G., Bader, G. D., Ding, H., Xu, H., Xin, X., Young, J., Berriz, G. F., Brost, R. L., Chang, M., Chen, Y. Q., Cheng, X., Chua, G., Friesen, H., Goldberg, D. S., Haynes, J., Humphries, C., He, G., Hussein, S., Ke, L., Krogan, N., Li, Z., Levinson, J. N., Lu, H., Mébard, P., Munyana C., Parsons, A. B., Ryan, O., Tonikian, R., Roberts, T., Sdicu, A. M., Shapiro, J., Sheikh, B., Suter, B., Wong, S. L., Zhang L. V., Zhu, H., Burd, C. G., Munro, S., Sander, C., Rine, J., Greenblatt, J., Peter, M., Bretscher, A., Bell, G., Roth, F. P., Brown G. W., Andrews, B., Bussey, H., and Boone, C. (2004) Global Mapping of the Yeast Genetic Interaction Network. Science, 303, 808-813.
[6] Ge, H., Liu, Z., Church, G. M., and Vidal, M. (2001) Correlation between transcriptome and interactome mapping data from Saccharomyces cerivisia. Nature Genetics, 29, 482-486.
[7] Deerwester, S., Dumais, S. T., and Harshamn, R. (1990) Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 41, 391-407.
[8] Enright, A. J., Dongen, S. V., and Ouzounis, C. A. (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Research, 30, 7, 1575-1584.
[9] Dongen, S. V. (2000) Graph clustering by flow simulation. PhD Thesis, University of Utrecht, The Netherlands.
[10] Altschul., S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 25, 17, 3389-3402.
[11] Apweiler, R., Attwood, T. K., Bairoch, A., Bateman, A., Birney, E., Biswas, M., Bucher, P., Cerutti, L., Corpet, F., Croning, M. D. R., Durbin, R., Falquet, L., Fleischmann, W., Gouzy, J., Hermjakob, H., Hulo, N., Jonassen, I., Kahn, D., Kanapin, A., Karavidopoulou, Y., Lopez, R., Marx, B., Mulder, N. J., Oinn, T. M., Pagni, M., Servant, F., Sigrist, C. J. A., and Zdobnov, E. M. (2001) The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Research, 29, 1, 37-40.
[12] Conte, L. L., Ailey, B., Hubbard, T. J. P., Brenner, S. E., Murzin, A. G., and Chothia, C. (2000) SCOP: a Structural Classification of Proteins database. Nucleic Acids Research, 28, 1, 257-259.
[13] Tatusov, R. L., Koonin, E. V., and Lipman, D. J. (1997) A Genomic Perspective on Protein Families. Science, 278, 631-637.
[14] Tatusov, R. L., Galperin, M. Y., Natale, D. A., and Koonin, E. V. (2000) The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Research, 28, 1, 33-36.
[15] Tatusov, R. L., Natale, D. A., Garkavtsev, L. V., Tatusova, T. A., Shankavaram, U. T., Rao, B. S., Kiryutin, B., Galperin, M. Y., Fedorova, N. D., and Koonin, E. V. (2001) The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Research, 29, 1, 22-28.
[16] Heger, A. and Holm, L. (2000) Towards a covering set of protein family profiles. Progress in Biophysics & Molecular Biology, 73, 321-337.
[17] Schwikowski, B., Uetz, P., and Fileds S. (2000) A network of protein-protein interactions in yeast. Nature Biotechnology, 18, 1257-1261.
[18] Vazquez, A., Flammini, A., Maritan, A., and Vespignani, A. (2003) Global protein function prediction from protein-protein interaction networks. Nature Biotechnology, 21, 6, 697-700.
[19] Kirkpatrck, S., Gelatt, C. D., and Vecchi, M. P. (1983) Optimization by simulated annealing. Science, 220, 671-680.
[20] Mewes, H. W., Frishman, D., Güldener, U., Mannhaupt, G., Mayer, K., Mokrejs, M., Morgenstren, B., Münsterkötter, M., Rudd, S., and Weil, B. (2002) MIPS: a database for genomes and protein sequences. Nucleic Acids Research, 30, 1, 31-34.
[21] Samanta, M. P. and Liang, S. (2003) Predicting protein functions from redundancies in large-scale protein interaction networks. PNAS, 100, 22, 12579-12583.
[22] von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S. G., Fields, S., and Bork, P. (2002) Comparative assessment of large-scale data sets of protein-protein interactions. Nature, 417, 399-403.
[23] Pereira-Leal, J. B., Enright, A. J., and Ouzounis, C. A. (2004) Detection of Functional Modules From Protein Interaction Networks. PROTEINS: Structure, Function, and Bioinformatics, 54, 49-57.
[24] Kanehisa, M., Goto, S., Kawashima, S., and Nakaya, A. (2002) The KEGG databases at GenomeNet. Nucleic Acids Research, 30, 1, 42-46.
[25] Andrade, M. A., Brown, N. P., Leroy, C., Hoersch, S., Daruvar, A. D., Reich, C., Franchini, A., Tamanes, J., Valencia, A., Ouzounis, C., and Sander, C. (1999) Automated genome sequence analysis and annotation. Bioinformatics, 15, 5, 391-412.
[26] Kelley, B. P., Sharan, R., Karp, R. M., Sittler, T., Root, D. E., Stockwell, B. R., and Ideker, T. (2003) Conserved pathways within bacteria and yeast as revealed by global protein network alignment. PNAS, 100, 20, 11394-11399.
[27] Sharan, R., Ideker, T., Kelley, B. P., Shamir, R., and Karp, R. M. (2004) Identification of Protein Complexes by Comparative Analysis of Yeast and Bacterial Protein Interaction Data. Proceedings of the Eighth Annual International Conference on Computational Molecular Biology, 282-289.
[28] The TIGR database. http://www.tigr.org.
[29] Segal, E., Wang, H., and Koller, D. (2003) Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics, 19, 1, i264-i272.
[30] Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1-39.
[31] Berry, M., Do, T., O’Brien, G., Krishna, V., and Varadhan S. (1993) SVDPACKC (Version 1.0) User’s Guide.
[32] Dowling, J. (2002) Information Retrieval using Latent Semantic Indexing and a Semi-Discrete Matrix Decomposition. Thesis.
[33] Kolda, T.G. and O’Leary, D. P. (1998) A Semidiscrete Matrix Decomposition for Latent Semantic Indexing in Information Retrieval. ACM Transactions on Information Systems, 16, 4, 322-346.
[34] Papadimitriou, C. H., Raghavan, P., and Tamaki, H. (1998) Latent Semantic Indexing: A Probabilistic Analysis. Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, 159-168.
[35] Rosario B. (2000) Latent Semantic Indexing: An overview. INFOSYS 240.
[36] Salwinski, L., Miller, C. S., Smith, A. J., Pettit, F. K., Bowie, J. U., and Eisenberg, D. (2004) The Database of Interacting Proteins: 2004 update. Nucleic Acids Research, 32, D449-D451.
[37] Xenarios, I., Rice, D. W., Salwinski, L., Baron, M. K., Marcotte, E. M., and Eisenberg, D. (2000) DIP: the Database of Interacting Proteins. Nucleic Acids Research, 28, 1, 289-291.
[38] Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Traver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., and Sherlock, G. (2000) Gene Ontology: tool for the unification of biology. Nature Genetics, 25, 25-29.
[39] Cherry, J. M., Adler, C., Ball, C., Chervitz, S. A., Dwight, S. S., Hester, E. T., Jia, Y., Juvik, G., Roe, T. Y., Schroeder, M., Weng, S., and Botstein, D. (1998) SGD: Saccharomyces Genome Database. Nucleic Acids Research, 26, 1, 73-79.
[40] Gavin, A. C., Bösche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., Schultz, J., Rick, J. M., Michon, A. M., Cruciat, C. M., Remor, M. Höfert, C., Schelder, M., Brajenovic, M., Ruffner, H., Merino, A., Klein, K., Hudak, M., Dickson, D., Rudi, T., Gnau, V., Bauch,A., Bastuck, S., Huhse, B., Leutwein, C., Heurtier, M. A., Copley, R. R., Edelmann, A., Querfurth, E., Rybin, V., Drewes, G., Raida, M., Bouwmeester, T., Bork, P., Seraphin, B., Kuster, B., Neubauer, G., and Superti-Furga, G. (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature, 415, 141-147.
[41] Ho, Y., Gruhler, A., Heilbut, A., Bader, G. D., Moore, L., Adams, S. L., Millar, A., Taylor, P., Bennett, K., Boutilier, K., Yand, L., Wolting, C., Donaldson, I., Schandorff, S., Shewnarane, J., Vo, M., Taggart, J., Goudreault, M., Muskat, B., Alfarano, C., Dewar, D., Lin, Z., Michalickova, K., Willems, A. R., Sassi, H., Nielsen, P. A., Rasmussen, K. J., Andersen, J. R., Johansen, L. E., Hansen, L. H., Jespersen, H., Podtelejnikov, A., Nielsen, E., Crawford, J., Poulsen, V., Sǿrensen, B. D., Matthiesen, J., Hendrickson, R. C., Gleeson, F., Pawson, T., Moran, M. F., Durocher, D., Mann, M., Hogue, C. W. V., Daniel, F., and Tyers, M. (2002) Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature, 415, 180-183.
[42] Krogan, N. J., Peng, W. T., Cagney, G., Robinson, M. D., Haw, R., Zhong, G., Gau, X., Zhang, X., Canadien, V., Richards, D. P., Beattie, B. K., Lalev, A., Zhang, W., Davierwala, A. P., Mnaimneh, S., Starostine, A., Tikuisis, A. P., Grigull, J., Datta, N., Bray, J. E., Hughes, T. R., Emili, A., and Greenblatt, J. F. (2004) High-Definition Macromolecular Composition of Yeast RNA-Processing Complexes. Molecular Cell, 13, 225-239.

簡易檢索 / 詳目顯示

相關論文