| 研究生: |
魏忠志 Chung-Chih Wei |
|---|---|
| 論文名稱: |
SCI/SSCI文章比對方法之研究 |
| 指導教授: |
陳彥良
Yen-Liang Chen |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理學系 Department of Information Management |
| 畢業學年度: | 93 |
| 語文別: | 中文 |
| 論文頁數: | 77 |
| 中文關鍵詞: | 文章比對 、倒傳遞類神經網路 、資料挖掘 |
| 相關次數: | 點閱:8 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
文件檢索的技術行之有年,現今已廣泛地運用在各種線上文件檢索系統中。大部份的檢索工具是依據使用者輸入的查詢字串進行全文比對,或者將查詢字串做過部份處理之後再行比對文章。目前,已有為數眾多的研究者致力於發展文章分類、相關文章比對、文件相似度衡量方法及權重計算模型,更有許多改良過的方法實際應用於檢索系統中,逐步改善檢索效果與效率。
於本論文中,我們以現行著名之SCI/SSCI期刊文獻資料庫檢索工具為對象,根據此類文章之特色額外發展文章相似度比對方法。本研究專注於該檢索工具所擁有的標題、摘要、關鍵字及引用文獻,共四項不同特色的重要屬性,並利用著名之向量空間模式和TFIDF公式,計算文章向量的相似度。由於四項屬性之權重大小將影響兩兩文章之整體相似度,我們輔以倒傳遞類神經網路技術,建立屬性權重分配與兩兩文章之間的總相似度值之關係模式。而為了驗證ANN模組之成效,以及本文提出的文章比對方法與傳統比對方法之差異,本論文實際建立真實的期刊文章資料庫,並按照文章比對流程進行研究實作。最後則設計實驗,邀請實驗受測者測試文章比對之效果。
實驗結果顯示,我們所提出的文章比對方法,相較傳統方法而言,確實能大幅改善相似度比對效果。同時,我們也驗證了ANN模組確實帶來更佳的成效。
在SCI/SSCI檢索工具中,本研究期望能在保留標準欄位查詢功能之前提下,額外增加本論文所發展之文章相似度比對方法,藉以提昇檢索工具之彈性及實用性,協助研究學者或一般使用者更有效地查詢資料庫內相關文章。
[1] Amer-Yahia, S., Botev, C. and Shanmugasundaram, J., 2004. TeXQuery: A Full-Text Search Extension to XQuery. In Proceedings International WWW Conference, New York, USA.
[2] Amer-Yahia, S., Fern´andez, M., Srivastava, D. and Xu, Y., 2003. Phrase Matching in XML. Proceedings of the 29th VLDB Conference, Berlin, Germany.
[3] Baeza-Yates, R. and Ribeiro-Neto, B., 1999. Modern Information Retrieval. New York: The ACM Press.
[4] Buckley, C., SMART, Version 7.
[5] Callan, J.P., Croft, W.B. and Harding, S.M., 1995. The INQUERY Retrieval System. In DEXA 3. International Conferrence on Database and Expert Systems Applications, pp. 83-97, Berlin: Springer Verlag.
[6] Cohen, W., June 1998. Integration of Heterogeneous databases Without Common Domains Using Queries Based on Textual Similarity. In Proceeding ACM SIGMOD, 27(2): pp. 201-212, Seattle, WA.
[7] CORDIS: Community Research & Development Information Service, http://www.cordis.lu/en/home.html.
[8] Cutting, D. and Pedersen, J., 1990. Optimizations for Dynamic Inverted Index Maintenance. The 13th International Conference on Research and Development in Information Retrieval, pp. 405-411.
[9] Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K. and Harshman, R., 1990. Indexing by Latent Semantic analysis. Journal of the American Society for Information Sciences, 41, 6, pp. 391-407.
[10] Dickson, G.W., Senn, J.A. and Chervany, N.L., May 1977. Research in Management Information Systems: The Minnesota Experiments. Management Science, vol. 23, no. 9, pp. 913-923.
[11] Doszkocs, T.E., 1983. From Research to Application: The CITE Natural Language Information Retrieval System. In Research and Development in Information Retrieval, Salton, G. and Schneider, H.J., eds. (Lecture Notes in Computer Science Series, 146) Berlin: Springer-Verlag, pp. 251-262.
[12] Dumais, S.T., 1991. Improving the retrieval of information from external sources. Behavior Research Methods, Instruments & Computers, vol. 23, no. 2, pp. 229-236.
[13] Ellman, J., 2000. Using Roget''s Thesaurus to Determine the Similarity of Texts. Ph.D. Thesis, School of Computing, Engineering and Technology, University of Sunderland, England.
[14] Fagan, J.L., March 1989. The Effectiveness of a Nonsyntactic Approach to Automatic Phrase Indexing for Document Retrieval. Journal of the American Society for Information Science (ASIS), Vol. 40, Iss. 2, pp. 115-132.
[15] Fellbaum, C., 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA.
[16] Fontaine, A., May 1995. Sub-element indexing and probabilistic retrieval in the POSTGRES database system. Technical Report CSD-95-876, University of California at Berkeley. ftp://s2k-ftp.CS.Berkeley. EDU/pub/postgres/papers/.
[17] Fox, C., 1990. A stop list for general text. SIGIR Forum 20(12), pp. 19-35.
[18] Frakes, W.B. and Fox, C.J., 2003. Strength and similarity of affix removal stemming algorithms. SIGIR Forum 37(1): pp. 26-30.
[19] Geffet, M. and Feitelson, D.G., Jun 2001. Hierarchical indexing and document matching in BoW. In first ACM/IEEE Joint Conferrence Digital Libraries, pp. 259-267.
[20] George Allan Alderman III, M.A., 2000. Information Retrieval using an adaptive resonance theory (ART)-based Neural Net. Ph.D. dissertation, Georgetown University, UMI Number: 9978116.
[21] Grossman, D.A. and Frieder O., 1998. Information Retrieval: algorithms and heuristics. Boston: Kluwer.
[22] Hammouda, K. and Kamel, M., 2004. Document Similarity Using a Phrase Indexing Graph Model. Knowledge and Information Systems, vol. 6, no. 6, pp. 710-727.
[23] ISI Web of Knowledge, Version 3.0, http://isi01.isiknowledge.com/portal.cgi.
[24] Korfhage, R.R., 1997. Information Storage and Retrieval. N.Y.: John Wiley and Sons.
[25] Kowalski, G.J. and Maybury, M.T., 2000. Information Storage and Retrieval Systems: Theory and Implementation. Kluwer International Series on Information Retrieval, Inre 8. Kluwer Academic.
[26] Lee, K.H., Choy, Y.C. and Cho, S.B., 2004. An Efficient Algorithm to Compute Differences between Structured Documents. IEEE Transactions on Knowledge and Data Engineering, 16(8): pp. 965-979.
[27] Lin, D., 1997. Using Syntactic Dependency as Local Context to Resolve Word-Sense Ambiguity. In Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics. Somerset, N.J.: Association for Computational Linguistics.
[28] Lin, S.H., Shih, C.S., Chen, M.C., Ho, J.M., Ko, M.T. and Huang, Y.M., 1998. Extracting Classification Knowledge of Internet Documents with Mining Term Associations: A Semantic Approach. SIGIR: pp. 241-249.
[29] Luhn, H.P., 1958. The automatic creation of literature abstracts. IBM Journal of Research, 2(4): pp. 159-165.
[30] Maayan, G. and Feitelson, D., 2001. Hierarchical Indexing and Document Matching in BoW. Proceedings of the first ACM/IEEE-CS joint conference on Digital Libraries, Roanoke, Virginia, pp. 259-267.
[31] Meadow, C.T., Wang, J. and Stamboulie, M., 1993. An Analysis of Zipf-Mandelbrot Language Measures and Their Application to Artificial Languages. Journal of Information Science, 19(4): pp. 247-258.
[32] Meadow, C.T., Boyce, B.R., and Kraft, D.H., 2000. Text Information Retrieval Systems. 2nd edition. San Diego: Academic Press.
[33] Michaelj, A.B., 1997. Data Mining Techniques For Marketing, sales, and Customer Support. Wiley Computer Publishing.
[34] Miller, G., Beckwith, R., Fellbaum, C., Gross, D. and Miller, K., 1990. Introduction to WordNet: An on-line lexical database. J. Lexicography 3(4): pp. 235-244.
[35] Miller, G.A., 1995. WorldNet: a lexical database for English. Communications of the ACM, 38(11): pp. 39-41.
[36] Ng, H.T. and Zelle, J., 1997. Corpus based approaches to semantic interpretation in natural language processing. AI Magazine, 18(4): pp. 25-31.
[37] Palo Alto, C.A., 1987. Dialog Information Services. DIALOG System Seminar Manual, Problem Set 3.1.1, pp. 20.
[38] Patel-Schneider, P.F., Simeon, J., 2003. The Yin/Yang Web: A unified model for XML syntax and RDF semantics. IEEE Transactions on Knowledge and Data Engineering, 15: pp. 797-812.
[39] Petrakis, E.G.M. and Tzeras K., November 2000. Similarity Searching in the CORDIS Text Database. Software Practice and Experience, Vol. 30, No. 13, pp. 1447-1464.
[40] Principe, C.J., Euliano, R.N. and Lefebvre, W.C., 2000. Neural and Adaptive Systems: Fundamentals Through Simulations. John Wiley and Sons.
[41] Quah, T.S. and Srinivasan B., 2000. Utilizing Neural Networks in Stock Pickings. Proceedings of the International Conference on Artificial Intelligence.
[42] Raeisi, R., 2005. Modeling and Verification of Digital Logic Circuit Using Neural Networks. 2005 ASEE IL/IN Sectional Conference, Session B-T2-3.
[43] Rijsbergen, C.J., 1975. Information Retrieval. Butterworth.
[44] Salton, G. and Yang, C.S., 1973. On the Specification of Term Values in Automatic Indexing. J. Documentation 29(4), pp. 351-72.
[45] Salton, G. and McGill, M.J., 1983. Text Analysis and Automatic Indexing in Introduction to Modern Information Retrieval. New York: McGrae-Hill.
[46] Salton, G., July 1986. Another Look At Automatic Text Retrieval Systems. Communication of the ACM, vol. 29, no. 7, pp. 648-656.
[47] Salton, G. (editor), 1988. Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer. Addison Wesley.
[48] Salton, G. and Buckley, C., 1988. Improving retrieval performance by relevance feedback. Computer Science Technical Report TR88-898, Department of Computer Science, Cornell University, Ithaca, N.Y.
[49] Salton, G., 1989. Automatic Text Processing. Addison-Wesley.
[50] Weiss, S.M., White, B.F., Apte, C.V. and Damerau, F.J., March 2000. Lightweight Document Matching for Help-Desk Applications. IEEE Intelligent Systems, vol. 15, no. 2, pp. 57-61.
[51] Spark Jones, K. and Furnas, G.W., November 1987. Pictures of relevance: A geometric analysis of similarity measures. Journal of the American Society for Information Science, 38(6): pp. 420-442.
[52] Suarez, A., Noeda, M. and Palomar, M., 1999. A Method of Restricted Knowledge Acquisition from WordNet. Proceeding of the third International Conference on Knowledge-Based Intelligent Information Engineering System, IEEE, pp. 38-41.
[53] Turban E. and Aronson J.E., 2001. Decision Support Systems and Intelligent Systems. sixth edition, Prentice Hall.
[54] Tzeras, K. and Petrakis, E.G.M., 1999. Similarity searching in text databases with multiple field types. Proceedings, the fifteenth International Conference on Data Engineering, pp. 100.
[55] Utsuro, T., Ikeda, H., Yamane, M., Matsumoto, Y. and Nagao, M., 1994. Bilingual Text Matching using Bilingual Dictionary and Statistics. Proceedings of fifteenth International Conference on Computational Linguistics, pp. 1076-1082, Kyoto.
[56] Wei, J., Bressan, S. and Ooi, B.C., 2000. Mining Term Association Rules for Automatic Global Query Expansion: Methodology and Preliminary Results. Proceedings of the First International Conference on Web Information Systems Engineering, pp. 366-373.
[57] Web WordNet, Version 2.0, http://wordnet.princeton.edu/cgi-bin/webwn.
[58] Yarowsky, D., 1995. Unsupervised Word Sense Disambiguation rivaling Supervised Method. Proceedings of the Thirty-third Annual Meeting of the Association for Computational Linguistics, pp. 189-196.
[59] Yunjae Jung, Haesun Park and Ding-zhu Du, 2001. A Balanced Term-Weighting Scheme for Effective Document Matching. Technical Report TR-01-009, Department for Mathematics and Computer Science, University of Mannheim.
[60] Zipf, G.K., 1949. Human Behavior and the Principle of Least Effort. Addison-Wesley. Cambridge, MA, pp. 22-27.