| 研究生: |
張振超 Chen-Chao Chang |
|---|---|
| 論文名稱: |
結合基因演算法與使用者興趣檔之資訊檢索研究 A Research on Combining the Genetic Algorithm and User Profile for Information Retrieval |
| 指導教授: |
周世傑
Shih-Chieh Chou |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理學系 Department of Information Management |
| 畢業學年度: | 94 |
| 語文別: | 中文 |
| 論文頁數: | 63 |
| 中文關鍵詞: | 遺傳基因演算法 、使用者興趣檔 、範例文件 、相關回饋 、演化與適應系統 |
| 外文關鍵詞: | user profile, genetic algorithm, Evolutionary and Adaptive System, example documents, relevant feedback |
| 相關次數: | 點閱:9 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著時間的推移,網際網路和全球資訊網在資訊搜尋上是越來越重要,但目錄搜尋引擎跟關鍵字搜尋引擎各面臨到問題。目錄搜尋引擎目前仍有賴人力維護,難以趕上全球資訊網增加資訊的速度;關鍵字搜尋引擎則由於關鍵字通常難以準確地表達出使用者的真正需求,以致提供太多無意義的資訊而形成資訊過載。
為了協助在全球資訊網上的資訊搜尋,本研究建構出一套演化與適應系統領域中的新系統,來支援使用者檢索網頁資訊。系統應用遺傳基因演算法來擴展查詢,並以興趣檔來描述使用者的真正需求。首先,使用者輸入概念字串和權重來描述使用者需求,此概念字串的字詞將被使用於組成AND運算。為了組成適當的查詢字串,本系統根據使用者提供的範例文件和相關回饋為基礎建立兩種興趣檔:正面字詞興趣檔提供用來組成OR運算的字詞;負面字詞興趣檔則是用來組成NOT運算的字詞。接著本系統從正面字詞興趣檔挑選若干字詞,組成真正的使用者興趣檔,並用以比對回傳網頁、計算相似度。最後,藉由使用者進行測試的實驗,驗證本系統的確提升了資訊檢索的效率。
Internet and World Wide Web ( WWW ) are becoming more and more important in terms of information search. But the directory-based search engine and the keyword-based search engine both have their own shortcomings. Directory-based search engine basically cannot catch up with the growth of WWW resources because human work is required in categorization; Keyword-based search engine, usually give too much irrelevant information because keywords usually cannot precisely specify the user’s real requirements.
To assist information search on WWW, our research constructs a new “Evolutionary and Adaptive” system to assist the users in retrieving web information. This system applies the genetic algorithm to expand queries, and develops profiles to describe the users’ requirements. At first, the users input the concept and the weight to describe what they want, and the terms of the concept are used to compose the AND operation. In order to make up suitable query strings, our system establishes two kinds of profiles based on example documents and relevant feedback. The positive term profile offers the terms used to compose the OR operation, and the negative term profile’s terms are used to compose the NOT operation. Next, our system picks up some terms from the positive term profile to become the real user profile, and compare it with the web pages to calculate similarity. At last, the improving performance is reported through the experiment of the user testing.
中文部分:
[1] 張永霖,「使用基因演算法與相關回饋於協助網頁搜尋」,中央大學資訊管理研究所碩士論文,民91。
英文部分
[2] Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern Information Retrieval. New York:ACM.
[3] Chen, H., Chung, Y., Ramsey, M., & Yang, C. (1998). “An intelligent personal spider( agent ) for dynamic Internet / Intranet searching,” Decision Support System, pp. 41-58.
[4] Gordon, M. (1988). “Probabilistic and genetic algorithm for document retrieval,” Communications of the ACM, pp. 1208-1218.
[5] Kim, S., & Byoung-Tak, Z. (2001). “Evolutionary learning of web document structure for information retrieval,” in Proceedings of Congress on Evolutionary Computation(CEC), Seoul, Korea, pp. 1253-1260.
[6] Kushchu, I. (2001). “An adaptive approach to organizational knowledge management,” J. Knowl. Innov, pp. 109-126.
[7] Kushchu, I. (2005). “Web-Based Evolutionary and Adaptive Information Retrieval,” IEEE Transactions on Evolutionary Computation, pp. 109-126.
[8] Loia, V., & Luongo, P. (2001). “An evolutionary approach to automatic web page categorization and updating,” in Lecture Notes in Artificial Intelligence, International Conference on Web Intelligence(WI), pp. 477-478.
[9] Menczer, F., & Belew, R. K. (2000). “Adaptive retrieval agents: Internalizing local context and scaling up to the web,” Machine Learning, pp. 203-242.
[10] Nasraoui, O. D., Dasgupta, & Gonzalez, F. (2002). “An artificial immune system approach to robust data mining,” in Proceedings of Late Breaking Papers, Genetic and Evolutionary Computation Conference(GECCO), pp. 356-363.
[11] Pathak, P. M., Gordon, & Fan, W. (2000). “Effective information retrieval using genetic algorithm-based matching functions adaptation,” in Proceedings of 33rd Hawaii International Conference on System Sciences(HICSS), Hawaii, p. 8.
[12] Raghavan, V. V., & Agarwal, B. (1987). “Optimal determination of user-oriented clusters: An application for the reproductive plan,” in Proceedings of the Second International Conference on Genetic Algorithms and Their Applications, Cambridge, MA, pp. 241-246.
[13] van Rijsbergen, C. J. (1975). Information Retrieval, Butterworth, London.
[14] Suhail, J. S., Owais, Pavel, K., & V´aclav, S. (2005). “Query optimization by genetic algorithms,” Proceedings of Dateso, pp. 125–137, ISBN 80-01-03204-3.
[15] Zacharis, N. Z., & Panayiotopoulos, T. (2001). “Web search using a genetic algorithm,” IEEE Internet Computing, pp. 18-26.
網站部份
[16] GNU’s Not Unix, “http://www.gnu.org/”
[17] Google, “http://www.google.com/”
[18] Stemming Algorithm, “http://www.tartarus.org/~martin/PorterStemmer/”
[19] Stop List of Rijsbergen, “http://www.dcs.gla.ac.uk/Keith/Chapter.2/Table_2.1.htm”