跳到主要內容

簡易檢索 / 詳目顯示

研究生: 王仕榮
Shih-Yong Wang
論文名稱: 具檔案敘述相關語查詢之智慧型檔案搜尋系統
SmartArchie: An Intelligent File Search System
指導教授: 曾黎明
Li-Ming Tseng
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
畢業學年度: 88
語文別: 中文
論文頁數: 58
中文關鍵詞: 檔案敘述檔案下載全球資訊網代理伺服器檔案傳輸資源再利用
外文關鍵詞: description, file retrieval, Archie, World Wild Web, proxy, FTP, resource reuse
相關次數: 點閱:14下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 眾所皆知,檔案下載佔去了很大部份的網路頻寬,但要找到心目中的軟體是放置在何處並不容易,
    於是一些著名或國外的站台便成了許多使用者的理想去處,不過盡往外連的結果就無法節省寶貴的網路資源。
    Archie 是網際網路上常見的服務,使用者可藉由檔名字串來作檔案的搜尋,只是 Archie 沒有跟上時代變化的
    腳步,因為有愈來愈多的軟體現在只能在 WWW的網站上見到,但 Archie 卻只能搜尋放在 FTP 站台內的檔案。不過,一般 WWW 的搜尋引擎在做檔案搜尋的工作上也不盡人意。
    在本論文中,我們實作了一個具檔案敘述相關語查詢的智慧型檔案搜尋系統,稱作 SmartArchie。
    使用者可以藉由檔名字串或是檔案敘述相關語來搜尋想要的檔案,無論這個檔案是置於網際網路上的 FTP 站
    或是 WWW 網站。而且本系統與代理伺服器相互合作,因此 SmartArchie 可以有效提高檔案的重覆下載率
    及整體的命中率。此外,本系統亦是完全中文相容的。
    經過我們長期的觀察,結果顯示已經有 20% 以上的使用者藉由檔案敘述相關語來搜尋檔案。


    It is believed that file retrieve accounts for a large percentage of the network traffic. However, there can be great
    difficulty in finding desired software and then people always fetch them from well-known foreign sites.
    Thus saving in bandwidth and reusing resource is almost not guaranteed.
    The Archie service issues the problem of locating files by their attributes, which are always the filename, on the Internet. But Archie lost the sight of more and more software packages are stored on the Web sites, therefore it can deal with only archives within the FTP sites. In addition, most WWW search engines cannot perform well on filename search or locate
    the file by some describing words.
    In this paper, we provide an intelligent file search system, called SmartArchie, for Internet archive discovery. It allows searching file not only by the filename but also the description relevant to the file. It also allows people to locate inquired file, no matter it is on public FTP hosts or Web sites. In cooperating with proxy caching server, the SmartArchie successfully increases the ratio of archive reuse and the request and byte hit rate. Finally, indexing and searching in Chinese is also supported.
    Our results demonstrate that there are about 20% of users search file by description.

    ABSTRACTI 摘要II 誌謝III CONTENTSIV LIST OF FIGURESVI LIST OF TABLESVII CHAPTER 1 INTRODUCTION1 1.1 MOTIVATION1 1.2 SYSTEM OVERVIEW2 1.3 ORGANIZATION OF THIS THESIS2 CHAPTER 2 ARCHIVE DISCOVERY ON THE INTERNET3 2.1 ARCHIVES ON FTP SERVERS3 2.2 ARCHIVES ON WEB SITES5 CHAPTER 3 RELATED WORKS7 3.1 GLIMPSE AND WEBGLIMPSE7 3.2 GAIS8 3.3 FTPSEARCH9 3.4 FTPLOCATE9 3.5 PERLFECT10 3.6 ARCHIE11 3.7 SUMMARY12 CHAPTER 4 PROBLEM OVERVIEW14 4.1 BUILT-IN FILE DESCRIPTION14 4.2 BETTER FTP AND WEB SITES14 4.3 CONTRIBUTION BY LINUX AND BSD15 4.4 PROXY CACHING SERVER16 4.5 RESOURCE REUSE17 CHAPTER 5 SYSTEM ARCHITECTURE18 5.1 FTP FILE NAME AND DESCRIPTION COLLECTION19 5.1.1 Main Functions and Improvement19 5.1.2 Linux Packages20 5.1.3 BSD Ports20 5.2 PROXY CACHING SERVER21 5.2.1 Log Analysis21 5.2.2 ProxyFTP Server22 5.2.3 Classification in FTP Directory23 5.3 FILE RELATED WEB PAGES COLLECTION24 5.3.1 Retrieving Web pages25 5.3.2 Indexing Web pages25 5.3.3 Searching Web Pages26 5.4 INTEGRATED QUERY INTERFACE26 CHAPTER 6 IMPLEMENTATION AND EVALUATION28 6.1 PLATFORM28 6.2 INDEX/SEARCH FTP CONTENT29 6.3 INDEX/SEARCH WEB ARCHIVES32 6.4 PROXYFTP SERVER34 CHAPTER 7 CONCLUSIONS AND FUTURE WORKS40 ACKNOWLEDGEMENT42 REFERENCES43

    [1]Emtage and P. Deutsch, "Archie: An Electronic Directory Service for the Internet," in Proc. Winter 1992 Usenix Conf.,
    Usenix, Sunset Beach, Calif., pp. 93-110, 1992.
    [2]"The Archie 3.5 System Manual," Bunyip Information Systems, Inc., 1996.
    [3]Lee-Feng Chien, Sung-Chien Lin, Jenn-Chau Hong, and Ming-Chiuan Chen, "Internet Chinese Information Retrieval
    Using Unconstrained Mandarin Speech Queries Based on A Client-Server Architecture and A PAT-tree-based
    Language Model," in IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2,
    pp. 1155-1158, 1997.
    [4]Katia Obraczka, Peter B. Danzig, and Shih-Hao Li, "Internet Resource Discovery Services," IEEE Computer,
    pp. 8-22, Sep. 1993.
    [5]Tao Guan and Kam-Fai Wong, "KPS: A Web Information Mining Algorithm," Computer Networks, vol. 31,
    pp. 1495-1507, 1999.
    [7]Steve Lawrence and C. Lee Giles, "Searching the Web: General and Scientific Information Access,"
    IEEE Communication Magazine, pp. 116-122, Jan. 1999.
    [8]Chia-Hui Chang and Ching-Chi Hsu, "Integrating Query Expansion and Conceptual Relevance Feedback For
    Personalized Web Information Retrieval," Computer Networks and ISDN Systems, vol. 30, pp. 621-623, 1998.
    [9]Udi Manber and Sun Wu, "GLIMPSE: A Tool to Search Through Entire File System,"
    in Winter USENIX Technical Conference, 1994.
    [10]Jia Wang, "A Survey of Web Caching Schemes for The Internet," Computer Communication Review of
    ACM SIGCOMM, vol. 29, pp. 36-46, Oct. 1999.
    [11]K.L.E. Law, B. Nandy, A. Chapman, "A Scalable and Distributed WWW Proxy System," in IEEE Multimedia
    Computing and Systems Conference, pp. 565-571, 1997.
    [12]Martin Arlitt, Ludmila Cherkasova, John Dilley, Rich Friedrich, and Tai Jin, "Evaluation Content Management
    Techniques for Web Proxy Caches," HP Labs Technical Reports, 1999. (available in
    http://www.hpl.hp.com/techreports/1999/HPL-1999-69.html)
    [13]Hsin-Yi Lu, "The Mirror-On-Daemon Server," Distributed System Laboratory, Department of Computer Science
    and Information Engineering, National Central University, Jungli, Taiwan, ROC, pp. 33-35, Jun. 1999.
    [14]Mark Russell and Tim Hopkins, "CFTP: a Caching FTP Server," Computer Networks and ISDN Systems,
    pp. 2211-2222, 1998.
    [15]Venkat N. Gudivada, Vijay V. Raghavan, William I. Grosky, and Rajesh Kasanagottu, "Information Retrieval
    On The World Wide Web," IEEE Internet Computing, pp. 58-68, Sep. 1997.
    [16]H.Vernon Leighton and Jaideep Srivastava, "Precision among World Wide Web Search Services (Search Engines):
    Alta Vista, Excite, Hotbot, Infoseek, Lycos," Jun. 1997. (available in http://www.winona.msus.edu/
    library/webind2/webind2.htm)

    QR CODE
    :::