跳到主要內容

簡易檢索 / 詳目顯示

研究生: 楊士鋒
Shih-Feng Yang
論文名稱: 應用資料擷取於Web小工具開發之研究–多個資料源之資料整合
Multiple Source Data Management for Gadget Creation on Web Portals
指導教授: 張嘉惠
Chia-Hui Chang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
畢業學年度: 96
語文別: 中文
論文頁數: 44
中文關鍵詞: 資訊擷取資訊整合網路 2.0
外文關鍵詞: Information Extraction, Information Integration, Web 2.0
相關次數: 點閱:8下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著網路使用量快速地增加,網路行為日益頻繁,個人化資訊整合與管理愈來愈受到重視。從早期My Yahoo到近期的iGoogle、Netvibes所提供的個人化首頁(Personal Portal)服務,這類型個人入口網站結合軟體即服務的概念,讓人們透過簡易步驟管理個人所需的資訊及服務。在此平台上軟體以Gadget的方式呈現給使用者,因此沒有軟體更新的問題。只要簡易地新增現有的Gadget,即可以將新聞、股市、行事曆、電子郵件等許多有用的Gadget放置於個人入口網站,再由AJAX的技術所提供的互動介面,達成個人資訊整合的目的。透過個人入口網站,使用者能夠讓每天例行性的網頁瀏覽更為輕易,並且將以往堆積的網頁書籤,轉為視覺化的資訊呈現。不過使用者雖然可以自行選擇現有的Gadget,但自行設計Gadget仍不是一件平易近人的事。本篇論文我們提出一個線上Gadget製作網站,讓即使完全不懂程式語言的使用者,也能夠透過簡單的步驟產生個人入口網站上可以使用的模組。我們的目的是希望讓使用者僅需簡單的操作,就能夠達到監控網頁更新和多樣的資料呈現方式。同時透過我們所設計的頁面擷取流程(Page Fetch Plan)規劃,讓使用者能將網頁資料轉成多資料源(Data Source),擷取所需資訊,進行個人化資料整合。最後透過Gadget平台,更可以將產生的模組分享給網路上的使用者。


    The Web 2.0 trend has bought more and more users into World Wide Web, and allowed users to create and share their information on the Web. Therefore, personal information integration has become an important research domain. Personal portal, such as MyYahoo, iGoogle and Netvibes, provides end-users a convenient platform to manage their desired web information. All the applications in personal portal can be easily added by users, and there is no application update problem. Users can add any kinds of gadget, like news, stock, calendar and e-mail, into their personal portal, and manage all information by the interactive interface.
    Although the personal portal can make people easily to manage their information, it is hard for a non-expert user to design a gadget. In this paper, we present an online gadget creation service to help non-expert users who want to create a gadget to manage their desired web information. We propose a simple process to create a gadget which can monitor the update of web pages, and present a gadget as several different forms. Additionally, users can extract web information from multiple data sources by Page Fetch Plan process, and accomplish the personal information integration works. Finally, users can easily share the gadgets to their friends through personal portal platform.

    中文摘要........................................................................................................................i 英文摘要.......................................................................................................................ii 誌謝..............................................................................................................................iii 目錄...............................................................................................................................iv 圖目錄..........................................................................................................................vi 表目錄.........................................................................................................................vii 1. 緒論...........................................................................................................................1 2. 相關研究..................................................................................................................5 2.1 網路資料整合................................................................................................5 2.2 網頁區塊追蹤................................................................................................5 2.3 Gadget自動產生.............................................................................................6 2.4 網頁抓取........................................................................................................7 2.5 資訊擷取技術................................................................................................8 3. 系統架構................................................................................................................10 3.1 選擇顯示模組..............................................................................................11 3.2 建立初始資料源..........................................................................................11 3.2.1 網頁收集............................................................................................12 3.2.2 資料擷取技術...................................................................................12 3.2.3 擷取結果呈現...................................................................................14 3.3 頁面擷取流程..............................................................................................16 3.4 模組發佈......................................................................................................16 4. 多資料源合併........................................................................................................18 4.1 前處理..........................................................................................................19 4.2 Bottom-Up合併............................................................................................19 5. 還原樹狀架構資料源............................................................................................23 6. 案例討論................................................................................................................26 7. 系統比較................................................................................................................28 7.1 網路資料整合..............................................................................................28 7.2 網頁區塊追蹤..............................................................................................29 7.3 Gadget自動產生...........................................................................................29 8. 結論與未來研究....................................................................................................32 參考文獻.....................................................................................................................34

    1. A. Jhingran, "Enterprise information mashups: integrating information, simply," VLDB, 2006.
    2. A. Thor, D. Aumueller, E. Rahm, "Data Integration Support for Mashups," Sixth International Workshop on Information Integration on the Web, IIWeb, 2007.
    3. C.H. Chang; M. Kayed, M.R. Girgis, K.F. Shaalan, "A Survey of Web Information Extraction Systems," IEEE Transactions on Knowledge and Data Engineering, 2006.
    4. F. Daniel, J. Yu, B. Benatallah, F. Casati, M. Matera, R. Saint-Paul "Understanding UI Integration: A survey of problems, technologies and opportunities," IEEE Internet Computing, 2007.
    5. J. Fujima, A. Lunzer, K. Hornbæk, Y. Tanaka, "Clip, connect, clone: combining application elements to build custom interfaces for information access," Proceedings of the 17th annual ACM symposium on User interface software and technology.
    6. J. Han, D. Han, C. Lin, H.J. Zeng, Z. Chen, Y. Yu, "Homepage live: automatic block tracing for web personalization," Proceedings of the 16th international conference on World Wide Web, 2007.
    7. J. Yu, B. Benatallah, R. Saint-Paul, F. Casati, F. Daniel, M. Matera, "A framework for rapid integration of presentation components," Proceedings of the 16th international conference on World Wide Web, 2007.
    8. L. Liu, C. Pu, W. Han, "XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources," 16th International Conference on Data Engineering, ICDE, 2000.
    9. S. Lingam, S. Elbaum, "Supporting end-users in the creation of dependable web clips," Proceedings of the 16th international conference on World Wide Web, 2007.
    10. Y.H. Lu, Y. Hong, J. Varia, D. Lee, "Pollock: automatic generation of virtual web services from web sites," Proceedings of the ACM symposium on Applied computing, 2005.
    11. M. Kayed, C.H. Chang, M.R. Girgis, K.F. Shaalan, "FiVaTech: Page-Level Web Data Extraction from Template Pages," Workshops on Data Mining in Web2.0 Environment, 2007.
    12. J. P. Lage, A. S. da Silva, P. B. Golgher, A. H. F. Laender, "Automatic generation of agents for collecting hidden Web pages for data extraction," Data & Knowledge Engineering, Vol. 49, No. 2, 2004. Pages: 177-196.
    13. A. H. F. Laender, B. Ribeiro-Neto, A. S. da Silva, " DEByE - Data Extraction By Example," Data & Knowledge Engineering, Vol. 40, No. 2, 2002. Pages: 121-154.
    14. V. Crescenzi, G. Mecca, and P. Merialdo. "An Automatic Data Grabber for Large Web Sites," Proceedings of the Thirtieth international conference on Very large data bases, VLDB 2004. Pages: 1321 - 1324.
    15. iGoogle, “http://www.google.com.tw/ig”.
    16. Netvibes, “http://www.netvibes.com/errors/migration.php”.
    17. Google Map API, “http://code.google.com/apis/maps/”.
    18. Google Gadget Developer’s Guide, http://code.google.com/apis/gadgets/docs/dev_guide.html.
    19. Dapper: The Data Mapper, “http://www.dapper.net/”.
    20. Openkapow, “http://openkapow.com/”.

    QR CODE
    :::