| 研究生: |
李逸群 Yi-Chun Li |
|---|---|
| 論文名稱: |
網頁異動偵測技術在網際網路新聞資訊擷取上之應用 |
| 指導教授: |
陳奕明
Yi-Ming Chen |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理學系 Department of Information Management |
| 畢業學年度: | 92 |
| 語文別: | 中文 |
| 論文頁數: | 54 |
| 中文關鍵詞: | 網頁改版 、擷取程式 、資訊擷取技術 、新聞網頁 、異動偵測技術 |
| 相關次數: | 點閱:9 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
網際網路上的資訊愈來愈豐富,幾乎所有的資訊都可以在網際網路上搜尋的到,而新聞資訊在網際網路的資料中,具有相當的重要性。在資料量快速增加的趨勢下,網際網路的使用者希望有一套方法能快速地過濾掉不需要的新聞資訊,取得自己想要的新聞資訊。資訊擷取技術可以協助使用者過濾和篩選新聞資訊,將經過整理和統一格式的資料呈現在使用者前。然而當新聞網頁的格式因為網頁改版而有所變動時,有部分的資訊擷取功能就會發生錯誤。本研究針對上述的問題,設計了一個異動偵測資訊擷取系統(Change Detection Information Extraction System, CDIES),CDIES在進行資訊擷取時,如果遇到網頁改版的情形,會自動地更新擷取路徑,使資訊擷取的動作不因網頁改版而中斷。最後本研究會說明CDIES實際使用的情形,並與「資訊安全資料庫」的資訊擷取系統作比較,以驗證本系統的有效性和可用性。
[劉榮修 2002] 劉榮修,”一種網頁資訊擷取程式之自動化產生技術研發”,國立中央大學資訊管理學系研究所碩士論文,民國91年6月
[顏逸品 2000] 顏逸品,“網際網路半結構化資料之蒐集與整合系統”,國立中央大學資訊管理學系研究所碩士論文,民國89年6月。
[國科會報告 2003] 陳奕明,”網際網路資安資訊的自動擷取技術及其軟體研發”,國科會結案報告 NSC91-CS-7-008-003,民國92年1月
[AI 1999] Douglas E. Appelt, David J. Israel, “Introduction to Information Extraction Technology”, International Joint Conference on Artificial Intelligence (IJCAI-99) Tutorial, Stockholm, Sweden, 1999.
[AK 1997] Naveen Ashish, Craig Knoblock, “Semi-automatic Wrapper Generation for Internet Information Sources”, Conference in Cooperative Information Systems, pp. 160-169, 1997.
[BHC 1996] Robin D. Burke, Kristian J. Hammond, Edwin Cooper, “Knowledge-based information retrieval from semi-structured text”, AAAI/IAAI, Vol. 1, pp. 462-468, 1996.
[BLG 1998] Kurt D. Bollacker, Steve Lawrence, and C. Lee Giles, “Citeseer: An autonomous web agent for automatic retrieval and identification of interesting publications”, Proceedings of the 2nd International Conference on Autonomous Agents, ACM Press, pp.116-123, 1998.
[BRS 2002] Manish Bhide, Krithi Ramamritham, Prashant Shenoy, “Efficiently Maintaining Stock Portfolios Up-To-Date On The Web”, Proceedings of the 12th International Workshop on Research Issues in Data Engineering: Engineering E-Commerce/E-Business Systems, 2002.
[BCFG 2002] Vijay Boyapati, Kristie Chevrier, Avi Finkel, Natalie Glance, Tom Pierce, Robert Stockton, Chip Whitmer, “ChangeDetectorTM: A Site-Level Monitoring Tool for the WWW”, Proceedings of the eleventh international conference on World Wide Web, 2002.
[BGRV 1999] Laura Bright, Jean-Robert Gruser, Louiqa Raschid, Maria Esther Vidal, “A wrapper generation toolkit specify and construct wrappers for web accessible data sources (WebSources)”, International Journal of Computer Systems Science and Engineering, Vol. 14, No. 2, pp. 83-97, 1999.
[CAM 2002] Gregory Cobena, Serge Abiteboul, Amelie Marian, “Detecting Changes in XML Documents”, Proceedings of the 18th International Conference on Data Engineering, 2002.
[CGL 1998] Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, Daniele Nardi, Riccardo Rosati, “Description Logic Framework for Information Integration”, Principles of Knowledge Representation and Reasoning, pp. 2-13, 1998.
[CRR 2000] Boris Chidlovskii, Jon Ragetli, Maarten de Rijke, “Wrapper Generation via Grammar Induction”, European Conference on Machine Learning, pp. 96-108, 2000.
[Eikvil 1999] Line Eikvil, “Information Extraction from world wide web -A Survey-”, Norwegian Computing Center, No. 945, July 1999.
[Etzioni 1996] Oren Etzioni, “The World Wide Web: quagmire or gold mine?”, Communications of the ACM, Vol. 39, No. 11, pp. 65-68, 1996.
[FFM 2001] Sergio Flesca, Filippo Furfaro, Elio Masciari, “Monitoring Web information changes”, International Conference on Information Technology: Coding and Computing, 2001.
[FM 2003] S. Flesca, E. Masciari, “Efficient and effective Web change detection”, Data & Knowledge Engineering archive Volume 46 Issue 2, pp. 203 – 224, 2003.
[FMMPP 2002] Sergio Flesca, Giuseppe Manco, Elio Masciari, Luigi Pontieri, Andrea Pugliese, “Detecting Structural Similarities between XML Documents”, Proceedings of 5th International Workshop on the Web and Database, 2002
[FMNW 2003] Dennis Fetterly, Mark Manasse, Marc Najork, Janet L. Wiener, “A Large-Scale Study of the Evolution of Web Pages”, Software—Practice & Experience Volume 34 Issue 2, pp. 213 – 237, 2003.
[GS 1999] Xiaoying Gao, Leon Sterling, “Semi-Structured data extraction from heterogeneous sources”, 2nd International Workshop on Innovative Internet Information Systems (IIIS''99), 1999.
[GW 1999] Tao Guan, Kam-Fai Wong, “KPS: a Web information mining algorithm”, Computer Networks, Vol. 31, pp. 1495-1507, 1999.
[KC 2001] Yong Hae Kong, In Seok Choi, “An efficient Web information extracting system”, Proceedings of IEEE International Symposium on Industrial Electronics (ISIE 2001), Vol. 3, pp. 1771-1774, 2001.
[KWR 2002] Latifur Khan, Lei Wang and Yan Rao, “Change Detection of XML Documents Using Signatures”, WWW2002, Workshop on Real World RDF and Semantic Web Application, 2002
[LLG 1999] Mengchi Liu, Tok Wang Ling, Tao Guan, “Integration of semistructured Data with Patial and Inconsistent Information”, Database Engineering and Applications, pp. 44-52, 1999.
[LN 2001] Seung-Jin Lim, Yiu-Kai Ng, “An Automated Change-Detection Algorithm for HTML Documents Based on Semantic Hierarchies”, Data Engineering Proceedings 17th International Conference on , pp. 303 – 312, 2001.
[LN 2004] Seung-Jin Lim, Yiu-Kai Ng, “Change Discovery of Hierarchically Structured, Order-Sensitive Data in HTML/XML Documents”, Applications and the Internet Proceedings 2004 International Symposium, pp. 178-187, 2004.
[LP 1997] Ling Liu, Calton Pu, “An Adaptive Object-oriented Approach to Integration and Access of Heterogeneous Information Sources”, Distributed and Parallel Databases, Vol. 5, No. 2, pp. 167-205, 1997.
[LPBZ 1996] Ling Liu, Calton Pu, Roger Barga, Tong Zhou, “Differential Evaluation of Continual Queries”, Proceedings of the 16th International Conference on Distributed Computing Systems, pp. 458, 1996
[LPT 1999] Ling Liu, Calton Pu, Wei Tang, “Continual Queries for Internet Scale Event-Driven Information Delivery”, IEEE Transactions on Knowledge and Data Engineering archive Volume 11 Issue 4, pp. 610 – 628, 1999
[LPT 2000] Ling Liu, Calton Pu, Wei Tang, “WebCQ – Detecting and Delivering Information Changes on the Web”, the Proceesings of International Conference on Information and Knowledge Management, 2000.
[LTBP 2002] Ling Liu, Wei Tang, David Buttler, Calton Pu, “Information Monitoring on the Web: A Scalable Solution”, World Wide Web Journal, 2002.
[NJ 2002] Anddrew Nierman, H. V. Jagadish, “Evaluating Structural Similarity in XML Documents”, the Proceedings of the Fifth International Workshop on the Web and Databases, 2002
[PL 1998] Calton Pu, Ling Liu, “Update Monitoring: The CQ Project”, The 2nd International Conference on Worldwide Computing and Its Applications, 1998.
[PRC 2003] Sandeep Pandey, Krithi Ramamritham, Soumen Chakrabarti, “Monitoring the Dynamic Web to respond to continuous Queries”, Proceedings of the Twelfth International World Wide Web Conference WWW2003, 2003.
[TLP 2003] Wei Tang, Ling Liu, Calton Pu, “Trigger Grouping: A Scalable Approach to Large Scale Information Monitoring”, the Proceedings The 2nd IEEE International Symposium on Network Computing and Applications, 2003.