| 研究生: |
蔡子宸 Tzu-chen Tsai |
|---|---|
| 論文名稱: |
自動偵測HTML語言的語意區塊 An Automatic Semantic-Segment Detection Method in the HTML Language |
| 指導教授: |
楊鎮華
Stephen J.H. Yang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 畢業學年度: | 95 |
| 語文別: | 英文 |
| 論文頁數: | 106 |
| 中文關鍵詞: | 內容調適 、語意區塊 、調適策略 、自動偵測 |
| 外文關鍵詞: | Semantic Segment, Content Adaptation, Structure Fragment, Context Aware |
| 相關次數: | 點閱:10 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
21世紀網際網路的使用率已迅速普及化,以及各大場商推出多樣化的可攜式上網設備,讓使用者除了傳統使用個人電腦的設備外,增加了更多的選擇及機會,達到可以隨時隨地使用各種上網設備,在網路上獲得網頁資訊及學習。雖然這些可攜式上網設備的優勢就是輕便、小巧、可移動性強,及功能多樣化,深受大眾喜愛。但是也存在讓人困擾的缺點,如受到螢幕畫面太小會導致呈現網頁內容時,常常出現排版失當及不易閱讀的情況,以及網路頻寬和計算能力較個人電腦差,產生使用者需要花較多的等待時間...等問題。然而內容調適主要技術是分析、拆解原始網頁並且依照使用者的情境狀態、身體狀況及設備等條件,重新為使用者轉換內容,產生量身訂做的網頁,使得調適後的網頁是以更佳的呈現方式,表達作者要傳遞給使用者的資訊。總而言之,內容調適(Content Adaptation)機制是為了彌補各大網站未提供使用者適宜網頁的缺失,進行自動化產生調適網頁的系統。但是系統依照使用者條件調適網頁前,必需先使用正確的編碼拆解網頁。因此,我在論文中提出一套有效率模組化拆解且自動偵測語意區塊的方法,進行識別最小不可再被細分的成員,依照語意關聯性和結構性,自動偵測出語意區塊,作為調適的單位,而單位內的物件要保留語意的同質關聯性、完整的功能性、可讀性,以及呈現位置結構的階層性。然而再以語意區塊為單位進行物件轉換等調適策略,產生適合使用者的網頁,解決等使用者等待下載時間過長及觀看網頁資訊過多導致需要不斷移動畫面所產生的不易閱讀等等的問題。
The amount of information on World Wide Web continues to grow at an astonishing speed increases astonishingly, and then many contents of the web pages are designed for large-sized screen and powerful computation device such as PC and NB so these contents can not fit into the small device, such as personal digital assistants. Additionally, these factors, users’ personal condition and capability of device, can influence the users to successfully understand content of the webpage. In this paper, we propose a mediator system to facilitate the surfing in WWW for users. The main purpose of this system adapts the original content to suitable content for users via Context Aware. We named this system Content Adaptation (CA). In other words, CA system produces the suitable webpage for the user.
CA can be separated into two steps, content decomposing and content re-composing. Because of the content decomposer needs to analyze semantics of HTML language before adapting content for the users’ condition, I focused on the automatic content decomposition in my research. In the decomposition process, I need to use a correct Code-Page to parse the HTML file and structurally consider whole tags and information of HTML, furthermore I developed to analyze the semantic context, architecture, arrangement, structure, and visual effect and split it into a small Semantic Segment (S.S.) that is not being subdivided. S.S. has some important properties, keeping complete function (functionality related), readable typesetting (readability related), relationship of presenting (space and time related), and literary context (semantics related). My experimental results show that I proposed convention of detection semantic segments and developed a page splitting scheme to partition the web page into many smaller semantic segments greatly improve the users’ browsing experiences on a small screen of hand-held devices.
[1] D. Buttler and L. Liu, 2001, “A Fully Automated Object Extraction System for the World Wide Web”, In Proceedings of ICDCS-2001, 2001.
[2] D. Raggett. HTML TIDY. http://www.w3.org/People/Raggett/tidy/
W3C® (MIT, ERCIM, Keio)
[3] J.S.F. Hsieh. DOL HTML Parser.
http://www.codeproject.com/useritems/DOL_HTML_Parser.asp Code Project
[4] B. Bos, T. ÇelikIan, I. Hickson, and H.W. Lie, 2006, “Cascading Style Sheets, level 2 revision 1 CSS 2.1 Specification”, W3C® (MIT, ERCIM, Keio), November 2006
[5] W3C, 2004b, “Document Object Model (DOM) Level 3 Core Specification
Version 1.0”, W3C Recommendation, 07 April.
[6] W3C, 2004c, “Extensible Markup Language (XML) 1.1”, W3C Recommendation, 4th February.
[7] W3C, “HTML 4.0 specification”. http://www.w3.org/TR/html4/
[8] W3C, 2001, “XSL Transformations (XSLT) Version 1.1”, W3C Working Draft August. Available at: http://www.w3.org/TR/xslt11/
[9]. W3C, 2004a, “Composite Capability/Preference Profiles (CC/PP): Structure and Vocabularies 1.0”, W3C Recommendation 15 January.
[10] U. Manber. , 1994, “Finding Similar Files in a Large File System”, In Proceedings of USENIX-1994, January 1994.
[11] J. Mogul. , 1995, “Network Behavior of a Busy Web Server and its Clients”, Technical report, DEC Western Research Laboratories, 1995.
[12] N. Adam and S. Naqvi, 1996, “Universal Access in Digital Libraries”, ACM Computing Surveys, vol. 28, no. 4, Dec.
[13] Bickmore, T.W. and Schilit, B.N. Digestor. , 1997, “Deviceindependent Access to the World Wide Web”, Proc. of the 6th WWW Conference, 1997, pp655-663.
[14] A. Broder. , 1997, “On resemblance and Containment of Documents”, In Proceedings of SEQUENCES-97, 1997.
[15] A. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. , 1997, “Syntactic Clustering of the Web”, In Proceedings of WWW-6, April 1997.
[16] J.R. Smith, R.Mohan, and C.S. Li, 1998, “Transcoding Internet Content for
Heterogeneous Client Devices”, Proc. of IEEE International Conf. On Circuits and Systems. June, Monterey, California, pp. pp. 599-602, May.
[17] R. Han and, P. Bhagwat, 1998, “Dynamic Adaptation in an Image Transcoding Proxy for Mobile Web”, IEEE Personal Communications Magazine, Dec. 1998, pp. 8-17.
[18] Fox, A., Gribble, S.D., et al., 1998, “Adapting to Network and Client Variation Using Infrastructural Proxies: Lessons and Perspectives”, IEEE Personal Communication, V5, I4, 1998, pp10-19.
[19] J. Challenger, A. Iyengar, and P. Dantzig., 1999, “A Scalable System for Consistently Caching Dynamic Web Data”, In Proceedings of IEEE INFOCOM 1999, March 1999.
[20] M. C. Chan and T. W. C. Woo., 1999, “Cache-Based Compaction: A
New Technique for Optimizing Web Transfer”, In Proceedings of INFOCOM-1999
[21] F. Reynolds, J Hjelm, S. Dawkins, and S. Singhal, 1999, “Composite
Capability /Preference Profiles (CC/PP): a User Side Framework for Content Negotiation”, W3C note, 27 July.
[22] R. Mohan, J.R. Smith, and C.S. Li, 1999, “Adapting Multimedia Internet
Content for Universal Access”, IEEE Transactions on Multimedia, Volume 1, No. 1, pp. 104-114.
[23] Hori, M., Kondoh, G., Ono, K., Hirose, S. and Singhal, S., 2000, “Annotation-Based Web Content Transcoding”, Proc. Of WWW-9, Amsterdam, Holland, May 2000.
[24] Yang, Y.D., Chen, J.L. and Zhang, H.J, 2000, “Adaptive Delivery of HTML Contents”, WWW9 Poster Proceedings, May, 2000, pp24-25.
[25] J. Challenger, A. Iyengar, K. Witting, C. Ferstat, and P. Reed, 2000, “Publishing System for Efficiently Creating Dynamic Web”, Content. In Proceedings of IEEE INFOCOM 2000, May 2000.
[26] Buyukkokten, O., Garcia-Molina, H. and Paepcke, A., 2001, “Accordion Summarization for End-Game Browsing on PDAs and Cellular Phones”, Proc. of the SIGCHI Conference on Human Factors in Computing Systems, 2001, pp213-220.
[27] Buyukkokten, O., Garcia-Molina, H. and Paepcke, A., 2001, “Seeing the Whole in Parts: Text Summarization for Web Browsing on Handheld Devices”, Proc. of WWW-10, May 1-5, 2001, Hong Kong.
[28] Chen, J.L., Zhou, B.Y., Shi, J., Zhang, H.J. and Wu, Q.F., 2001, “Function-based Object Model Towards Website Adaptation”, Proc. of WWW-10, May 1-5, 2001, Hong Kong.
[29] Rahman, A.F.R., Alam, H., Hartono, R. and Ariyoshi, K., 2001, “Automatic Summarization of Web Content to Smaller Display Devices”, In: Post Presentations of 6th International Conference on Document Analysis and Recognition, Seattle, The United States, Sept. 10-13, 2001.
[30] D. Buttler and L. Liu., 2001, “A Fully Automated Object Extraction System for the World Wide Web”, In Proceedings of ICDCS-2001, 2001.
[31] P. Mohapatra and H. Chen., 2001, “A Framework for Managing QoS and Improving Performance of Dynamic Web Content”, In Proceedings of GLOBECOM-2001, November 2001.
[32] K. Ducatel, M. Bogdanowicz, F. Scapolo, J. Leijten , and J.C. Burgelman, 2001,
“ISTAG Advisory Group Report on Scenarios for Ambient Intelligence in 2010”, available on http://www.hltcentral.org/usr_docs/ISTAG-Final.pdf
[33] L. Zhijun and N.D. Georganas, 2001, “Context-based Media Adaptation in
Pervasive Computing”, Electrical and Computer Engineering, Canadian
Conference on Volume 2, May 2001.
[34] N.R. Adam, V. Athuri, I. Adiwiyaya, S. Banerjee, and R. Holowczak, 2001, “A Dynamic Manifestation Approach for Providing Universal Access to Digital Library Objects”, IEEE Transactions on Knowledge and Data Engineering, Volume. 13, No. 4, pp. 705-716, January.
[35] Gu, X.D., Chen, J.L., Ma, W.Y., Chen, G.L., 2002, “Visual Based Content Understanding towards Web Adaptation”, 2nd Intl. Conf. on Adaptive Hypermedia and Adaptive Web Based Systems (Malaga, Spain, May 2002), pp164-173.
[36] Milic-Frayling, N. and Sommerer, R, 2002, “SmartView: Flexible Viewing of Web Page Contents”, Poster paper at the Eleventh World Wide Web Conference, Hawaii, 2002 http://www2002.org/CDROM/poster/172/
[37] Wang, Y.L. and Hu, J.Y., 2002, “A Machine Learning Based Approach for Table Detection on the Web”, Proc. Of WWW2002, May 7-11, 2002, Honolulu, Hawaii, USA.
[38] H. Bahn, H. Lee, S. H. Noh, S. L. Min, and K. Koh., 2002, “Replica-Aware Caching for Web Proxies”, Computer Communications, 25(3), 2002.
[39] Z. Bar-Yossef and S. Rajagopalan., 2002, “Template Detection via Data Mining and its Applications”, In Proceedings of WWW-2002, May 2002.
[40] K. S. Candan, D. Agrawal, W.-S. Li, O. Po, and W.-P. Hsiung., 2002, “View Invalidation for Dynamic Content Caching in Multi tiered Architectures”, In Proceedings of VLDB-2002, September 2002.
[41] A. Datta, K. Dutta, H. Thomas, D. VanderMeer, Suresha, and K. Ramamritham. , 2002, “Proxy-Based Accelaration of Dynamically Generated Content on the World Wide Web: An Approach and Implementation”, In Proceedings of SIGMOD-2002, June 2002.
[42] T. Kelly and J. Mogul. , 2002, “Aliasing on the World Wide Web: Prevalence and Performance Implications”, In Proceedings of the 11th International World Wide Web Conference, May2002.
[43] L.Q. Chen, X. Xie, Fan X., W.Y. Ma, H.J. Zhang, H.Q. Zhou, and H.Q. Feng, 2002a, “DRESS: A Slicing Tree Based Web Representation for Various Display Sizes”, Technical report MSR-TR-2002-126, Microsoft Research.
[44] L.Q. Chen, X. Xie, X. Fan, W.Y. Ma, H.J. Zhang, and H.Q. Zhou, 2002b, “A Visual Attention Model for Adapting Images on Small Displays”, Technical report MSR-TR-2002-125, Microsoft Research.
[45] T. Lemlouma and N. Layaida, 2002, “Universal Profiling for Content Negotiation and Adaptation in Heterogeneous Environments”, W3C Workshop on Delivery Context. W3C/INRIA Sophia-Antipolis, France, 4-5 March 2002.
[46] T. Phan, G. Zorpas, and R. Bagrodia, 2002, “An Extensible and Scalable Content Adaptation Pipeline Architecture to Support Heterogeneous Clients”, Proceedings of the 22nd International Conference on Distributed Computing Systems, pp. 507-516, Austria.
[47] W.Y. Lum and, F.C.M. Lau, 2002, “A Context-Aware Decision Engine for Content Adaptation”, IEEE Pervasive computing, Volume 1, No.3, pp.41-49.
[48] S. Yu, D. Cai, J.-R. Wen, and W.-Y. Ma., 2003, “Improving pseudo-relevance feedback in web information retrieval using web page segmentation” ,In Proceedings of the Twelfth International World Wide Web Conference, WWW2003, pp. 11-18, Budapest, Hungary, May 20-24, 2003.
[49]Y. Chen, W.-Y. Ma, and H.-J. Zhang, 2003, “Detecting web pages structure for adaptive viewing on small form factor devices”, In Proceedings of the Twelfth International World Wide Web Conference, WWW2003, pp. 225-266, Budapest, Hungary, May 20-24, 2003.
[50] F. Douglis and A. Iyengar., 2003, “Application-Specific Delta Encoding Via Resemblance Detection”, In Proceedings of the USENIX Annual Technical Conference, June 2003.
[51] M. Naaman, H. Garcia-Molina, and A. Paepcke., 2003, “Evaluation of ESI and Class-Based Delta Encoding”, In Proceedings of WCW - 2003.
[52] S. C. Rhea, K. Liang, and E. Brewer., 2003, “Value-Based Web Caching”, In Proceedings of 12th WWW Conference, 2003.
[53] T. Suel, P. Noel, and D. Trendafilov., 2003, “Improved File Synchronization Techniques for Maintaining Large Replicated Collections Over Slow Networks”, In Proceedings of ICDE 2004, March 2004. To appear.
[54] A. Kinno, Y. Yonemoto, T. Nakayama and M. Etoh, 2003, “Environment adaptive XML Transformation and Its Applications to Content Delivery”, In Proceedings of 2003 Symposium on Applications and the Internet (SAINT2003), January.
[55] A. Pashtan, S. Kollipara, and M. Pearce, 2003, “Adapting Content for Wireless Web Service”, IEEE Internet Computing, Volume 7, No. 5, pp. 79-85. 8. F.H. Ernest, 2003, Jess in Action: Java Rule-Based Systems, Manning Publications.
[56] S. Toivonen, J. Kolari, and T. Laakko, 2003, “Facilitating Mobile Users with Contextualized Content”, Artificial Intelligence in Mobile System Workshop, USA.
[57] T. Lemlouma and N. Layaida, 2003, “Adapted Content Delivery for Different Contexts, 2003 Symposium on Applications and the Internet”, Florida, USA, pp 190 – 197.
[58] V.W.M. Kwan, R.C.M. Lau, and C.L. Wang, 2003, “Functionality Adaptation: a Contest-aware Service Code Adaptation for Pervasive Computing Environments”, Web Intelligence (WI 2003), IEEE/WIC International Conference on 13-17 Oct., pp. 358-364.
[59] Y.W. Lee, G. Chandranmenon, and S.C. Miller, 2003, “GAMMA: A Content Adaptation Server for Wireless Multimedia Applications”, Bell-Labs, Technical Report, 2003.
[60] T. Lemlouma and N. Layaida, 2004, “Context-aware Adaptation for Mobile Devices”, 2004 IEEE International Conference on Mobile Data Management, pp. 106–111, USA.
[61] S.J.H. Yang, B.C.D. Wu, and N.W.Y. Shao, 2004, “Content Model applied to HTML Content Adaptation”, 9 th TAAI, Sept.
[62] G. Berhe L. Brunie, and J.M. Pierson, 2004, “Modeling Service-based Multimedia Content Adaptation in Pervasive Computing”, Proceedings of the first conference on computing frontiers on Computing frontiers, pp. 60-69, Ischia , Italy, April.
[63] B. Kurz, I. Popescu, and S. Gallacher, 2004, “FACADE - a Framework for Context-aware Content Adaptation and Delivery”, Second Annual Conference on Communication Networks and Services Research, pp. 46–55, Canada.
[64] D. Wagelaar, 2004, “Towards a Context-Driven Development Framework for Ambient Intelligence”, Proceeding of the 24th International conference on Distributed Computing Systems Workshops, pp. 304-309, Japan.
[65] A. Kinno, H. Yukitomo, and T. Nakayama, 2004, “An Efficient Caching Mechanism for XML Content Adaptation”, the 10th International Multimedia Modeling Conference, pp.308-315.
[66] P. Kulkarni, F. Douglis, J. LaVoie, and J. Tracey. , 2004, “Redundancy Elimination Within Large Collections of Files”, In Proceedings of the USENIX Annual Technical Conference, June 2004. To appear.
[67] J. Mogul, Y. Chan, and T. Kelly. , 2004, “Design, Implementation, and Evaluation of Duplicate Transfer Detection in HTTP”, In Proceedings of NSDI ’04, March 2004. To appear.
[68] P. L. Emiliani and C. Stephanidis, 2005, “Universal Access to Ambient Intelligence Environments: Opportunities and Challenges for People with Disabilities”, IBM System journal, Volume 44, No.3, pp. 605-619.
[69] S.J.H. Yang, N.W.Y. Shao, and J.Y. Chung, 2005a, “Pervasive Content Access for Service Oriented Mobile Commerce”, Seventh IEEE Conference on E-Commerce Technology, pp. 523-526, Germany.
[70] S.J.H. Yang and N.W.Y. Shao, 2005b, “Enhancing Pervasive Web Accessibility with Rule-Based Adaptation Strategy”, Expert Systems With Applications, 32(4), to be published in August 2005.
[71] L. Ramaswamy, A. Iyengar, L. Liu, and F. Douglis, 2005, “Automatic Fragment Detection in Dynamic Web Pages and Its Impact on Caching”, IEEE Transactions on Knowledge and Data engineering, Volume 17, No. 6, pp.859-874.
[72] M.T. Chebbine, A. Obaid, S. Chebbine, and R. Johnston, 2005, “Internet Content Adaptation System for Mobile and Heterogeneous Environment”, Wireless and Optical Communications Networks 2005 (WOCN 2005), Second IFIP International Conference on March 6-8, pp. 346-350.
[73] Stephen J.H. Yang, Jia Zhang, Rick C.S. Chen, and Norman W.Y. Shao, 2007, “A UOI-Based Content Adaptation Method for Improving Web Content Accessibility in the Mobile Internet”, ETRI Journal.