利用資料探勘改善代理伺服器預先擷取效率之研究

簡易檢索 / 詳目顯示

回結果列表

研究生：	林書呈 Shu-Cheng Lin
論文名稱：	利用資料探勘改善代理伺服器預先擷取效率之研究 A Data Mining Algorithm to Enhance Proxy Prefetching
指導教授：	張瑞益 Ray-I Chang 陳彥良 Yen-Leiang Chen
口試委員:
學位類別：	碩士 Master
系所名稱：	管理學院 - 資訊管理學系 Department of Information Management
畢業學年度：	92
語文別：	中文
論文頁數：	61
中文關鍵詞：	關聯規則、階層式分群、預先擷取
外文關鍵詞：	Hierarchical Clustering, Association Rule, Prefetching
相關次數：	點閱：15 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著網路的快速成長，使用網路的人口也急速增加。多元化的網路應用發展，造成大量數位化的資料需要透過網路來傳送，但由於頻寬發展速度跟不上數位內容的快速成長，使得資料傳送上會有延遲的現象，整體網路的服務品質也跟著降低。透過代理伺服器的快取功能以及預先擷取的方式，可以有效降低使用者等待傳輸的時間，讓資源使用更有效率。因此如何進一步提升代理伺服器的快取效能及預先擷取命中率，便成為一項重要的研究議題。
本研究試圖利用資料探勘的技術，改進目前預先擷取演算法的缺點。首先我們分析存取記錄檔(Access Log)中的各欄位，對記錄檔進行前處理的工作，過濾掉會影響分析的紀錄。接下來本研究利用關聯規則的概念，提出Portal-aware的預先擷取演算法，對跨網站的瀏覽記錄進行強化，有助於提升預先擷取的效率。我們也使用網頁資訊價值的概念，以網頁資訊價值取代原本單純使用存取次數的計算方式，讓預先擷取的結果更具效益。最後我們考量使用者在不同時間會有不同的瀏覽趨勢，提出一套階層式分群演算法來找出使用者瀏覽趨勢相近的時間分區。實驗結果顯示所提出的預先擷取方法可以提升預先擷取效率，而時間的分區方法符合使用者瀏覽趨勢，讓代理伺服器能夠提供更好的服務品質。

摘    要	i
致 謝 辭	ii
目　　錄	iii
圖 目 錄	v
表 目 錄	vi
第1章　緒論	1
1 研究動機	1
2 研究目的	3
3 論文架構	4
第2章  文獻探討	6
1 預先擷取演算法	6
1.1 Access Tree 演算法	6
1.2 Domain-Top 演算法	8
2 資料探勘(Data Mining)	10
2.1 關聯規則模式(Association rule)	10
2.2 分群方法(Clustering)	12
2.3 網頁探勘(Web Mining)	15
第3章　改善代理伺服器預先擷取效率之方法	18
1 存取記錄檔的前處理	19
1.1 問題描述與基本構想	19
1.2存取記錄檔欄位分析	20
2 Portal-aware熱門清單的產生	22
2.1 問題描述與基本構想	22
2.2 關聯規則產生Portal-aware熱門清單	23
3網頁資訊價值的使用	27
3.1問題描述與基本構想	27
3.2 網頁的資訊價值	27
3.3 網頁資訊價值的熱門清單	29
4利用Data Mining分析瀏覽趨勢週期性	30
4.1 問題描述與基本構想	30
4.2 階層式分群演算法	31
4.3 分析時間趨勢演算法-聚合式	31
第4章　系統模擬與效能評估	34
1 系統及資料來源介紹	34
1.1模擬環境	34
1.2資料來源	34
2 效能度量基準	35
3 分析Portal-aware實驗結果	37
3.1 參數設定	37
3.2 結果分析	39
4 分析資訊價值實驗結果	40
4.1 參數設定	40
4.2 結果分析	42
5 分析瀏覽趨勢週期性實驗結果	47
5.1參數設定	47
5.2 結果分析	47
第5章　結論與未來研究方向	50
1 結論	50
2 研究貢獻	51
3 未來研究方向	52
附    錄	54
參考文獻	59

                                

[1] 陳桂慧，民89，WWW代理伺服器的部分快取置換策略，元智大學電機與資訊工程研究所碩士論文。
[2] 江巧雯，民90，長時間序列叢集化之研究，元智大學資訊管理研究所碩士論文。
[3] 許毅嘉，民90，關聯法則應用於代理伺服器上之快取置換機制，國立中興大學資訊科學研究所碩士論文。
[4] 黃毓莉，民90，行動商務中快取機制之研究，國立交通大學資訊管理研究所碩士論文。
[5] 黃安賜，民90，階層式代理伺服器以不同階層為基礎之動態雜湊負載平衡機制，元智大學資訊工程學系碩士論文。
[6] 王敏傑，民91，一個針對快取以使用者行為為基礎之預先擷取機制，國立交通大學資訊管理研究所碩士論文。
[7] 林育臣，民91，群聚技術之研究，朝陽科技大學資訊管理研究所碩士論文。
[8] 陸津華，民92，挖掘高獲利性關聯規則之研究，私立東海大學資訊工程與科學研究所碩士論文。
[9] 蘇宇威，民92，利用資料探勘方法建立自動化代理伺服器預載排程，銘傳大學資訊管理研究所碩士論文。
[10] 黃汝棋，民92，考慮文件資訊價值之快取置換策略，朝陽科技大學資訊管理研究所碩士論文。
[11] 白典正，民92，一個以內容為基礎的代理伺服器演算法，國立中央大學資訊管理研究所碩士論文。
[12] 中央大學電子計算機中心：http://www.cc.ncu.ed.tw/
[13] Squid代理伺服器：http://www.squid-cache.org/
[14] Net Beans網頁http://www.netbeans.org/
[15] 尹相志，民92，SQL 2000 Analysis Service資料採礦服務，維科圖書有限公司。
[16] 經濟部技術處產業電子化指標與標準研究計畫/資策會ACI-FIND
[17] Cherkasova, L. (1998). “Improving WWW Proxies Performance with Greedy-Dual Size-Frequency Caching Policy,” HP Computer Systems Laboratory.
[18] Cooley, R., B. Mobasher, and J. Srivastava (1997). “Web Mining: Information and Pattern Discovery on the World Wide Web,” Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence.
[19] Ester, M., H. P. Kriegel, J. Sander and X. Xu (1996). “Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” Proceedings of the 1996 International Conference Knowledge Discovery and Data Mining, 226-231.
[20] Evangelos, P. M. and C. E. Chronaki (1998). “A Top-10 Approach to Prefetching on the Web,” Processings of the INET’98 Geneva, Swizerland, 276-290.
[21] Guha, S., R. Rastogi, and K. Shim (1998). “CURE: An Efficient Clustering Algorithm for Large Databases,” Proceedings of the 1998 ACM-SIGMOD International Conference Management of Data (SIGMOD’98), 73-84.
[22] Guha, S., R. Rastogi, and K. Shim (1999). “ROCK: A Robust Clustering Algorithm for Categorical Attribute,” Proceedings of the 1999 International Conference Data Engineering (ICDE’99), 512-521.
[23] Park, J. S., M. S. Chen, and P. Yu (1995). “An Efficient Hash-Based Algorithm for Data Mining Association Rules,” Proceedings of ACM SIGMOD, 175-186.
[24] Lorenzetti, P., L. Rizzo, and L. Vicisano (2000). “Replacement Policies for A Proxy Cache,” IEEE / ACM Transaction on Networking, 8 (2), 158-170.
[25] Swaminathan, N. and S.V. Raghavan (2000). “Intelligent Prefetch in WWW Using Client Behavior Characterization,” International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 13-19.
[26] Agrawal, R., and R. Srikant (1994). “Fast Algorithms for Mining Association Rules,” Proceedings of the 20th International Conference on Very Large Databases.
[27] Schechter, S., M. Krishnan, and M. D. Smith (1998). “Using Path Predict HTTP Request,” Proceedings of the 7th International World Wide Web Conference.
[28] Seung Won Shin, Byeong Hag Seong, and Daeyeon Park (2000). “Improving World-Wide-Web Performance Using Domain-Top Approach to Prefetching,” Proceedings of the 4th International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region, 2, 738-746.
[29] Spiliopoulou, M., and L. C. Faulstich (1998). “WUM: A Tool for Web Utilization Analysis,” In EDBT Workshop WebDB''98.
[30] Wang W., Yang and R. Muntz (1997). “STING: A Statistical Information grid Approach to Spatial Data Mining,” Proceedings of the 1997 International Conference Very Large Data Bases (VLDB’97), 186-195.
[31] Witten, I. H. and E. Frank (2000). Data Mining, Morgan Kaufmann Publishers., San Francisco

簡易檢索 / 詳目顯示

相關論文