基於Hadoop系統的雲端應用程式特徵擷取與計算監測架構

簡易檢索 / 詳目顯示

回結果列表

研究生：	劉勝豪 Sheng-Hao Liu
論文名稱：	基於Hadoop系統的雲端應用程式特徵擷取與計算監測架構 A Profiling and Monitoring Framework for Cloud Applications on Hadoop System
指導教授：	王尉任 Wei-Jen Wang
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering
畢業學年度：	98
語文別：	中文
論文頁數：	48
中文關鍵詞：	特徵擷取、監測服務、雲端計算
外文關鍵詞：	Profiling, Monitoring, Hadoop, Cloud Computing
相關次數：	點閱：15 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近年來由於雲端計算技術的快速發展，越來越多的領域採用雲端計算技術來協助相關的計算，並相繼開發出各種類型的應用程式。這些應用程式會佔用了大量的計算資源，因此需要透過適當的監測軟體來提供系統資訊，使該雲端系統可以據此調整計算資源的使用以提升系統的整體效能與使用者的滿意度。除此之外，如果雲端系統是屬於收費型的系統，則該系統就需要提供精確的使用者資源使用狀況，並根據使用狀況去收費。但是目前的監測系統多是單純的監測硬體資源使用狀況，例如：CPU使用率、Memory使用量、網路傳輸速率等，不足以提供雲端系統實際運作的需求。因此，我們引進了 Application-Aware的概念，在 Hadoop 系統上以現有的監測系統為基礎去發展一個應用程式特徵擷取與監控（Application Profiling and Monitoring）架構。這個系統能夠把複雜的計算工作對應關係隱藏起來，使得管理者可以用更簡單的方式去掌握應用程式。
這個系統主要包含三個元件：Application-Aware Profiling Agent、Profiling Database、Filter。
Application-Aware Profiling Agent安裝在每個計算節點上以紀錄各個雲端應用程式的執行情況（例如執行時間、CPU 使用狀況等資訊），這些監測資訊將透過Filter來擷取出對應關係，並將資料送到 Profiling Database 儲存。透過這一個系統，我們就可以掌握各個雲端應用程式的執行情況。除此之外，系統提供特徵擷取的功能，因此可以對計算工作的屬性進行分類。
基於使用者付費與服務品質的概念，雲端計算系統中採用越穩定的計算資源進行計算的使用者，就應付出較多的成本，當然雲端計算系統對於使用者的保障當然也要越大。因此，這個研究最大的貢獻就是提高監測的階層，以應用程式作為監測對象，提供一個監測應用程式的機制，劃分出應用程式與使用者的等級，才能讓雲端系統設計出保障各等級的使用者的方法。此外，系統處理過的資料也能夠回饋給管理者以及系統的資源調整機制，進而達到在動態的雲端計算環境中去支援動態系統調整，並保障使用者可佔用的計算資源份量。

The emerging cloud computing technology provides on-demand, powerful computing platforms for many complex scientific and industrial applications. They usually consume lots of computing resources and execute concurrently on a cloud platform. Therefore, a cloud system demands a good monitoring and profiling framework to keep track of users’ applications, and uses the observed information for system management purposes, such as process deployment, application optimization, and load balancing. A pay-per-use cloud system can also charge their customers ac-cording to the observed application usage. However, existing monitoring systems focus on hardware monitoring, such as CPU usage, memory usage, and network bandwidth usage. They have no clue of how users'' applications utilize the system re-sources. As a result, we introduce the concept of application-aware monitoring to improve existing cloud monitoring systems, and develop an application profiling and monitoring framework based on a cloud system, Hadoop.
The proposed framework does not present low-level views of jobs, tasks, and local processes. Instead, it provides a more integrated, abstract view for cloud appli-cations. The proposed architecture is comprised of three components --- the applica-tion-aware profiling agents, filters, and the profiling database. The application-aware profiling agents are installed on every computing node to record the execution status of users’ applications. The observed information is then sent to the filters for prelim-inary processing. The filters extract the mapping relations, save the results as inter-mediate files, and deliver the files to the profiling database. In addition, our system provides a classification service that utilizes the profiling data to classify cloud ap-plications. It helps users and administrators optimize their applications. The major difference between our system and other existing systems is that our system is appli-cation-oriented, while others are mostly hardware-oriented. The major contribution of our system is that it can integrate the information of users, applications, jobs, processes, and resources. When problems arise in a cloud system, applications with high performance guarantee can be identified easily to get timely service. Cloud ser-vice providers can also take advantage of our system to develop a set of billing strat-egies, to create different service-level agreements, and to protect the rights for dif-ferent customers who pay different amount of money. Furthermore, the processed data can be sent to the load-balancing service of a cloud system to support dynamic system reconfiguration and improve resource utilization rate.

摘要	ii
Abstract	iv
目錄	vi
圖目錄	viii
表目錄	ix
第一章	緒論	1
1-1	研究動機	1
1-2	研究目的	3
1-3	研究主要貢獻	5
1-4	文章架構	6
第二章	相關研究	7
2-1	監測系統	7
2-2	Hadoop簡介	7
第三章	系統架構	9
第四章	系統元件	14
4-1	Application-Aware Profiling Agent	14
4-2	Profiling Database	16
4-3	Filters	16
4-4	Web Interface	17
第五章	系統元件實做與環境	19
5-1	Testbed	19
5-2	Application-Aware Profiling Agent	20
5-3	Profiling Database	22
5-4	Filters	23
5-5	Web Interface	25
第六章	未來研究方向	30
6-1	容錯能力（Fault Tolerance）	30
6-2	可擴充性（Scalability）	30
6-3	調適控制（Adaptive Control）	30
6-4	網頁介面擴充（Improve Web Interface）	31
第七章	結論	32
參考文獻	33

                                

[1] Amazon.com. Amazon EC2.
[2] Apache Hadoop project. (a). Hadoop MapReduce.
[3] Apache Hadoop project. (b). HBase.
[4] Apache Hadoop project. (c). HDFS.
[5] Apache Hadoop project. (d). Hive.
[6] Apache Hadoop project. (e). Pig.
[7] Appleby, K., Fakhouri, S., Fong, L., Goldszmidt, G., Kalantar, M., Krishnaku-mar, S., et al. (2001). Oceano-SLA based management of a computing utility. Paper presented at the Proceedings of the 7th IFIP/IEEE International Sympo-sium on Integrated Network Management, , 5
[8] Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R. H., Konwinski, A., et al. (2009). Above the clouds: A berkeley view of cloud computing. EECS Department, University of California, Berkeley, Tech.Rep.UCB/EECS-2009-28,
[9] Audience, I., & Bios, I. Data-intensive text processing with MapReduce.
[10] Balaton, Z., & Gombás, G. (2003). Resource and job monitoring in the grid. Euro-Par 2003 Parallel Processing, , 404-411.
[11] Baliś, B., Bubak, M., Funika, W., Szepieniec, T., & Wismüller, R. An infra-structure for grid application monitoring. Recent Advances in Parallel Virtual Machine and Message Passing Interface, , 41-49.
[12] Balis, B., Bubak, M., Funika, W., Szepieniec, T., Wismüller, R., & Radecki, M. Monitoring grid applications with grid-enabled OMIS monitor. Grid Computing, 230-239.
[13] Barth, W. (2008). Nagios: System and network monitoring No Starch Press San Francisco, CA, USA.
[14] Bialecki, A., Cafarella, M., Cutting, D., & O’Malley, O. (2005). Hadoop: A framework for running applications on large clusters built of commodity hard-ware.
[15] Borthakur, D. (2007). The hadoop distributed file system: Architecture and de-sign. Hadoop Project Website,
[16] Buyya, R., Yeo, C. S., & Venugopal, S. (2008). Market-oriented cloud compu-ting: Vision, hype, and reality for delivering it services as computing utilities. 10th IEEE International Conference on High Performance Computing and Communications, 2008. HPCC''08, 5-13.
[17] Calheiros, R. N., Ranjan, R., De Rose, C. A. F., & Buyya, R. (2009). CloudSim: A novel framework for modeling and simulation of cloud computing infrastruc-tures and services. Arxiv Preprint arXiv:0903.2525,
[18] Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., et al. (2006). Bigtable: A distributed storage system for structured data. To Ap-pear in OSDI, , 1.
[19] Chaudhuri, S., & Dayal, U. (1997). An overview of data warehousing and OLAP technology. ACM Sigmod Record, 26(1), 65-74.
[20] Chunghwa Telecom Co., L. CHT HiCloud CaaS.
[21] Cooke, A., Gray, A., Nutt, W., Magowan, J., Oevers, M., Taylor, P., et al. (2004). The relational grid monitoring architecture: Mediating information about the grid. Journal of Grid Computing, 2(4), 323-339.
[22] Czajkowski, K., Fitzgerald, S., Foster, I., & Kesselman, C. (2001). Grid infor-mation services for distributed resource sharing. 10th IEEE International Sym-posium on High Performance Distributed Computing, , 184
[23] Czajkowski, K., Foster, I., Kesselman, C., Sander, V., & Tuecke, S. (2002). SNAP: A protocol for negotiating service level agreements and coordinating re-source management in distributed systems. Job Scheduling Strategies for Pa-rallel Processing, 153-183.
[24] Dean, J., & Ghemawat, S. (2008). Map reduce: Simplified data processing on large clusters. Communications of the ACM-Association for Computing Machi-nery-CACM, 51(1), 107-114.
[25] Figueiredo, R., Dinda, P., & Fortes, J. (2003). A case for grid computing on vir-tual machines. Distributed Computing Systems, 2003. Proceedings. 23rd Inter-national Conference on, 550-559.
[26] Foster, I., & Kesselman, C. (2004). The grid: Blueprint for a new computing infrastructure Morgan Kaufmann.
[27] Foster, I., Zhao, Y., Raicu, I., & Lu, S. (2008). Cloud computing and grid com-puting 360-degree compared. Grid Computing Environments Workshop, 2008. GCE''08, 1-10.
[28] Ghemawat, S., Gobioff, H., & Leung, S. T. (2003). The google file system. ACM SIGOPS Operating Systems Review, 37(5), 43.
[29] GridLab: A grid application toolkit and testbed.
[30] Hasselmeyer, P., Mersch, H., Koller, B., Quyen, H., Schubert, L., & Wieder, P. (2007). Implementing an SLA negotiation framework. Exploiting the Knowledge Economy: Issues, Applications, Case Studies (eChallenges 2007),
[31] Houstis, E. N., Catlin, A. C., Rice, J. R., Verykios, V. S., Ramakrishnan, N., & Houstis, C. E. (2000). PYTHIA-II: A knowledge/database system for managing performance data and recommending scientific software. ACM Transactions on Mathematical Software (TOMS), 26(2), 227-253.
[32] Huang, W., Liu, J., Abali, B., & Panda, D. K. (2006). A case for high perfor-mance computing with virtual machines. Proceedings of the 20th Annual Inter-national Conference on Supercomputing, 134.
[33] Iosup, A., Ţãpuş, N., & Vialle, S. A monitoring architecture for control grids. Advances in Grid Computing-EGC 2005, , 922-931.
[34] Jang, S. H., Wu, X., Taylor, V., Mehta, G., Vahi, K., & Deelman, E. (2004). Us-ing performance prediction to allocate grid resources. Texas A&M University, College Station, TX, GriPhyN Technical Report, 25
[35] Jin, C., & Buyya, R. (2009). MapReduce programming model for .NET-based distributed computing. Proc. 15th European Conference on Parallel Processing (Euro-Par 2009),
[36] Khare, R., Cutting, D., Sitaker, K., & Rifkin, A. (2004). Nutch: A flexible and scalable open-source web search engine. Oregon State University,
[37] Krauter, K., Buyya, R., & Maheswaran, M. ATaxonomy and survey of grid re-source management systems.
[38] Lin, J., & Dyer, C. (2009). Data-intensive text processing with MapReduce. Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Tutorial Abstracts, 1-2.
[39] Litke, A., Konstanteli, K., Andronikou, V., Chatzis, S., & Varvarigou, T. (2008). Managing service level agreement contracts in OGSA-based grids. Future Gen-eration Computer Systems, 24(4), 245-258.
[40] Ludwig, A., Braun, P., Kowalczyk, R., & Franczyk, B. A framework for auto-mated negotiation of service level agreements in services grids. Business Process Management Workshops, 89-101.
[41] Massie, M. L., Chun, B. N., & Culler, D. E. (2004). The ganglia distributed monitoring system: Design, implementation, and experience. Parallel Computing, 30(7), 817-840.
[42] Paurobally, S., Tamma, V., & Wooldrdige, M. (2007). A framework for web ser-vice negotiation. ACM Transactions on Autonomous and Adaptive Systems (TAAS), 2(4), 14.
[43] Ribler, R., Vetter, J., Simitci, H., & Reed, D. (1998). Autopilot: Adaptive control of distributed applications. High Performance Distributed Computing, 1998. Proceedings. the Seventh International Symposium on, 172-179.
[44] Sacerdoti, F. D., Katz, M. J., Massie, M. L., & Culler, D. E. (2003). Wide area cluster monitoring with ganglia. Proceedings of the IEEE Cluster 2003 Confe-rence,
[45] Sahai, A., Machiraju, V., Sayal, M., Van Moorsel, A., Casati, F., & Jin, L. J. (2002). Automated SLA monitoring for web services. Lecture Notes in Comput-er Science, , 28-41.
[46] Seidel, J., Waldrich, O., Ziegler, W., Wieder, P., & Yahyapour, R. Using SLA for resource management and scheduling-a survey. Grid Middleware and Servic-es-Challenges and Solutions, 8
[47] Staten, J. (2008). Is cloud computing ready for the enterprise? Forrester Re-search, March, 7
[48] Stephens, A. OverView: A framework for generic online visualization of distri-buted systems.
[49] Tierney, B., Aydt, R., Gunter, D., Smith, W., Swany, M., Taylor, V., et al. (2002). A grid monitoring architecture. The Global Grid Forum GWD-GP-16-2,
[50] Tierney, B., & Gunter, D. (2003). NetLogger: A toolkit for distributed system performance tuning and debugging. Proceedings of the 8th IFIP/IEEE Interna-tional Symposium on Integrated Network Management,
[51] University of California, Berkeley Ganglia.
[52] Varela, C., & Agha, G. (2001). Programming dynamically reconfigurable open systems with SALSA. ACM SIGPLAN Notices, 36(12), 34.
[53] Vraalsen, F., Aydt, R., Mendes, C., & Reed, D. (2001). Performance contracts: Predicting and monitoring grid application behavior. Grid Computing—GRID 2001, , 154-165.
[54] Waheed, A., Smith, W., George, J., & Yan, J. An infrastructure for monitoring and management in computational grids. Languages, Compilers, and Run-Time Systems for Scalable Computers, , 619-628.
[55] White, T. (2009). Hadoop: The definitive guide O''Reilly Media, Inc.
[56] Yang, H., Dasdan, A., Hsiao, R. L., & Parker, D. S. (2007). Map-reduce-merge: Simplified relational data processing on large clusters. Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, 1040.
[57] Zanikolas, S., & Sakellariou, R. (2005). A taxonomy of grid monitoring systems. Future Generation Computer Systems, 21(1), 163-188.
[58] Zhang, X., Freschl, J., & Schopf, J. (2003). A performance study of monitoring and information services for distributed systems. 12th IEEE International Sym-posium on High Performance Distributed Computing, 2003. Proceedings, 270-281.
[59] Zhu, X., Young, D., Watson, B. J., Wang, Z., Rolia, J., Singhal, S., et al. (2009). 1000 islands: An integrated approach to resource management for virtualized data centers. Cluster Computing, 12(1), 45-57.

簡易檢索 / 詳目顯示

相關論文