| 研究生: |
黃惠筠 Huei-Yun Huang |
|---|---|
| 論文名稱: |
雲端系統之二階層虛擬機器高可靠度保護機制 Two-Layers High Availability Protection for Virtual Machine in Cloud System |
| 指導教授: | 梁德容 |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2018 |
| 畢業學年度: | 106 |
| 語文別: | 中文 |
| 論文頁數: | 72 |
| 中文關鍵詞: | 雲端運算 、OpenStack 、高可用性 、虛擬機器 、Libvirt 、IPMI 、軟體定義運算叢集 |
| 外文關鍵詞: | Cloud Computing, OpenStack, High Availability, Virtual Machine, Libvirt, IPMI, Software-Defined High Availability Cluster |
| 相關次數: | 點閱:8 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來雲端運算技術日益成熟,大多數企業都選擇將其服務佈署至雲端環境運行,由於雲端技術所帶來的擴展性與方便性,雲端環境相較於實體環境更能動態調整並有效管理運算資源,隨著開源雲端平台—OpenStack在不斷的推出更加完善的版本,也逐漸成為企業建立雲端平台的選擇之一。
因使用者將業務部署於雲端平台,且由雲端平台之運算單位虛擬機器提供服務,為使虛擬機器所提供之服務不中斷,故雲端的高可用性(High Availability, HA)將相對重要,然而OpenStack的HA皆針對管理節點之服務進行保護,對於虛擬機器的維護較不完善,因此本研究提出軟體定義運算叢集(Software-Defined High Availability Cluster, SDHAC)的機制,針對叢集內部之虛擬機器發展一套自動化錯誤偵測與復原機制,透過Libvirt服務的即時偵測以及OpenStack的虛擬機管理服務,確保虛擬機器於運算節點維持正常運行之狀態,使用者不須人工介入處理虛擬機器停擺的問題。
為避免因虛擬機所屬的運算節點發生軟硬體異常,而造成虛擬機器服務停擺,本研究結合IPMI (Intelligent Platform Management Interface)進行偵測復原機制,透過IPMI取得運算節點之感測器資訊,可即時監控運算節點之狀態,若節點狀態異常,本研究將會即時遷移(Live Migrate)虛擬機器,以避免運算節點發生錯誤,並造成虛擬機器服務中斷的情況,若運算節點已無預警發生故障,則將虛擬機器錯誤轉移至叢集中另一正常執行之運算節點,並針對異常運算節點進行偵測復原機制,以提高OpenStack針對虛擬機器之高可用性。
In recent years, cloud computing technology has become more mature. Because of its elasticity and manageability, most enterprises decide to deploy their business on their virtualized cloud platform. Compare with deploying date center, cloud platform is more convenient to dynamically adjust and effectively manage computing source. With the open source cloud platform, OpenStack, is constantly released a better version. It has gradually become one of the choices for enterprises to build their private cloud computing platform.
Because enterprises deploy their business on cloud platform to serve their clients, and those services are provided by virtual machines. In order to keep those services running, high availability(HA) for the cloud platform will be relatively important. However, the HA mechanism of OpenStack is only for those services of controller node. It is incomplete for virtual machine protection, therefore this study proposes Software-Defined High Availability Cluster(SDHAC) mechanism to automatically detect HA virtual machines and recover their failure.
The detection mechanism uses libvirt API to real-time monitor virtual machine events, and the recovery mechanism use OpenStack API to recover virtual machine failure. Let virtual machines keep running, users don’t need to fix virtual machines failure by themselves.
In order to avoid virtual machines abnormalities which are caused by hardware and software problem of computing nodes.
This study combined with IPMI (Intelligent Platform Management Interface) to detect and recover computing node, and read sensor information.
If the sensor information of the nodes is critical, our system will immediately migrate (Live Migrate) those virtual machines to avoid errors in the computing nodes and cause the virtual machine services to be interrupted.
If the computing node occur no excepted failure, HA virtual machines are failovered to another normal computing node in the HA cluster and recover abnormal computing node to improve OpenStack's high availability for the virtual machine.
[1] Michael Armbrust et al., "Above the clouds: A Berkeley view of cloud computing," Dept. Electrical Eng. and Comput. Sciences, University of California, Berkeley, Rep. UCB/EECS, vol. 28, no. 13, February 10 2009.
[2] Y. Jadeja and K. Modi, "Cloud computing - concepts, architecture and challenges," in International Conference on Computing, Electronics and Electrical Technologies, 21-22 March 2012.
[3] RightScale, "RightScale 2018 State of the Cloud Report," January 2018, Available: https://www.rightscale.com/press-releases/rightscale-2018-state-of-the-cloud-report.
[4] Susanta Nanda and tzi-cker Chiueh, "A Survey on Virtualization Technologies," Department of Computer Science ,SUNY at Stony Brook, 2005.
[5] Peter Mell and Timothy Grance, "The NIST definition of cloud computing," Special Publication September 2011, Available: http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf.
[6] NV RENO, 2018, Cloud Growth Rate Increases [Online]. Available: https://www.srgresearch.com/articles/cloud-growth-rate-increases-amazon-microsoft-google-all-gain-market-share
[7] Ken Pepple, Deploying OpenStack. O'Reilly Media, Inc., August 1 2011.
[8] Kevin Jackson and Cody Bunch, OpenStack Cloud Computing Cookbook - Second Edition. Birmingham, UK: Packt Publishing, October 17 2013.
[9] August to December 2017, Average cost per hour of enterprise server downtime worldwide in 2017 and 2018 [Online]. Available: https://www.statista.com/statistics/753938/worldwide-enterprise-server-hourly-downtime-cost/
[10] J. Gray and D.P. Siewiorek, "High-availability computer systems," 1991, vol. 24, no. 9: IEEE Xplore Digital Library.
[11] RightScale, "RightScale 2017 Statr of the Cloud Report," January 2017, Available: http://assets.rightscale.com/uploads/pdfs/RightScale-2017-State-of-the-Cloud-Report.pdf.
[12] Chun-Yu Cheng, Zheng-jia Su, and Chia-Ching Chen, "Supporting software-defined HA clusters on OpenStack platform," in IEEE Applied System Innovation Conf. ICASI '17, Sapporo, Japan, 2017.
[13] A. Oliner and J. Stearley, "What Supercomputers Say: A Study of Five System Logs," in Proceedings of the 37th Annunal IEEE/IFIP International Conference on Dependable Systems and Networks. DSN '07, Washington, DC, USA, June 25-28,2007, Washington, DC, USA: IEEE Xplore Digital Library.
[14] James Hamilton, September 19,2010, Overall Data Center Costs [Online]. Available: https://perspectives.mvdirona.com/2010/09/overall-data-center-costs/
[15] Intel Corporation, April 21,2015, Intelligent Platform Management Interface (IPMI) [Online]. Available: https://www.intel.com/content/www/us/en/servers/ipmi/ipmi-home.html
[16] Libvirt web site [Online]. Available: https://libvirt.org/
[17] OpenStack, June 11,2018 OpenStack High Availability Guide [Online]. Available: https://docs.openstack.org/ha-guide
[18] Florian Haas, "Ahead of the pack: the pacemaker high-availability stack," Linux Journal, no. 216, June 18 ,2012.
[19] Mitchell Anicas, October 20, 2015, How To Create a High Availability Setup with Corosync, Pacemaker, and Floating IPs on Ubuntu 14.04. Available: https://www.digitalocean.com/community/tutorials/how-to-create-a-high-availability-setup-with-corosync-pacemaker-and-floating-ips-on-ubuntu-14-04
[20] Lars Ellenberg et al., June ,2016, The DRBD 9 User’s Guide (white paper) [Online]. Available: https://www.linbit.com/en/resources/documentation/535-drbd-users-guide-9-0/
[21] Werner Fischer, December 22,2014, IPMI Basics web site [Online]. Available: https://www.thomas-krenn.com/en/wiki/IPMI_Basics
[22] Anand Babu, November 13,2006, GNU FreeIPMI User’s Guide [Online]. Available: ftp://ftp.gwdg.de/pub/gnu/www/savannah-checkouts/gnu/freeipmi/freeipmi.pdf
[23] Tim T. Murphy, October 2004, Managing Dell PowerEdge Servers Using IPMItool [Online]. Available: https://www.dell.com/downloads/global/power/ps4q04-20040204-murphy.pdf
[24] Corey Minyard, February 10,2006, IPMI – A Gentle Introduction with OpenIPMI [Online]. Available: http://openipmi.sourceforge.net/IPMI.pdf
[25] Lambtron, February 26,2013, Watchdog timer From Wikipedia [Online]. Available: https://en.wikipedia.org/wiki/Watchdog_timer
[26] IBM Knowledge Center, 2016,2018, What you should know about the diag288 watchdog device driver [Online]. Available: https://www.ibm.com/support/knowledgecenter/en/linuxonibm/com.ibm.linux.z.ludd/ludd_c_dog_know.html
[27] ScotXW, November 6,2013, Libvirt From Wikipedia [Online]. Available: https://en.wikipedia.org/wiki/Libvirt
[28] Huawei Technologies Open Source Software Competence Center, Nov 27, 2012, Look Into Libvirt Osier Yang [Online]. Available: https://www.slideshare.net/ben_duyujie/look-into-libvirt-osier-yang
[29] Mayur Parmer, December 10,2017, What is VMware vSphere HA ? [Online]. Available: http://masteringvmware.com/what-is-vmware-vsphere-ha/
[30] Al Muller, Seburn Wilson, Don Happe, Gary J. Humphrey, and Ralph Troupe, Virtualization with VMware ESX Server. Syngress, July 2005.
[31] GS Khalsa, April 14,2014, App HA 1.1 Released – Now available for download [Online]. Available: https://blogs.vmware.com/vsphere/2014/04/app-ha-1-1-ga-now-available-download.html
[32] VMware Tools web site. [Online]. Available: https://www.vmware.com/support/ws55/doc/new_guest_tools_ws.html
[33] Red hat cluster suite introduction. Available: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/cluster_suite_overview/s1-rhcs-intro-cso
[34] Luka Perkov, Nikola Pavković, and Juraj Petrović, "High-Availability Using Open Source Software," E-Leader Croatia 2011, 2011.
[35] Yoji Yamato, Yukihisa Nishizawa, Shinji Nagao, and Kenichi Sato, "Fast and reliable restoration method of virtual resources on OpenStack," IEEE Transactions on Cloud Computing, 2015.
[36] Fereydoun Farrahi Moghaddam, Abdelouahed Gherbi, and Yves Lemieux, "Self-Healing Redundancy for OpenStack Applications through Fault-Tolerant Multi-agent Task Scheduling," in IEEE 8th International Conference on Cloud Computing Technology and Science, 2016.
[37] Emily Hiltzik, “The Table of Nines” and High Availability [Online]. Available: http://vinciconsulting.com/blog/-/blogs/%E2%80%9Cthe-table-of-nines%E2%80%9D-and-high-availability
[38] Charng-Da Lu, 2005, Scalable diskless checkpointing for large parallel systems [Online]. Available: https://www.ideals.illinois.edu/bitstream/handle/2142/11054/Scalable%20Diskless%20Checkpointing%20for%20Large%20Parallel%20Systems.pdf?sequence=2&isAllowed=y
[39] VMware web site, VMWare vSphere High Availability Guide [Online]. Available: https://docs.vmware.com/tw/VMware-vSphere/6.0/vsphere-esxi-vcenter-server-601-availability-guide.pdf
[40] iaasprovider web site, High Availability Hosting Infrastructure [Online]. Available: http://www.iaasprovider.com/high-availability-hosting-infrastructure/