跳到主要內容

簡易檢索 / 詳目顯示

研究生: 鄭鈞輿
Chun-Yu Cheng
論文名稱: 軟體定義運算叢集之快速自動化軟硬體錯誤偵測與復原機制
Fast Failover Based on Software-Defined Computing Cluster
指導教授: 王尉任
Wei-Jen Wang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 中文
論文頁數: 59
中文關鍵詞: 雲端運算OpenStack高可用性虛擬機器IPMI軟體定義運算叢集
外文關鍵詞: Cloud Computing, OpenStack, High Availability, Virtual Machine, IPMI, Software-Defined High Availability Cluster
相關次數: 點閱:9下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來雲端運算技術日益成熟,大多數企業都選擇將其服務佈署至雲端環境運行,由於雲端技術所帶來的擴展性與方便性,雲端環境相較於實體環境對於資源能以低成本的方式動態調整,能夠妥善利用完整的機器資源,因此OpenStack成為建置企業雲的熱門選項。然而企業仍著重於服務的不中斷性,也就是雲端的高可用性(High Availability, HA),然而OpenStack對於使用者之虛擬機器並沒有一套完整的HA機制。而本研究首先提出軟體定義運算叢集(Software-Defined High Availability Cluster, SDHAC)的機制,透過邏輯性地切割運算資源成多個不同之SDHAC,並根據不同需求設置每個叢集之HA策略,使管理者能夠更輕易地管理與分配雲端資源。本研究基於SDHAC之上,針對叢集內部之運算節點與虛擬機器發展一套自動化錯誤偵測與復原機制,除了監控運算節點之軟體服務狀態外,亦與IPMI(Intelligent Platform Management Interface)結合提供硬體層級的監控,像是作業系統、電源及硬體內部之溫度與電壓感測器,若偵測出錯誤則針對本研究提出之錯誤模型(Failure Model)進行復原程序。本研究提出之HA系統由於結合IPMI介面,因此大幅下降錯誤偵測之時間,並提供更完善之復原機制,提高了OpenStack針對虛擬機器之高可用性。


    In recent years, virtualized cloud computing has become more and more mature. Most enterprises decide to deploy their services on a virtualized cloud platform because of its elasticity and manageability. Compared to traditional computing platforms, the virtualized cloud platform can automatically adjust the computing resources in response to the change of users’ requirements. OpenStack is a popular virtualized cloud computing project that facilitates building such a cloud platform, where computations are carrying on virtual machines. In the past, we have proposed and implemented a cloud platform that supports the concept of Software-Defined High Availability Cluster (SDHAC), to address the problem of cloud platform availability and manageability. This mechanism can logically divide the computing pool into multiple HA clusters, and the administrators can apply different HA policies to different software-defined HA clusters according to different demands. This research focuses on the issue of fast failure detection and recovery on a platform with Software-Defined High Availability Clusters. The proposed system supports the use of IPMI machines, which are the computers with the interface for fast hardware state detection, and therefore it can efficiently identify the root cause of a failure. In addition, our proposed system provides a complete set of recovery features such as VM recovery and machine recovery when IPMI is used. Our experimental results show that, the proposed system with IPMI machines can achieve higher availability than the traditional system with the heart-beating failure-detection approach.

    摘要 I Abstract II 目錄 III 圖目錄 IV 表目錄 V 第一章 緒論 1 1-1 研究背景 1 1-2 研究動機與實作目標 4 1-3 研究貢獻 5 1-4 論文架構 6 第二章 相關研究 7 2-1 背景知識 7 2-1-1 Intelligent Platform Management Interface 7 2-1-2 OpenStack 9 2-1-3 OpenStack HA機制 10 2-2 高可用性相關研究 12 2-2-1 VMware vSphere HA 12 2-2-2 相關文獻探討 13 第三章 系統設計 15 3-1 系統架構模型 15 3-2 軟體定義高可靠度叢集 17 3-3 錯誤偵測與復原機制 18 3-3-1 錯誤偵測機制 19 3-3-2 錯誤復原機制 23 3-4 與其他系統比較與討論 33 3-5 與OpenStack Horizon結合 34 第四章 實驗環境及測量 38 4-1 實驗環境及架構 38 4-2 實驗環境假設 39 4-3 實驗案例 40 4-4 實驗結果 43 第五章 結論 49 參考文獻 50

    [1] A. Fox, R. Griffith, A. Joseph, R. Katz, A. Konwinski, G. Lee, et al., "Above the clouds: A Berkeley view of cloud computing," Dept. Electrical Eng. and Comput. Sciences, University of California, Berkeley, Rep. UCB/EECS, vol. 28, p. 2009, 2009.
    [2] Y. Jadeja and K. Modi, "Cloud computing - concepts, architecture and challenges," in International Conference on Computing, Electronics and Electrical Technologies, 2012, pp. 877-880.
    [3] S. N. T.c. Chiueh and S. Brook, "A survey on virtualization technologies," in Rpe Report, 2005, pp. 1-42.
    [4] P. Mell and T. Grance, (2011), The NIST definition of cloud computing [Special Publication]. Available: http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf
    [5] K. Jackson and C. Bunch, OpenStack Cloud Computing Cookbook - Second Edition. Birmingham, UK: Packt Publishing, 2013.
    [6] K. Pepple, Deploying OpenStack: O'Reilly Media, Inc., 2011.
    [7] RightScale, (2017), 2017-State-of-the-Cloud-Report [Online]. Available: http://assets.rightscale.com/uploads/pdfs/RightScale-2017-State-of-the-Cloud-Report.pdf
    [8] M. Toeroe and F. Tam, Service availability: principles and practice: John Wiley & Sons, 2012.
    [9] Ponemon, (2016), Cost of Data Center Outages [Online]. Available: http://datacenterfrontier.com/cost-of-data-center-outages/
    [10] C.Y. Cheng, Z.J. Su, C.C. Chen, S.J. Chen, and W.J. Wang, "Supporting Software-Defined HA Clusters on OpenStack Platform," in IEEE Applied System Innovation Conf. ICASI '17, Sapporo, Japan, May 2017.
    [11] A. Oliner and J. Stearley, "What Supercomputers Say: A Study of Five System Logs," in Proceedings of the 37th Annunal IEEE/IFIP International Conference on Dependable Systems and Networks. DSN '07, Washington, DC, USA, 2007, pp. 575-584.
    [12] K. V. Vishwanath and N. Nagappan, "Characterizing cloud computing hardware reliability," in Proceedings of the 1st ACM symposium on Cloud computing, 2010, pp. 193-204.
    [13] C. Minyard, (2006), IPMI – A Gentle Introduction with OpenIPMI [Online]. Available: http://openipmi.sourceforge.net/IPMI.pdf
    [14] A. Babu, (2006), GNU FreeIPMI User’s Guide [Online]. Available: ftp://ftp.gwdg.de/pub/gnu/www/savannah-checkouts/gnu/freeipmi/freeipmi.pdf
    [15] T. T. Murphy, (2004), Managing Dell PowerEdge Servers Using IPMItool [Online]. Available: https://www.dell.com/downloads/global/power/ps4q04-20040204-murphy.pdf
    [16] OpenStack High Availability Guide web site. [Online]. Available: https://docs.openstack.org/ha-guide/
    [17] F. Haas, "Ahead of the pack: the pacemaker high-availability stack," Linux Journal, vol. 2012, p. 4, 2012.
    [18] Corosync web site. [Online]. Available: http://corosync.github.io/corosync/
    [19] L. Ellenberg, A. Grünbacher, F. Haas, B. Hellman, R. Kammerer, P. Marek, et al., (2016), The DRBD 9 User’s Guide [Online]. Available: https://www.linbit.com/en/resources/documentation/535-drbd-users-guide-9-0/
    [20] Libvirt web site. [Online]. Available: https://libvirt.org/
    [21] Keepalived web site. [Online]. Available: http://www.keepalived.org/
    [22] S. A. Weil, S. A. Brandt, E. L. Miller, D. D. Long, and C. Maltzahn, "Ceph: A scalable, high-performance distributed file system," in Proceedings of the 7th symposium on Operating systems design and implementation, Seattle, Washington, 2006, pp. 307-320.
    [23] A. Muller, S. Wilson, D. Happe, G. J. Humphrey, and R. Troupe, Virtualization with VMware ESX Server: Syngress, 2005.
    [24] M. Potheri, G. B. Fritz, and P. Gupta, (2015), VMware vCenter Server™ 6.0 Availability Guide [Online]. Available: https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vmware-vcenter-server-6-0-availability-guide-white-paper.pdf
    [25] VMware Tools web site. [Online]. Available: https://www.vmware.com/support/ws55/doc/new_guest_tools_ws.html
    [26] P. Heidari, M. Hormati, M. Toeroe, Y. Al Ahmad, and F. Khendek, "Integrating Open SAF High Availability Solution with OpenStack," in Services (SERVICES), 2015 IEEE World Congress on, 2015, pp. 229-236.
    [27] Y. Yamato, Y. Nishizawa, S. Nagao, and K. Sato, "Fast and reliable restoration method of virtual resources on OpenStack," IEEE Transactions on Cloud Computing, vol. PP, pp. 1-1, 2015.
    [28] F. F. Moghaddam, A. Gherbi, and Y. Lemieux, "Self-healing redundancy for openstack applications through fault-tolerant multi-agent task scheduling," in Cloud Computing Technology and Science (CloudCom), 2016 IEEE International Conference on, 2016, pp. 572-577.
    [29] Init Process web site. [Online]. Available: https://help.ubuntu.com/community/KnowThyUbuntu#The_Init_Process
    [30] C.D. Lu. (, 2005), Scalable diskless checkpointing for large parallel systems [Online]. Available: https://www.ideals.illinois.edu/bitstream/handle/2142/11054/Scalable%20Diskless%20Checkpointing%20for%20Large%20Parallel%20Systems.pdf?sequence=2&isAllowed=y
    [31] FIVE NINES: CHASING THE DREAM? [Online]. Available: http://www.continuitycentral.com/feature0267.htm

    QR CODE
    :::