跳到主要內容

簡易檢索 / 詳目顯示

研究生: 未丹彼
Moh. Wildan Habibi
論文名稱: 具有可重新配置的故障模型的高可用性雲計算平台
High-Availability Cloud Computing Platform with Configurable Fault Models
指導教授: 王尉任 教授
Prof. Wei-Jen Wang

梁德容 教授
Prof. Deron Liang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 36
中文關鍵詞: 可重新配置的故障模型檢測器高可用性活動性檢測
外文關鍵詞: configurable fault models, detector, high availability, liveness detection
相關次數: 點閱:13下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在現代云計算平台中,高可用性(HA)是重要的方面之一,該平台應確保在其計算池上運行的服務的活動性(liveness),並在服務故障時自動進行故障轉移。現代的雲計算平台通常使用系統層心跳(system layer heartbeating)來檢測平台中的故障。先前的研究(在OpenStack上實施的高可用性軟件系統(HASS))應用了多個檢測器,並被證明比系統層心跳更有效率。但是,HASS在另一種可能具有不同支持的檢測器的環境(異質性環境,heterogeneous environment)中不太可靠。異質性環境中的異構計算主機將影響受支持的檢測器。而且在某些情況下,檢測器可能無法正常工作,或者在某些環境中可能會丟失。在上述情況下,需要人為乾預以更改原始故障模型或修復檢測器故障。因此,本研究旨在提供一種機制,該機制自動地重新配置故障模型以為雲計算平台繼續提供活動性檢測。也就是說,即使某些檢測器出現故障或丟失還是能繼續提供HA保護。使用實驗方法,有兩個主要結果:(i)所提出的機制與系統層心跳一樣可靠; (ii)雖然在最壞的情況下檢測時間與系統層心跳相等,但其他情況下,所提出的機制比系統層心跳更有效率。


    In a modern cloud computing platform, High availability (HA) is one of important aspects where the platform should ensure the liveness of services that runs on its compute pool and do failover automatically when services are off. A modern cloud computing platform commonly uses system layer heartbeat to detect faults in the platform. The previous research, High-Availability Software System (HASS) implemented on OpenStack, applied several detectors and proven to be more efficient than heartbeat method. However, HASS not so reliable in another environment which may has different supported detector. Heterogenous computing host that built the environment will affect the supported detector. In some conditions, detector may be failed to works properly, or it may not available in some environments. In this case, human intervention is needed, either to change the original fault model or to fix the detector fault or use the common method to detect all kind of faults. Therefore, this research aims to provide an efficient mechanism that applicable in heterogenous cloud computing platform to continuously provide liveness detection even if some detectors are failed or not available. Using experimental method, there are two main results, which are: (i) the proposed mechanism is as reliable as heartbeat method; and (ii) the proposed mechanism is more efficient than heartbeat method, although in the worst case the detection time is near equal to heartbeat method.

    摘要 v Abstract vi ACKNOWLEDGMENT vii TABLE OF CONTENTS viii LIST OF FIGURES x LIST OF TABLES xi CHAPTER 1 INTRODUCTION 1 1.1 Background 1 1.2 Motivation 2 1.3 Research Objective 2 1.4 Research Contributions 3 1.5 Thesis Structure 3 CHAPTER 2 BACKGROUND KNOWLEDGE 4 2.1 System Layer Heartbeat 4 2.2 HASS on OpenStack 4 2.3 Layer Dependency 5 2.4 Layer Detector 6 CHAPTER 3 THE PROPOSED MECHANISM 8 3.1 Highest Layer Detector 8 3.2 Layering in Proposed Mechanism 8 3.3 Configurable Fault Models 9 3.4 Fault Detection Mechanism 11 3.5 Fault Recovery Mechanism 13 3.6 Fault Detection and Recovery Mechanism Example 15 CHAPTER 4 EXPERIMENTS 17 4.1 Host and VM Specification 17 4.2 Layer Fault Injection Method 17 4.3 Experiments Design 18 4.4 Experiments Results and Analysis 19 4.5 Comparison with Heartbeat Method 20 CHAPTER 5 CONCLUSIONS 22 REFERENCES 23

    Cheng, Chun Yu, Zheng Jia Su, Chia Ching Chen, Shao Jui Chen, and Wei Jen Wang. 2017. "Supporting Software-Defined HA Clusters on OpenStack Platform." International Conference on Applied System Innovation (ICASI). Sapporo.
    Dobre, Ciprian, Florian Pop, Alexandru Costan, Mugurel Ionut Andreica, and Valentin Cristea. 2009. "Robust Failure Detection Architecture for Large Scale Distributed Systems." 17th International Confrence on Control Systems and Computer Science. Bucharest.
    Lee, Yen Lin, Min Huang Ho, Aswin Suharsono, Yu Chen Pan, Wei Jen Wang, and Deron Liang. 2017. "NCU-HA: A Lightweight HA System for Kernel-Based Virtual Machine." 2017 International Conference on Platform Technology and Service (PlatCon). Busan.
    Lu, Charng Da. 2005. SCALABLE DISKLESS CHECKPOINTING FOR LARGE PARALLEL SYSTEMS. Dissertation, Urbana-Champaign: University of Illinois. https://www.ideals.illinois.edu/bitstream/handle/2142/11054/Scalable%20Diskless%20Checkpointing%20for%20Large%20Parallel%20Systems.pdf?sequence=2&isAllowed=y.
    Rahman, M. Saifur, Md. Yusuf Sarwar Uddin, Tahmid Hasan, M. Sohel Rahman, and M. Kaykobad. 2018. "Using Adaptive Heartbeat Rate on Long-Lived TCP Connections." IEEE/ACM Transactions on Networking. IEEE. 203-2016.
    Shambroom, W. David. 1993. "Use of Protocol Validation and Verification Techniques in the Design of a Fault-Tolerant Computer Architecture." The Twenty-Third International Symposium on Fault-Tolerant Computing. Toulouse.
    Toeroe, Maria, and Francis Tam. 2012. Service Availability: Principles and Practice. John Wiley & Sons.
    Zhang, Xinjia, Lan Luan, Lu Han, and Zhang Lu. 2008. "Reserach and Improvement on Failure Detection Alogorithm." Third International Confrence on Pervasive Computing and Applications. Alexandria.

    QR CODE
    :::