| 研究生: |
商晉瑋 Jin-Wei Shang |
|---|---|
| 論文名稱: |
利用不定性事件記錄與重播之技術實現KVM虛擬機器自動容錯之研究 Using Non-Deterministic Event Log and Replay to Support Virtual Machine Fault Tolerance of Kernel-based Virtual Machine |
| 指導教授: |
王尉任
Wei-Jen Wang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2016 |
| 畢業學年度: | 104 |
| 語文別: | 中文 |
| 論文頁數: | 53 |
| 中文關鍵詞: | KVM 、虛擬機器 、Fault Tolerance 、記錄與重播 |
| 外文關鍵詞: | KVM, Virtual Machine, Fault Tolerance, Log and Replay |
| 相關次數: | 點閱:18 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
現代許多雲端系統服務皆建置在虛擬機的基礎之上,因此當虛擬機器因為某些問題而無法運行時,就會讓虛擬機上的服務與應用程式停止,進而造成服務提供者以及客戶的損失,所以如何提高虛擬機器系統的可用性就成為一個重要的議題。容錯技術為提供虛擬機高可用性的一種技術,這種技術可以在虛擬機器發生錯誤時,由備援實體機器來接手運行這個發生錯誤的虛擬機器,並讓此虛擬機持續而不間斷的執行(Continuous Execution)。通常有兩種方式可以用以實作虛擬機容錯,一為利用Checkpointing方式達成虛擬機器與備援虛擬機器的狀態同步,我們稱為記憶體層級狀態同步容錯機制;二為記錄主要虛擬機器所執行的指令,並且在備援虛擬機器上重現以達成兩虛擬機器狀態同步,此為指令層級狀態同步容錯機制。本研究鎖定在KVM虛擬機器系統上的指令層級同步容錯機制。這個機制主要透過監控主要虛擬機器上所發生的不定性事件 (Non-Deterministic Events),然後計算該事件之邏輯時間並記錄該事件的參數後,再傳送給備援虛擬機器重現。備援虛擬機器一開始的狀態需要與主虛擬機器一致,也就是會執行同一份程式指令並保持相同記憶體內容,但是維持在暫停的狀態。當備援虛擬機器在接收到事件紀錄後,它會去設定該事件對應的指令中斷點後開始執行,並於中斷發生時安插紀錄的事件資料並重現,因此可達成兩虛擬機器的狀態同步。最後我們利用這種不定性事件記錄與重播之技術來設計並實作一個錯誤處理與復原的機制,來達成虛擬機器自動容錯的目的。
Virtual machine fault tolerance (VMFT) is a technology enabling continuous execution upon hardware/software failures, and it thus can be used to protect virtualized, critical software services. There are two ways to implement VMFT. The first one uses a continuous-checkpointing strategy, in which a backup virtual machine (VM) keeps receiving the latest VM checkpoint from the protected VM. The other one uses a log-and-replay strategy, in which all events in the protected VM are recorded and the recorded events are turned into deterministic events for replay in the backup VM. Once the protected VM fails, the backup VM replaces the role of the protected VM immediately to minimize service downtime. This research aims to provide a log-and-replay-based mechanism for VMFT over Kernel-based Virtual Machine (KVM). Before entering the phase of VMFT, the proposed mechanism creates the backup VM by live cloning the protected VM. Then, the two VMs enter the fault tolerance phase, in which they synchronize periodically. In each synchronization epoch, the proposed mechanism monitors the non-deterministic events happening on the protected VM, and identifies the logical time along with the parameters of the events. It then transfers the logged data to the backup VM for event replay. Upon reception of the data, the backup VM sets instruction break points at the right place and starts execution. It injects each logged event when reaches the corresponding break point. The backup VM signals the protected VM when it finishes. When the protected VM fails during the fault tolerance phase, the backup VM is responsible to detect such a failure and to replace the role of the protected VM.
[1] Staff, VMWare. "Virtualization overview." White Paper,
http://www.vmware.com/pdf/virtualization.pdf (2012).
[2] Popek, Gerald J., and Robert P. Goldberg. "Formal requirements for virtualizable third generation architectures." Communications of the ACM 17.7 (1974): 412-421.
[3] Power, Emerson Network. "Understanding the cost of data center downtime: an analysis of the financial impact on infrastructure vulnerability." white paper (2011).
[4] Gray, Jim, and Daniel P. Siewiorek. "High-availability computer systems." Computer 24.9 (1991): 39-48.
[5] Scales, Daniel J., Mike Nelson, and Ganesh Venkitachalam. "The design of a practical system for fault-tolerant virtual machines." ACM SIGOPS Operating Systems Review 44.4 (2010): 30-39.
[6] VMware Inc., “VMWare vSphere 4 Fault Tolerance: Architecture and Performance,” Chapter 1-Chapter 2, 2009
[7] Red Hat Inc., “KVM – KERNEL BASED VIRTUAL MACHINE,” white paper, update: January 2015.
[8] Maohua Lu, and Tzi-cker Chiueh, “Fast Memory State Synchronization for Virtualization-based Fault Tolerance,” 2009 IEEE/IFIP International Conference on Dependable Systems & Networks, 534-543, July 2009
[9] Uhlig, Rich, et al. "Intel virtualization technology." Computer 38.5 (2005): 48-56.
[10] Virtualization, A. M. D. "Amd-v nested paging." White paper.[Online] Available: http://sites.amd.com/us/business/it-solutions/virtualization/Pages/amd-v.aspx (2008).
[11] Intel, Intel. "and IA-32 architectures software developer’s manual." Volume 3B.
[12] QEMU Fabrice Bellard, “QEMU, a Fast and Portable Dynamic Translator,” USENIX Annual Technical Conference, FREENIX Track, 41-46, 2005.
[13] Y. Tamura, K. Sato, S. Kihara, and S. Moriai, “Kemari: Virtual Machine Synchronization for Fault Tolerance,” Proc. USENIX Annual Technical Conference, 2008.
[14] Micro-Checkpointing “Features/MicroCheckpointing – QEMU,” [Online]. Available: http://wiki.qemu.org/Features/MicroCheckpointing. [Accessed: 24-June-2016].
[15] Lockstep Thomas C. Bressoud, Fred B. Schneider, “Hypervisor-based fault tolerance,” ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles (Volume: 14 Issue 1), 80-107, Feb. 1996.
[16] Kurt E. Kiefer* and Louise E. Moser, ” Replay debugging of non-deterministic executions in the Kernel-based Virtual Machine,” Software: Practice and Experience (Volume: 43, Issue: 11), 1261-1281, November 2013.
[17] J. Li, S. Si, B. Li, L. Cui, and J. Zheng , “LoRe: Supporting Non-deterministic Events Logging and Replay for KVM Virtual Machines,” High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC), 2013 IEEE 10th International Conference on, 442-449, November 2013.
[18] Julian B. Grizzard and Ryan W. Gardner, “Analysis of Virtual Machine Record and Replay for Trustworthy Computing,” JOHNS HOPKINS APL TECHNICAL DIGEST (Volume: 32, Number: 2), 528-535, 2013.
[19] Kurt E. Kiefer, Louise E. Moser, “Replay debugging of non-deterministic executions in the Kernel-based Virtual Machine,” Software: Practice and Experience (Volume: 43 Issue 11), 1261-1281, November 2013.
[20] Sheldon, M. X. V. M. J., and Ganesh Venkitachalam Boris Weissman. "Retrace: Collecting execution trace with virtual machine deterministic replay." Proceedings of the Third Annual Workshop on Modeling, Benchmarking and Simulation (MoBS 2007). 2007.