| 研究生: |
陳峻浩 Chun-Hao Chen |
|---|---|
| 論文名稱: |
基於KVM的網路服務高可靠性容錯同步架構 A Fault-Tolerant KVM Architecture for Network Services of High Availability |
| 指導教授: |
王尉任
Wei-Jen Wang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2014 |
| 畢業學年度: | 102 |
| 語文別: | 中文 |
| 論文頁數: | 44 |
| 中文關鍵詞: | 高可用性 、虛擬機器 、KVM 、容錯機制 、不間斷服務 |
| 外文關鍵詞: | High Availability, Virtualization, KVM, Fault Tolerance, Non-Stop Service |
| 相關次數: | 點閱:14 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
軟體系統的高可用性(High Availability)的意義是讓使用者能夠隨時存取到系統所提供的服務,不會因為系統發生問題而中斷使用,而這種特性可以利用各式各樣的錯誤偵測與回復技術來達成,例如自動容錯(Automatic Fault Tolerance)的技術。而隨著雲端運算技術的發展,虛擬機器(Virtual Machine)在實務上的應用變得更普及,因此需要在虛擬化層級(Virtualization Layer)提供各式各樣的高可用性的工具與技術。自動容錯技術與虛擬化技術的結合可以帶來許多好處。舉例來說,應用程式可以不須經過重新修改程式的過程,只要直接在容錯虛擬機上執行,就能獲得高可用性。本研究對於虛擬化平台KVM(Kernel-based Virtual Machine)的自動容錯機制作了研究後,發現目前基於KVM上的開源自動容錯機制無法讓網路服務順暢的運作,原因是這些自動容錯機制會產生繁重的通訊量而嚴重干擾到網路服務的運作。為了解決這個問題,本研究在KVM的虛擬化平台上設計並實作了一套針對網路服務的容錯架構。根據本研究的實驗顯示,我們提出的架構,比起其它現有的開源(Open-Source)的容錯架構,在網路服務品質的部分獲得相當大幅度的改善。
In today’s cloud computing environment, virtual machines (VMs) are widely used to host many types of network services/applications that demand high availability, the ability to provide services to users under any circumstances. In the literature, many failure detection and recovery technologies are able to support high-availability features for cloud applications. However, the application developers may need to modify their application source code to adopt these technologies. One way to avoid re-engineering is to provide high-availability features in the VM layer, and then to execute the applications on the VMs. The Kemari KVM and the Micro-Checkpointing KVM are two famous open-source projects that support high-availability features in the VM layer. Both of them employ a similar strategy that the execution state of an active VM (a.k.a. the primary VM) is continuously updated in a remote VM (a.k.a. the secondary VM). The secondary VM will be activated as soon as a failure is detected in the primary VM. Based on our observation, the primary-secondary synchronization strategy consumes a high amount of network bandwidth, and thus it decreases the quality of the network services on the VMs. To solve this problem, we propose a new primary-secondary synchronization mechanism, which is implemented on KVM. The experimental results show that, our approach outperforms the Kemari KVM and the Micro-Checkpointing KVM in network bandwidth consumption and response time while executing the same network services.
[1] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, “A View of Cloud Computing,” Commun ACM, vol. 53, no. 4, pp. 50–58, 2010.
[2] I. Foster, Y. Zhao, I. Raicu, and S. Lu, “Cloud Computing and Grid Computing 360-Degree Compared,” in Grid Computing Environments Workshop, pp. 1–10, 2008.
[3] Armbrust, M., et al, “Above the clouds: A Berkeley view of cloud computing,” Tech. Rep. UCB/EECS-2009-28, EECS Department, U.C. Berkeley, 2009.
[4] B. Furht and A. Escalante, Handbook of Cloud Computing, 2010 edition. New York: Springer, 2010.
[5] I. P. Egwutuoha, D. Levy, B. Selic, and S. Chen, “A Survey of Fault Tolerance Mechanisms and Checkpoint/Restart Implementations for High Performance Computing Systems,” J Supercomput, vol. 65, no. 3, pp. 1302–1326, 2013.
[6] J. Gray and D. P. Siewiorek, “High-availability computer systems,” Computer, vol. 24, no. 9, pp. 39–48, Sep. 1991.
[7] S. N. T. Chiueh and S. Brook, “A survey on virtualization technologies,” RPE Rep., pp. 1–42, 2005.
[8] R. P. Goldberg, “Survey of Virtual Machine Research,” Computer, vol. 7, no. 9, pp. 34–45, 1974.
[9] G. J. Popek and R. P. Goldberg, “Formal Requirements for Virtualizable Third Generation Architectures,” Commun ACM, vol. 17, no. 7, pp. 412–421, 1974.
[10] “VMware High Availability: Concepts, Implementation, and Best Practices.” [Online]. Available: http://www.vmware.com/files/pdf/VMwareHA_twp.pdf.
[11] D. J. Scales, M. Nelson, and G. Venkitachalam, “The design and evaluation of a practical system for fault-tolerant virtual machines,” Tech. Rep. VMWare-RT-2010-001, VMWare, 2010.
[12] Y. Tamura, K. Sato, S. Kihara, and S. Moriai, “Kemari: Virtual machine synchronization for fault tolerance,” in Proc. USENIX Annu. Tech. Conf.(Poster Session), 2008.
[13] “Micro-Checkpointing.” [Online]. Available: http://wiki.qemu.org/Features/MicroCheckpointing.
[14] B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and A. Warfield, “Remus: High Availability via Asynchronous Virtual Machine Replication,” in Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, Berkeley, CA, USA, pp. 161–174, 2008.
[15] T. C. Bressoud and F. B. Schneider, “Hypervisor-based Fault Tolerance,” in Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, pp. 1–11, 1995.
[16] A. Kivity, Y. Kamay, D. Laor, U. Lublin, and A. Liguori, “kvm: the Linux virtual machine monitor,” in Proceedings of the Linux Symposium, vol. 1, pp. 225–230, 2007.
[17] M. Zabaljauregui, Hardware Assisted Virtualization. Intel Virtualization Technology. University of La Plata, Buenos Ares, Argentina. Can be retrieved at: http://linux. linti. unlp. edu. ar/images/f/f1/Vtx. pdf, 2008.
[18] “AMD Virtualization.” [Online]. Available: http://www.amd.com/en-us/solutions/servers/virtualization.
[19] F. Bellard, “QEMU, a Fast and Portable Dynamic Translator.,” in USENIX Annual Technical Conference, FREENIX Track, pp. 41–46, 2005.
[20] E. N. (Mootaz) Elnozahy, L. Alvisi, Y.-M. Wang, and D. B. Johnson, “A Survey of Rollback-recovery Protocols in Message-passing Systems,” ACM Comput Surv, vol. 34, no. 3, pp. 375–408, 2002.
[21] C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield, “Live migration of virtual machines,” in Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation-Volume 2, pp. 273–286, 2005.
[22] R. Recio, B. Metzler, P. Culley, J. Hilland, and D. Garcia, A Remote Direct Memory Access Protocol Specification", RFC 5040. 2007.
[23] M. X. V. M. J. Sheldon and G. V. B. Weissman, “Retrace: Collecting execution trace with virtual machine deterministic replay,” in Proceedings of the Third Annual Workshop on Modeling, Benchmarking and Simulation (MoBS 2007), 2007.
[24] V. Medina, A. Tchernykh, and A. R. Paz, “A TCP/IP Replication with a Fault Tolerance Scheme for High Availability,” ISUM2014, 2014.
[25] M. Marwah, S. Mishra, and C. Fetzer, “TCP server fault tolerance using connection migration to a backup server,” in 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 373–373, 2003.
[26] M. Orgiyan and C. Fetzer, “Tapping TCP streams,” in IEEE International Symposium on Network Computing and Applications, 2001. NCA 2001, pp. 278–289, 2001.
[27] D. Zagorodnov, K. Marzullo, L. Alvisi, and T. C. Bressoud, “Practical and low-overhead masking of failures of TCP-based servers,” ACM Trans. Comput. Syst., vol. 27, no. 2, pp. 1–39, May 2009.
[28] R. Russell, “virtio: towards a de-facto standard for virtual I/O devices,” ACM SIGOPS Oper. Syst. Rev., vol. 42, no. 5, pp. 95–103, 2008.
[29] “Copy On Write Based File Systems Performance Analysis And Implementation.” [Online]. Available: http://faif.objectis.net/download-copy-on-write-based-file-systems.
[30] “Dell DVD Store Database Test Suite.” [Online]. Available: http://linux.dell.com/dvdstore/.