| 研究生: |
王建文 Jiann-Wen Wang |
|---|---|
| 論文名稱: |
基於libvirt與QEMU-KVM虛擬機器之記憶體層級同步容錯系統 An Adaptive Continuous Checkpointing Fault-Tolerant Virtual Machine System based on QEMU-KVM with libvirt |
| 指導教授: |
梁德容
Deron Liang 王尉任 Wei-Jen Wang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 英文 |
| 論文頁數: | 57 |
| 中文關鍵詞: | QEMU-KVM 、Libvirt 、虛擬機器 、容錯系統 、持續同步 |
| 外文關鍵詞: | QEMU-KVM, Libvirt, Virtual Machine, Fault Tolerance, Continuous Checkpointing |
| 相關次數: | 點閱:15 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著雲端計算與虛擬化技術的快速發展,資訊產業得以利用相關技術提升實體機器的利用效率並達成彈性的資源分配;然而在將多個伺服器整合到同一實體機器之時,也產生單一主機硬體故障即會導致多個服務失效的問題。基於虛擬化技術的容錯系統可以在主機硬體發生故障時,保護關鍵服務之虛擬機器運作狀態與其執行的 soft real-time 程式,進一步提升服務的可用性。
本研究基於 QEMU 3.0.0 、 libvirt 5.7.0 與持續同步的架構實作可透過外部管理介面控制的容錯系統,其中的持續同步架構藉由不斷同步主要虛擬機器與備援虛擬機器的狀態、並保證對外輸出的一致性,以達到容錯系統之基本要求。同時本研究以引入壓縮工具降低同步所需之頻寬、感知虛擬機器工作負載並進行參數設定等方式,協助系統管理者提升服務於容錯系統運作之效能。
The IT industries have commonly adopted the concept of cloud computing and virtualization, making resource management more efficient and elastic. However, as more servers are consolidated into one physical server, availability will be threatened by a single physical host's hardware failure. A virtualization-based fault-tolerant system can protect mission-critical virtual machines running soft real-time applications from such hardware failures, thus improving the services' availability.
Based on QEMU 3.0.0, libvirt 5.7.0, and continuous checkpointing, this study implements a virtualization-based fault-tolerant system with a management interface. Continuous checkpointing keeps replicating internal states of VM on the primary host to backup host to meet the requirements of fault tolerance, and outputs are buffered to ensure consistency. This study also designed and implemented two methods to reduce the performance degradation of guest applications brought by the system; by adjusting the checkpointing parameter automatically and utilizing compression tools to speed up dirty pages transfer on demand, system administrators can set up the system without finding out suitable parameter for every application and have more flexibility to deploy the system.
[1] M. Armbrust et al., “A View of Cloud Computing,” Commun ACM, vol. 53, pp. 50–58,
Apr. 2010, doi: 10.1145/1721654.1721672.
[2] Armbrust et al., “Above the Clouds: A Berkeley View of Cloud Computing,” Jan.
2009.
[3] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandic, “Cloud Computing and
Emerging IT Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th
Utility,” Future Gener. Comput. Syst., vol. 25, pp. 599–616, Jun. 2009, doi:
10.1016/j.future.2008.12.001.
[4] McAfee, LLC, “Cloud Market Share Report | AWS vs Azure vs Google Cloud 2019 |
McAfee,” Oct. 25, 2019. https://www.skyhighnetworks.com/cloud-security-blog/microsoft-
azure-closes-iaas-adoption-gap-with-amazon-aws/ (accessed Jul. 10, 2020).
[5] VMware, Inc, “What is vSphere 7? | Server Virtualization Software | VMware.” https://
www.vmware.com/products/vsphere.html (accessed Jul. 10, 2020).
[6] O. Sefraoui, M. Aissaoui, and M. Eleuldj, “OpenStack: Toward an Open-Source
Solution for Cloud Computing,” Int. J. Comput. Appl., vol. 55, pp. 38–42, Oct. 2012, doi:
10.5120/8738-2991.
[7] F. Bellard, “QEMU, a fast and portable dynamic translator,” in Proceedings of the
annual conference on USENIX Annual Technical Conference, Anaheim, CA, Apr. 2005, p.
41, Accessed: Jul. 10, 2020. [Online].
[8] A. Qumranet, Y. Qumranet, D. Qumranet, U. Qumranet, and A. Liguori, “KVM: The
Linux virtual machine monitor,” Proc. Linux Symp., vol. 15, Jan. 2007.
[9] “libvirt: The virtualization API.” https://libvirt.org/ (accessed Jul. 10, 2020).
[10] C. Clark et al., “Live Migration of Virtual Machines.,” May 2005.
[11] W. Voorsluys, J. Broberg, S. Venugopal, and R. Buyya, “Cost of Virtual Machine Live
Migration in Clouds: A Performance Evaluation,” Sep. 2011, vol. 5931, doi: 10.1007/978-
3-642-10665-1_23.
[12] K. Vishwanath and N. Nagappan, “Characterizing Cloud Computing Hardware
Reliability,” Jan. 2010, pp. 193–204, doi: 10.1145/1807128.1807161.
[13] J. Gray and D. Siewiorek, “High-Availability Computer Systems,” Computer, vol. 24,
pp. 39–48, Oct. 1991, doi: 10.1109/2.84898.
[14] D. Scales, M. Nelson, and G. Venkitachalam, “The design of a practical system for
fault-tolerant virtual machines,” Oper. Syst. Rev., vol. 44, pp. 30–39, Dec. 2010, doi:
10.1145/1899928.1899932.
[15] P.-J. Tsao, Y.-F. Sun, L.-H. Chen, and C.-Y. Cho, “Efficient Virtualization-Based
Fault Tolerance,” Dec. 2016, pp. 114–119, doi: 10.1109/ICS.2016.0031.
[16] C. Wang et al., “PLOVER: Fast, Multi-core Scalable Virtual Machine Fault-
tolerance,” Apr. 2018.
[17] Y. Dong et al., “COLO: COarse-grained LOck-stepping virtual machines for non-stop
service,” presented at the Proceedings of the 4th Annual Symposium on Cloud Computing,
SoCC 2013, Oct. 2013, doi: 10.1145/2523616.2523630.
[18] A. Souza, A. Papadopoulos, L. Tomás, D. Gilbert, and J. Tordsson, “Hybrid Adaptive
Checkpointing for Virtual Machine Fault Tolerance,” Apr. 2018, pp. 12–22, doi:
10.1109/IC2E.2018.00023.
[19] M. Pereira da Silva, R. Obelheiro, and G. Koslovski, “Adaptive Remus : adaptive
checkpointing for Xen-based virtual machine replication,” Int. J. Parallel Emergent Distrib.
Syst., vol. 32, pp. 1–20, Mar. 2016, doi: 10.1080/17445760.2016.1162302.
[20] “qemu git repository: docs/COLO-FT.txt,” GitHub. https://github.com/qemu/qemu
(accessed Jul. 10, 2020).
[21] R. Russell, “virtio: towards a de-facto standard for virtual I/O devices.,” Oper. Syst.
Rev., vol. 42, pp. 95–103, Jan. 2008.
[22] Red Hat,Inc., “Introduction to virtio-networking and vhost-net.”
https://www.redhat.com/en/blog/introduction-virtio-networking-and-vhost-net (accessed
Jul. 10, 2020).
[23] Advanced Micro Devices Inc., “AMD64 Architecture Programmer’s Manual, Volume
2: System Programming; Chapter 15: Secure Virtual Machine,” p. 714, 2020.
[24] Intel Corporation, “Intel® 64 and IA-32 Architectures Software Developer’s Manual,
Volume 3C: System Programming Guide, Part 3; Part 3: CHAPTER 23, INTRODUCTION
TO VIRTUAL MACHINE EXTENSIONS,” p. 730.
[25] “libvirt: Applications using libvirt.” https://libvirt.org/apps.html (accessed Jul. 10,
2020).
[26] “Documentation/QMP - QEMU.” https://wiki.qemu.org/Documentation/QMP
(accessed Jul. 10, 2020).
[27] T. Bressoud and F. Schneider, “Hypervisor-Based Fault Tolerance.,” ACM Trans
Comput Syst, vol. 14, pp. 80–107, Feb. 1996, doi: 10.1145/224056.224058.
[28] B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and A. Warfield, “Remus:
High Availability via Asynchronous Virtual Machine Replication,” Apr. 2008.
[29] “Features/MicroCheckpointing - QEMU.”
https://wiki.qemu.org/Features/MicroCheckpointing (accessed Jul. 12, 2020).
[30] Y. Tamura, K. Sato, S. Kihara, and S. Moriai, “Kemari: virtual machine
synchronization for fault tolerance,” Jan. 2008.
[31] “VMware vSphere 6 Fault Tolerance: Architecture and Performance,” Fault Toler., p.
21.
[32] P. Svärd, B. Hudzia, J. Tordsson, and E. Elmroth, “Evaluation of Delta Compression
Techniques for Efficient Live Migration of Large Virtual Machines,” Jul. 2011, vol. 46, pp.
111–120, doi: 10.1145/2007477.1952698.
[33] L. Li and Y. Zhang, “KVM Live Migration Optimization - KVM Forum 2015.” http://
www.linux-kvm.org/images/b/b3/02x-09-Cedar-Liang_Li-
KVMLiveMigrationOptimization.pdf (accessed Jul. 10, 2020).
[34] X. Song, J. Shi, R. Liu, J. Yang, and H. Chen, “Parallelizing Live Migration of Virtual
Machines,” ACM SIGPLAN Not., vol. 48, Mar. 2013, doi: 10.1145/2451512.2451531.
[35] M. Hines, U. Deshpande, and K. Gopalan, “Post-copy live migration of virtual
machines,” Oper. Syst. Rev., vol. 43, pp. 14–26, Jul. 2009, doi: 10.1145/1618525.1618528.
[36] “Features/AutoconvergeLiveMigration - QEMU.”
https://wiki.qemu.org/Features/AutoconvergeLiveMigration (accessed Jul. 10, 2020).
[37] “qemu git repository: docs/xbzrle.txt,” GitHub. https://github.com/qemu/qemu
(accessed Jul. 10, 2020).
[38] “open(2) - Linux manual page.” https://man7.org/linux/man-pages/man2/open.2.html
(accessed Jul. 10, 2020).
[39] “ChangeLog/2.10 - QEMU.”
https://wiki.qemu.org/ChangeLog/2.10#Block_devices_and_tools (accessed Jul. 10, 2020).
[40] “fcntl(2) - Linux manual page.”
https://www.man7.org/linux/man-pages/man2/fcntl.2.html (accessed Jul. 10, 2020).
[41] “Percona-Lab/tpcc-mysql,” Jul. 10, 2020. https://github.com/Percona-Lab/tpcc-mysql
(accessed Jul. 10, 2020).
[42] “acmeair/acmeair-nodejs,” Jul. 07, 2020. https://github.com/acmeair/acmeair-nodejs
(accessed Jul. 10, 2020).
[43] “Node.js Benchmarking.” https://benchmarking.nodejs.org/ (accessed Jul. 10, 2020).
[44] “lz4/lz4,” Aug. 15, 2020. https://github.com/lz4/lz4 (accessed Aug. 16, 2020).