跳到主要內容

簡易檢索 / 詳目顯示

研究生: 錡靖
Ching Chi
論文名稱: 基於QEMU Block Migration的硬碟層級同步容錯系統
A Fault-Tolerant QEMU-KVM System with Block Replication based on QEMU Block Migration
指導教授: 王尉任
Wei-Jen Wang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 47
中文關鍵詞: QEMU-KVMLibvirt虛擬機器容錯系統持續同步硬碟同步
外文關鍵詞: QEMU-KVM, Virtual Machine, Fault Tolerance, Continuous Checkpointing, Block Replication, Disk Replication
相關次數: 點閱:13下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著愈來愈多企業將他們的系統與服務部署到雲端平台上,雲端平台上計算資源的可靠性(reliability)以及可用性(availability)就更加重要。對於一個高可靠的雲端平台來說,容錯技術成為了一個系統中重要的部份。容錯技術讓雲端平台有了在硬體故障時可以持續提供服務的能力,即便後端計算資源的實體機器發生故障,也能通過冗餘,即備用的資源來接手運行的服務,使用者不會察覺到服務的中斷。
    本研究基於中央大學平行與分散計算實驗室開發的NCU MFTVM容錯系統上繼續進行開發,讓原本未被容錯系統進行保護的虛擬機硬碟狀態,和其他狀態一起被同步至備援節點上。本研究針對了幾種虛擬機硬碟寫入狀況進行了實作上的修改,讓容錯系統在虛擬機對硬碟寫入情況下,提升應用程式在虛擬機上運作的效能,並縮短了虛擬機對外界回應的時間,提升了容錯系統的可靠性以及可用性。


    More and more enterprise deploy their systems and services on Cloud platform, reliability and availability of computing resources on cloud platform will become increasingly important. For a highly reliable cloud platform, fault tolerance becomes an important part of the system. Fault-tolerant technology enables the cloud platform to continuously provide services in the event of hardware failure. Even if the physical machine of the computing resources fails, it can also take over the running services through redundancy, that is, standby resources, and users will not be aware of the interruption of the service.
    This research is based on the NCU MFTVM fault tolerant system developed by the Parallel and Distributed Computing Laboratory of National Central University. We implement a new feature so that the state of the virtual block device that is not protected by the origin fault tolerant system is synchronized to the Backup node together with other states. In this study, several disk workload of virtual machine are considered. We present several implementations that improve the performance of the application on the virtual machine, and reduce the response time of the virtual machine when the virtual machine writes to the hard disk. These improve the reliability and availability of the fault tolerance system.

    第一章 緒論 1 1-1 研究背景 1 1-2 研究動機 2 1-3 論文貢獻 2 1-4 論文架構 3 第二章 背景知識 4 2-1 QEMU Live Block Migration 4 2-2 NCU M-FTVM容錯系統 5 第三章 系統架構 7 3-1 主要架構 7 3-2 Block Replication 9 3-3 主要流程 10 第四章 效能改進 11 4-1 實驗環境 11 4-2 粒度調整 13 4-3 連續Block處理 16 4-4 Block Live Copy 19 第五章 實驗結果 27 5-1 實驗環境 27 5-2 實驗結果 29 5-2-1 DVDStore3 29 5-2-2 Acme Air in Nodejs 30 5-2-3 Kernel Compilation 30 5-2-4 fio 31 5-3 實驗分析 32 第六章 相關研究 33 6-1 Vmware 33 6-2 Colo 34 6-3 Kemari 35 6-4 Remus 35 6-5 Adaptive Remus 35 第七章 結論與未來研究方向 36 參考文獻 37

    [1] J. Gray and D. P. Siewiorek, "High-availability computer systems," Computer, vol. 24, no. 9, pp. 39-48, 1991.
    [2] K. Bilal, O. Khalid, S. U. . Malik, M. U. S. Khan, S. . Khan, and A. Zomaya, “Fault Tolerance in the Cloud,” Encyclopedia on Cloud Computing. John Wiley & Sons, Hoboken, NJ, USA, 2015, pp. 291–300, 2015.
    [3] P. Kumari and P. Kaur, "A survey of fault tolerance in cloud computing," Journal of King Saud University –Computer and Information Sciences, 2018.
    [4] F. Bellard, "QEMU, a fast and portable dynamic translator," in USENIX Annual Technical Conference, FREENIX Track, 2005, vol. 41, p. 46.
    [5] C. Clark et al., "Live migration of virtual machines," in Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation-Volume 2, 2005: USENIX Association, pp. 273-286.
    [6] Rusty Russell. 2008. Virtio: towards a de-facto standard for virtual I/O devices. SIGOPS Oper. Syst. Rev. 42, 5 (July 2008), 95–103.
    [7] "Features/LiveBlockMigration - QEMU", https://wiki.qemu.org/Features/LiveBlockMigration (accessed June 1, 2021)
    [8] "open(2) - Linux Programmer's Manual", http://www.kernel.org/doc/man-pages/online/pages/man2/open.2.html (accessed June 1, 2021)
    [9] "DVD Store Benchmark 3", http://github.com/dvdstore/ds3 (accessed June 1, 2021)
    [10] "Acme Air in NodeJS",
    https://github.com/acmeair/acmeair-nodejs (accessed June 1, 2021)
    [11] "Acme Air Workload driver".
    https://github.com/acmeair/acmeair-driver (accessed June 1, 2021)
    [12] "fio",
    https://github.com/axboe/fio (accessed June 1, 2021)
    [13] D. J. Scales, M. Nelson, and G. Venkitachalam, "The design and evaluation of a practical system for fault-tolerant virtual machines," Technical Report VMWare-RT-2010–001, VMWare, 2010.
    [14] A. Mashtizadeh, E. Celebi, T. Garfinkel, and M. Cai, ‘‘The design and evolution of live storage migration in VMware ESX,’’ in Proc. ATC, Jun. 2011, p. 14.
    [15] Dong, Yaozu & Ye, Wei & Jiang, YunHong & Pratt, Ian & Ma, ShiQing & Li, Jian & Guan, HaiBing. (2013). COLO: COarse-grained LOck-stepping virtual machines for non-stop service. Proceedings of the 4th Annual Symposium on Cloud Computing, SoCC 2013. 10.1145/2523616.2523630.
    [16] "Features/BlockReplication - QEMU", https://wiki.qemu.org/Features/BlockReplication (accessed June 1, 2021)
    [17] "Rapid VM Synchronization with I/O Emulation Logging-Replay", http://www.linux-kvm.org/images/5/5c/2011-forum-logging-replay.pdf (accessed June 1, 2021)
    [18] Y. Tamura, K. Sato, S. Kihara, and S. Moriai, "Kemari: Virtual machine synchronization for fault tolerance," in Proc. USENIX Annu. Tech. Conf.(Poster Session), 2008: Citeseer.
    [19] Brendan Cully, Geoffrey Lefebvre, Dutch Meyer, Mike Feeley, Norm Hutchinson, and Andrew Warfield. 2008. Remus: high availability via asynchronous virtual machine replication. In Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation (NSDI’08). USENIX Association, USA, 161–174.
    [20] Marcelo Pereira da Silva, Rafael Rodrigues Obelheiro & Guilherme Piegas Koslovski (2017) Adaptive Remus: adaptive checkpointing for Xen-based virtual machine replication, International Journal of Parallel, Emergent and Distributed Systems, 32:4, 348-367, DOI: 10.1080/17445760.2016.1162302
    [21] P. Reisner and L. Ellenberg, "DRBD v8: Replicated storage with shared disk semantics", in Proceedings of the 12th International Linux System Technology Conference (LinuxKongress), Hamburg, Germany, 2005.

    QR CODE
    :::