| 研究生: |
李承恩 cheng-en Li |
|---|---|
| 論文名稱: |
利用備援實體連線改善虛擬機器容錯之錯誤與 Split-Brain 偵測 Efficient Detection Mechanism for VM failures and Split-Brain Using Backup Physical Connection |
| 指導教授: |
王尉任
Wei-Jen Wang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 中文 |
| 論文頁數: | 47 |
| 中文關鍵詞: | QEMU-KVM 、虛擬機器 、容錯系統 、Split-Brain |
| 外文關鍵詞: | QEMU-KVM, virtual machine, fault-tolerant system, Split-Brain |
| 相關次數: | 點閱:18 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來隨著資訊產業的發展,以及各大企業對於資訊科技的依賴性提高,全世界正在快速的發展雲端計算相關研究,來提供企業雲端服務並且提升生產力,然而一套完整且值得企業購買的雲端服務,必然要達到高可用性 ( availability ) 的服務。容錯機制的加入可以提供雲端服務做到高可用性,當主要進行服務的機器發生故障或錯誤時,備援機器會快速接管原來的服務,以維持正常運作來達到高可靠性。本研究基於 NCU M-FTVM 的研究,在原始錯誤檢測架構下新增 Heartbeat 機制,主要機器定期在新增的網路通道傳送 Heartbeat 封包到備援機器上,提供備援機器確認主要機器的存活狀態的依據,並且探討新增 Heartbeat 機制後的 NCU M-FTVM 系統在所有可能發生錯誤的情況,進行錯誤注入的分析。另外針對 NCU M-FTVM 錯誤偵測及 Split-Brain 狀態的情況進行討論。
In recent years, with the development of the information industry and the increasing dependence of major enterprises on information technology, the world is rapidly developing cloud computing-related research to provide enterprise cloud services and improve productivity. However, a complete set of Cloud services are bound to achieve high availability services. The addition of a fault tolerance mechanism can provide cloud services with high availability. When the primary machine fails, the backup machine will quickly takeover the service to maintain normal operation to achieve high availability. Based on the research of NCU M-FTVM, this research setup a backup physical connection with the original fault detection framework. The primary machine periodically sends the Heartbeat packets to the backup machine through the newly added connection channel, providing the backup machine with the basis for confirming the status of the primary machine. And analyze all possible failures in the NCU M-FTVM system after Heartbeat connection been setup, then inject faults in the system. In addition, the original NCU M-FTVM fault detection and potential of Split-Brain status is discussed.
參考文獻
1. JoSEP, A.D., et al., A view of cloud computing. Communications of the ACM, 2010. 53(4): p. 50-58.
2. Gray, J. and D.P. Siewiorek, High-availability computer systems. Computer, 1991. 24(9): p. 39-48.
3. Chiueh, S.N.T.-c. and S. Brook, A survey on virtualization technologies. Rpe Report, 2005. 142.
4. Manohar, N. A survey of virtualization techniques in cloud computing. in Proceedings of international conference on vlsi, communication, advanced devices, signals & systems and networking (vcasan-2013). 2013. Springer.
5. Wulf, C., M. Willig, and D. Göhringer. A Survey on Hypervisor-based Virtualization of Embedded Reconfigurable Systems. in 2021 31st International Conference on Field-Programmable Logic and Applications (FPL). 2021. IEEE.
6. Cheraghlou, M.N., A. Khadem-Zadeh, and M. Haghparast, A survey of fault tolerance architecture in cloud computing. Journal of Network and Computer Applications, 2016. 61: p. 81-92.
7. Bellard, F. QEMU, a fast and portable dynamic translator. in USENIX annual technical conference, FREENIX Track. 2005. Califor-nia, USA.
8. Elhage, N., Virtunoid: Breaking out of KVM. Black Hat USA, 2011.
9. Isermann, R., Fault-diagnosis systems: an introduction from fault detection to fault tolerance. 2005: Springer Science & Business Media.
10. Burke, J., R. Rygaard, and S. Stathatos, Split-Brain Consensus. On A Raft Up Split Creek Without A Paddle, 2014.
11. Drives. What is Split-Brain and why do you need to worry about it? 2022 ; Available from: https://www.45drives.com/community/articles/what-is-split-brain/.[accessed June 7, 2022]
12. Vissers, C.A., et al., Specification styles in distributed systems design and verification. Theoretical Computer Science, 1991. 89(1): p. 179-206.
13. Alquraan, A., et al. An Analysis of {Network-Partitioning} Failures in Cloud Systems. in 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 2018.
14. Amir, Y. and A. Wool. Evaluating quorum systems over the internet. in Proceedings of Annual Symposium on Fault Tolerant Computing. 1996. IEEE.
15. Pawlowski, B., et al. The NFS version 4 protocol. in In Proceedings of the 2nd International System Administration and Networking Conference (SANE 2000. 2000.
16. VMware, I. and R. Calculator, VMware. 2018.
17. Guthrie, F., S. Lowe, and K. Coleman, VMware vSphere design. 2013: John Wiley & Sons.
18. Collins, L., ◾ Virtual Private Cloud, in Security in the Private Cloud. 2016, CRC Press. p. 137-164.
19. Richly, M. and C. Schuster, Dependable Systems 2010 Virtualization Fault Tolerance. Hasso Plattner Institute, 2010.
20. Leners, J.B., et al. Taming uncertainty in distributed systems with help from the network. in Proceedings of the Tenth European Conference on Computer Systems. 2015.
21. Wang, C.-Y. and D.J. Buehrer. A ring-based decentralized collaborative non-blocking atomic commit protocol. in 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. 2008. IEEE.
22. Saha, I., D. Mukhopadhyay, and S. Banerjee. Designing reliable architecture for stateful fault tolerance. in 2006 Seventh International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT'06). 2006. IEEE.
23. Lamport, L., R. Shostak, and M. Pease, The Byzantine generals problem, in Concurrency: the Works of Leslie Lamport. 2019. p. 203-226.
24. Castro, M. and B. Liskov. Practical byzantine fault tolerance. in OsDI. 1999.
25. Arlat, J., et al., Fault injection for dependability validation: A methodology and some applications. IEEE Transactions on software engineering, 1990. 16(2): p. 166-182..