关键词:检查点;云服务;故障恢复;效率;可伸缩性
摘 要:Checkpoint replication is a prevalent way of maintaining virtual machine availability in the presence of host failures. Since checkpoint replication can impose heavy load on network resources, checkpoint compression has been suggested to reduce network usage. This paper presents the first detailed evaluation and characterization of the effectiveness and overheads of checkpoint compression methods for various workloads frequently seen in high-availability systems. We propose a lightweight compression method that exploits similarities in checkpoints to eliminate redundant network traffic, and compare it with two well-known methods, gzip and delta compression. Our results show that gzip and delta compression reduce network traffic significantly for various workloads, but incur high CPU and memory overheads, respectively. The proposed similarity compression is most effective for VM clusters running homogeneous workloads, while using both CPU and memory efficiently. Based on our extensive evaluation, we suggest guidelines for selecting and using these compression methods.