10 | 10 | Remus, built on top of the well-known Xen hypervisor, provides transparent, comprehensive high availability by using a checkpointing method under the Primary-Backup model (Figure 1). It checkpoints the runningVMon the primary host, and transfers the latest checkpoint to the backup host as whole-system migration. Once the primary host fails, the backup host will take over the service based on the latest checkpoint. Remus proves that it is possible to create a general, fully transparent, high-availability solution entirely in software. However, checkpointing at high frequency will introduce significant overhead, since significant CPU and memory resources are consumed by the migration. Therefore, clients endure a long network delay. Jiang et. al. proposed an integrated live migration mechanism, called LLM, which integrates both whole-system checkpointing and input replay to reduce the network delay in Remus. The basic idea is that the primary host migrates the guest VM image (including CPU/memory status updates and new writes to the file system) to the backup host at low frequency. In the meanwhile, the service requests from network clients are migrated at high frequency, and because of this, LLM significantly outperforms Remus in terms of network delay by more than 90%. |