Changes between Version 23 and Version 24 of VDEchp
- Timestamp:
- 10/06/11 01:19:56 (13 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
VDEchp
v23 v24 15 15 [[Image(figure1.jpg)]] 16 16 17 17 Figure 1. Execution cases under VDEchp. 18 18 19 19 In the VDEchp design, for each VM, the state of its stable copy is always one checkpoint interval behind the current VM’s state except the initial state. This means that, when a new checkpoint is generated, it is not copied to the stable copy immediately. Instead, the last checkpoint is copied to the stable copy. The reason is that, there is a latency between when an error occurs and when the failure caused by this error is detected. … … 45 45 [[Image(table1.jpg)]] 46 46 47 Table 1. Solo VM downtime comparison. 48 47 49 Table I shows the downtime results under different mechanisms. We compare VDEchp with Remus and the VNsnap-memory daemon, under the same checkpoint interval. We measure the downtime of all three mechanisms, with the same VM (with 512MB of RAM), for three cases: a) when the VM is idle, b) when the VM runs the NPB-EP benchmark program, and c) when the VM runs the Apache web server workload. 48 50 … … 60 62 [[Image(figure3.jpg)]] 61 63 64 Figure 3. VDE downtime under Apache and NPB benchmarks. 65 62 66 The VDE downtime is the time from when the failure was detected in the VDE until the entire VDE resumes from the last globally consistent checkpoint. We conducted experiments to measure the downtime. To induce failures in the VDE, we developed an application program that causes a segmentation failure after executing for a while. This program is launched on several VMs to generate a failure while the distributed application workload is running in the VDE. The protected VDE is then rolled back to the last globally consistent checkpoint. We use the NPB-EP program (MPI task in the VDE) and the Apache web server benchmark as the distributed workload on the protected VMs. 63 67