Changes between Version 23 and Version 24 of VDEchp


Ignore:
Timestamp:
10/06/11 01:19:56 (13 years ago)
Author:
lvpeng
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • VDEchp

    v23 v24  
    1515[[Image(figure1.jpg)]]
    1616
    17                    Figure 1. Execution cases under VDEchp.
     17              Figure 1. Execution cases under VDEchp.
    1818
    1919In the VDEchp design, for each VM, the state of its stable copy is always one checkpoint interval behind the current VM’s state except the initial state. This means that, when a new checkpoint is generated, it is not copied to the stable copy immediately. Instead, the last checkpoint is copied to the stable copy. The reason is that, there is a latency between when an error occurs and when the failure caused by this error is detected.
     
    4545[[Image(table1.jpg)]]
    4646
     47             Table 1. Solo VM downtime comparison.
     48
    4749Table I shows the downtime results under different mechanisms. We compare VDEchp with Remus and the VNsnap-memory daemon, under the same checkpoint interval. We measure the downtime of all three mechanisms, with the same VM (with 512MB of RAM), for three cases: a) when the VM is idle, b) when the VM runs the NPB-EP benchmark program, and c) when the VM runs the Apache web server workload.
    4850
     
    6062[[Image(figure3.jpg)]]
    6163
     64            Figure 3. VDE downtime under Apache and NPB benchmarks.
     65
    6266The VDE downtime is the time from when the failure was detected in the VDE until the entire VDE resumes from the last globally consistent checkpoint. We conducted experiments to measure the downtime. To induce failures in the VDE, we developed an application program that causes a segmentation failure after executing for a while. This program is launched on several VMs to generate a failure while the distributed application workload is running in the VDE. The protected VDE is then rolled back to the last globally consistent checkpoint. We use the NPB-EP program (MPI task in the VDE) and the Apache web server benchmark as the distributed workload on the protected VMs.
    6367