Changes between Version 27 and Version 28 of VDEchp


Ignore:
Timestamp:
10/06/11 22:59:40 (13 years ago)
Author:
lvpeng
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • VDEchp

    v27 v28  
    4747             Table 1. Solo VM downtime comparison.
    4848
    49 Table I shows the downtime results under different mechanisms. We compare [wiki:VDEchp VDEchp] with [http://nss.cs.ubc.ca/remus/ Remus] and the [http://friends.cs.purdue.edu/dokuwiki/doku.php?id=vnsnap VNsnap]-memory daemon, under the same checkpoint interval. We measure the downtime of all three mechanisms, with the same VM (with 512MB of RAM), for three cases: a) when the VM is idle, b) when the VM runs the NPB-EP benchmark program, and c) when the VM runs the Apache web server workload.
     49Table I shows the downtime results under different mechanisms. We compare [wiki:VDEchp VDEchp] with [http://nss.cs.ubc.ca/remus/ Remus] and the [http://friends.cs.purdue.edu/dokuwiki/doku.php?id=vnsnap VNsnap]-memory daemon, under the same checkpoint interval. We measure the downtime of all three mechanisms, with the same VM (with 512MB of RAM), for three cases: a) when the VM is idle, b) when the VM runs the [http://www.nas.nasa.gov/Resources/Software/npb.html NPB-EP] benchmark program, and c) when the VM runs the Apache web server workload.
    5050
    5151Several observations are in order regarding the downtime measurements.
     
    5353First, the downtime results of all three mechanisms are short and very similar for the idle case. This is not surprising, as memory updates are rare during idle runs, so the downtime of all mechanisms is short and similar.
    5454
    55 Second, the downtime of both [wiki:VDEchp VDEchp] and [http://nss.cs.ubc.ca/remus/ Remus] remain almost the same when running NPB-EP and Apache. This is because, the downtime depends on the amount of memory remaining to be copied when the guest VM is suspended. Since both [wiki:VDEchp VDEchp] and [http://nss.cs.ubc.ca/remus/ Remus] use a high-frequency methodology, the dirty pages in the last round are almost the same.
     55Second, the downtime of both [wiki:VDEchp VDEchp] and [http://nss.cs.ubc.ca/remus/ Remus] remain almost the same when running [http://www.nas.nasa.gov/Resources/Software/npb.html NPB-EP] and Apache. This is because, the downtime depends on the amount of memory remaining to be copied when the guest VM is suspended. Since both [wiki:VDEchp VDEchp] and [http://nss.cs.ubc.ca/remus/ Remus] use a high-frequency methodology, the dirty pages in the last round are almost the same.
    5656
    57 Third, when running the NPB-EP program, [wiki:VDEchp VDEchp] has lesser downtime than the [http://nss.cs.ubc.ca/remus/ Remus] and the [http://friends.cs.purdue.edu/dokuwiki/doku.php?id=vnsnap VNsnap]-memory daemon (reduction is more than 20%). This is because, NPB-EP is a computationally intensive workload. Thus, the guest VM memory is updated at high frequency. When saving the checkpoint, compared with other high-frequency checkpoint solutions, the [http://nss.cs.ubc.ca/remus/ Remus] and the [http://friends.cs.purdue.edu/dokuwiki/doku.php?id=vnsnap VNsnap]-memory daemon takes more time to save larger dirty data due to its low memory transfer frequency.
     57Third, when running the [http://www.nas.nasa.gov/Resources/Software/npb.html NPB-EP] program, [wiki:VDEchp VDEchp] has lesser downtime than the [http://nss.cs.ubc.ca/remus/ Remus] and the [http://friends.cs.purdue.edu/dokuwiki/doku.php?id=vnsnap VNsnap]-memory daemon (reduction is more than 20%). This is because, [http://www.nas.nasa.gov/Resources/Software/npb.html NPB-EP] is a computationally intensive workload. Thus, the guest VM memory is updated at high frequency. When saving the checkpoint, compared with other high-frequency checkpoint solutions, the [http://nss.cs.ubc.ca/remus/ Remus] and the [http://friends.cs.purdue.edu/dokuwiki/doku.php?id=vnsnap VNsnap]-memory daemon takes more time to save larger dirty data due to its low memory transfer frequency.
    5858
    59 Finally, when running the Apache application, the memory update is not so much as that when running NPB. But the memory update is more than the idle run. The results show that [wiki:VDEchp VDEchp] has lower downtime than [http://nss.cs.ubc.ca/remus/ Remus] and the [http://friends.cs.purdue.edu/dokuwiki/doku.php?id=vnsnap VNsnap]-memory daemon (downtime is reduced by roughly 16%).
     59Finally, when running the Apache application, the memory update is not so much as that when running [http://www.nas.nasa.gov/Resources/Software/npb.html NPB]. But the memory update is more than the idle run. The results show that [wiki:VDEchp VDEchp] has lower downtime than [http://nss.cs.ubc.ca/remus/ Remus] and the [http://friends.cs.purdue.edu/dokuwiki/doku.php?id=vnsnap VNsnap]-memory daemon (downtime is reduced by roughly 16%).
    6060
    6161=== VDE Downtime ===
    6262[[Image(figure3.jpg)]]
    6363
    64             Figure 3. VDE downtime under Apache and NPB benchmarks.
     64            Figure 3. VDE downtime under Apache and [http://www.nas.nasa.gov/Resources/Software/npb.html NPB] benchmarks.
    6565
    66 The VDE downtime is the time from when the failure was detected in the VDE until the entire VDE resumes from the last globally consistent checkpoint. We conducted experiments to measure the downtime. To induce failures in the VDE, we developed an application program that causes a segmentation failure after executing for a while. This program is launched on several VMs to generate a failure while the distributed application workload is running in the VDE. The protected VDE is then rolled back to the last globally consistent checkpoint. We use the NPB-EP program (MPI task in the VDE) and the Apache web server benchmark as the distributed workload on the protected VMs.
     66The VDE downtime is the time from when the failure was detected in the VDE until the entire VDE resumes from the last globally consistent checkpoint. We conducted experiments to measure the downtime. To induce failures in the VDE, we developed an application program that causes a segmentation failure after executing for a while. This program is launched on several VMs to generate a failure while the distributed application workload is running in the VDE. The protected VDE is then rolled back to the last globally consistent checkpoint. We use the [http://www.nas.nasa.gov/Resources/Software/npb.html NPB-EP] program (MPI task in the VDE) and the Apache web server benchmark as the distributed workload on the protected VMs.
    6767
    6868Figure 3 shows the results. From the figure, we observe that, in our 36-node (VM) environment, the measured VDE downtime under [wiki:VDEchp VDEchp] ranges from 2.46 seconds to 4.77 seconds, with an average of 3.54 seconds. Another observation is that the VDE downtime in [wiki:VDEchp VDEchp] slightly increases as the checkpoint interval grows. This is because, the VDE downtime depends on the number of memory pages restored during recovery. Thus, as the checkpoint interval grows, the checkpoint size also grows, so does the number of restored pages during recovery.