Context Navigation

Changes between Version 27 and Version 28 of VDEchp

Timestamp:: 10/06/11 22:59:40 (14 years ago)
Author:: lvpeng
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

VDEchp

-                      v27
+                      v28
              Table 1. Solo VM downtime comparison.
 Table I shows the downtime results under different mechanisms. We compare [wiki:VDEchp VDEchp] with [http://nss.cs.ubc.ca/remus/ Remus] and the [http://friends.cs.purdue.edu/dokuwiki/doku.php?id=vnsnap VNsnap]-memory daemon, under the same checkpoint interval. We measure the downtime of all three mechanisms, with the same VM (with 512MB of RAM), for three cases: a) when the VM is idle, b) when the VM runs the NPB-EP benchmark program, and c) when the VM runs the Apache web server workload.
+Table I shows the downtime results under different mechanisms. We compare [wiki:VDEchp VDEchp] with [http://nss.cs.ubc.ca/remus/ Remus] and the [http://friends.cs.purdue.edu/dokuwiki/doku.php?id=vnsnap VNsnap]-memory daemon, under the same checkpoint interval. We measure the downtime of all three mechanisms, with the same VM (with 512MB of RAM), for three cases: a) when the VM is idle, b) when the VM runs the [http://www.nas.nasa.gov/Resources/Software/npb.html NPB-EP] benchmark program, and c) when the VM runs the Apache web server workload.
 Several observations are in order regarding the downtime measurements.
 …
 First, the downtime results of all three mechanisms are short and very similar for the idle case. This is not surprising, as memory updates are rare during idle runs, so the downtime of all mechanisms is short and similar.
 Second, the downtime of both [wiki:VDEchp VDEchp] and [http://nss.cs.ubc.ca/remus/ Remus] remain almost the same when running NPB-EP and Apache. This is because, the downtime depends on the amount of memory remaining to be copied when the guest VM is suspended. Since both [wiki:VDEchp VDEchp] and [http://nss.cs.ubc.ca/remus/ Remus] use a high-frequency methodology, the dirty pages in the last round are almost the same.
+Second, the downtime of both [wiki:VDEchp VDEchp] and [http://nss.cs.ubc.ca/remus/ Remus] remain almost the same when running [http://www.nas.nasa.gov/Resources/Software/npb.html NPB-EP] and Apache. This is because, the downtime depends on the amount of memory remaining to be copied when the guest VM is suspended. Since both [wiki:VDEchp VDEchp] and [http://nss.cs.ubc.ca/remus/ Remus] use a high-frequency methodology, the dirty pages in the last round are almost the same.
 Third, when running the NPB-EP program, [wiki:VDEchp VDEchp] has lesser downtime than the [http://nss.cs.ubc.ca/remus/ Remus] and the [http://friends.cs.purdue.edu/dokuwiki/doku.php?id=vnsnap VNsnap]-memory daemon (reduction is more than 20%). This is because, NPB-EP is a computationally intensive workload. Thus, the guest VM memory is updated at high frequency. When saving the checkpoint, compared with other high-frequency checkpoint solutions, the [http://nss.cs.ubc.ca/remus/ Remus] and the [http://friends.cs.purdue.edu/dokuwiki/doku.php?id=vnsnap VNsnap]-memory daemon takes more time to save larger dirty data due to its low memory transfer frequency.
+Third, when running the [http://www.nas.nasa.gov/Resources/Software/npb.html NPB-EP] program, [wiki:VDEchp VDEchp] has lesser downtime than the [http://nss.cs.ubc.ca/remus/ Remus] and the [http://friends.cs.purdue.edu/dokuwiki/doku.php?id=vnsnap VNsnap]-memory daemon (reduction is more than 20%). This is because, [http://www.nas.nasa.gov/Resources/Software/npb.html NPB-EP] is a computationally intensive workload. Thus, the guest VM memory is updated at high frequency. When saving the checkpoint, compared with other high-frequency checkpoint solutions, the [http://nss.cs.ubc.ca/remus/ Remus] and the [http://friends.cs.purdue.edu/dokuwiki/doku.php?id=vnsnap VNsnap]-memory daemon takes more time to save larger dirty data due to its low memory transfer frequency.
 Finally, when running the Apache application, the memory update is not so much as that when running NPB. But the memory update is more than the idle run. The results show that [wiki:VDEchp VDEchp] has lower downtime than [http://nss.cs.ubc.ca/remus/ Remus] and the [http://friends.cs.purdue.edu/dokuwiki/doku.php?id=vnsnap VNsnap]-memory daemon (downtime is reduced by roughly 16%).
+Finally, when running the Apache application, the memory update is not so much as that when running [http://www.nas.nasa.gov/Resources/Software/npb.html NPB]. But the memory update is more than the idle run. The results show that [wiki:VDEchp VDEchp] has lower downtime than [http://nss.cs.ubc.ca/remus/ Remus] and the [http://friends.cs.purdue.edu/dokuwiki/doku.php?id=vnsnap VNsnap]-memory daemon (downtime is reduced by roughly 16%).
 === VDE Downtime ===
 [[Image(figure3.jpg)]]
             Figure 3. VDE downtime under Apache and NPB benchmarks.
+            Figure 3. VDE downtime under Apache and [http://www.nas.nasa.gov/Resources/Software/npb.html NPB] benchmarks.
 The VDE downtime is the time from when the failure was detected in the VDE until the entire VDE resumes from the last globally consistent checkpoint. We conducted experiments to measure the downtime. To induce failures in the VDE, we developed an application program that causes a segmentation failure after executing for a while. This program is launched on several VMs to generate a failure while the distributed application workload is running in the VDE. The protected VDE is then rolled back to the last globally consistent checkpoint. We use the NPB-EP program (MPI task in the VDE) and the Apache web server benchmark as the distributed workload on the protected VMs.
+The VDE downtime is the time from when the failure was detected in the VDE until the entire VDE resumes from the last globally consistent checkpoint. We conducted experiments to measure the downtime. To induce failures in the VDE, we developed an application program that causes a segmentation failure after executing for a while. This program is launched on several VMs to generate a failure while the distributed application workload is running in the VDE. The protected VDE is then rolled back to the last globally consistent checkpoint. We use the [http://www.nas.nasa.gov/Resources/Software/npb.html NPB-EP] program (MPI task in the VDE) and the Apache web server benchmark as the distributed workload on the protected VMs.
 Figure 3 shows the results. From the figure, we observe that, in our 36-node (VM) environment, the measured VDE downtime under [wiki:VDEchp VDEchp] ranges from 2.46 seconds to 4.77 seconds, with an average of 3.54 seconds. Another observation is that the VDE downtime in [wiki:VDEchp VDEchp] slightly increases as the checkpoint interval grows. This is because, the VDE downtime depends on the number of memory pages restored during recovery. Thus, as the checkpoint interval grows, the checkpoint size also grows, so does the number of restored pages during recovery.