Changes between Version 28 and Version 29 of VDEchp
- Timestamp:
- 10/06/11 23:04:04 (13 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
VDEchp
v28 v29 47 47 Table 1. Solo VM downtime comparison. 48 48 49 Table I shows the downtime results under different mechanisms. We compare [wiki:VDEchp VDEchp] with [http://nss.cs.ubc.ca/remus/ Remus] and the [http://friends.cs.purdue.edu/dokuwiki/doku.php?id=vnsnap VNsnap]-memory daemon, under the same checkpoint interval. We measure the downtime of all three mechanisms, with the same VM (with 512MB of RAM), for three cases: a) when the VM is idle, b) when the VM runs the [http://www.nas.nasa.gov/Resources/Software/npb.html NPB-EP] benchmark program, and c) when the VM runs the Apacheweb server workload.49 Table I shows the downtime results under different mechanisms. We compare [wiki:VDEchp VDEchp] with [http://nss.cs.ubc.ca/remus/ Remus] and the [http://friends.cs.purdue.edu/dokuwiki/doku.php?id=vnsnap VNsnap]-memory daemon, under the same checkpoint interval. We measure the downtime of all three mechanisms, with the same VM (with 512MB of RAM), for three cases: a) when the VM is idle, b) when the VM runs the [http://www.nas.nasa.gov/Resources/Software/npb.html NPB-EP] benchmark program, and c) when the VM runs the [http://httpd.apache.org/ Apache] web server workload. 50 50 51 51 Several observations are in order regarding the downtime measurements. … … 53 53 First, the downtime results of all three mechanisms are short and very similar for the idle case. This is not surprising, as memory updates are rare during idle runs, so the downtime of all mechanisms is short and similar. 54 54 55 Second, the downtime of both [wiki:VDEchp VDEchp] and [http://nss.cs.ubc.ca/remus/ Remus] remain almost the same when running [http://www.nas.nasa.gov/Resources/Software/npb.html NPB-EP] and Apache. This is because, the downtime depends on the amount of memory remaining to be copied when the guest VM is suspended. Since both [wiki:VDEchp VDEchp] and [http://nss.cs.ubc.ca/remus/ Remus] use a high-frequency methodology, the dirty pages in the last round are almost the same.55 Second, the downtime of both [wiki:VDEchp VDEchp] and [http://nss.cs.ubc.ca/remus/ Remus] remain almost the same when running [http://www.nas.nasa.gov/Resources/Software/npb.html NPB-EP] and [http://httpd.apache.org/ Apache]. This is because, the downtime depends on the amount of memory remaining to be copied when the guest VM is suspended. Since both [wiki:VDEchp VDEchp] and [http://nss.cs.ubc.ca/remus/ Remus] use a high-frequency methodology, the dirty pages in the last round are almost the same. 56 56 57 57 Third, when running the [http://www.nas.nasa.gov/Resources/Software/npb.html NPB-EP] program, [wiki:VDEchp VDEchp] has lesser downtime than the [http://nss.cs.ubc.ca/remus/ Remus] and the [http://friends.cs.purdue.edu/dokuwiki/doku.php?id=vnsnap VNsnap]-memory daemon (reduction is more than 20%). This is because, [http://www.nas.nasa.gov/Resources/Software/npb.html NPB-EP] is a computationally intensive workload. Thus, the guest VM memory is updated at high frequency. When saving the checkpoint, compared with other high-frequency checkpoint solutions, the [http://nss.cs.ubc.ca/remus/ Remus] and the [http://friends.cs.purdue.edu/dokuwiki/doku.php?id=vnsnap VNsnap]-memory daemon takes more time to save larger dirty data due to its low memory transfer frequency. 58 58 59 Finally, when running the Apacheapplication, the memory update is not so much as that when running [http://www.nas.nasa.gov/Resources/Software/npb.html NPB]. But the memory update is more than the idle run. The results show that [wiki:VDEchp VDEchp] has lower downtime than [http://nss.cs.ubc.ca/remus/ Remus] and the [http://friends.cs.purdue.edu/dokuwiki/doku.php?id=vnsnap VNsnap]-memory daemon (downtime is reduced by roughly 16%).59 Finally, when running the [http://httpd.apache.org/ Apache] application, the memory update is not so much as that when running [http://www.nas.nasa.gov/Resources/Software/npb.html NPB]. But the memory update is more than the idle run. The results show that [wiki:VDEchp VDEchp] has lower downtime than [http://nss.cs.ubc.ca/remus/ Remus] and the [http://friends.cs.purdue.edu/dokuwiki/doku.php?id=vnsnap VNsnap]-memory daemon (downtime is reduced by roughly 16%). 60 60 61 61 === VDE Downtime === 62 62 [[Image(figure3.jpg)]] 63 63 64 Figure 3. VDE downtime under Apacheand [http://www.nas.nasa.gov/Resources/Software/npb.html NPB] benchmarks.64 Figure 3. VDE downtime under [http://httpd.apache.org/ Apache] and [http://www.nas.nasa.gov/Resources/Software/npb.html NPB] benchmarks. 65 65 66 The VDE downtime is the time from when the failure was detected in the VDE until the entire VDE resumes from the last globally consistent checkpoint. We conducted experiments to measure the downtime. To induce failures in the VDE, we developed an application program that causes a segmentation failure after executing for a while. This program is launched on several VMs to generate a failure while the distributed application workload is running in the VDE. The protected VDE is then rolled back to the last globally consistent checkpoint. We use the [http://www.nas.nasa.gov/Resources/Software/npb.html NPB-EP] program (MPI task in the VDE) and the Apacheweb server benchmark as the distributed workload on the protected VMs.66 The VDE downtime is the time from when the failure was detected in the VDE until the entire VDE resumes from the last globally consistent checkpoint. We conducted experiments to measure the downtime. To induce failures in the VDE, we developed an application program that causes a segmentation failure after executing for a while. This program is launched on several VMs to generate a failure while the distributed application workload is running in the VDE. The protected VDE is then rolled back to the last globally consistent checkpoint. We use the [http://www.nas.nasa.gov/Resources/Software/npb.html NPB-EP] program (MPI task in the VDE) and the [http://httpd.apache.org/ Apache] web server benchmark as the distributed workload on the protected VMs. 67 67 68 68 Figure 3 shows the results. From the figure, we observe that, in our 36-node (VM) environment, the measured VDE downtime under [wiki:VDEchp VDEchp] ranges from 2.46 seconds to 4.77 seconds, with an average of 3.54 seconds. Another observation is that the VDE downtime in [wiki:VDEchp VDEchp] slightly increases as the checkpoint interval grows. This is because, the VDE downtime depends on the number of memory pages restored during recovery. Thus, as the checkpoint interval grows, the checkpoint size also grows, so does the number of restored pages during recovery.