Context Navigation

Changes between Version 26 and Version 27 of FGBI

Timestamp:: 10/06/11 01:28:22 (14 years ago)
Author:: lvpeng
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

FGBI

-                      v26
+                      v27
 == Downtime Problem in [wiki:LLM LLM] ==
+[[Image(figure1.jpg)]][[BR]]
+[[Image(figure1.jpg)]]
 Downtime is the primary factor for estimating the high availability of a system, since any long downtime experience for clients may result in loss of client loyalty and thus revenue loss. Under the Primary-Backup model (Figure 1), there are two types of downtime: I) the time from when the primary host crashes until the VM resumes from the last checkpointed state on the backup host and starts to handle client requests (D1 = T3 - T1); II) the time from when the VM pauses on the primary (to save for the checkpoint) until it resumes (D2). From Jiang’s paper we observe that for memory-intensive workloads running on guest VMs (such as the highSys workload), LLM endures much longer type I downtime than Remus. This is because, these workloads update the guest memory at high frequency. On the other side, LLM migrates the guest VM image update (mostly from memory) at low frequency but uses input replay as an auxiliary. In this case, when failure happens, a significant number of memory updates are needed in order to ensure synchronization between the primary and backup hosts. Therefore, it needs significantly more time for the input replay process in order to resume the VM on the backup host and begin handling client requests.
 …
 == Downtime Evaluations ==
 [[Image(figure2.jpg)]]
+[[BR]]
+Type I Downtime. Figures 4a, 4b, 4c, and 4d show the type I downtime com-
+Figures 2a, 2b, 2c, and 2d show the type I downtime com-
 parison among FGBI, LLM, and Remus mechanisms under Apache, NPB-EP,
 SPECweb, and SPECsys applications, respectively. The block size used in all
 …
 same value for the checkpointing frequency of Remus/FGBI and the network
 buffer frequency of LLM, we ensure the fairness of the comparison. We observe
 that Figures 4a and 4b show a reverse relationship between FGBI and LLM.
 Under Apache (Figure 4a), the network load is high but system updates are
+that Figures 2a and 2b show a reverse relationship between FGBI and LLM.
+Under Apache (Figure 2a), the network load is high but system updates are
 rare. Therefore, LLM performs better than FGBI, since it uses a much higher
 frequency to migrate the network service requests. On the other hand, when
 running memory-intensive applications (Figure 4b and 4d), which involve high
+running memory-intensive applications (Figure 2b and 2d), which involve high
 computational loads, LLM endures a much longer downtime than FGBI (even
 worse than Remus).
 …
 pages/second. Thus, SPECweb is not a lightweight computational workload for
 these migration mechanisms. As a result, the relationship between FGBI and
 LLM in Figure 4c is more similar to that in Figure 4b (and also Figure 4d),
 rather than Figure 4a. In conclusion, compared with LLM, FGBI reduces the
+LLM in Figure 2c is more similar to that in Figure 2b (and also Figure 2d),
+rather than Figure 2a. In conclusion, compared with LLM, FGBI reduces the
 downtime by as much as 77%. Moreover, compared with Remus, FGBI yields a
 shorter downtime, by as much as 31% under Apache, 45% under NPB-EP, 39%
 under SPECweb, and 35% under SPECsys.
 [[Image(table1.jpg)]]
+[[BR]]
+Type II Downtime. Table 1 shows the type II downtime comparison among
+Table 1 shows the type II downtime comparison among
 Remus, LLM, and FGBI mechanisms under different applications. We have three
 main observations: (1) Their downtime results are very similar for "idle" run.
 …
 == Overhead ==
+[[Image(figure5.png)]]
+[[BR]]
+Figure 5a shows the overhead during VM migration. The figure compares the
+[[Image(figure3.jpg)]]
+Figure 3a shows the overhead during VM migration. The figure compares the
 applications' runtime with and without migration, under Apache, SPECweb,
 NPB-EP, and SPECsys, with the size of the fine-grained blocks varies from 64
 …
 In order to understand the respective contributions of the three proposed
 techniques (i.e., FGBI, sharing, and compression), Figure 5b shows the break-
+techniques (i.e., FGBI, sharing, and compression), Figure 3b shows the break-
 down of the performance improvement among them under the NPB-EP bench-
 mark. It compares the downtime between integrated FGBI (which we use for
 …
 than 10 times the downtime that FGBI incurs.
 We observe from Figure 5b that if we just apply the FGBI mechanism without
+We observe from Figure 3b that if we just apply the FGBI mechanism without
 integrating sharing or compression support, the downtime is reduced, compared
 with that of Remus in Figure 4b, but it is not significant (reduction is no more