Changes between Version 26 and Version 27 of FGBI


Ignore:
Timestamp:
10/06/11 01:28:22 (13 years ago)
Author:
lvpeng
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • FGBI

    v26 v27  
    66
    77== Downtime Problem in [wiki:LLM LLM] ==
    8 [[Image(figure1.jpg)]][[BR]]
     8[[Image(figure1.jpg)]]
     9
     10
     11
    912Downtime is the primary factor for estimating the high availability of a system, since any long downtime experience for clients may result in loss of client loyalty and thus revenue loss. Under the Primary-Backup model (Figure 1), there are two types of downtime: I) the time from when the primary host crashes until the VM resumes from the last checkpointed state on the backup host and starts to handle client requests (D1 = T3 - T1); II) the time from when the VM pauses on the primary (to save for the checkpoint) until it resumes (D2). From Jiang’s paper we observe that for memory-intensive workloads running on guest VMs (such as the highSys workload), LLM endures much longer type I downtime than Remus. This is because, these workloads update the guest memory at high frequency. On the other side, LLM migrates the guest VM image update (mostly from memory) at low frequency but uses input replay as an auxiliary. In this case, when failure happens, a significant number of memory updates are needed in order to ensure synchronization between the primary and backup hosts. Therefore, it needs significantly more time for the input replay process in order to resume the VM on the backup host and begin handling client requests.
    1013
     
    4346== Downtime Evaluations ==
    4447[[Image(figure2.jpg)]]
    45 [[BR]]
    46 Type I Downtime. Figures 4a, 4b, 4c, and 4d show the type I downtime com-
     48
     49
     50
     51Figures 2a, 2b, 2c, and 2d show the type I downtime com-
    4752parison among FGBI, LLM, and Remus mechanisms under Apache, NPB-EP,
    4853SPECweb, and SPECsys applications, respectively. The block size used in all
     
    5257same value for the checkpointing frequency of Remus/FGBI and the network
    5358buffer frequency of LLM, we ensure the fairness of the comparison. We observe
    54 that Figures 4a and 4b show a reverse relationship between FGBI and LLM.
    55 Under Apache (Figure 4a), the network load is high but system updates are
     59that Figures 2a and 2b show a reverse relationship between FGBI and LLM.
     60Under Apache (Figure 2a), the network load is high but system updates are
    5661rare. Therefore, LLM performs better than FGBI, since it uses a much higher
    5762frequency to migrate the network service requests. On the other hand, when
    58 running memory-intensive applications (Figure 4b and 4d), which involve high
     63running memory-intensive applications (Figure 2b and 2d), which involve high
    5964computational loads, LLM endures a much longer downtime than FGBI (even
    6065worse than Remus).
     
    6570pages/second. Thus, SPECweb is not a lightweight computational workload for
    6671these migration mechanisms. As a result, the relationship between FGBI and
    67 LLM in Figure 4c is more similar to that in Figure 4b (and also Figure 4d),
    68 rather than Figure 4a. In conclusion, compared with LLM, FGBI reduces the
     72LLM in Figure 2c is more similar to that in Figure 2b (and also Figure 2d),
     73rather than Figure 2a. In conclusion, compared with LLM, FGBI reduces the
    6974downtime by as much as 77%. Moreover, compared with Remus, FGBI yields a
    7075shorter downtime, by as much as 31% under Apache, 45% under NPB-EP, 39%
    7176under SPECweb, and 35% under SPECsys.
    7277[[Image(table1.jpg)]]
    73 [[BR]]
    74 Type II Downtime. Table 1 shows the type II downtime comparison among
     78
     79
     80
     81Table 1 shows the type II downtime comparison among
    7582Remus, LLM, and FGBI mechanisms under different applications. We have three
    7683main observations: (1) Their downtime results are very similar for "idle" run.
     
    8895
    8996== Overhead ==
    90 [[Image(figure5.png)]]
    91 [[BR]]
    92 Figure 5a shows the overhead during VM migration. The figure compares the
     97[[Image(figure3.jpg)]]
     98
     99
     100
     101Figure 3a shows the overhead during VM migration. The figure compares the
    93102applications' runtime with and without migration, under Apache, SPECweb,
    94103NPB-EP, and SPECsys, with the size of the fine-grained blocks varies from 64
     
    102111
    103112In order to understand the respective contributions of the three proposed
    104 techniques (i.e., FGBI, sharing, and compression), Figure 5b shows the break-
     113techniques (i.e., FGBI, sharing, and compression), Figure 3b shows the break-
    105114down of the performance improvement among them under the NPB-EP bench-
    106115mark. It compares the downtime between integrated FGBI (which we use for
     
    114123than 10 times the downtime that FGBI incurs.
    115124
    116 We observe from Figure 5b that if we just apply the FGBI mechanism without
     125We observe from Figure 3b that if we just apply the FGBI mechanism without
    117126integrating sharing or compression support, the downtime is reduced, compared
    118127with that of Remus in Figure 4b, but it is not significant (reduction is no more