Changes between Version 21 and Version 22 of FGBI


Ignore:
Timestamp:
09/28/11 00:53:29 (13 years ago)
Author:
lvpeng
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • FGBI

    v21 v22  
    66
    77== Downtime Problem in [wiki:LLM LLM] ==
    8 [[Image(figure1.jpg)]]
     8[[Image(figure1.jpg)]][[BR]]
    99Downtime is the primary factor for estimating the high availability of a system, since any long downtime experience for clients may result in loss of client loyalty and thus revenue loss. Under the Primary-Backup model (Figure 1), there are two types of downtime: I) the time from when the primary host crashes until the VM resumes from the last checkpointed state on the backup host and starts to handle client requests (D1 = T3 - T1); II) the time from when the VM pauses on the primary (to save for the checkpoint) until it resumes (D2). From Jiang’s paper we observe that for memory-intensive workloads running on guest VMs (such as the highSys workload), LLM endures much longer type I downtime than Remus. This is because, these workloads update the guest memory at high frequency. On the other side, LLM migrates the guest VM image update (mostly from memory) at low frequency but uses input replay as an auxiliary. In this case, when failure happens, a significant number of memory updates are needed in order to ensure synchronization between the primary and backup hosts. Therefore, it needs significantly more time for the input replay process in order to resume the VM on the backup host and begin handling client requests.
    1010
     
    4242
    4343== Downtime Evaluations ==
     44
     45[[BR]]
    4446Type I Downtime. Figures 4a, 4b, 4c, and 4d show the type I downtime com-
    4547parison among FGBI, LLM, and Remus mechanisms under Apache, NPB-EP,