Changes between Version 33 and Version 34 of FGBI


Ignore:
Timestamp:
10/06/11 02:04:11 (13 years ago)
Author:
lvpeng
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • FGBI

    v33 v34  
    1010            Figure 1. Primary-Backup model and the downtime problem.
    1111
    12 Downtime is the primary factor for estimating the high availability of a system, since any long downtime experience for clients may result in loss of client loyalty and thus revenue loss. Under the Primary-Backup model (Figure 1), there are two types of downtime: I) the time from when the primary host crashes until the VM resumes from the last checkpointed state on the backup host and starts to handle client requests (D1 = T3 - T1); II) the time from when the VM pauses on the primary (to save for the checkpoint) until it resumes (D2). From Jiang’s paper we observe that for memory-intensive workloads running on guest VMs (such as the highSys workload), [wiki:LLM LLM] endures much longer type I downtime than Remus. This is because, these workloads update the guest memory at high frequency. On the other side, [wiki:LLM LLM] migrates the guest VM image update (mostly from memory) at low frequency but uses input replay as an auxiliary. In this case, when failure happens, a significant number of memory updates are needed in order to ensure synchronization between the primary and backup hosts. Therefore, it needs significantly more time for the input replay process in order to resume the VM on the backup host and begin handling client requests.
     12Downtime is the primary factor for estimating the high availability of a system, since any long downtime experience for clients may result in loss of client loyalty and thus revenue loss. Under the Primary-Backup model (Figure 1), there are two types of downtime: I) the time from when the primary host crashes until the VM resumes from the last checkpointed state on the backup host and starts to handle client requests (D1 = T3 - T1); II) the time from when the VM pauses on the primary (to save for the checkpoint) until it resumes (D2). From Jiang’s paper we observe that for memory-intensive workloads running on guest VMs (such as the highSys workload), [wiki:LLM LLM] endures much longer type I downtime than [http://nss.cs.ubc.ca/remus/ Remus]. This is because, these workloads update the guest memory at high frequency. On the other side, [wiki:LLM LLM] migrates the guest VM image update (mostly from memory) at low frequency but uses input replay as an auxiliary. In this case, when failure happens, a significant number of memory updates are needed in order to ensure synchronization between the primary and backup hosts. Therefore, it needs significantly more time for the input replay process in order to resume the VM on the backup host and begin handling client requests.
    1313
    1414Regarding the type II downtime, there are several migration epochs between two checkpoints, and the newly updated memory data is copied to the backup host at each epoch. At the last epoch, the VM running on the primary host is suspended and the remaining memory states are transferred to the backup host. Thus, the type II downtime depends on the amount of memory that remains to be copied and transferred when pausing the VM on the primary host. If we reduce the dirty data which need to be transferred at the last epoch, then we can reduce the type II downtime. Moreover, if we reduce the dirty data which needs to be transferred at each epoch, trying to synchronize the memory state between primary and backup host all the time, then at the last epoch, there won’t be too much new memory update that need to be transferred, so we can reduce the type I downtime too.
    1515
    16 == FGBI Design ==
    17 Therefore, in order to achieve HA in these virtualized systems, especially to address the downtime problem under memory-intensive workloads, we propose a memory synchronization technique for tracking memory updates, called Fine-Grained Block Identification (or FGBI). Remus and [wiki:LLM LLM] track memory updates by keeping evidence of the dirty pages
    18 at each migration epoch. Remus uses the same page size as Xen (for x86, this is
     16== [wiki:FGBI FGBI] Design ==
     17Therefore, in order to achieve HA in these virtualized systems, especially to address the downtime problem under memory-intensive workloads, we propose a memory synchronization technique for tracking memory updates, called Fine-Grained Block Identification (or [wiki:FGBI FGBI]). [http://nss.cs.ubc.ca/remus/ Remus] and [wiki:LLM LLM] track memory updates by keeping evidence of the dirty pages
     18at each migration epoch. [http://nss.cs.ubc.ca/remus/ Remus] uses the same page size as Xen (for x86, this is
    19194KB), which is also the granularity for detecting memory changes. However, this
    2020mechanism is not efficient. For instance, no matter what changes an application
     
    2525blocks.
    2626
    27 We propose the FGBI mechanism which uses memory blocks (smaller than
     27We propose the [wiki:FGBI FGBI] mechanism which uses memory blocks (smaller than
    2828page sizes) as the granularity for detecting memory changes. FBGI calculates
    2929the hash value for each memory block at the beginning of each migration epoch.
    30 Then it uses the same mechanism as Remus to detect dirty pages. However, at the
    31 end of each epoch, instead of transferring the whole dirty page, FGBI computes
     30Then it uses the same mechanism as [http://nss.cs.ubc.ca/remus/ Remus] to detect dirty pages. However, at the
     31end of each epoch, instead of transferring the whole dirty page, [wiki:FGBI FGBI] computes
    3232new hash values for each block and compares them with the corresponding old
    3333values. Blocks are only modified if their corresponding hash values do not match.
    34 Therefore, FGBI marks such blocks as dirty and replaces the old hash values with
    35 the new ones. Afterwards, FGBI only transfers dirty blocks to the backup host.
    36 However, because of using block granularity, FGBI introduces new overhead.
     34Therefore, [wiki:FGBI FGBI] marks such blocks as dirty and replaces the old hash values with
     35the new ones. Afterwards, [wiki:FGBI FGBI] only transfers dirty blocks to the backup host.
     36However, because of using block granularity, [wiki:FGBI FGBI] introduces new overhead.
    3737If we want to accurately approximate the true dirty region, we need to set the
    3838block size as small as possible. For example, to obtain the highest accuracy,
     
    5050
    5151Figures 2a, 2b, 2c, and 2d show the type I downtime com-
    52 parison among FGBI, [wiki:LLM LLM], and Remus mechanisms under Apache, NPB-EP,
     52parison among [wiki:FGBI FGBI], [wiki:LLM LLM], and [http://nss.cs.ubc.ca/remus/ Remus] mechanisms under Apache, NPB-EP,
    5353SPECweb, and SPECsys applications, respectively. The block size used in all
    54 experiments is 64 bytes. For Remus and FGBI, the checkpointing period is the
     54experiments is 64 bytes. For [http://nss.cs.ubc.ca/remus/ Remus] and [wiki:FGBI FGBI], the checkpointing period is the
    5555time interval of system update migration, whereas for [wiki:LLM LLM], the checkpointing
    5656period represents the interval of network buffer migration. By configuring the
    57 same value for the checkpointing frequency of Remus/FGBI and the network
     57same value for the checkpointing frequency of [http://nss.cs.ubc.ca/remus/ Remus]/[wiki:FGBI FGBI] and the network
    5858buffer frequency of [wiki:LLM LLM], we ensure the fairness of the comparison. We observe
    59 that Figures 2a and 2b show a reverse relationship between FGBI and [wiki:LLM LLM].
     59that Figures 2a and 2b show a reverse relationship between [wiki:FGBI FGBI] and [wiki:LLM LLM].
    6060Under Apache (Figure 2a), the network load is high but system updates are
    61 rare. Therefore, [wiki:LLM LLM] performs better than FGBI, since it uses a much higher
     61rare. Therefore, [wiki:LLM LLM] performs better than [wiki:FGBI FGBI], since it uses a much higher
    6262frequency to migrate the network service requests. On the other hand, when
    6363running memory-intensive applications (Figure 2b and 2d), which involve high
    64 computational loads, [wiki:LLM LLM] endures a much longer downtime than FGBI (even
    65 worse than Remus).
     64computational loads, [wiki:LLM LLM] endures a much longer downtime than [wiki:FGBI FGBI] (even
     65worse than [http://nss.cs.ubc.ca/remus/ Remus]).
    6666
    6767Although SPECweb is a web workload, it still has a high page modifi-
     
    6969ment, the 1 Gbps migration link is capable of transferring approximately 25,000
    7070pages/second. Thus, SPECweb is not a lightweight computational workload for
    71 these migration mechanisms. As a result, the relationship between FGBI and
     71these migration mechanisms. As a result, the relationship between [wiki:FGBI FGBI] and
    7272[wiki:LLM LLM] in Figure 2c is more similar to that in Figure 2b (and also Figure 2d),
    73 rather than Figure 2a. In conclusion, compared with [wiki:LLM LLM], FGBI reduces the
    74 downtime by as much as 77%. Moreover, compared with Remus, FGBI yields a
     73rather than Figure 2a. In conclusion, compared with [wiki:LLM LLM], [wiki:FGBI FGBI] reduces the
     74downtime by as much as 77%. Moreover, compared with [http://nss.cs.ubc.ca/remus/ Remus], [wiki:FGBI FGBI] yields a
    7575shorter downtime, by as much as 31% under Apache, 45% under NPB-EP, 39%
    7676under SPECweb, and 35% under SPECsys.
     
    8080
    8181Table 1 shows the type II downtime comparison among
    82 Remus, [wiki:LLM LLM], and FGBI mechanisms under different applications. We have three
     82[http://nss.cs.ubc.ca/remus/ Remus], [wiki:LLM LLM], and [wiki:FGBI FGBI] mechanisms under different applications. We have three
    8383main observations: (1) Their downtime results are very similar for "idle" run.
    84 This is because Remus is a fast checkpointing mechanism and both [wiki:LLM LLM] and
    85 FGBI are based on it. There is rare memory update for "idle" run, so the type
     84This is because [http://nss.cs.ubc.ca/remus/ Remus] is a fast checkpointing mechanism and both [wiki:LLM LLM] and [wiki:FGBI FGBI] are based on it. There is rare memory update for "idle" run, so the type
    8685II downtime in all three mechanisms is short. (2) When running NPB-EP ap-
    8786plication, the guest VM memory is updated at high frequency. When saved for
    8887the checkpoint, [wiki:LLM LLM] takes much more time to save huge "dirty" data caused
    89 by its low memory transfer frequency. Therefore in this case FGBI achieves a
    90 much lower downtime than Remus (reduce more than 70%) and [wiki:LLM LLM] (more
     88by its low memory transfer frequency. Therefore in this case [wiki:FGBI FGBI] achieves a
     89much lower downtime than [http://nss.cs.ubc.ca/remus/ Remus] (reduce more than 70%) and [wiki:LLM LLM] (more
    9190than 90%). (3) When running Apache application, the memory update is not so
    9291much as that when running NPB, but the memory update is definitely more than
    93 "idle" run. The downtime results shows FGBI still outperforms both Remus and
     92"idle" run. The downtime results shows [wiki:FGBI FGBI] still outperforms both [http://nss.cs.ubc.ca/remus/ Remus] and
    9493[wiki:LLM LLM].
    9594
     
    104103bytes to 128 bytes and 256 bytes. We observe that in all cases the overhead is
    105104low, no more than 13% (Apache with 64 bytes block). As we discuss in Section 3,
    106 the smaller the block size that FGBI chooses, the greater is the memory overhead
     105the smaller the block size that [wiki:FGBI FGBI] chooses, the greater is the memory overhead
    107106that it introduces. In our experiments, the smaller block size that we chose is 64
    108107bytes, so this is the worst case overhead compared with the other block sizes.
     
    111110
    112111In order to understand the respective contributions of the three proposed
    113 techniques (i.e., FGBI, sharing, and compression), Figure 3b shows the break-
     112techniques (i.e., [wiki:FGBI FGBI], sharing, and compression), Figure 3b shows the break-
    114113down of the performance improvement among them under the NPB-EP bench-
    115 mark. It compares the downtime between integrated FGBI (which we use for
    116 evaluation in this Section), FGBI with sharing but no compression support,
    117 FGBI with compression but no sharing support, and FGBI without sharing nor
     114mark. It compares the downtime between integrated [wiki:FGBI FGBI] (which we use for
     115evaluation in this Section), [wiki:FGBI FGBI] with sharing but no compression support,
     116[wiki:FGBI FGBI] with compression but no sharing support, and [wiki:FGBI FGBI] without sharing nor
    118117compression support, under the NPB-EP benchmark. As previously discussed,
    119 since NPB-EP is a memory-intensive workload, it should present a clear differ-
    120 ence among the three techniques, all of which focus on reducing the memory-
     118since NPB-EP is a memory-intensive workload, it should present a clear difference among the three techniques, all of which focus on reducing the memory-
    121119related overhead. We do not include the downtime of [wiki:LLM LLM] here, since for this
    122120compute-intensive benchmark, [wiki:LLM LLM] incurs a very long downtime, which is more
    123 than 10 times the downtime that FGBI incurs.
     121than 10 times the downtime that [wiki:FGBI FGBI] incurs.
    124122
    125 We observe from Figure 3b that if we just apply the FGBI mechanism without
     123We observe from Figure 3b that if we just apply the [wiki:FGBI FGBI] mechanism without
    126124integrating sharing or compression support, the downtime is reduced, compared
    127 with that of Remus in Figure 4b, but it is not significant (reduction is no more
    128 than twenty percent). However, compared with FGBI with no support, after in-
    129 tegrating hybrid compression, FGBI further reduces the downtime, by as much
     125with that of [http://nss.cs.ubc.ca/remus/ Remus] in Figure 3b, but it is not significant (reduction is no more
     126than twenty percent). However, compared with [wiki:FGBI FGBI] with no support, after integrating hybrid compression, [wiki:FGBI FGBI] further reduces the downtime, by as much
    130127as 22%. We also obtain a similar benefit after adding the sharing support (down-
    131128time reduction is a further 26%). If we integrate both sharing and compression
    132 support, the downtime is reduced by as much as 33%, compared to FGBI without
     129support, the downtime is reduced by as much as 33%, compared to [wiki:FGBI FGBI] without
    133130sharing or compression support.