Changes between Version 26 and Version 27 of FGBI
- Timestamp:
- 10/06/11 01:28:22 (13 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
FGBI
v26 v27 6 6 7 7 == Downtime Problem in [wiki:LLM LLM] == 8 [[Image(figure1.jpg)]][[BR]] 8 [[Image(figure1.jpg)]] 9 10 11 9 12 Downtime is the primary factor for estimating the high availability of a system, since any long downtime experience for clients may result in loss of client loyalty and thus revenue loss. Under the Primary-Backup model (Figure 1), there are two types of downtime: I) the time from when the primary host crashes until the VM resumes from the last checkpointed state on the backup host and starts to handle client requests (D1 = T3 - T1); II) the time from when the VM pauses on the primary (to save for the checkpoint) until it resumes (D2). From Jiang’s paper we observe that for memory-intensive workloads running on guest VMs (such as the highSys workload), LLM endures much longer type I downtime than Remus. This is because, these workloads update the guest memory at high frequency. On the other side, LLM migrates the guest VM image update (mostly from memory) at low frequency but uses input replay as an auxiliary. In this case, when failure happens, a significant number of memory updates are needed in order to ensure synchronization between the primary and backup hosts. Therefore, it needs significantly more time for the input replay process in order to resume the VM on the backup host and begin handling client requests. 10 13 … … 43 46 == Downtime Evaluations == 44 47 [[Image(figure2.jpg)]] 45 [[BR]] 46 Type I Downtime. Figures 4a, 4b, 4c, and 4d show the type I downtime com- 48 49 50 51 Figures 2a, 2b, 2c, and 2d show the type I downtime com- 47 52 parison among FGBI, LLM, and Remus mechanisms under Apache, NPB-EP, 48 53 SPECweb, and SPECsys applications, respectively. The block size used in all … … 52 57 same value for the checkpointing frequency of Remus/FGBI and the network 53 58 buffer frequency of LLM, we ensure the fairness of the comparison. We observe 54 that Figures 4a and 4b show a reverse relationship between FGBI and LLM.55 Under Apache (Figure 4a), the network load is high but system updates are59 that Figures 2a and 2b show a reverse relationship between FGBI and LLM. 60 Under Apache (Figure 2a), the network load is high but system updates are 56 61 rare. Therefore, LLM performs better than FGBI, since it uses a much higher 57 62 frequency to migrate the network service requests. On the other hand, when 58 running memory-intensive applications (Figure 4b and 4d), which involve high63 running memory-intensive applications (Figure 2b and 2d), which involve high 59 64 computational loads, LLM endures a much longer downtime than FGBI (even 60 65 worse than Remus). … … 65 70 pages/second. Thus, SPECweb is not a lightweight computational workload for 66 71 these migration mechanisms. As a result, the relationship between FGBI and 67 LLM in Figure 4c is more similar to that in Figure 4b (and also Figure 4d),68 rather than Figure 4a. In conclusion, compared with LLM, FGBI reduces the72 LLM in Figure 2c is more similar to that in Figure 2b (and also Figure 2d), 73 rather than Figure 2a. In conclusion, compared with LLM, FGBI reduces the 69 74 downtime by as much as 77%. Moreover, compared with Remus, FGBI yields a 70 75 shorter downtime, by as much as 31% under Apache, 45% under NPB-EP, 39% 71 76 under SPECweb, and 35% under SPECsys. 72 77 [[Image(table1.jpg)]] 73 [[BR]] 74 Type II Downtime. Table 1 shows the type II downtime comparison among 78 79 80 81 Table 1 shows the type II downtime comparison among 75 82 Remus, LLM, and FGBI mechanisms under different applications. We have three 76 83 main observations: (1) Their downtime results are very similar for "idle" run. … … 88 95 89 96 == Overhead == 90 [[Image(figure5.png)]] 91 [[BR]] 92 Figure 5a shows the overhead during VM migration. The figure compares the 97 [[Image(figure3.jpg)]] 98 99 100 101 Figure 3a shows the overhead during VM migration. The figure compares the 93 102 applications' runtime with and without migration, under Apache, SPECweb, 94 103 NPB-EP, and SPECsys, with the size of the fine-grained blocks varies from 64 … … 102 111 103 112 In order to understand the respective contributions of the three proposed 104 techniques (i.e., FGBI, sharing, and compression), Figure 5b shows the break-113 techniques (i.e., FGBI, sharing, and compression), Figure 3b shows the break- 105 114 down of the performance improvement among them under the NPB-EP bench- 106 115 mark. It compares the downtime between integrated FGBI (which we use for … … 114 123 than 10 times the downtime that FGBI incurs. 115 124 116 We observe from Figure 5b that if we just apply the FGBI mechanism without125 We observe from Figure 3b that if we just apply the FGBI mechanism without 117 126 integrating sharing or compression support, the downtime is reduced, compared 118 127 with that of Remus in Figure 4b, but it is not significant (reduction is no more