wiki:Evaluation

Version 2 (modified by lvpeng, 13 years ago) (diff)

--

Evaluation

Downtime

Type I Downtime. Figures 4a, 4b, 4c, and 4d show the type I downtime comparison among FGBI, LLM, and Remus mechanisms under Apache, NPB-EP, SPECweb, and SPECsys applications, respectively. The block size used in all experiments is 64 bytes. For Remus and FGBI, the checkpointing period is the time interval of system update migration, whereas for LLM, the checkpointing period represents the interval of network buffer migration. By configuring the same value for the checkpointing frequency of Remus/FGBI and the network buffer frequency of LLM, we ensure the fairness of the comparison. We observe that Figures 4a and 4b show a reverse relationship between FGBI and LLM. Under Apache (Figure 4a), the network load is high but system updates are rare. Therefore, LLM performs better than FGBI, since it uses a much higher frequency to migrate the network service requests. On the other hand, when running memory-intensive applications (Figure 4b and 4d), which involve high computational loads, LLM endures a much longer downtime than FGBI (even worse than Remus).

Although SPECweb is a web workload, it still has a high page modification rate, which is approximately 12,000 pages/second [4]. In our experiment, the 1 Gbps migration link is capable of transferring approximately 25,000 pages/second. Thus, SPECweb is not a lightweight computational workload for these migration mechanisms. As a result, the relationship between FGBI and LLM in Figure 4c is more similar to that in Figure 4b (and also Figure 4d), rather than like Figure 4a. In conclusion, compared with LLM, FGBI reduces the downtime by as much as 77%. Moreover, compared with Remus, FGBI yields a shorter downtime, by as much as 31% under Apache, 45% under NPB-EP, 39% under SPECweb, and 35% under SPECsys.

Type II Downtime. Table 1 shows the type II downtime comparison among Remus, LLM, and FGBI mechanisms under different applications.We have three main observations: (1) Their downtime results are very similar for “idle” run. This is because Remus is a fast checkpointing mechanism and both LLM and FGBI are based on it. There is rare memory update for “idle” run, so the type II downtime in all three mechanism is short. (2) When running NPB-EP application, the guest VM memory is updated at high frequency. When saved for the checkpoint, LLM need to take much more time to save huge “dirty” data caused by its low memory transfer frequency. Therefore in this case FGBI achieves a much lower downtime than Remus (reduce more than 70%) and LLM (more than 90%). (3) When running Apache application, the memory update isn’t so much as that when running NPB, but the memory update is definitely more than “idle” run. The downtime results shows FGBI still outperforms both Remus and LLM.

Overload

Figure 5a shows the overhead during VM migration. The figure compares the applications runtime with and without migration, under Apache, SPECweb, NPB-EP, and SPECsys, when the size of the fine-grained blocks is varied from 64 bytes to 128 bytes and 256 bytes. We observe that in all cases the overhead is low, no more than 13% (Apache with 64 bytes block). As we discuss in Section 3, the smaller the block size that FGBI chooses, the greater is the memory overhead that it introduces. In our experiments, the smaller block size that we chose is 64 bytes, so this is the worst case overhead compared with the other block sizes. Even in this “worst” case, under all these benchmarks, the overhead is less than 8.21%, on average.

In order to understand the respective contributions of the three proposed techniques (i.e., FGBI, sharing, and compression), Figure 5b shows the breakdown if the performance improvement among them under the NPB-EP benchmark. It compares the downtime between integrated FGBI (which we use for evaluation in this Section), FGBI with sharing but no compression support, FGBI with compression but no sharing support, and FGBI without sharing nor compression support, under the NPB-EP benchmark. As previously discussed, since NPB-EP is a memory-intensive workload, it should present a clear difference among the three techniques, all of which focus on reducing the memory-related overhead. We don’t include the downtime of LLM here, since for this compute-intensive benchmark, LLM incurs a very long downtime, which is more than 10 times the downtime that FGBI incurs.

We observe from Figure 5b that if we just apply the FGBI mechanism without integrating sharing or compression support, the downtime is reduced, compared with that of Remus in Figure 4b, but it is not significant (reduction is no more than twenty percent). However, compared with FGBI with no support, after integrating hybrid compression, FGBI further reduces the downtime, by as much as 22%. We also obtain a similar benefit after adding in the sharing support (downtime reduction is a further 26%). If we integrate both sharing and compression support, the downtime is reduced by as much as 33%, compared to FGBI without sharing or compression support.