| 42 | |
| 43 | == Downtime Evaluations == |
| 44 | Type I Downtime. Figures 4a, 4b, 4c, and 4d show the type I downtime com- |
| 45 | parison among FGBI, LLM, and Remus mechanisms under Apache, NPB-EP, |
| 46 | SPECweb, and SPECsys applications, respectively. The block size used in all |
| 47 | experiments is 64 bytes. For Remus and FGBI, the checkpointing period is the |
| 48 | time interval of system update migration, whereas for LLM, the checkpointing |
| 49 | period represents the interval of network buffer migration. By configuring the |
| 50 | same value for the checkpointing frequency of Remus/FGBI and the network |
| 51 | buffer frequency of LLM, we ensure the fairness of the comparison. We observe |
| 52 | that Figures 4a and 4b show a reverse relationship between FGBI and LLM. |
| 53 | Under Apache (Figure 4a), the network load is high but system updates are |
| 54 | rare. Therefore, LLM performs better than FGBI, since it uses a much higher |
| 55 | frequency to migrate the network service requests. On the other hand, when |
| 56 | running memory-intensive applications (Figure 4b and 4d), which involve high |
| 57 | computational loads, LLM endures a much longer downtime than FGBI (even |
| 58 | worse than Remus). |
| 59 | Although SPECweb is a web workload, it still has a high page modifi- |
| 60 | cation rate, which is approximately 12,000 pages/second [4]. In our experi- |
| 61 | ment, the 1 Gbps migration link is capable of transferring approximately 25,000 |
| 62 | pages/second. Thus, SPECweb is not a lightweight computational workload for |
| 63 | these migration mechanisms. As a result, the relationship between FGBI and |
| 64 | LLM in Figure 4c is more similar to that in Figure 4b (and also Figure 4d), |
| 65 | rather than Figure 4a. In conclusion, compared with LLM, FGBI reduces the |
| 66 | downtime by as much as 77%. Moreover, compared with Remus, FGBI yields a |
| 67 | shorter downtime, by as much as 31% under Apache, 45% under NPB-EP, 39% |
| 68 | under SPECweb, and 35% under SPECsys. |
| 69 | Type II Downtime. Table 1 shows the type II downtime comparison among |
| 70 | Remus, LLM, and FGBI mechanisms under different applications. We have three |
| 71 | main observations: (1) Their downtime results are very similar for "idle" run. |
| 72 | This is because Remus is a fast checkpointing mechanism and both LLM and |
| 73 | FGBI are based on it. There is rare memory update for \idle" run, so the type |
| 74 | II downtime in all three mechanisms is short. (2) When running NPB-EP ap- |
| 75 | plication, the guest VM memory is updated at high frequency. When saved for |
| 76 | the checkpoint, LLM takes much more time to save huge \dirty" data caused |
| 77 | by its low memory transfer frequency. Therefore in this case FGBI achieves a |
| 78 | much lower downtime than Remus (reduce more than 70%) and LLM (more |
| 79 | than 90%). (3) When running Apache application, the memory update is not so |
| 80 | much as that when running NPB, but the memory update is definitely more than |
| 81 | "idle" run. The downtime results shows FGBI still outperforms both Remus and |
| 82 | LLM. |
| 83 | |