19 | 19 | For example, in Figure 3, an error happens at time t0 and causes the system to fail at time t1. Since most error latency is small, in most cases, t1 - t0 < Te. In the Case A, the latest checkpoint is chp1, and the system needs to roll back to the state S1 by resuming from the checkpoint chp1. However, in the Case B, an error happens at time t2, and then a new checkpoint chp3 is saved. After the system moves to the state S3, this error causes the system to fail at time t3. Here, we assume that t3 - t2 < Te. But, if we choose chp3 as the latest correct checkpoint and roll the system back to state S3, after resuming, the system will fail again. We can see that, in this case, the latest checkpoint should be chp2, and when the system crashes, we should roll it back to state S2, by resuming from the checkpoint chp2. |