3 | | Traditional xen-based systems track memory updates by keeping evidence of the dirty pages at each migration epoch. For example, Remus uses the same page size as Xen (for x86, this is 4KB), which is also the granularity for detecting memory changes. [wiki:FGBI FGBI] (Fine-Grained Block Idenification) is a mechanism which uses smaller memory blocks (smaller than page sizes) as the granularity for detecting memory changes. [wiki:FGBI FGBI] calculates the hash value for each memory block at the beginning of each migration epoch. At the end of each epoch, instead of transferring the whole dirty page, [wiki:FGBI FGBI] computes new hash values for each block and compares them with the corresponding old values. Blocks are only modified if their corresponding hash values don’t match. Therefore, [wiki:FGBI FGBI] marks such blocks as dirty and replaces the old hash values with the new ones. Afterwards, [wiki:FGBI FGBI] only transfers dirty blocks to the backup host. |
| 3 | Traditional xen-based systems track memory updates by keeping evidence of the dirty pages at each migration epoch. For example, Remus uses the same page size as Xen (for x86, this is 4KB), which is also the granularity for detecting memory changes. [wiki:FGBI FGBI] (Fine-Grained Block Identification) is a mechanism which uses smaller memory blocks (smaller than page sizes) as the granularity for detecting memory changes. [wiki:FGBI FGBI] calculates the hash value for each memory block at the beginning of each migration epoch. At the end of each epoch, instead of transferring the whole dirty page, [wiki:FGBI FGBI] computes new hash values for each block and compares them with the corresponding old values. Blocks are only modified if their corresponding hash values don’t match. Therefore, [wiki:FGBI FGBI] marks such blocks as dirty and replaces the old hash values with the new ones. Afterwards, [wiki:FGBI FGBI] only transfers dirty blocks to the backup host. |
11 | | Regarding the type II downtime, there are several migration epochs between two checkpoint, and the newly updated memory data is copied to the backup host at each epoch. At the last epoch, the VM running on the primary host is suspended and the remaining memory states are transfered to the backup host. Thus, the type II downtime depends on the amount of memory that remains to be copied and transferred when pausing the VM on the primary host. If we reduce the dirty data which need to be transferred at the last epoch, then we can reduce the type II downtime. Moreover, if we reduce the dirty data which needs to be transferred at each epoch, trying to synchronize the memory state between primary and backup host all the time, then at the last epoch, there won’t be too much new memory update that need to be transferred, so we can reduce the type I downtime too. |
| 11 | Regarding the type II downtime, there are several migration epochs between two checkpoints, and the newly updated memory data is copied to the backup host at each epoch. At the last epoch, the VM running on the primary host is suspended and the remaining memory states are transferred to the backup host. Thus, the type II downtime depends on the amount of memory that remains to be copied and transferred when pausing the VM on the primary host. If we reduce the dirty data which need to be transferred at the last epoch, then we can reduce the type II downtime. Moreover, if we reduce the dirty data which needs to be transferred at each epoch, trying to synchronize the memory state between primary and backup host all the time, then at the last epoch, there won’t be too much new memory update that need to be transferred, so we can reduce the type I downtime too. |
| 25 | |
| 26 | We propose the FGBI mechanism which uses memory blocks (smaller than |
| 27 | page sizes) as the granularity for detecting memory changes. FBGI calculates |
| 28 | the hash value for each memory block at the beginning of each migration epoch. |
| 29 | Then it uses the same mechanism as Remus to detect dirty pages. However, at the |
| 30 | end of each epoch, instead of transferring the whole dirty page, FGBI computes |
| 31 | new hash values for each block and compares them with the corresponding old |
| 32 | values. Blocks are only modified if their corresponding hash values do not match. |
| 33 | Therefore, FGBI marks such blocks as dirty and replaces the old hash values with |
| 34 | the new ones. Afterwards, FGBI only transfers dirty blocks to the backup host. |
| 35 | However, because of using block granularity, FGBI introduces new overhead. |
| 36 | If we want to accurately approximate the true dirty region, we need to set the |
| 37 | block size as small as possible. For example, to obtain the highest accuracy, |
| 38 | the best block size is one bit. That is impractical because it requires storing an |
| 39 | additional bit for each bit in memory, which means that we need to double the |
| 40 | main memory. Thus, a smaller block size leads to a greater number of blocks and |
| 41 | also requires more memory for storing the hash values. Based on these past eorts |
| 42 | illustrating the memory saving potential, we present two supporting |
| 43 | techniques: block sharing and hybrid compression. |