提高以使用内联块为基础的重复数据删除的备份系统的恢复速度
Improving Restore Speed for Backup Systems that Use Inline Chunk-Based Deduplication
关键词:重复数据删除;碎片;恢复;缓存;离线缓存
摘 要:Slow restoration due to chunk fragmentation is a serious problem facing inline chunk-based data deduplication systems: restore speeds for the most recent backup can drop orders of magnitude over the lifetime of a system. We study three techniques--increasing cache size, container capping, and using a forward assembly area--for alleviating this problem. Container capping is an ingest-time operation that reduces chunk fragmentation at the cost of forfeiting some deduplication, while using a forward assembly area is a new restore-time caching and prefetching technique that exploits the perfect knowledge of future chunk accesses available when restoring a backup to reduce the amount of RAM required for a given level of caching at restore time. We show that using a larger cache per stream--we see continuing benefits even up to 8 GB--can produce up to a 5-16X improvement, that giving up as little as 8% deduplication with capping can yield a 2-6X improvement, and that using a forward assembly area is strictly superior to LRU, able to yield a 2-4X improvement while holding the RAM budget constant.