关键词:路由器性能;交叉数据;节点
摘 要:High capacity, high throughput, chunk-based inline deduplication systems for backup have been commercially successful, but scaling them out has proved challenging. In such multi-node systems, the data needs to be routed at a large enough granularity to sustain locality at the back ends. Two routing algorithms, Min Hash and Auction, have been put forth for this purpose. We demonstrate that these algorithms perform poorly on interleaved data. Interleaved data occurs when multiple streams are multiplexed into a single high-speed stream to speed up backups. Of particular commercial importance, database backup procedures produce such interleaved data, where multiple threads read database files in parallel. We present a new routing algorithm, Sticky Auction routing, that, unlike existing algorithms, handles interleaved data with little deduplication loss. It also achieves comparable or better deduplication performance for non-interleaved data and good load balancing, especially when multiple streams are used, the typical case.