What is it about?

Inline deduplication removes redundant data in real-time as data is being sent to the storage system. However, it causes data fragmentation: logically consecutive chunks are physically scattered across various containers after data deduplication. Many rewrite algorithms aim to alleviate the performance degradation due to frag-mentation by rewriting fragmented duplicate chunks as unique chunks into new containers. Unfortunately, these algorithms determine whether a chunk is fragmented based on a simple pre-set fixed value, ignoring the variance of data characteristics between data segments. Accordingly, when backups are restored, they often fail to select an appropriate set of old containers for rewrite, generating a substantial number of invalid chunks in retrieved containers.

Featured Image

Why is it important?

we propose an inline deduplication approach for storage systems, called InDe, which uses a greedy algorithm to detect valid container utilization and dynamically adjusts the number of old con-tainer references in each segment. InDe fully leverages the distribution of duplicated chunks to improve the restore performance while maintaining high backup performance.

Read the Original

This page is a summary of: InDe: An inline data deduplication approach via adaptive detection of valid container utilization, ACM Transactions on Storage, November 2022, ACM (Association for Computing Machinery), DOI: 10.1145/3568426.
You can read the full text:



The following have contributed to this page