What is it about?

Data deduplication techniques construct an index consisting of fingerprint entries to identify and eliminate duplicated copies of repeating data. The bottleneck of disk-based index lookup and data fragmentation caused by eliminating duplicated chunks are two challenging issues in data deduplica-tion. Deduplication-based backup systems generally employ containers storing contiguous chunks together with their fingerprints to preserve data locality for alleviating the two issues, which is still inadequate.

Featured Image

Why is it important?

we propose a container utilization based hot finger-print entry distilling strategy to improve the performance of deduplication-based backup systems. We divide the index into three parts, namely, hot fingerprint entries, fragmented fingerprint en-tries, and useless fingerprint entries. A container with utilization smaller than a given threshold is called a sparse container. Fingerprint entries that point to non-sparse containers are hot finger-print entries. For the remaining fingerprint entries, if a fingerprint entry matches any fingerprint of forthcoming backup chunks, it is classified as a fragmented fingerprint entry. Otherwise, it is classified as a useless fingerprint entry.

Read the Original

This page is a summary of: Improving the Performance of Deduplication-Based Backup Systems via Container Utilization Based Hot Fingerprint Entry Distilling, ACM Transactions on Storage, November 2021, ACM (Association for Computing Machinery),
DOI: 10.1145/3459626.
You can read the full text:

Read

Contributors

The following have contributed to this page