Re: [PATCH] erofs-utils: avoid redundant memcpy and sha256() for dedupe

From: zhaoyifan (H)
Date: Fri Aug 15 2025 - 06:02:45 EST


Hi Zijie,

It would be quite appreciated if you could help us polish the multithreading -Ededupe implementation. I will try to rebase the existing code to the latest codebase ASAP.

You could find the design decision in multithreading -Ededupe in this paper:

https://dl.acm.org/doi/pdf/10.1145/3671016.3671395


Thanks,

Yifan

On 2025/8/15 17:54, wangzijie wrote:
Hi Zijie,

On 2025/8/15 16:44, wangzijie wrote:
We have already use xxh64() for filtering first for dedupe, when we
need to skip the same xxh64 hash, no need to do memcpy and sha256(),
relocate the code to avoid it.

Signed-off-by: wangzijie <wangzijie1@xxxxxxxxx>
Thanks for the patch, it makes sense to me since we only keep one
record according to xxh64 (instead of sha256) for now:

Reviewed-by: Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx>

Although I think multi-threaded deduplication is more useful, see:
https://github.com/erofs/erofs-utils/issues/25
but I'm not sure if you're interested in it... ;-)
Hi Xiang,
Thank you for providing this information, I want to optimize mkfs time with
dedupe option and send this patch. I will find time to research Yifan's demo
of multi-threaded deduplication and try to provide some help.

Thanks,
Gao Xiang