Re: [PATCH 00/10] Hardening page _refcount

From: Matthew Wilcox
Date: Wed Dec 08 2021 - 16:06:09 EST


On Wed, Dec 08, 2021 at 08:35:34PM +0000, Pasha Tatashin wrote:
> It is hard to root cause _refcount problems, because they usually
> manifest after the damage has occurred. Yet, they can lead to
> catastrophic failures such memory corruptions. There were a number
> of refcount related issues discovered recently [1], [2], [3].
>
> Improve debugability by adding more checks that ensure that
> page->_refcount never turns negative (i.e. double free does not
> happen, or free after freeze etc).
>
> - Check for overflow and underflow right from the functions that
> modify _refcount
> - Remove set_page_count(), so we do not unconditionally overwrite
> _refcount with an unrestrained value
> - Trace return values in all functions that modify _refcount

You're doing a lot more atomic instructions with these patches. Have you
done any performance measurements with these patches applied and debug
disabled? I'm really not convinced it's worth closing
one-instruction-wide races of this kind when they are "shouldn't ever
happen" situations. If the debugging will catch the problem in 99.99%
of cases and miss 0.01% without using atomic instructions, that seems
like a better set of tradeoffs than catching 100% of problems by using
the atomic instructions.