Re: [PATCH 1/1] mm: slub: fix corrupted freechain in deactivate_slab()

From: Dongli Zhang
Date: Fri Apr 17 2020 - 21:59:04 EST




On 4/17/20 6:12 PM, Andrew Morton wrote:
> On Mon, 30 Mar 2020 20:14:50 -0700 Dongli Zhang <dongli.zhang@xxxxxxxxxx> wrote:
>
>> The slub_debug is able to fix the corrupted slab freelist/page. However,
>> alloc_debug_processing() only checks the validity of current and next
>> freepointer during allocation path. As a result, once some objects have
>> their freepointers corrupted, deactivate_slab() may lead to page fault.
>>
>> Below is from a test kernel module when
>> 'slub_debug=PUF,kmalloc-128 slub_nomerge'. The test kernel corrupts the
>> freepointer of one free object on purpose. Unfortunately, deactivate_slab()
>> does not detect it when iterating the freechain.
>>
>> ...
>>
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -2082,6 +2082,20 @@ static void deactivate_slab(struct kmem_cache *s, struct page *page,
>> void *prior;
>> unsigned long counters;
>>
>> + if ((s->flags & SLAB_CONSISTENCY_CHECKS) &&
>> + !check_valid_pointer(s, page, nextfree)) {
>> + /*
>> + * If 'nextfree' is invalid, it is possible that
>> + * the object at 'freelist' is already corrupted.
>> + * Therefore, all objects starting at 'freelist'
>> + * are isolated.
>> + */
>> + object_err(s, page, freelist, "Freechain corrupt");
>> + freelist = NULL;
>> + slab_fix(s, "Isolate corrupted freechain");
>> + break;
>> + }
>> +
>> do {
>> prior = page->freelist;
>> counters = page->counters;
>
> We could do it this way:
>
> --- a/mm/slub.c~mm-slub-fix-corrupted-freechain-in-deactivate_slab-fix
> +++ a/mm/slub.c
> @@ -2083,6 +2083,7 @@ static void deactivate_slab(struct kmem_
> void *prior;
> unsigned long counters;
>
> +#ifdef CONFIG_SLAB_DEBUG
> if ((s->flags & SLAB_CONSISTENCY_CHECKS) &&
> !check_valid_pointer(s, page, nextfree)) {
> /*
> @@ -2096,6 +2097,7 @@ static void deactivate_slab(struct kmem_
> slab_fix(s, "Isolate corrupted freechain");
> break;
> }
> +#endif
>
> do {
> prior = page->freelist;
>
> But it's a bit ugly. How about this?

Sorry that I did not realize check_valid_pointer() requires CONFIG_SLAB_DEBUG.

Yes, it is much better to encapsulate it into freelist_corrupted() and just
return false when CONFIG_SLAB_DEBUG is not involved. The check_object() has
similar implementation.

Should I resend with your "Signed-off-by" or you would just fix it when applying?

It is the first time I submit a patch to mm so that I am not familiar with the
mm policy/process.

Thank you very much for the feedback!

Dongli Zhang

>
> --- a/mm/slub.c~mm-slub-fix-corrupted-freechain-in-deactivate_slab-fix
> +++ a/mm/slub.c
> @@ -650,6 +650,20 @@ static void slab_bug(struct kmem_cache *
> va_end(args);
> }
>
> +static bool freelist_corrupted(struct kmem_cache *s, struct page *page,
> + void *freelist, void *nextfree)
> +{
> + if ((s->flags & SLAB_CONSISTENCY_CHECKS) &&
> + !check_valid_pointer(s, page, nextfree)) {
> + object_err(s, page, freelist, "Freechain corrupt");
> + freelist = NULL;
> + slab_fix(s, "Isolate corrupted freechain");
> + return true;
> + }
> +
> + return false;
> +}
> +
> static void slab_fix(struct kmem_cache *s, char *fmt, ...)
> {
> struct va_format vaf;
> @@ -1400,6 +1414,11 @@ static inline void inc_slabs_node(struct
> static inline void dec_slabs_node(struct kmem_cache *s, int node,
> int objects) {}
>
> +static bool freelist_corrupted(struct kmem_cache *s, struct page *page,
> + void *freelist, void *nextfree)
> +{
> + return false;
> +}
> #endif /* CONFIG_SLUB_DEBUG */
>
> /*
> @@ -2083,19 +2102,13 @@ static void deactivate_slab(struct kmem_
> void *prior;
> unsigned long counters;
>
> - if ((s->flags & SLAB_CONSISTENCY_CHECKS) &&
> - !check_valid_pointer(s, page, nextfree)) {
> - /*
> - * If 'nextfree' is invalid, it is possible that
> - * the object at 'freelist' is already corrupted.
> - * Therefore, all objects starting at 'freelist'
> - * are isolated.
> - */
> - object_err(s, page, freelist, "Freechain corrupt");
> - freelist = NULL;
> - slab_fix(s, "Isolate corrupted freechain");
> + /*
> + * If 'nextfree' is invalid, it is possible that the object at
> + * 'freelist' is already corrupted. So isolate all objects
> + * starting at 'freelist'.
> + */
> + if (freelist_corrupted(s, page, freelist, nextfree))
> break;
> - }
>
> do {
> prior = page->freelist;
> _
>