Re: [PATCH v2] rbtree: fix the red root

From: Qian Cai
Date: Sun Jan 13 2019 - 21:36:56 EST




On 1/13/19 9:20 PM, David Lechner wrote:
> On 1/11/19 8:58 PM, Michel Lespinasse wrote:
>> On Fri, Jan 11, 2019 at 3:47 PM David Lechner <david@xxxxxxxxxxxxxx> wrote:
>>>
>>> On 1/11/19 2:58 PM, Qian Cai wrote:
>>>> A GPF was reported,
>>>>
>>>> kasan: CONFIG_KASAN_INLINE enabled
>>>> kasan: GPF could be caused by NULL-ptr deref or user memory access
>>>> general protection fault: 0000 [#1] SMP KASAN
>>>> ÂÂÂÂÂÂÂÂÂ kasan_die_handler.cold.22+0x11/0x31
>>>> ÂÂÂÂÂÂÂÂÂ notifier_call_chain+0x17b/0x390
>>>> ÂÂÂÂÂÂÂÂÂ atomic_notifier_call_chain+0xa7/0x1b0
>>>> ÂÂÂÂÂÂÂÂÂ notify_die+0x1be/0x2e0
>>>> ÂÂÂÂÂÂÂÂÂ do_general_protection+0x13e/0x330
>>>> ÂÂÂÂÂÂÂÂÂ general_protection+0x1e/0x30
>>>> ÂÂÂÂÂÂÂÂÂ rb_insert_color+0x189/0x1480
>>>> ÂÂÂÂÂÂÂÂÂ create_object+0x785/0xca0
>>>> ÂÂÂÂÂÂÂÂÂ kmemleak_alloc+0x2f/0x50
>>>> ÂÂÂÂÂÂÂÂÂ kmem_cache_alloc+0x1b9/0x3c0
>>>> ÂÂÂÂÂÂÂÂÂ getname_flags+0xdb/0x5d0
>>>> ÂÂÂÂÂÂÂÂÂ getname+0x1e/0x20
>>>> ÂÂÂÂÂÂÂÂÂ do_sys_open+0x3a1/0x7d0
>>>> ÂÂÂÂÂÂÂÂÂ __x64_sys_open+0x7e/0xc0
>>>> ÂÂÂÂÂÂÂÂÂ do_syscall_64+0x1b3/0x820
>>>> ÂÂÂÂÂÂÂÂÂ entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>>
>>>> It turned out,
>>>>
>>>> gparent = rb_red_parent(parent);
>>>> tmp = gparent->rb_right; <-- GPF was triggered here.
>>>>
>>>> Apparently, "gparent" is NULL which indicates "parent" is rbtree's root
>>>> which is red. Otherwise, it will be treated properly a few lines above.
>>>>
>>>> /*
>>>> ÂÂ * If there is a black parent, we are done.
>>>> ÂÂ * Otherwise, take some corrective action as,
>>>> ÂÂ * per 4), we don't want a red root or two
>>>> ÂÂ * consecutive red nodes.
>>>> ÂÂ */
>>>> if(rb_is_black(parent))
>>>> ÂÂÂÂÂÂ break;
>>>>
>>>> Hence, it violates the rule #1 (the root can't be red) and need a fix
>>>> up, and also add a regression test for it. This looks like was
>>>> introduced by 6d58452dc06 where it no longer always paint the root as
>>>> black.
>>>>
>>>> Fixes: 6d58452dc06 (rbtree: adjust root color in rb_insert_color() only
>>>> when necessary)
>>>> Reported-by: Esme <esploit@xxxxxxxxxxxxx>
>>>> Tested-by: Joey Pabalinas <joeypabalinas@xxxxxxxxx>
>>>> Signed-off-by: Qian Cai <cai@xxxxxx>
>>>> ---
>>>
>>> Tested-by: David Lechner <david@xxxxxxxxxxxxxx>
>>> FWIW, this fixed the following crash for me:
>>>
>>> Unable to handle kernel NULL pointer dereference at virtual address 00000004
>>
>> Just to clarify, do you have a way to reproduce this crash without the fix ?
>
> I am starting to suspect that my crash was caused by some new code
> in the drm-misc-next tree that might be causing a memory corruption.
> It threw me off that the stack trace didn't contain anything related
> to drm.
>
> See: https://patchwork.freedesktop.org/patch/276719/
>

It may be useful for those who could reproduce this issue to turn on those
memory corruption debug options to narrow down a bit.

CONFIG_DEBUG_PAGEALLOC=y
CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
CONFIG_KASAN=y
CONFIG_KASAN_GENERIC=y
CONFIG_SLUB_DEBUG_ON=y