Re: get_user_pages returning 0 (was Re: kernel BUG at drivers/vhost/vhost.c:LINE!)

From: Dmitry Vyukov
Date: Mon Mar 19 2018 - 15:52:22 EST


On Mon, Mar 19, 2018 at 4:29 PM, David Sterba <dsterba@xxxxxxx> wrote:
> On Mon, Mar 19, 2018 at 05:09:28PM +0200, Michael S. Tsirkin wrote:
>> Hello!
>> The following code triggered by syzbot
>>
>> r = get_user_pages_fast(log, 1, 1, &page);
>> if (r < 0)
>> return r;
>> BUG_ON(r != 1);
>>
>> Just looking at get_user_pages_fast's documentation this seems
>> impossible - it is supposed to only ever return # of pages
>> pinned or errno.
>>
>> However, poking at code, I see at least one path that might cause this:
>>
>> ret = faultin_page(tsk, vma, start, &foll_flags,
>> nonblocking);
>> switch (ret) {
>> case 0:
>> goto retry;
>> case -EFAULT:
>> case -ENOMEM:
>> case -EHWPOISON:
>> return i ? i : ret;
>> case -EBUSY:
>> return i;
>>
>> which originally comes from:
>>
>> commit 53a7706d5ed8f1a53ba062b318773160cc476dde
>> Author: Michel Lespinasse <walken@xxxxxxxxxx>
>> Date: Thu Jan 13 15:46:14 2011 -0800
>>
>> mlock: do not hold mmap_sem for extended periods of time
>>
>> __get_user_pages gets a new 'nonblocking' parameter to signal that the
>> caller is prepared to re-acquire mmap_sem and retry the operation if
>> needed. This is used to split off long operations if they are going to
>> block on a disk transfer, or when we detect contention on the mmap_sem.
>>
>> [akpm@xxxxxxxxxxxxxxxxxxxx: remove ref to rwsem_is_contended()]
>> Signed-off-by: Michel Lespinasse <walken@xxxxxxxxxx>
>> Cc: Hugh Dickins <hughd@xxxxxxxxxx>
>> Cc: Rik van Riel <riel@xxxxxxxxxx>
>> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>> Cc: Nick Piggin <npiggin@xxxxxxxxx>
>> Cc: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx>
>> Cc: Ingo Molnar <mingo@xxxxxxx>
>> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> Cc: David Howells <dhowells@xxxxxxxxxx>
>> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>>
>> I started looking into this, if anyone has any feedback meanwhile,
>> that would be appreciated.
>>
>> In particular I don't really see why would this trigger
>> on commit 8f5fd927c3a7576d57248a2d7a0861c3f2795973:
>>
>> Merge: 8757ae2 093e037
>> Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>> Date: Fri Mar 16 13:37:42 2018 -0700
>>
>> Merge tag 'for-4.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
>>
>> is btrfs used on these systems?
>
> There were 3 patches pulled by that tag, none of them is even remotely
> related to the reported bug, AFAICS. If there's some impact, it must be
> indirect, obvious bugs like NULL pointer would exhibit in a different
> way and leave at least some trace in the stacks.

That is just a commit on which the bug was hit. It's provided so that
developers can make sense out of line numbers and check if the tree
includes/not includes a particular commit, etc. It's not that that
commit introduced the bug.