Re: possible deadlock in get_user_pages_unlocked

From: Eric Biggers
Date: Fri Feb 02 2018 - 00:35:23 EST


On Fri, Feb 02, 2018 at 04:50:20AM +0000, Al Viro wrote:
> On Thu, Feb 01, 2018 at 04:58:00PM -0800, syzbot wrote:
> > Hello,
> >
> > syzbot hit the following crash on upstream commit
> > 7109a04eae81c41ed529da9f3c48c3655ccea741 (Thu Feb 1 17:37:30 2018 +0000)
> > Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide
> >
> > So far this crash happened 2 times on upstream.
> > C reproducer is attached.
>
> Umm... How reproducible that is?
>
> > syzkaller reproducer is attached.
> > Raw console output is attached.
> > compiler: gcc (GCC) 7.1.1 20170620
> > .config is attached.
>
> Can't reproduce with gcc 5.4.1 (same .config, same C reproducer).
>
> It looks like __get_user_pages_locked() returning with *locked zeroed,
> but ->mmap_sem not dropped. I don't see what could've lead to it and
> attempts to reproduce had not succeeded so far...
>
> How long does it normally take for lockdep splat to trigger?
>

Try starting up multiple instances of the program; that sometimes helps with
these races that are hard to hit (since you may e.g. have a different number of
CPUs than syzbot used). If I start up 4 instances I see the lockdep splat after
around 2-5 seconds. This is on latest Linus tree (4bf772b1467). Also note the
reproducer uses KVM, so if you're running it in a VM it will only work if you've
enabled nested virtualization on the host (kvm_intel.nested=1).

Also it appears to go away if I revert ce53053ce378c21 ("kvm: switch
get_user_page_nowait() to get_user_pages_unlocked()").

- Eric