Re: kernel BUG at kernel/futex.c:679 on v4.13-rc3-ish on arm64

From: Mel Gorman
Date: Tue Aug 08 2017 - 11:32:40 EST


On Tue, Aug 08, 2017 at 11:52:05AM +0100, Mark Rutland wrote:
> Hi,
>
> As a heads-up, I hit the below splat when using Syzkaller to fuzz arm64
> VMAP_STACK patches [1] atop of v4.13-rc3. I haven't hit anything else
> major, and so far I haven't had any luck reproducing this, so it may be
> an existing issue that's difficult to hit.
>
> Note that while reported as a BUG(), it's actually the WARN_ON_ONCE()
> introduced in commit:
>
> 65d8fc777f6dcfee ("futex: Remove requirement for lock_page() in get_futex_key()")
>
> ... misreported as I accidentally throw away the flags in __BUG_FLAGS().
> Other than that, I believe BUG() and friends are working correctly.
>
> The Syzkaller log is huge (1.0M), so rather than attaching it, I've
> uploaded the log, report, and kernel config to:
>
> http://data.yaey.co.uk/bugs/20170808-futex-bug/
>
> I'll continue trying to reproduce and minimize this.
>
> ------------[ cut here ]------------
> kernel BUG at kernel/futex.c:679!

This corresponds to the warning

/*
* Take a reference unless it is about to be freed. Previously
* this reference was taken by ihold under the page lock
* pinning the inode in place so i_lock was unnecessary. The
* only way for this check to fail is if the inode was
* truncated in parallel so warn for now if this happens.
*
* We are not calling into get_futex_key_refs() in file-backed
* cases, therefore a successful atomic_inc return below will
* guarantee that get_futex_key() will still imply smp_mb(); (B).
*/
if (WARN_ON_ONCE(!atomic_inc_not_zero(&inode->i_count))) {
rcu_read_unlock();
put_page(page);

goto again;
}

The comment is pretty self-explanatory. The only situation I could think
of where it could happen is if a futex existed on a shared mapping that
was truncated during the operation. Why would an application truncate a
mapping with a key on it? As weird as it is, the situation is recoverable
which is what the code does but the warning was included in case I was
not imaginative enough.

Can you tell me if it's possible that syskaller when fuzz testing was
creating a shared mapping, creating a futex backed by the mapping and
truncating it? If so and that's what triggers the warning then I think it
would be reasonable to remove the warning as the source of the confusion
is userspace truncating a mapping with active keys on it.

If you manage to create a test case, then it would be nice to test without
that warning and see if it completes successfully or if there is other
fallout.

--
Mel Gorman
SUSE Labs