Re: WARNING in __static_key_slow_dec

From: Willem de Bruijn
Date: Sun May 20 2018 - 18:10:29 EST


On Fri, May 18, 2018 at 4:30 PM, Willem de Bruijn
<willemdebruijn.kernel@xxxxxxxxx> wrote:
> On Fri, May 18, 2018 at 4:03 AM, DaeRyong Jeong <threeearcat@xxxxxxxxx> wrote:
>> We report the crash: WARNING in __static_key_slow_dec
>>
>> This crash has been found in v4.8 using RaceFuzzer (a modified
>> version of Syzkaller), which we describe more at the end of this
>> report.
>> Even though v4.8 is the relatively old version, we did manual verification
>> and we think the bug still exists.
>> Our analysis shows that the race occurs when invoking two syscalls
>> concurrently, setsockopt() with optname SO_TIMESTAMPING and ioctl() with
>> cmd SIOCGSTAMPNS.
>>
>>
>> Diagnosis:
>> We think if timestamp was previously enabled with
>> SOCK_TIMESTAMPING_RX_SOFTWARE flag, the concurrent execution of
>> sock_disable_timestamp() and sock_enable_timestamp() causes the crash.
>>
>>
>> Thread interleaving:
>> (Assume sk->flag has the SOCK_TIMESTAMPING_RX_SOFTWARE flag by the
>> previous setsockopt() call with SO_TIMESTAMPING)
>>
>> CPU0 (sock_disable_timestamp()) CPU1 (sock_enable_timestamp())
>> ===== =====
>> (flag == 1UL << SOCK_TIMESTAMPING_RX_SOFTWARE) (flag == SOCK_TIMESTAMP)
>>
>> if (!sock_flag(sk, flag)) {
>> unsigned long previous_flags = sk->sk_flags;
>>
>> if (sk->sk_flags & flags) {
>> sk->sk_flags &= ~flags;
>> if (sock_needs_netstamp(sk) &&
>> !(sk->sk_flags & SK_FLAGS_TIMESTAMP))
>> net_disable_timestamp();
>> sock_set_flag(sk, flag);
>>
>> if (sock_needs_netstamp(sk) &&
>> !(previous_flags & SK_FLAGS_TIMESTAMP))
>> net_enable_timestamp();
>> /* Here, net_enable_timestamp() is not called because
>> * previous_flags has the SOCK_TIMESTAMPING_RX_SOFTWARE
>> * flag
>> */
>> /* After the race, sk->sk has the flag SOCK_TIMESTAMP, but
>> * net_enable_timestamp() is not called one more time.
>> * Consequently, when the socket is closed, __sk_destruct()
>> * calls net_disable_timestamp() that leads WARNING.
>> */
>
> Thanks for the detailed analysis.
>
> Indeed the updates to sk->sk_flags and calls to net_(dis|en)able_timestamp
> should happen atomically, but this is not the case. The setsockopt
> path holds the socket lock, but not all ioctl paths.
>
> Perhaps we can take lock_sock_fast in sock_get_timestamp and
> variants.

Some callers of sock_get_timestamp already hold the socket lock,
e.g., ax25_ioctl, so that is out.

There is some known non-determinism in this path. Callers of
sock_get_timestamp do not necessarily expect a valid sk_stamp
when they enable the timestamp, so that function can continue
to test sk_flags lockless.

net_enable_timestamp enables timestamping using a static_branch
and possibly a workqueue, so already does not complete synchronously
in the sock_enable_timestamp call.

The only requirement is that updates to sk_flags do not race. This
should be solvable with cmpxchg. The situation is slightly complicated
because sk_flags has two bits that may toggle timestamping. Only the
first bit set must trigger a call to net_enable_timestamp and only the
last bit cleared must call net_disable_timestamp.

Something like

-static bool sock_needs_netstamp(const struct sock *sk)
+static bool sock_needs_netstamp(const struct sock *sk, unsigned long flags)
{
switch (sk->sk_family) {
case AF_UNSPEC:
case AF_UNIX:
return false;
default:
- return true;
+ return (flags & SK_FLAGS_TIMESTAMP);
}
}

-static void sock_disable_timestamp(struct sock *sk, unsigned long flags)
+static void sock_disable_timestamp(struct sock *sk, unsigned long flag)
{
- if (sk->sk_flags & flags) {
- sk->sk_flags &= ~flags;
- if (sock_needs_netstamp(sk) &&
- !(sk->sk_flags & SK_FLAGS_TIMESTAMP))
- net_disable_timestamp();
- }
+ unsigned long prev;
+
+ do {
+ prev = READ_ONCE(sk->sk_flags);
+
+ if (!(prev & flag))
+ return;
+
+ if (cmpxchg(&sk->sk_flags, prev, prev & ~flag) == prev)
+ break;
+ } while (1);
+
+ /* disable only if this operation removed the last tstamp flag */
+ if (!sock_needs_netstamp(sk, prev & ~flag))
+ net_disable_timestamp();
}

and analogous for enable.