Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates

From: Mathieu Desnoyers
Date: Fri Aug 16 2019 - 13:19:24 EST


----- On Aug 16, 2019, at 12:25 PM, rostedt rostedt@xxxxxxxxxxx wrote:

> On Fri, 16 Aug 2019 10:26:43 -0400 Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>
[...]
>>
>> Also, write and read to/from those variables should be done with
>> WRITE_ONCE() and READ_ONCE(), given that those are read within tracing
>> probes without holding the sched_register_mutex.
>>
>
> I understand the READ_ONCE() but is the WRITE_ONCE() truly necessary?
> It's done while holding the mutex. It's not that critical of a path,
> and makes the code look ugly.

The update is done while holding the mutex, but the read-side does not
hold that mutex, so it can observe the intermediate state caused by
store-tearing or invented stores which can be generated by the compiler
on the update-side.

Please refer to the following LWN article:

https://lwn.net/Articles/793253/

Sections:
- "Store tearing"
- "Invented stores"

Arguably, based on that article, store tearing is only observed in the
wild for constants (which is not the case here), and invented stores
seem to require specific code patterns. But I wonder why we would ever want to
pair a fragile non-volatile store with a READ_ONCE() ? Considering the pain
associated to reproduce and hunt down this kind of issue in the wild, I would
be tempted to enforce that any READ_ONCE() operating on a variable would either
need to be paired with WRITE_ONCE() or with atomic operations, so those can
eventually be validated by static code checkers and code sanitizers.

If coding style is your only concern here, we may want to consider
introducing new macros in compiler.h:

WRITE_ONCE_INC(v) /* v++ */
WRITE_ONCE_DEC(v) /* v-- */
WRITE_ONCE_ADD(v, count) /* v += count */
WRITE_ONCE_SUB(v, count) /* v -= count */

Thanks,

Mathieu

>
> -- Steve
>
>
>
>> [ Compile-tested only. I suspect it might fix the following syzbot
>> report:
>>
>> syzbot+774fddf07b7ab29a1e55@xxxxxxxxxxxxxxxxxxxxxxxxx ]
>>
>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
>> CC: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
>> CC: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>> CC: Steven Rostedt (VMware) <rostedt@xxxxxxxxxxx>
>> CC: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> CC: Paul E. McKenney <paulmck@xxxxxxxxxxxxx>
>> ---
>> kernel/trace/trace_sched_switch.c | 32 ++++++++++++++++++++++----------
>> 1 file changed, 22 insertions(+), 10 deletions(-)
>>
>> diff --git a/kernel/trace/trace_sched_switch.c
>> b/kernel/trace/trace_sched_switch.c
>> index e288168661e1..902e8bf59aeb 100644
>> --- a/kernel/trace/trace_sched_switch.c
>> +++ b/kernel/trace/trace_sched_switch.c
>> @@ -26,8 +26,8 @@ probe_sched_switch(void *ignore, bool preempt,
>> {
>> int flags;
>>
>> - flags = (RECORD_TGID * !!sched_tgid_ref) +
>> - (RECORD_CMDLINE * !!sched_cmdline_ref);
>> + flags = (RECORD_TGID * !!READ_ONCE(sched_tgid_ref)) +
>> + (RECORD_CMDLINE * !!READ_ONCE(sched_cmdline_ref));
>>
>> if (!flags)
>> return;
>> @@ -39,8 +39,8 @@ probe_sched_wakeup(void *ignore, struct task_struct *wakee)
>> {
>> int flags;
>>
>> - flags = (RECORD_TGID * !!sched_tgid_ref) +
>> - (RECORD_CMDLINE * !!sched_cmdline_ref);
>> + flags = (RECORD_TGID * !!READ_ONCE(sched_tgid_ref)) +
>> + (RECORD_CMDLINE * !!READ_ONCE(sched_cmdline_ref));
>>
>> if (!flags)
>> return;
>> @@ -89,21 +89,28 @@ static void tracing_sched_unregister(void)
>>
>> static void tracing_start_sched_switch(int ops)
>> {
>> - bool sched_register = (!sched_cmdline_ref && !sched_tgid_ref);
>> + bool sched_register;
>> +
>> mutex_lock(&sched_register_mutex);
>> + sched_register = (!sched_cmdline_ref && !sched_tgid_ref);
>>
>> switch (ops) {
>> case RECORD_CMDLINE:
>> - sched_cmdline_ref++;
>> + WRITE_ONCE(sched_cmdline_ref, sched_cmdline_ref + 1);
>> break;
>>
>> case RECORD_TGID:
>> - sched_tgid_ref++;
>> + WRITE_ONCE(sched_tgid_ref, sched_tgid_ref + 1);
>> break;
>> +
>> + default:
>> + WARN_ONCE(1, "Unsupported tracing op: %d", ops);
>> + goto end;
>> }
>>
>> - if (sched_register && (sched_cmdline_ref || sched_tgid_ref))
>> + if (sched_register)
>> tracing_sched_register();
>> +end:
>> mutex_unlock(&sched_register_mutex);
>> }
>>
>> @@ -113,16 +120,21 @@ static void tracing_stop_sched_switch(int ops)
>>
>> switch (ops) {
>> case RECORD_CMDLINE:
>> - sched_cmdline_ref--;
>> + WRITE_ONCE(sched_cmdline_ref, sched_cmdline_ref - 1);
>> break;
>>
>> case RECORD_TGID:
>> - sched_tgid_ref--;
>> + WRITE_ONCE(sched_tgid_ref, sched_tgid_ref - 1);
>> break;
>> +
>> + default:
>> + WARN_ONCE(1, "Unsupported tracing op: %d", ops);
>> + goto end;
>> }
>>
>> if (!sched_cmdline_ref && !sched_tgid_ref)
>> tracing_sched_unregister();
>> +end:
>> mutex_unlock(&sched_register_mutex);
>> }

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com