Re: [RFC PATCH for 5.8 3/4] rseq: Introduce RSEQ_FLAG_RELIABLE_CPU_ID

From: Mathieu Desnoyers
Date: Tue Jul 07 2020 - 06:48:19 EST


----- On Jul 7, 2020, at 3:29 AM, Florian Weimer fw@xxxxxxxxxxxxx wrote:

> * Mathieu Desnoyers:
>
>> commit 93b585c08d16 ("Fix: sched: unreliable rseq cpu_id for new tasks")
>> addresses an issue with cpu_id field of newly created processes. Expose
>> a flag which can be used by user-space to query whether the kernel
>> implements this fix.
>>
>> Considering that this issue can cause corruption of user-space per-cpu
>> data updated with rseq, it is recommended that user-space detects
>> availability of this fix by using the RSEQ_FLAG_RELIABLE_CPU_ID flag
>> either combined with registration or on its own before using rseq.
>
> Presumably, the intent is that glibc uses RSEQ_FLAG_RELIABLE_CPU_ID to
> register the rseq area. That will surely prevent glibc itself from
> activating rseq on broken kernels. But if another rseq library
> performs registration and has not been updated to use
> RSEQ_FLAG_RELIABLE_CPU_ID, we still end up with an active rseq area
> (and incorrect CPU IDs from sched_getcpu in glibc). So further glibc
> changes will be needed. I suppose we could block third-party rseq
> registration with a registration of a hidden rseq area (not
> __rseq_abi). But then the question is if any of the third-party rseq
> users are expecting the EINVAL error code from their failed
> registration.
>
> The rseq registration state machine is quite tricky already, and the
> need to use RSEQ_FLAG_RELIABLE_CPU_ID would make it even more
> complicated. Even if we implemented all the changes, it's all going
> to be essentially dead, untestable code in a few months, when the
> broken kernels are out of circulation. It does not appear to be good
> investment to me.

Those are very good points. One possibility we have would be to let
glibc do the rseq registration without the RSEQ_FLAG_RELIABLE_CPU_ID
flag. On kernels with the bug present, the cpu_id field is still good
enough for typical uses of sched_getcpu() which does not appear to
have a very strict correctness requirement on returning the right
cpu number.

Then libraries and applications which require a reliable cpu_id field
could check this on their own by calling rseq with the
RSEQ_FLAG_RELIABLE_CPU_ID flag. This would not make the state more
complex in __rseq_abi, and let each rseq user decide about its own fate:
whether it uses rseq or keeps using an rseq-free fallback.

I am still tempted to allow combining RSEQ_FLAG_REGISTER | RSEQ_FLAG_RELIABLE_CPU_ID
for applications which would not be using glibc, and want to check this flag on
thread registration.

Thoughts ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com