Re: [RFC PATCH v7 1/7] Restartable sequences system call

From: Andy Lutomirski
Date: Thu Aug 11 2016 - 03:23:48 EST


On Aug 11, 2016 12:01 AM, "Mathieu Desnoyers"
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>
> ----- On Aug 10, 2016, at 4:09 PM, Andy Lutomirski luto@xxxxxxxxxxxxxx wrote:
>
> > On Wed, Aug 10, 2016 at 1:06 PM, Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>
> <snip>
>
> >>> u64 is a perfectly valid, if odd, userspace pointer on all
> >>> architecures that I know of, and it's certainly a valid userspace
> >>> pointer on x86 32-bit userspace (the high bits will just all be zero).
> >>> Can you just use u64?
> >>
> >> My concern is about a 32-bit user-space putting garbage rather than zeroes
> >> (on purpose) to fool the kernel on those upper 32 bits. Doing
> >>
> >> compat_ptr((compat_uptr_t)rseq_cs.start_ip)
> >>
> >> effectively ends up clearing the upper 32 bits.
> >>
> >> But since we only use those pointer values for comparisons, perhaps we
> >> just don't care if a 32-bit userspace app try to shoot itself in
> >> the foot by passing garbage upper 32 bits ?
> >>
> >
> > How is garbage in the high bits any different than garbage in any
> > other bits in there?
>
> It's not :)
>
> >
> >>
> >>> If this would be a performance problem on ARM, then maybe that's a
> >>> reason to use compat helpers.
> >>
> >> We already use 64-bit values for the pointers, even on 32-bit. Normally
> >> userspace just puts zeroes in the top bits. It's mostly a question of
> >> clearing the top 32 bits or not when loading them in the kernel. If we
> >> don't need to, then I can remove the compat code entirely, and we don't
> >> care about user_64bit_mode() anymore, as you initially recommended.
> >> Does it make sense ?
> >
> > Yes, I think so. I'd suggest just honoring all the bits.
>
> OK, will do !
>
> >
> >>
> >>>
> >>>>
> >>>>>
> >>>>>
> >>>>>>>> +SYSCALL_DEFINE2(rseq, struct rseq __user *, rseq, int, flags)
> >>>>>>>> +{
> >>>>>>>> + if (unlikely(flags))
> >>>>>>>> + return -EINVAL;
> >>>>>>>
> >>>>>>> (add whitespace)
> >>>>>>
> >>>>>> fixed.
> >>>>>>
> >>>>>>>
> >>>>>>>> + if (!rseq) {
> >>>>>>>> + if (!current->rseq)
> >>>>>>>> + return -ENOENT;
> >>>>>>>> + return 0;
> >>>>>>>> + }
> >>>>>
> >>>>> This looks entirely wrong. Setting rseq to NULL fails if it's already
> >>>>> NULL but silently does nothing if rseq is already set? Surely it
> >>>>> should always succeed and it should actually do something if rseq is
> >>>>> set.
> >>>>
> >>>> From the proposed rseq(2) manpage:
> >>>>
> >>>> "A NULL rseq value can be used to check whether rseq is registered
> >>>> for the current thread."
> >>>>
> >>>> The implementation does just that: it returns -1, errno=ENOENT if no
> >>>> rseq is currently registered, or 0 if rseq is currently registered.
> >>>
> >>> I think that's problematic. Why can't you unregister an existing
> >>> rseq? If you can't, how is a thread supposed to clean up after
> >>> itself?
> >>>
> >>
> >> Unregistering an existing thread rseq would require that we keep reference
> >> counting, in case multiple libs and/or the app are using rseq. I am
> >> trying to keep things as simple as needed.
> >>
> >> If I understand your concern, the problematic scenario would be at
> >> thread exit (this is my current approximate understanding of glibc
> >> handling of library TLS variable reclaim at thread exit):
> >>
> >> thread exits in userspace:
> >> - glibc frees its rseq TLS memory area (in case the TLS is in a library),
> >> - thread preempted before really exiting,
> >> - kernel reads/writes to freed TLS memory.
> >> - corruption may occur (e.g. memory re-allocated by another thread already)
> >>
> >> Am I getting it right ?
> >
> > Yes.
>
> Hrm, then we should:
>
> - add a rseq_refcount field to the task struct,
> - increment this refcount whenever rseq receives a registration, after
> ensuring that we are registering the same address as was previously
> requested by preceding registrations for the thread (except if the
> refcount was 0),
> - When rseq receives a NULL address, decrement refcount. Set address to
> NULL when it reaches 0.
>
> Doing the refcounting in kernel-space rather than user-space allows us to
> keep both registration/unregistration and refcount atomic, which simplify
> things if we plan to use rseq from signal handlers.
>
> With current glibc, a library that would lazily register and use rseq
> without knowledge of the application would then have to use pthread_key_create()
> to set a destr_function to run at thread exit, which would take care of
> unregistration.

That sounds reasonable at first glance.

>
> We could add a RSEQ_FORCE_UNREGISTER flag to rseq flags to allow future
> glibc versions to force unregistering rseq before freeing its TLS memory,
> just in case a userspace library omits to unregister itself.

Sounds good too.