Re: [PATCH] ARC: Improve cmpxchng syscall implementation

From: Alexey Brodkin
Date: Tue Jun 19 2018 - 03:58:26 EST


Hi Vineet,

On Wed, 2018-04-18 at 11:16 -0700, Vineet Gupta wrote:
> On 03/21/2018 04:54 AM, Alexey Brodkin wrote:
> > /*
> > > > * This is only for old cores lacking LLOCK/SCOND, which by defintion
> > > > @@ -60,23 +62,48 @@ SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new)
> > > > /* Z indicates to userspace if operation succeded */
> > > > regs->status32 &= ~STATUS_Z_MASK;
> > > >
> > > > - if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int)))
> > > > - return -EFAULT;
> > > > + ret = access_ok(VERIFY_WRITE, uaddr, sizeof(*uaddr));
> > > > + if (!ret)
> > > > + goto fail;
> > > >
> > > > +again:
> > > > preempt_disable();
> > > >
> > > > - if (__get_user(uval, uaddr))
> > > > - goto done;
> > > > -
> > > > - if (uval == expected) {
> > > > - if (!__put_user(new, uaddr))
> > > > + ret = __get_user(val, uaddr);
> > > > + if (ret == -EFAULT) {
> > >
> > > Lets see if this warrants adding complexity ! This implies that TLB entry with
> > > Read permissions didn't exist for reading the var and page fault handler could not
> > > wire up even a zero page due to preempt_disable, meaning it was something not
> > > touched by userspace already - sort of uninitialized variable in user code.
> >
> > Ok I completely missed the fact that fast path TLB miss handler is being
> > executed even if we have preemption disabled. So given the mapping exist
> > we do not need to retry with enabled preemption.
> >
> > Still maybe I'm a bit paranoid here but IMHO it's good to be ready for a corner-case
> > when the pointer is completely bogus and there's no mapping for him.
> > I understand that today we only expect this syscall to be used from libc's
> > internals but as long as syscall exists nobody stops anybody from using it
> > directly without libc. So maybe instead of doing get_user_pages_fast() just
> > send a SIGSEGV to the process? At least user will realize there's some problem
> > at earlier stage.
>
> if the pointer is bogus, we currently return -EFAULT, is that not enough ! I'm
> fine if u want to change that to segv.

Ok good.

> > > Otherwise it is extremely unlikely to start with a TLB entry with Read
> > > permissions, followed by syscall Trap only to find the entry missing, unless a
> > > global TLB flush came from other cores, right in the middle. But this syscall is
> > > not guaranteed to work with SMP anyways, so lets ignore any SMP misdoings here.
> >
> > Well but that's exactly the situation I was debugging: we start from data from read-only
> > page and on attempt to write back modified value COW machinery gets involved.
>
> No exactly your situation. In your case the TLB entry *did* exist with Read
> permission. What I was pointing to is that case where it woudl vanish between user
> reading the backing page and making a syscall !

Probably I'm missing something here. Indeed there's already TLB entry with READ permission
and we need ProtV exception to happen to update this entry such that it becomes READ-WRITE
enabled. And in its turn for ProtV exception to happen we need to enable preemption and
execute __put_user().

> > > Now in case it was *an* uninitialized var, do we have to guarantee any well
> > > defined semantics for the kernel emulation of cmpxchg ? IMO it should be fine to
> > > return 0 or -EFAULT etc. Infact -EFAULT is better as it will force a retry loop on
> > > user side, given the typical cmpxchg usage pattern.
> >
> > The problem is libc only expects to get a value read from memory.
> > And in theory expected value might be -14 which is basically -EFAULT.
> > I'm not talking about 0 at all because in some cases that's exactly what
> > user-space expects.
> >
> > So if we read unexpected value then we'll just return it without even attempting
> > to write.
> >
> > If we read expected data but fail to write then we'll send a SIGSEGV and
> > return whatever... let it be -EFAULT - anyways the app will be killed on exit from
> > this syscall.
>
> I'm not sure what you mean here. I'm fine with adding segv kill semantics, but
> don't think complexity for get_user is worth it !

This complexity adds predictability - if we cannot read or write data we kill the app as if
our normal LD/ST fails fro whatever reason.

-Alexey