Re: Signal handling in a page fault handler

From: Matthew Wilcox
Date: Wed Apr 04 2018 - 10:39:07 EST


On Wed, Apr 04, 2018 at 11:32:54AM +0200, Daniel Vetter wrote:
> So we've done some experiments for the case where the fault originated
> from kernel context (copy_to|from_user and friends). The fixup code seems
> to retry the copy once after the fault (in copy_user_handle_tail), if that
> fails again we get a short read/write. This might result in an -EFAULT,
> short read()/write() or anything else really, depending upon the syscall
> api.
>
> Except in some code paths in gpu drivers where we convert anything into
> -ERESTARTSYS/EINTR if there's a signal pending it won't ever result in the
> syscall getting restarted (well except maybe short read/writes if
> userspace bothers with that).
>
> So I guess gpu fault handlers indeed break the kernel's expectations, but
> then I think we're getting away with that because the inner workings of
> gpu memory objects is all heavily abstracted away by opengl/vulkan and
> friends.
>
> I guess what we could do is try to only do killable sleeps if it's a
> kernel fault, but that means wiring a flag through all the callchains. Not
> pretty. Except when there's a magic set of functions that would convert
> all interruptible sleeps to killable ones only for us.

I actually have plans to allow mutex_lock_{interruptible,killable} to
return -EWOULDBLOCK if a flag is set. So this doesn't seem entirely
unrelated. Something like this perhaps:

struct task_struct {
+ unsigned int sleep_state;
};

static noinline int __sched
-__mutex_lock_interruptible_slowpath(struct mutex *lock)
+__mutex_lock_slowpath(struct mutex *lock, long state)
{
- return __mutex_lock(lock, TASK_INTERRUPTIBLE, 0, NULL, _RET_IP_);
+ if (state == TASK_NOBLOCK)
+ return -EWOULDBLOCK;
+ return __mutex_lock(lock, state, 0, NULL, _RET_IP_);
}

+int __sched mutex_lock_state(struct mutex *lock, long state)
+{
+ might_sleep();
+
+ if (__mutex_trylock_fast(lock))
+ return 0;
+
+ return __mutex_lock_slowpath(lock, state);
+}
+EXPORT_SYMBOL(mutex_lock_state);

Then the page fault handler can do something like:

old_state = current->sleep_state;
current->sleep_state = TASK_INTERRUPTIBLE;
...
current->sleep_state = old_state;


This has the page-fault-in-a-signal-handler problem. I don't know if
there's a way to determine if we're already in a signal handler and use
a different sleep_state ...?