Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)

From: Andy Lutomirski
Date: Thu Mar 15 2018 - 20:38:46 EST


On Thu, Mar 15, 2018 at 9:02 PM, Arnd Bergmann <arnd@xxxxxxxx> wrote:
> On Thu, Mar 15, 2018 at 8:04 PM, Dominik Brodowski
> <linux@xxxxxxxxxxxxxxxxxxxx> wrote:
>> Here is a re-spin of the first set of patches which reduce the number of
>> syscall invocations from within the kernel; the RFC may be found at
>>
>> The rationale for this change is described in patch 1 as follows:
>>
>> The syscall entry points to the kernel defined by SYSCALL_DEFINEx()
>> and COMPAT_SYSCALL_DEFINEx() should only be called from userspace
>> through kernel entry points, but not from the kernel itself. This
>> will allow cleanups and optimizations to the entry paths *and* to
>> the parts of the kernel code which currently need to pretend to be
>> userspace in order to make use of syscalls.
>>
>> The whole series can be found at
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git syscalls-next
>>
>> and will be submitted for merging for the v4.17-rc1 cycle, probably together
>> with another batch of related patches I hope to send out tomorrow as a RFC.
>
> Nice work!
>
> I've already commented on a few patches that now have a kernel-internal
> helper function that takes a __user pointer. I think those are all only used
> in the early boot code (initramfs etc) that runs before we set_fs() to the
> user address space, but it also causes warnings with sparse. If we
> can change all of them to take kernel pointers, that would let us avoid
> the sparse warnings and start running with a normal user address space
> view. Unfortunately, some of the syscall seem to be harder to change to
> that than others, so not sure if it's worth the effort.

It would be fantastic to get rid of set_fs() entirely and make it
impossible for get_user(), etc to ever access kernel memory. And this
effort is necessary to ever achieve that.

I don't think this patch series should wait for any of these cleanups,
though. We need these patches to change the x86_64 internal syscall
function signature, which we've been wanting to do for a little while.