Re: [PATCH v2 2/6] RISC-V: Add a syscall for HW probing

From: Greg KH
Date: Fri Feb 10 2023 - 01:48:30 EST


On Thu, Feb 09, 2023 at 05:22:09PM +0000, Jessica Clarke wrote:
> On 9 Feb 2023, at 17:13, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> > On Thu, Feb 09, 2023 at 09:09:16AM -0800, Evan Green wrote:
> >> On Mon, Feb 6, 2023 at 10:32 PM Conor Dooley <conor@xxxxxxxxxx> wrote:
> >>>
> >>> Hey Evan, Greg,
> >>>
> >>>
> >>> On 7 February 2023 06:13:39 GMT, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> >>>> On Mon, Feb 06, 2023 at 12:14:51PM -0800, Evan Green wrote:
> >>>>> We don't have enough space for these all in ELF_HWCAP{,2} and there's no
> >>>>> system call that quite does this, so let's just provide an arch-specific
> >>>>> one to probe for hardware capabilities. This currently just provides
> >>>>> m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
> >>>>> the future.
> >>>>
> >>>> Ick, this is exactly what sysfs is designed to export in a sane way.
> >>>> Why not just use that instead? The "key" would be the filename, and the
> >>>> value the value read from the filename. If the key is not present, the
> >>>> file is not present and it's obvious what is happening, no fancy parsing
> >>>> and ABI issues at all.
> >>>
> >>> https://lore.kernel.org/linux-riscv/20221201160614.xpomlqq2fzpzfmcm@kamzik/
> >>>
> >>> This is the sysfs interface that I mentioned drew
> >>> suggested on the v1.
> >>> I think it fits ~perfectly with what Greg is suggesting too.
> >>
> >> Whoops, I'll admit I missed that comment when I reviewed the feedback
> >> from v1. I spent some time thinking about sysfs. The problem is this
> >> interface will be needed in places like very early program startup. If
> >> we're trying to use this in places like the ifunc selector to decide
> >> which memcpy to use, having to go open and read a fistful of files is
> >> going to be complex that early, and rough on performance.
> >
> > How is it going to be any different on "performance" than a syscall? Or
> > complex? It should be almost identical overall as this is all in-ram
> > and not any real I/o is happening. You are limited only by the speed of
> > your cpu.
> >
> >> Really this is data that would go great in the aux vector, except
> >> there's probably too much of it to justify preparing and copying into
> >> every new process. You could point the aux vector into a vDSO data
> >> area. This has the advantage of great performance and no syscall, but
> >> has the disadvantages of making that data ABI, and requiring it all to
> >> be known up front (eg the kernel can't compute any answers on the
> >> fly).
> >>
> >> After discussions with Palmer, my plan for the next version is to move
> >> this into a vDSO function plus a syscall. Private vDSO data will be
> >> prepped with common answers for the "all CPUs" case, avoiding the need
> >> for a syscall in most cases and making this fast. Since the data is
> >> hidden behind the vdso function, it's not ABI, which is a plus. Then
> >> the vdso function can fall back to the syscall for cases with exotic
> >> CPU masks or keys that are unknown/expensive to compute at runtime.
> >
> > I still think that's wrong, as you are wanting a set of key/values here,
> > which is exactly what sysfs is designed for.
>
> But this needs to be a RISC-V standard interface that can be programmed
> against, not something tied to highly Linux-specific things like sysfs.
> You’re free to implement that interface with sysfs, but exposing that
> as *the* interface to use would be terrible for portability.

A vdso and a new kernel syscall is also a highly Linux-specific thing,
so I do not understand the objection here at all. You're going to have
to wrap all of this up in some sort of common userspace library code
anyway, and that will have to handle all of the different operating
system implementations.

Also, frankly, I don't care about non-Linux implementations, so that
isn't a valid argument here :)

thanks,

greg k-h