Re: [PATCH v2 2/6] RISC-V: Add a syscall for HW probing

From: Evan Green
Date: Thu Feb 09 2023 - 12:10:18 EST


On Mon, Feb 6, 2023 at 10:32 PM Conor Dooley <conor@xxxxxxxxxx> wrote:
>
> Hey Evan, Greg,
>
>
> On 7 February 2023 06:13:39 GMT, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> >On Mon, Feb 06, 2023 at 12:14:51PM -0800, Evan Green wrote:
> >> We don't have enough space for these all in ELF_HWCAP{,2} and there's no
> >> system call that quite does this, so let's just provide an arch-specific
> >> one to probe for hardware capabilities. This currently just provides
> >> m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
> >> the future.
> >
> >Ick, this is exactly what sysfs is designed to export in a sane way.
> >Why not just use that instead? The "key" would be the filename, and the
> >value the value read from the filename. If the key is not present, the
> >file is not present and it's obvious what is happening, no fancy parsing
> >and ABI issues at all.
>
> https://lore.kernel.org/linux-riscv/20221201160614.xpomlqq2fzpzfmcm@kamzik/
>
> This is the sysfs interface that I mentioned drew
> suggested on the v1.
> I think it fits ~perfectly with what Greg is suggesting too.

Whoops, I'll admit I missed that comment when I reviewed the feedback
from v1. I spent some time thinking about sysfs. The problem is this
interface will be needed in places like very early program startup. If
we're trying to use this in places like the ifunc selector to decide
which memcpy to use, having to go open and read a fistful of files is
going to be complex that early, and rough on performance.

Really this is data that would go great in the aux vector, except
there's probably too much of it to justify preparing and copying into
every new process. You could point the aux vector into a vDSO data
area. This has the advantage of great performance and no syscall, but
has the disadvantages of making that data ABI, and requiring it all to
be known up front (eg the kernel can't compute any answers on the
fly).

After discussions with Palmer, my plan for the next version is to move
this into a vDSO function plus a syscall. Private vDSO data will be
prepped with common answers for the "all CPUs" case, avoiding the need
for a syscall in most cases and making this fast. Since the data is
hidden behind the vdso function, it's not ABI, which is a plus. Then
the vdso function can fall back to the syscall for cases with exotic
CPU masks or keys that are unknown/expensive to compute at runtime.

-Evan