Re: [PATCH v3 2/7] RISC-V: Add a syscall for HW probing

From: Arnd Bergmann
Date: Thu Feb 23 2023 - 05:06:18 EST


On Tue, Feb 21, 2023, at 20:08, Evan Green wrote:
> We don't have enough space for these all in ELF_HWCAP{,2} and there's no
> system call that quite does this, so let's just provide an arch-specific
> one to probe for hardware capabilities. This currently just provides
> m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
> the future.
>
> Co-developed-by: Palmer Dabbelt <palmer@xxxxxxxxxxxx>
> Signed-off-by: Palmer Dabbelt <palmer@xxxxxxxxxxxx>
> Signed-off-by: Evan Green <evan@xxxxxxxxxxxx>

I'm still skeptical about the need for a custom syscall interface here.
I had not looked at the interface so far, but there are a few things
that stick out:

> +RISC-V Hardware Probing Interface
> +---------------------------------
> +
> +The RISC-V hardware probing interface is based around a single
> syscall, which
> +is defined in <asm/hwprobe.h>::
> +
> + struct riscv_hwprobe {
> + __s64 key;
> + __u64 value;
> + };

The way this is defined, the kernel will always have to know
about the specific set of features, it can't just forward
unknown features to user space after probing them from an
architectured hardware interface or from DT.

If 'key' is just an enumerated value with a small number of
possible values, I don't see anything wrong with using elf
aux data. I understand it's hard to know how many keys
might be needed in the long run, from the way you define
the key/value pairs here, I would expect it to have a lot
of the same limitations that the aux data has, except for
a few bytes to be copied.

> + long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t
> pair_count,
> + size_t cpu_count, cpu_set_t *cpus,
> + unsigned long flags);

The cpu set argument worries me more: there should never be a
need to optimize for broken hardware that has an asymmetric set
of features. Just let the kernel figure out the minimum set
of features that works across all CPUs and report that like we
do with HWCAP. If there is a SoC that is so broken that it has
important features on a subset of cores that some user might
actually want to rely on, then have them go through the slow
sysfs interface for probing the CPUs indidually, but don't make
the broken case easier at the expense of normal users that
run on working hardware.

> +asmlinkage long sys_riscv_hwprobe(uintptr_t, uintptr_t, uintptr_t,
> uintptr_t,
> + uintptr_t, uintptr_t);

Why 'uintptr_t' rather than the correct type?

Arnd