Re: [PATCH v2] PM: EM: Add inotify support when the energy model is updated.
From: Rafael J. Wysocki
Date: Fri May 09 2025 - 12:41:31 EST
On Fri, May 9, 2025 at 12:55 PM Lukasz Luba <lukasz.luba@xxxxxxx> wrote:
>
> Hi Changwoo,
>
> On 5/7/25 02:47, Changwoo Min wrote:
> > The sched_ext schedulers [1] currently access the energy model through the
> > debugfs to make energy-aware scheduling decisions [2]. The userspace part
> > of a sched_ext scheduler feeds the necessary (post-processed) energy-model
> > information to the BPF part of the scheduler.
> >
> > However, there is a limitation in the current debugfs support of the energy
> > model. When the energy model is updated (em_dev_update_perf_domain), there
> > is no way for the userspace part to know such changes (besides polling the
> > debugfs files).
> >
> > Therefore, add inotify support (IN_MODIFY) when the energy model is updated.
> > With this inotify support, the directory of an updated performance domain
> > (e.g., /sys/kernel/debug/energy_model/cpu0) and its parent directory (e.g.,
> > /sys/kernel/debug/energy_model) are inotified. Therefore, a sched_ext
> > scheduler (or any userspace application) monitors the energy model change
> > in userspace using the regular inotify interface.
> >
> > Note that accessing the energy model information from userspace has many
> > advantages over other alternatives, especially adding new BPF kfuncs. The
> > userspace has much more freedom than the BPF code (e.g., using external
> > libraries and floating point arithmetics), which may be infeasible (if not
> > impossible) in the BPF/kernel code.
> >
> > [1] https://lwn.net/Articles/922405/
> > [2] https://github.com/sched-ext/scx/pull/1624
> >
> > Signed-off-by: Changwoo Min <changwoo@xxxxxxxxxx>
> > ---
> >
> > ChangeLog v1 -> v2:
> > - Change em_debug_update() to only inotify the directory of an updated
> > performance domain (and its parent directory).
> > - Move the em_debug_update() call outside of the mutex lock.
> > - Update the commit message to clarify its motivation and what will be
> > inotified when updated.
> >
> > kernel/power/energy_model.c | 12 ++++++++++++
> > 1 file changed, 12 insertions(+)
> >
>
> I have discussed that with Rafael and we have similar view.
> The EM debugfs is not the right interface for this purpose.
>
> A better design and mechanism for your purpose would be the netlink
> notification. It is present in the kernel in thermal framework
> and e.g. is used by Intel HFI
> - drivers/thermal/intel/intel_hfi.c
> - drivers/thermal/thermal_netlink.c
> It's able to send to the user space the information from FW about
> the CPUs' efficiency changes, which is similar to this EM modification.
In addition, after this patch
https://lore.kernel.org/linux-pm/3637203.iIbC2pHGDl@xxxxxxxxxxxxx/
which is about to get into linux-next, em_dev_update_perf_domain()
will not be the only place where the Energy Model can be updated.
Thanks!