Re: [PATCH v2] PM: EM: Add inotify support when the energy model is updated.

From: Changwoo Min
Date: Sat May 10 2025 - 00:41:03 EST


Hi Lukasz and Rafael,

Thank you for the pointers and guidance.

On 5/9/25 19:55, Lukasz Luba wrote:
Hi Changwoo,

On 5/7/25 02:47, Changwoo Min wrote:
The sched_ext schedulers [1] currently access the energy model through the
debugfs to make energy-aware scheduling decisions [2]. The userspace part
of a sched_ext scheduler feeds the necessary (post-processed) energy- model
information to the BPF part of the scheduler.

However, there is a limitation in the current debugfs support of the energy
model. When the energy model is updated (em_dev_update_perf_domain), there
is no way for the userspace part to know such changes (besides polling the
debugfs files).

Therefore, add inotify support (IN_MODIFY) when the energy model is updated.
With this inotify support, the directory of an updated performance domain
(e.g., /sys/kernel/debug/energy_model/cpu0) and its parent directory (e.g.,
/sys/kernel/debug/energy_model) are inotified. Therefore, a sched_ext
scheduler (or any userspace application) monitors the energy model change
in userspace using the regular inotify interface.

Note that accessing the energy model information from userspace has many
advantages over other alternatives, especially adding new BPF kfuncs. The
userspace has much more freedom than the BPF code (e.g., using external
libraries and floating point arithmetics), which may be infeasible (if not
impossible) in the BPF/kernel code.

[1] https://lwn.net/Articles/922405/
[2] https://github.com/sched-ext/scx/pull/1624

Signed-off-by: Changwoo Min <changwoo@xxxxxxxxxx>
---

ChangeLog v1 -> v2:
   - Change em_debug_update() to only inotify the directory of an updated
     performance domain (and its parent directory).
   - Move the em_debug_update() call outside of the mutex lock.
   - Update the commit message to clarify its motivation and what will be
     inotified when updated.

  kernel/power/energy_model.c | 12 ++++++++++++
  1 file changed, 12 insertions(+)


I have discussed that with Rafael and we have similar view.
The EM debugfs is not the right interface for this purpose.

A better design and mechanism for your purpose would be the netlink
notification. It is present in the kernel in thermal framework
and e.g. is used by Intel HFI
- drivers/thermal/intel/intel_hfi.c
- drivers/thermal/thermal_netlink.c
It's able to send to the user space the information from FW about
the CPUs' efficiency changes, which is similar to this EM modification.


I have considered netlink before. However, I chose the debugfs-inotify
path since it requires fewer changes.

However, if the netlink interface is better for this purpose (I agree
*debugfs* is not ideal), sure let's go with that direction.


Would you be interested in writing similar mechanism in the EM fwk?

Sure, I will work on it and send another patch set.


Regards,
Lukasz

_______________________________________________
Kernel-dev mailing list -- kernel-dev@xxxxxxxxxx
To unsubscribe send an email to kernel-dev-leave@xxxxxxxxxx