Re: [PATCH v2 2/5] cpufreq: intel_pstate: Always return last EPP value from sysfs

From: Srinivas Pandruvada
Date: Tue Aug 25 2020 - 11:06:25 EST


On Tue, 2020-08-25 at 16:51 +0200, Rafael J. Wysocki wrote:
> On Tue, Aug 25, 2020 at 8:20 AM Artem Bityutskiy <dedekind1@xxxxxxxxx
> > wrote:
> > On Mon, 2020-08-24 at 19:42 +0200, Rafael J. Wysocki wrote:
> > > From: "Rafael J. Wysocki" <rafael.j.wysocki@xxxxxxxxx>
> > >
> > > Make the energy_performance_preference policy attribute in sysfs
> > > always return the last EPP value written to it instead of the one
> > > currently in the HWP Request MSR to avoid possible confusion when
> > > the performance scaling algorithm is used in the active mode with
> > > HWP enabled (in which case the EPP is forced to 0 regardless of
> > > what value it has been set to via sysfs).
> >
> > Why is this a good idea, I wonder. If there was a prior discussion,
> > please, point to it.
> >
> > The general approach to changing settings via sysfs is often like
> > this:
> >
> > 1. Write new value.
> > 2. Read it back and verify that it is the same. Because there is no
> > better way to verify that the kernel "accepted" the value.
>
> If the write is successful (ie. no errors returned and the value
> returned is equal to the number of written characters), the kernel
> *has* accepted the written value, but it may not have taken effect.
> These are two different things.
>
> The written value may take an effect immediately or it may take an
> effect later, depending on the current configuration etc. If you
> don't see the effect of it immediately, it doesn't matter that there
> was a failure of some sort.
>
> > Let's say I write 'balanced' to energy_performance_preference. I
> > read
> > it back, and it contains 'balanced', so I am happy, I trust the
> > kernel
> > changed EPP to "balanced".
> >
> > If the kernel, in fact, uses something else, I want to know about
> > it
> > and have my script fail.
>
> Why do you want it to fail then?
>
> > Why caching the value and making my script _think_ it succeeded is
> > a good idea.
>
> Because when you change the scaling algorithm or the driver's
> operation mode, the value you have written will take effect.
>
> In this particular case it is explained in the driver documentation
> that the performance scaling algorithm in the active mode overrides
> the sysfs value and that's the only case when it can be overridden.
> So whatever you write to this attribute will not take effect
> immediately anyway, but it may take an effect later.

In some cases without even changing active/passive this is happening
when there was some error previously. For example:

#cat energy_performance_preference
127
[root@otcpl-perf-test-skx-i9 cpufreq]# rdmsr -p 1 0x774
8000ff00

I think we should show reality. In mode change can be a special case
and use the stored value to restore in new mode.

Thanks,
Srinivas

> > In other words, in my usage scenarios at list, I prefer kernel
> > telling
> > the true EPP value, not some "cached, but not used" value.
>
> An alternative is to fail writes to energy_performance_preference if
> the driver works in the active mode and the scaling algorithm for the
> scaling CPU is performance and *then* to make reads from it return
> the
> value in the register.
>
> Accepting a write and returning a different value in a subsequent
> read
> is confusing.
>
> Thanks!