Re: [RFC 08/10] platform/x86/intel/ifs: Add IFS sysfs interface

From: Dan Williams
Date: Mon Mar 07 2022 - 15:25:48 EST


On Mon, Mar 7, 2022 at 11:55 AM Joseph, Jithu <jithu.joseph@xxxxxxxxx> wrote:
>
>
>
> On 3/7/2022 11:15 AM, Dan Williams wrote:
> > On Mon, Mar 7, 2022 at 11:09 AM Joseph, Jithu <jithu.joseph@xxxxxxxxx> wrote:
> >>
> >>
> >>
> >> On 3/7/2022 9:38 AM, Dan Williams wrote:
> >>> On Fri, Mar 4, 2022 at 12:42 PM Joseph, Jithu <jithu.joseph@xxxxxxxxx> wrote:
> >>>>
> >>>>
> >>>>
> >>
> >>>>>> + */
> >>>>>> +static ssize_t details_show(struct device *dev,
> >>>>>> + struct device_attribute *attr,
> >>>>>> + char *buf)
> >>>>>> +{
> >>>>>> + unsigned int cpu = dev->id;
> >>>>>> + int ret;
> >>>>>> +
> >>>>>> + if (down_trylock(&ifs_sem))
> >>>>>> + return -EBUSY;
> >>>>>
> >>>>> What is the ifs_sem protecting? This result is immediately invalid
> >>>>> after the lock is dropped anyway, so why hold it over reading the
> >>>>> value? You can't prevent 2 threads racing each other here.
> >>>>
> >>>> percpu thread running scan_test_worker() will update per_cpu(ifs_state, cpu).scan_details. (before signalling this thread to run, this lock would be acquired)
> >>>> This is to protect against the scenario where if the percpu thread is running a test and if at the same time a user is querying its status, they would see busy.
> >>>
> >>> That begs the question why would userspace be polling this file? Is it
> >>> because it does not know when a test completes otherwise? How does it
> >>> know that the result it is seeing is from the test it ran and not some
> >>> other invocation to start a new test?
> >>
> >> I think I should have explained the need for locking at a higher level .
> >> It is to make sure that only one of the below action happens at any time
> >>
> >> 1. single percpu test
> >> 2. all-cpu test (tests all cores sequentially)
> >> 3. scan binary reload
> >> 4. querying the status
> >>
> >> (There are h/w reasons for why we restrict to a single IFS test at any time)
> >> If 4 happens while 1 or 2 is in progress , we return busy. My earlier explanation was trying to explain why we are preventing 4 when 1 or 2 is in progress.
> >>
> >> As to the question of why would a user do 4 while 1 or 2 is in progress, there is no reason for him to do that, both the run_test (percpu and global) are blocking.
> >> But if he issues a test from one shell and uses another shell to query the status (while it is still in progress) he will see busy.
> >
> > ...and what about the race that the semaphore cannot solve of test run
> > launch racing collection of previous run results?
>
>
> pardon me if I am missing something obvious here … but everybody (the 4 scenarios listed above) tries to acquire the same semaphore, or returns -EBUSY if try_lock() fails.
> So in case of triggering "run_test" and viewing "status" racing scenario you mention, the below are the 2 interleaving I see
>
> if "echo 1 > /sys/devices/sysem/cpu/cpu10/ifs/run_test" gets the lock , the parallel "cat /sys/devices/sysem/cpu/cpu10/ifs/status" will return busy (and not the previous status)
> if "cat /sys/devices/sysem/cpu/cpu10/ifs/status", gets the lock it will display the status of the last test result and the parallel "echo 1 > /sys/devices/sysem/cpu/cpu10/ifs/run_test" will fail with busy
>
> This I believe is consistent behavior.

I am speaking of the state of the case where 2 threads are doing
run_test and polling for results. Unless you can guarantee that run2
does not start before the results of run1 have been collected then
they are lost in that scenario. No amount of kernel locking can
resolve that race to collect previous result which would not be a
problem in the first place if there was an atomic way to log test
results.