Re: [PATCH] x86/split_lock: fix delayed detection enabling

From: Guilherme G. Piccoli
Date: Sun Apr 21 2024 - 14:28:02 EST


On 19/04/2024 08:26, Maksim Davydov wrote:
>
> [...]
> Some information that should be taken into account:
> * sld_update_msr() enables/disables SLD on both CPUs on the same core
> * schedule_delayed_work_on() internally checks WORK_STRUCT_PENDING_BIT.
> If a work has the 'pending' status, then schedule_delayed_work_on()
> will return an error code and, most importantly, the work will not
> be placed in the workqueue.
>
> Let's say we have a multicore system on which split_lock_mitigate=0 and
> a multithreaded application is running that calls splitlock in multiple
> threads. Due to the fact that sld_update_msr() affects the entire core
> (both CPUs), we will consider 2 CPUs from different cores. Let the 2
> threads of this application schedule to CPU0 (core 0) and to CPU 2
> (core 1), then:
>
> | || |
> | CPU 0 (core 0) || CPU 2 (core 1) |
> |_________________________________||___________________________________|
> | || |
> | 1) SPLIT LOCK occured || |
> | || |
> | 2) split_lock_warn() || |
> | || |
> | 3) sysctl_sld_mitigate == 0 || |
> | (work = &sl_reenable) || |
> | || |
> | 4) schedule_delayed_work_on() || |
> | (reenable will be called || |
> | after 2 jiffies on CPU 0) || |
> | || |
> | 5) disable SLD for core 0 || |
> | || |
> | ------------------------- || |
> | || |
> | || 6) SPLIT LOCK occured |
> | || |
> | || 7) split_lock_warn() |
> | || |
> | || 8) sysctl_sld_mitigate == 0 |
> | || (work = &sl_reenable, |
> | || the same address as in 3) ) |
> | || |
> | 2 jiffies || 9) schedule_delayed_work_on() |
> | || fials because the work is in |
>
> | || the pending state since 4). |
> | || The work wasn't placed to the |
> | || workqueue. reenable won't be |
> | || called on CPU 2 |
> | || |
> | || 10) disable SLD for core 0 |
> | || |
> | || From now on SLD will |
> | || never be reenabled on core 1 |
> | || |
> | ------------------------- || |
> | || |
> | 11) enable SLD for core 0 by || |
> | __split_lock_reenable || |
> | || |
>
>
> If the application threads can be scheduled to all processor cores,
> then over time there will be only one core left, on which SLD will be
> enabled and split lock will be able to be detected; and on all other
> cores SLD will be disabled all the time.
> Most likely, this bug has not been noticed for so long because
> sysctl_sld_mitigate default value is 1, and in this case a semaphore
> is used that does not allow 2 different cores to have SLD disabled at
> the same time, that is, strictly only one work is placed in the
> workqueue.
>

Hi Maksim, this is awesome! Thanks a lot for the diagram, super clear now.

Well, I think you nailed it and we should get the patch merged, right?
I'm not sure if the diagram should be included or not in the commit
message - it's good but big, maybe include a lore archive mentioning the
diagram in a V2?

Cheers,


Guilherme