Re: [PATCH v5 5/9] rv: Retry when da monitor detects race conditions
From: Nam Cao
Date: Mon Jul 28 2025 - 10:23:59 EST
On Mon, Jul 28, 2025 at 03:50:17PM +0200, Gabriele Monaco wrote:
> DA monitor can be accessed from multiple cores simultaneously, this is
> likely, for instance when dealing with per-task monitors reacting on
> events that do not always occur on the CPU where the task is running.
> This can cause race conditions where two events change the next state
> and we see inconsistent values. E.g.:
>
> [62] event_srs: 27: sleepable x sched_wakeup -> running (final)
> [63] event_srs: 27: sleepable x sched_set_state_sleepable -> sleepable
> [63] error_srs: 27: event sched_switch_suspend not expected in the state running
>
> In this case the monitor fails because the event on CPU 62 wins against
> the one on CPU 63, although the correct state should have been
> sleepable, since the task get suspended.
>
> Detect if the current state was modified by using try_cmpxchg while
> storing the next value. If it was, try again reading the current state.
> After a maximum number of failed retries, react by calling a special
> tracepoint, print on the console and reset the monitor.
>
> Remove the functions da_monitor_curr_state() and da_monitor_set_state()
> as they only hide the underlying implementation in this case.
>
> Monitors where this type of condition can occur must be able to account
> for racing events in any possible order, as we cannot know the winner.
>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Signed-off-by: Gabriele Monaco <gmonaco@xxxxxxxxxx>
Reviewed-by: Nam Cao <namcao@xxxxxxxxxxxxx>