Re: [PATCH 11/11 v2] ftrace: Add recording of functions that caused recursion

From: Steven Rostedt
Date: Mon Nov 02 2020 - 12:46:20 EST


On Mon, 2 Nov 2020 12:37:21 -0500
Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:


> The only race that I see that can happen, is the one in the comment I
> showed. And that is after enabling the recursed functions again after
> clearing, one CPU could add a function while another CPU that just added
> that same function could be just exiting this routine, notice that a
> clearing of the array happened, and remove its function (which was the same
> as the one just happened). So we get a "zero" in the array. If this
> happens, it is likely that that function will recurse again and will be
> added later.
>

Updated version of this function:

-- Steve


void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip)
{
int index = 0;
int i;
unsigned long old;

again:
/* First check the last one recorded */
if (ip == cached_function)
return;

i = atomic_read(&nr_records);
/* nr_records is -1 when clearing records */
smp_mb__after_atomic();
if (i < 0)
return;

/*
* If there's two writers and this writer comes in second,
* the cmpxchg() below to update the ip will fail. Then this
* writer will try again. It is possible that index will now
* be greater than nr_records. This is because the writer
* that succeeded has not updated the nr_records yet.
* This writer could keep trying again until the other writer
* updates nr_records. But if the other writer takes an
* interrupt, and that interrupt locks up that CPU, we do
* not want this CPU to lock up due to the recursion protection,
* and have a bug report showing this CPU as the cause of
* locking up the computer. To not lose this record, this
* writer will simply use the next position to update the
* recursed_functions, and it will update the nr_records
* accordingly.
*/
if (index < i)
index = i;
if (index >= CONFIG_FTRACE_RECORD_RECURSION_SIZE)
return;

for (i = index - 1; i >= 0; i--) {
if (recursed_functions[i].ip == ip) {
cached_function = ip;
return;
}
}

cached_function = ip;

/*
* We only want to add a function if it hasn't been added before.
* Add to the current location before incrementing the count.
* If it fails to add, then increment the index (save in i)
* and try again.
*/
old = cmpxchg(&recursed_functions[index].ip, 0, ip);
if (old != 0) {
/* Did something else already added this for us? */
if (old == ip)
return;
/* Try the next location (use i for the next index) */
index++;
goto again;
}

recursed_functions[index].parent_ip = parent_ip;

/*
* It's still possible that we could race with the clearing
* CPU0 CPU1
* ---- ----
* ip = func
* nr_records = -1;
* recursed_functions[0] = 0;
* i = -1
* if (i < 0)
* nr_records = 0;
* (new recursion detected)
* recursed_functions[0] = func
* cmpxchg(recursed_functions[0],
* func, 0)
*
* But the worse that could happen is that we get a zero in
* the recursed_functions array, and it's likely that "func" will
* be recorded again.
*/
i = atomic_read(&nr_records);
smp_mb__after_atomic();
if (i < 0)
cmpxchg(&recursed_functions[index].ip, ip, 0);
else if (i <= index)
atomic_cmpxchg(&nr_records, i, index + 1);
}