RE: [PATCH] perf sched: fix wrong conversion of task state

From: Tomoki Sekiyama
Date: Wed Jul 27 2016 - 23:10:58 EST


On 2016/7/28 1:50, David Ahern wrote:
>>>> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
>>>> index 0dfe8df..eb2f7f4 100644
>>>> --- a/tools/perf/builtin-sched.c
>>>> +++ b/tools/perf/builtin-sched.c
>>>> @@ -71,6 +71,7 @@ struct sched_atom {
>>>> };
>>>>
>>>> #define TASK_STATE_TO_CHAR_STR "RSDTtZXxKWP"
>>>> +#define TASK_STATE_MASK 0x7ff
>>>
>>> The mask should not be needed and looking at top of tree there are 2 new
>>> states (N and n) that need to be added.
>>
>> I couldn't find the state for 'n'. Where can I find it?
>
> Check Linus' tree -- top of tree; commit 7dc603c9028ea.

Thanks, I was failing to git pull by some network reason..
Will add 'n' too.

>> I have rechecked the code and found that __trace_sched_switch_state()
uses
>> TASK_STATE_MAX for preempted tasks.
>
>since TASK_STATE_MAX is kernel version dependent perf should not rely on
it.

Hmm, that is true (actually 'n' is added recently).
But that means we cannot handle preemption correctly as far as
sched:sched_switch
event uses TASK_STATE_MAX to mark preempted tasks.

Should we stop using TASK_STATE_MAX for preempted tasks in ftrace and
use (1 << 63) or something that doesn't change on kernel version instead?


This causes missing some context switches information by 'perf sched
latency'.
For example, running 2 'yes' commands (inifinitely call write(2)) on the
same
cpu by following steps results in the incomplete result:

$ taskset -c 0 yes > /dev/null &
$ taskset -c 0 yes > /dev/null &
# perf sched record sleep 1
# perf sched latency -p -C 0

like:

----------------------------------------------------------------------
Task | Runtime ms | Switches | Average delay ms |
----------------------------------------------------------------------
yes:14187 | 499.705 ms | 17 | avg: 12.981 ms |
yes:14188 | 500.350 ms | 14 | avg: 12.023 ms |
gnome-terminal-:12722 | 0.285 ms | 3 | avg: 0.025 ms |
...

where avg delay * switches for yes commands don't cover expected total delay
500 ms although one make another delayed to be executed.
(12.981 ms * 17 = 220.677 ms << 500 ms)

With the patch, perf sched latency shows:
Now 12.838 ms * 39 = 500.682 ms as expected.

----------------------------------------------------------------------
Task | Runtime ms | Switches | Average delay ms |
----------------------------------------------------------------------
yes:14187 | 499.705 ms | 39 | avg: 12.838 ms |
yes:14188 | 500.350 ms | 40 | avg: 12.506 ms |
gnome-terminal-:12722 | 0.285 ms | 3 | avg: 0.025 ms |
...


Thanks,
Tomoki Sekiyama