[PATCH 6/9] perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task

From: Yunlong Song
Date: Tue Mar 31 2015 - 09:45:04 EST


Since there is sem_wait for each task in the wait_for_tasks(), e.g.
sem_wait(&task->work_done_sem). The sem_wait can continue only when
work_done_sem is greater than 0, or it will be blocked. For perf sched
replay, one task may sem_post the work_done_sem of another task, which
causes the work_done_sem of that task processed in a reasonable sequence,
e.g. sem_post, sem_wait, sem_wait, sem_post... This sequence simulates
the sched process of the running tasks at the time when perf sched
record runs. As a result, all the tasks are required and their threads
must be successfully created. If any one (task A) of the tasks fails to
create its thread, then another task (task B), whose work_done_sem needs
sem_post from that failed task A, may likely block itself due to seg_wait.
And this is a dead halt, since task B's thread_func cannot continue at
all. To solve this problem, perf sched replay should exit once any task
fails to create its thread.

Example:

Test environment: x86_64 with 160 cores

Before this patch:

$ perf sched replay
...
Error: sys_perf_event_open() syscall returned with -1 (Too many open
files)
------------------------------------------------------------ <- dead halt

After this patch:

$ perf sched replay
...
task 1551 ( <unknown>: 0), nr_events: 10
Error: sys_perf_event_open() syscall returned with -1 (Too many open
files)
$

As shown above, perf sched replay finishes the process after printing an
error message and does not block itself.

Signed-off-by: Yunlong Song <yunlong.song@xxxxxxxxxx>
---
tools/perf/builtin-sched.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 7fe3b3c..3261300 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -451,10 +451,12 @@ static int self_open_counters(void)
fd = sys_perf_event_open(&attr, 0, -1, -1,
perf_event_open_cloexec_flag());

- if (fd < 0)
+ if (fd < 0) {
pr_err("Error: sys_perf_event_open() syscall returned "
"with %d (%s)\n", fd,
strerror_r(errno, sbuf, sizeof(sbuf)));
+ exit(EXIT_FAILURE);
+ }
return fd;
}

--
1.8.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/