Re: Scheduling issue with signal handling

From: Ingo Molnar
Date: Thu Jul 10 2008 - 04:55:29 EST



* Elias Oltmanns <eo@xxxxxxxxxxxxxx> wrote:

> By sprinkling some printk()s all over the place, I've managed to
> establish the following sequence of events taking place in the event
> of delayed signal handling as described above: The first Ctrl+Z event
> enqueues a SIGTSTP signal which eventually results in a call to
> kick_process(). For some reason though, the signal isn't handled
> straight away but remains on the queue for some time. Consequently,
> subsequent Ctrl+Z events result in echoing another ^Z to the terminal
> but everything related to sending a signal is skipped (and rightly so)
> because the kernel detects that a SIGTSTP is still pending.
> Eventually, get_signal_to_deliver() dequeues the SIGTSTP signal and
> the shell propt appears.
>
> My question is this: Even under high disk I/O pressure, the threads
> dealing with I/O to the terminal evidently still get their turn as
> indicated by the sequence of ^Z appearing on screen. Why is it then,
> that the threads which are meant to process the SIGTSTP or SIGINT
> signals aren't scheduled for some seconds and is there a way to change
> this?
>
> Please let me know if there is anything I can try to investigate this
> any further or if you need further information.
>
> Thanks in advance,
>
> Elias
>
> [1] http://lkml.org/lkml/2008/6/28/50

hm, kick_process() is a no-op on !SMP.

Basically, when a new signal is queued and a task is already running, it
will run in due course and process the signal the moment it's scheduled
again. (unless the signal is blocked)

If a task is not already running, then the signal code will wake up the
task and it will then process the signal the moment it's executed.

The maximum latency of a runnable task hitting the CPU is controlled via
/proc/sys/kernel/sched_latency [available if CONFIG_SCHED_DEBUG=y in the
.config] - 20 milliseconds on uniprocessors.

Several seconds of lag is almost out of question and would indicate a
serious scheduler bug, or - which is far more likely - either an
application signal processing hickup or a kernel signal processing
hickup.

If the lag happens with the task you can observe its worst-case
scheduling delay by looking at /proc/<PID>/sched, if you also have
CONFIG_SCHEDSTAT=y in your .config.

For example, a random shell's delays on a testbox:

phoenix:~> grep se.wait_max /proc/$$/sched
se.wait_max : 3.338588

That's 3.3 msecs _worst case_, on a system that has otherwise quite
insane load:

10:53:57 up 2:48, 2 users, load average: 77.56, 94.33, 102.75

So several seconds of delay, if it came from the scheduler, would be
really anomalous.

As a final measure, instead of printk's, you could try the scheduler
tracer in linux-next (CONFIG_CONTEXT_SWITCH_TRACER=y), to have an exact
idea about what is going on and when. (see /debug/tracing/README)

[ You might also want to try CONFIG_FTRACE=y and CONFIG_DYNAMIC_FTRACE=y
for extremely finegrained kernel tracing - available in linux-next
too. ]

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/