Re: [PATCH] x86_64: fix delayed signals

From: Linus Torvalds
Date: Fri Jul 11 2008 - 16:38:59 EST




On Fri, 11 Jul 2008, Linus Torvalds wrote:
>
> Btw, did any of the impacted people test -rc9? Edwin's report is about
> -rc2 and -rc8, and one of the things we fixed since -rc8 is that incorrect
> and unintentional nr_zones zeroing that effectively disabled kswapd - and
> made everybody do synchronous memory freeing when they wanted to allocate
> more memory.. That can play havoc with any interactive stuff.

No, after testing more, I think Edwin is right.

There _does_ seem to be something wrong in signal handling when the signal
happens during 'D' state (although I guess it could also be an artifact of
other scheduling activity). I just put my machine under heavy read load by
having a background process that does

while : ; do echo 3 > /proc/sys/vm/drop_caches ; sleep 5; done

to flush the caches all the time, and then doing endless loops of "git
grep" and other things, and I can definitely re-create the bad latency.

The best example is doing just

[torvalds@woody ~]$ ls -l /usr/bin
^Ctotal 367720
-rwxr-xr-x 1 root root 43648 2008-07-04 09:35 [
[torvalds@woody ~]$

and there are several seconds between the '^C' and the printout of the two
lines (after which the 'ls' finally dies).

Perhaps more interestingly: when this happens (it doesn't happen all the
time), it will _always_ die at the first two lines of printout for me, on
two different machines: ie I *always* get that exact pattern of those two
lines printed out (except when it reacts immediately, or when I press ^C
so late that it has printed out much more).

And it's not some single system call latency that is long. I checked with
strace, and while the IO load is high and causes the whole 'ls' to take a
long time (up to 10 seconds - /usr/bin is *big*), each individual system
call never takes very long. The longest system call latency I saw was 0.24
seconds, which was for some very unlucky lstat() calls. Most of them were
in the millisecond range, even though they obviously had to do IO.

More tellingly, those two writes are simply not even done with a single
system call. The 'strace' for an 'ls -l' shows

...
write(1, "total 367720\n", 13) = 13 <0.000085>
open("/etc/localtime", O_RDONLY) = 3 <0.000016>
fstat(3, {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0 <0.000004>
fstat(3, {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0 <0.000004>
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6d996dd000 <0.000006>
read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0\0"..., 4096) = 2819 <0.041416>
lseek(3, -1802, SEEK_CUR) = 1017 <0.000005>
read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0\5\0\0\0\0"..., 4096) = 1802 <0.000006>
close(3) = 0 <0.000006>
munmap(0x7f6d996dd000, 4096) = 0 <0.000014>
write(1, "-rwxr-xr-x 1 root root 436"..., 54) = 54 <0.000008>
...

so there is actually *many* system calls in there between the header line
that does the 'total' and the first line of actual real output, but
despite that, it reacts to the ^C consistently after the second line.

Very odd.

And the problem definitely happens on x86-32 too.

It also happens on the text console, so this is not some bad interaction
with X and pty's and xterm or gnome-terminal or something like that. At
first I thought "no way can that actually happen, I think the terminal is
broken and only sends the signal after it has seen output", but no, it
happens for me in text-mode too.

Oleg added to the Cc as the master of signal handling code.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/